STEWARDSHIP
the assumption of responsibility for the welfare of the world
SITE MAP

STEWARDSHIP

UNION

PROJECT

EARTH AS IT IS

POLICIES

RESEARCH

ESSAYS

 

TRANSCRIPTION AND TRANSLITERATION PROTOCOLS

 

O.T. FORD
Updated 2016 September 6

 

For translation of an unfamiliar word, place the cursor over the word.

 

Transcription is rendering something into a script, or writing system. Since most writing systems exist to represent speech, most transcription involves rendering speech into a particular script. When societies become literate, their dialects are written in a particular system, or occasionally more than one, and a standard way of writing the dialect in each script emerges, either because the writing system is straightforward in its relation to speech, or because of tradition. This standard way of writing a dialect in a script is an orthography. The word transcription is most often used when speech is rendered into a script for which there is no orthography. For example, English is almost exclusively written in the Latin script. If I write English in the Latin script, I will use the orthography, rather than attempting to match speech with letters. But if I choose to write English in the Arabic script, one way to do this is to transcribe the sounds of English into Arabic characters, based on a regular interpretation of the sound value of each character. Or if I travel to a remote area where the dialect has never been written, I might transcribe the local speech into the Latin script, again based on a regular interpretation of the sound value of each character. If another person understands the writing system as I have used it, it should be possible for that person to recreate the original speech, within limits (for instance, that person will not necessarily be able to reproduce the foreign sounds in the original speech). Every character (or combination of characters) in the script should represent only one pronunciation.

Transliteration, when it has a distinct meaning, is rendering one script into another. This falls under the broadest definition of transcription, but the point of transliteration is not to represent the speech of the original, but the writing of the original. If another person understands the second writing system as I have used it, it sould be possible for that person to recreate the original writing. Every character (or combination) should represent only one character (or combination) from the original script. The term transliteration is frequently used for something else, transcription for dialects that are normally written in a different script. This is why many systems of transliteration do not in fact faithfully represent the original orthography. For example, the Γ of Ancient Greek usually represented the sound [ɡ], the g of get. But it could also be used, before the velars (Κ, Χ, Ξ, or another Γ), to represent the sound [ŋ], the ng of sing. The Romans represented the [ŋ] sound, in similar pre-velar environments (before C, G, CH, and X), with N; when they borrowed Greek words with this sound, they spelled those words with their own standard N. In other words, they were transcribing [ŋ], not transliterating the original Γ, since for them Γ would have corresponded to G. Following this Latin practice, modern English transliterations of Ancient Greek do not actually transliterate the nasal Γ as g; rather, they transcribe it as n.

What follows below is a regular system of transliteration and transcription used in the Stewardship site. This system is proposed for wider adoption, as an improvement over the disconnected and inexact systems currently in use. But it is also a central feature of the multilingual standard used on many pages on this site.

This multilingual standard, advocated and employed here, is based on the belief that, particularly in English with its strengthening global dominance, inclusion of other linguistic traditions is important for cultural heritage and for the facilitation of intercultural understanding. The first principle is that all names, of persons, places, organizations, and concepts, should be left in the original form wherever possible. The main Hellenistic alphabets ― Latin, Cyrillic, and Greek itself ― should be presented in the original alone. The assumption, in computerized communication, is that most computers are equipped with the requisite characters for these Hellenistic scripts. Virtually all names in the Latin scripts can be printed unaltered. Most names in the Cyrillic scripts can also be shown unaltered. Occasional characters or diacritical marks are unavailable, and these should be approximated rather than transliterated, with the remaining characters in a name left in the original form. Modern Greek can be shown in its native form; Ancient Greek should be presented in its classical form. But for other scripts no such assumption can be made, and names and words of non-English origin should be printed both in the original (where possible) and in transliterated or transcribed form. The transliteration or transcription employed here attempts to be regular, though mistakes have certainly been made, and corrections are welcome.

The second principle, where transliteration is required, is that it should be possible to reconstruct the original; the transliterated form should unambiguously represent the original writing. Therefore, while occasionally an original grapheme (character) will be transliterated by more than one Latin symbol (primarily when a single grapheme represents different phonemes), only rarely (when unavoidable) will one Latin symbol represent more than one original grapheme.

There are several minor principles concerning transliteration. Unlike in the International Phonetic Alphabet and numerous transliteration schemes, capitals and lowercase should not be used distinctively, so that they may be used, following normal practice, as contextual variants of each other. That is, most letters used for transliteration can be written in capital or lowercase form, as appropriate. Where scripts are related, transliterations should also be related, as with the Semitic or Indian scripts below. And finally, borrowings should not be transliterated as written, but rather restored to or transliterated as the original.

Since 漢字 xan51 ci51 (hnz) does not directly (and often does not indirectly) represent sound, it is merely transcribed, with each syllable transcribed separately. A standardized transcription has been employed here for all dialects using 漢字 xan51 ci51 , following many of the same principles of transliteration below. Tones are transcribed as superscripts ranging from 5 (highest) to 1 (lowest), or as sequences of these. Parenthetical conventional transcriptions have been included; for 漢語 Xan51 214 (Hnyŭ), the widely-recognized 拼音 p‛in5 in5 (pīnyīn), for 廣話 Kuoŋ3535 (Gwng Wa), Yale romanization, for 閩南語 Ban35 Lam35 Gu53 (Bn-lm-g), 白話字 Peɔ4 Ue2 Zi2 (Ph Ōe Jī), and for 閩北語 Miŋ53 Pak213 Ŋ3 (Mng-bk-ngṳ̄), Foochow romanization.

Northwest Semitic paradigm: א Ɔ ב B ג G ד D ה H ו Ŭ ז Z ח H ט T י J כ K ל L מ M נ N ס S ע C פ P צ S ק Q ר R ש ת T. No separate encoding exists; convention replaces it with modern Hebrew forms.

― Arabic: ا Ɔ-Ā ب B ت T ث ج Ĝ ح H خ X د D ذ ر R ز Z س S ش ص S ض D ط T ظ ع C غ Ğ ف F ق Q ك K ل L م M ن N ه H و Ŭ-Ū ى J-Ī (ة Ḧ ئ Ỉ ؤ Ủ ء Ɔ). The basic vowel structure (only occasionally written in the original) has been transcribed as A-I-U. Related scripts have been transliterated along the same principles but will include additional characters (ک K گ G ژ پ P چ Č ڠ Ŋ ڤ P ڽ ۏ‎ V ) and additional vowel transcriptions.

― Hebrew: א Ɔ ב B ג G ד D ה H ו Ŭ-Ū ז Z ח H ט T י J-Ī כ K ל L מ M נ N ס S ע C פ P צ S ק Q ר R ש -Ś ת T. A broader range of vowels has been transcribed.

― Syriac: ܐ Ɔ-Ā ܒ B ܓ G ܕ D ܗ H ܘ Ŭ-Ū ܙ Z ܚ H ܛ T ܝ J-Ī ܟ K ܠ L ܡ M ܢ N ܣ S ܥ C ܦ P ܨ S ܩ Q ܪ R ܫ ܬ T.

― Amharic/Ethiopic: አ ኡ U ኢ I ኣ A ኤ E እ Ə ኦ O ሀ H ለ L ሐ H መ M ሠ Ś ረ R ሰ S ሸ ቀ Q ቈ QŬ በ B ቨ V ተ T ቸ Č ኀ Ĥ ኈ ĤŬ ነ N ኘ አ Ɔ ከ K ኰ KŬ ኸ X ዀ XŬ ወ Ŭ ዐ C ዘ Z ዠ የ J ደ D ጀ ገ G ጐ GŬ ጠ TČPSD ፈ F ፐ P ጘ Ŋ ⶓ ŊŬ

Dēvanāgarī: अ A आ Ā इ I ई Ī उ U ऊ Ū ऋ Ŗ ए Ē ऐ ओ Ō औ क K ख K‛ ग G घ G‛ ङ Ŋ च C छ C‛ ज JJ‛ ञ ट TT‛ ड DD‛ ण N त T थ T‛ द D ध D‛ न N प P फ P‛ ब B भ B‛ म M य J र R ल L व V श Ś ष S स S ह H ळ L क़ Q ख़ X ग़ Ğ ज़ Z फ़ F

― Bengali: অ A আ Ā ই I ঈ Ī উ U ঊ Ū ঋ Ŗ এ Ē ঐ ও Ō ঔ ক K খ K‛ গ G ঘ G‛ ঙ Ŋ চ C ছ C‛ জ JJ‛ ঞ ট TT‛ ড DD‛ ণ N ত T থ T‛ দ D ধ D‛ ন N প P ফ P‛ ব B ভ B‛ ম M য J য় J-Ŭ র R ল L শ Ś ষ S স S হ H ড় RR‛ ঃ  ং ~  ঁ ~

― Gurmukhi/Punjabi: ਅ A ਆ Ā ਇ I ਈ Ī ਉ U ਊ Ū ਏ Ē ਐ ਓ Ō ਔ ਕ K ਖ K‛ ਗ G ਘ G‛ ਙ Ŋ ਚ C ਛ C‛ ਜ JJ‛ ਞ ਟ TT‛ ਡ DD‛ ਣ N ਤ T ਥ T‛ ਦ D ਧ D‛ ਨ N ਪ P ਫ P‛ ਬ B ਭ B‛ ਮ M ਯ J ਰ R ਲ L ਵ V ਸ S ਹ H ਸ਼ ਲ਼ L ਖ਼ X ਗ਼ Ğ ਜ਼ Z ੜ R ਫ਼ F  ੰ ~  ਂ ~  ੱ :

― Gujarati: અ A આ Ā ઇ I ઈ Ī ઉ U ઊ Ū ઋ Ŗ ૠ Ŗ: ઍ E એ Ē ઐ ઑ O ઓ Ō ઔ ક K ખ K‛ ગ G ઘ G‛ ઙ Ŋ ચ C છ C‛ જ JJ‛ ઞ ટ TT‛ ડ DD‛ ણ N ત T થ T‛ દ D ધ D‛ ન N પ P ફ P‛ બ B ભ B‛ મ M ય J ર R લ L ળ L વ V શ Ś ષ S સ S હ H  ં ~ ઃ

― Sinhalese: අ A ආ Ā ඇ EĒ ඉ I ඊ Ī උ U උ Ū ඍ Ŗ ඎ Ŗ: ඏ Ļ ඐ Ļ: එ E ඒ Ē ඓ ඔ O ඕ Ō ඖ ක K ඛ K‛ ග G ඝ G‛ ඞ Ŋ ඟ ŊG ච C ඡ C‛ ජ JJ‛ ඤ ඦ JTT‛ ඩ DD‛ ණ NND ත T ථ T‛ ද D ධ D‛ න N ඳ ND ප P ඵ P‛ බ B භ B‛ ම M ඹ MB ය J ර R ල L ව V ශ Ś ෂ S ස S හ H ළ L ෆ F  ං ~  ඃ

― Oriya: ଅ A ଆ Ā ଇ I ଈ Ī ଉ U ଊ Ū ଋ Ŗ ୠ Ŗ: ଌ Ļ ୡ Ļ: ଏ Ē ଐ ଓ Ō ଔ କ K ଖ K‛ ଗ G ଘ G‛ ଙ Ŋ ଚ C ଛ C‛ ଜ JJ‛ ଞ ଟ TT‛ ଡ DD‛ ଣ N ତ T ଥ T‛ ଦ D ଧ D‛ ନ N ପ P ଫ P‛ ବ B ଭ B‛ ମ M ୟ J ଯ ର R ଲ L ଳ L ଶ Ś ଷ S ସ S ହ H ଡ଼ R ଢ଼ R‛  ଁ ~ ଃ .

― Malayalam: അ A ആ Ā ഇ I ഈ Ī ഉ U ഊ Ū ഋ Ŗ എ E ഏ Ē ഐ ഒ O ഓ Ō ഔ ക K ഖ K‛ ഗ G ഘ G‛ ങ Ŋ ച C ഛ C‛ ജ JJ‛ ഞ ട TT‛ ഡ DD‛ ണ N ത T ഥ T‛ ദ D ധ D‛ ന N പ P ഫ P‛ ബ B ഭ B‛ മ M യ J ര R ല L വ V ശ Ś ഷ S സ S ഹ H ള LLR  ം ~ ഃ

― Kannada: ಅ A ಆ Ā ಇ I ಈ Ī ಉ U ಊ Ū ಋ Ŗ ೠ Ŗ: ಌ Ļ ೡ Ļ: ಎ E ಏ Ē ಐ ಒ O ಓ Ō ಔ ಕ K ಖ K‛ ಗ G ಘ G‛ ಙ Ŋ ಚ C ಛ C‛ ಜ JJ‛ ಞ ಟ TT‛ ಡ DD‛ ಣ N ತ T ಥ T‛ ದ D ಧ D‛ ನ N ಪ P ಫ P‛ ಬ B ಭ B‛ ಮ M ಯ J ರ R ಱ R ಲ L ಳ L ವ V ಶ Ś ಷ S ಸ S ಹ H ೞ F  ಂ ~ ಃ

― Telugu: అ A ఆ Ā ఇ I ఈ Ī ఉ U ఊ Ū ఋ Ŗ ౠ Ŗ: ఌ Ļ ౡ Ļ: ఎ E ఏ Ē ఐ ఒ O ఓ Ō ఔ క K ఖ K‛ గ G ఘ G‛ ఙ Ŋ చ C ఛ C‛ జ JJ‛ ఞ ట TT‛ డ DD‛ ణ N త T థ T‛ ద D ధ D‛ న N ప P ఫ P‛ బ B భ B‛ మ M య J ర R ఱ R ల L ళ L వ V శ Ś ష S స S హ H  ం ~ ః

― Tamil: அ A ஆ Ā இ I ஈ Ī உ U ஊ Ū எ E ஏ Ē ஐ ஒ O ஓ Ō ஔ க K ங Ŋ ச C ஞ ட TN த T ந N ப P ம M ய J ர R ல L வ V ழ ZL ற Ř ன Ň (ஜ J ஷ Ś ஸ S ஹ H)

― Khmer: ឣ A ឤ Ā ឥ I ឦ Ī ឧ U (ឨ UK ឩ Ū) ឪ Ū ឫ Ŗ ឬ Ŗ: ឭ Ļ ឮ Ļ: ឯ Ē ឰ ឱ O ឲ Ō ឳ ក K ខ K‛ គ G ឃ G‛ ង Ŋ ច C ឆ C‛ ជ JJ‛ ញ ដ TT‛ ឌ DD‛ ណ N ត T ថ T‛ ទ D ធ D‛ ន N ប P ផ P‛ ព B ភ B‛ ម M យ J រ R ល L វ V ឝ Ś ឞ S ស S ហ H ឡ L អ Ɔ

― Thai: อะ A อั A อา Ā อิ I อี Ī อึ I อื Ī อุ U อู Ū [อฺ ] อเ Ē อแ E อโ Ō อใ อไ ก K ข K‛ ฃ K‛ ค G ฅ G ฆ G‛ ง Ŋ จ C ฉ C‛ ช JJJ‛ ญ ฎ 'DTT‛ ฑ DD‛ ณ N ด 'D ต T ถ T‛ ท D ธ D‛ น N บ 'B ป P ผ P‛ ฝ P‛ พ B ฟ B ภ B‛ ม M ย J ร R ฤ Ŗ ล L ฦ Ļ ว Ŭ ศ Ś ษ S ส S ห H ฬ L อ ฮ H อํ ~

― Lao: ອະ A ອັ A ອາ Ā ອິ I ອີ Ī ອຶ I ອື Ī ອຸ U ອູ Ū [ອົ O ອຼ ] ອຽ IA ອເ Ē ອແ E ອໂ Ō ອໃ ອໄ [ອໆ ] ກ K ຂ K‛ ຄ G ງ Ŋ ຈ C ຊ J ຍ ດ 'D ຕ T ຖ T‛ ທ D ນ N ບ 'B ປ P ຜ P‛ ຝ P⁣ ພ B ຟ B ມ M ຢ J ຣ R ລ L ວ Ŭ ສ S ຫ H ອ ຮ H ອໍ ~

― Burmese: က K K‛ G G‛ Ŋ C C‛ J Jဉ/ည T T D D N T T‛ D D‛ N P P‛ B B‛ M J R L Ŭ S H L .

― Tibetan: ཨ A ཨི I ཨུ U ཨེ E ཨོ O ཀ K ཁ K‛ ག G ང Ŋ ཅ C ཆ C‛ ཇ J ཉ (ཊ TT‛ ཌ DN) ཏ T ཐ T‛ ད D ན N པ P ཕ P‛ བ B མ M ཙ SS‛ ཛ Z ཝ Ŭ ཞ ཟ Z འ ཡ J ར R ལ L ཤ Ś (ཥ S) ས S ཧ H .

― Thaana: އަ A އާ Ā އި I އީ Ī އު U އޫ Ū އެ E އޭ Ē އޮ O އޯ Ō ހ H ށ S ނ N ރ R ބ B ޅ L ކ K އ Ɔ ވ V މ M ފ F ދ D ތ T ލ L ގ G ޏ ސ S ޑ D ޒ Z ޓ T ޔ J ޕ P ޖ ޗ Č ( ޘ ޙ H ޚ X ޛ ޜ ޝ ޞ [S] ޟ [D] ޠ [T] ޡ ޢ C ޣ Ğ ޤ Q ޥ Ŭ ޱ N). Thaana is partially rooted in Brahmi (and is used for an Indic dialect), but also borrows heavily from Arabic.

Japanese: ア A イ I ウ U エ E オ O カ K ガ G サ S ザ Z タ T ダ D ナ N ハ H バ B パ P マ M ヤ J ラ R ワ Ŭ. Kana are transliterated, as any syllabary, directly, including for long vowels, based on a phonemic transcription. ( tu ) stands for small ッ TU when marking a geminate consonant, again corresponding to kana, as well as extended pronunciation. Kanji are transcribed phonemically, as kana above, with separate blocks for each character.

Korean: ᅡ A ᅣ JA ᅥ E ᅧ JE ᅩ O ᅭ JO ᅮ U ᅲ JU ᅳ U ᅵ I ᄀ K ᄂ N ᄃ T ᄅ L ᄆ M ᄇ P ᄉ S ᄋ /Ŋ ᄌ C ᄎ C‛ ᄏ K‛ ᄐ T‛ ᄑ P‛ ᄒ H. Since the han kul is always transliterated with cluster (syllable) breaks, the elements of the han kul are transliterated separately, yielding ᅢ AI ᅤ JAI ᅦ EI ᅨ JEI ᅪ OA ᅫ OAI ᅬ OI ᅯ UE ᅰ UEI ᅱ UI ᅴ UI ᄁ KK ᄄ TT ᄈ PP ᄊ SS ᄍ CC, noting that ᅬ OI is rendered [] or [we], and ᅱ UI [y] or [wi].

Armenian: Աա A Բբ B Գգ G Դդ D Եե E Զզ Z Էէ Ē Ըը Ə Թթ T‛ Ժժ Իի I Լլ L Խխ X Ծծ C Կկ K Հհ H Ձձ Z Ղղ Ğ Ճճ Č Մմ M Յյ J Նն N Շշ Ոո O Չչ Č‛ Պպ P Ջջ Ռռ Ř Սս S Վվ V Տտ T Րր R Ցց C‛ Ււ U Փփ P‛ Քք K‛ Օօ Ō Ֆֆ F և eu. The eastern pronunciation has been used as a standard.

Georgian: ა A ბ B გ G დ D ე E ვ V ზ Z თ T‛ ი I კ K ლ L მ M ნ N ო O პ P ჟ რ R ს S ტ T უ U ფ P‛ ქ K‛ ღ Ğ ყ Q შ ჩ Č‛ ც C‛ ძ Z წ C ჭ Č ხ X ჯ ჰ H ჱ Ē ჲ J ჳ ŬI ჴ Q‛ ჵ Ō ჶ F.

Other conventions used across systems: ‛ indicates aspiration, and ~ nasalization; Ɔ represents [ʔ], J represents [j], Ŭ represents [w], X represents [x], Z represents [dz], represents [dʒ], represents [θ], and represents []; due to typographical restrictions, underscoring _ replaces a subscript dot.

In contexts where the Hellenistic scripts are transliterated, the following system is used:

Greek: Α-A Β-B Γ-G/Ŋ Δ-D Ε-E Ζ-Z Η-Ē/H Θ-T‛ Ι-I Κ-K Λ-L Μ-M Ν-N Ξ-KS Ο-O Π-P Ρ-R Σ-S Τ-T Υ-U Φ-P‛ Χ-K‛ Ψ-PS Ω-Ō

― Cyrillic: А-A Б-B В-V Г-G Ґ-Ġ Д-D Ђ-Đ Ҙ- Ѓ-Ǵ Е-E Ё-JO Є- Ж- Ѕ-Z З-Z Ӡӡ-Z И-I Ѳ-T‛ І- Ї-J Й-Ĭ Ј-J К-K Ҝ-Ġ Қ-Q Ҡ-Q Л-L Љ-Ĺ М-M Н-N Њ-Ń Ң-Ŋ Ѯ-KS О-O Ө- П-P Р-R С-S Ҫ- Т-T Ћ-Ć Ќ-Ḱ У-U Ў-Ŭ Ұ-U Ү- Ф-F Х-X Һ-H Ѱ-PS Ѡ-Ō Ц-C Ч-Č Ҹ- Џ- Ш- Щ-Ŝ Ъ-Ə Ы-Y Ь-Ĵ Ѣ-Ě Э- Ю-JU Я-JA Ѧ- Ѩ-J Ѫ- Ѭ-J Ѵ- Ѥ- Ғ-Ğ Ӣ-Ī Ӯ-Ū Ҳ-H Ҷ-.

The stewardship site uses numerical Unicode. Most of the characters used in this site are available in Unicode fonts. In all cases of unfamiliar (non-English) words, a translation into the standard English form will appear if the cursor is placed over the word. In many cases, I have been forced to use the only form available to me, already transliterated. In some cases, I have even retransliterated into what I suppose to be the original. Corrections to these usages are welcomed, and should be sent to the Stewardship Project, project @ the-stewardship.org.

 

O.T. FORD

THE STEWARDSHIP
Home of the Stewardship Project
and O.T. Ford