TRANSCRIPTION PROTOCOLS

O.T. FORD

UPDATED 2010 MARCH 6

 

The language usage here advocated and employed is based on the belief that, particularly in English with its strengthening global dominance, inclusion of other linguistic traditions is important for cultural heritage and for the facilitation of intercultural understanding. The first principle is that all names, of persons, places, organizations, and concepts, should be left in the original form wherever possible. The main Hellenistic alphabets ― Latin, Cyrillic, and Greek itself ― should be presented in the original alone. The assumption, in computerized communication, is that most computers are equipped with the requisite characters for these Hellenistic scripts. Virtually all names in the Latin scripts can be printed unaltered. Most names in the Cyrillic scripts can also be shown unaltered. Occasional characters or diacritical marks are unavailable, and these should be approximated rather than transliterated, with the remaining characters in a name left in the original form. Modern Greek can be shown in its native form; Ancient Greek should be presented in its classical form. But for other scripts no such assumption can be made, and names and words of non-English origin should be printed both in the original (where possible) and in transliterated or transcribed form. The transliteration or transcription employed here attempts to be regular, though mistakes have certainly been made, and corrections are welcome.

The second principle, where transliteration is required, is that it should be possible to reconstruct the original; the transliterated form should unambiguously represent the original writing. Therefore, while occasionally an original grapheme will be transliterated by more than one Latin symbol (primarily when a single grapheme represents different phonemes), only rarely (when unavoidable) will one Latin symbol represent more than one original grapheme.

There are several minor principles concerning transliteration. Unlike in the International Phonetic Alphabet and numerous transliteration schemes, capitals and lowercase should not be used distinctively, so that they may be used, following normal practice, as contextual variants of each other. That is, most letters used for transliteration can be written in capital or lowercase form, as appropriate. Where scripts are related, transliterations should also be related, as with the Semitic or Indian scripts below. And finally, borrowings should not be transliterated as written, but rather restored to or transliterated as the original.

Since 漢字 « xan51 ci51 » (hànzì) does not directly (and often does not indirectly) represent sound, it is merely transcribed, with each syllable transcribed separately. A standardized transcription has been employed here for all dialects using 漢字 « xan51 ci51 », following many of the same principles of transliteration below. Tones are transcribed as superscripts ranging from 5 (highest) to 1 (lowest), or as sequences of these. Parenthetical conventional transcriptions have been included; for 漢語 « Xan51 Ü214 » (Hànyŭ), the widely-recognized 拼音 « p‛in5 in5 » (pīnyīn), for 廣話 « Kuoŋ3535 » (Gwóng Wáa), Yale romanization, for 閩南語 « Ban35 Lam35 Gu53 » (Bân-lâm-gú), 白話字 « Peɔ4 Ue2 Zi2 » (Ph Ōe Jī), and for 閩北語 « Miŋ53 Paök213 Ŋü3 » (Mìng-bák-ngṳ̄), Foochow romanization.

Northwest Semitic paradigm: א Ɔ ב B ג G ד D ה H ו Ŭ ז Z ח H ט T י J כ K ל L מ M נ N ס S ע C פ P צ S ק Q ר R ש Š ת T. No separate encoding exists; convention replaces it with modern Hebrew forms.

― Arabic: ا Ɔ-Ā ب B ت T ث Þ ج Ĝ ح H خ X د D ذ Ð ر R ز Z س S ش Š ص S ض D ط T ظ Ð ع C غ Ğ ف F ق Q ك K ل L م M ن N ه H و Ŭ-Ū ى J-Ī (ة Ḧ ئ Ỉ ؤ Ủ ء Ɔ). The basic vowel structure (only occasionally written in the original) has been transcribed as A-I-U. Related scripts have been transliterated along the same principles but will include additional characters (ک K گ G ژ Ž پ P چ Č ڠ Ŋ ڤ P´ ڽ Ñ ۏ‎ V ) and additional vowel transcriptions.

― Hebrew: א Ɔ ב B ג G ד D ה H ו Ŭ-Ū ז Z ח H ט T י J-Ī כ K ל L מ M נ N ס S ע C פ P צ S ק Q ר R ש Š-Ś ת T. A broader range of vowels has been transcribed.

― Syriac: ܐ Ɔ-Ā ܒ B ܓ G ܕ D ܗ H ܘ Ŭ-Ū ܙ Z ܚ H ܛ T ܝ J-Ī ܟ K ܠ L ܡ M ܢ N ܣ S ܥ C ܦ P ܨ S ܩ Q ܪ R ܫ Š ܬ T.

― Amharic/Ethiopic: አ Ä ኡ U ኢ I ኣ A ኤ E እ Ə ኦ O ሀ H ለ L ሐ H መ M ሠ Ś ረ R ሰ S ሸ Š ቀ Q ቈ QŬ በ B ቨ V ተ T ቸ Č ኀ Ĥ ኈ ĤŬ ነ N ኘ Ñ አ Ɔ ከ K ኰ KŬ ኸ X ዀ XŬ ወ Ŭ ዐ C ዘ Z ዠ Ž የ J ደ D ጀ Ž ገ G ጐ GŬ ጠ TČPSD ፈ F ፐ P ጘ Ŋ ⶓ ŊŬ

Dēvanāgarī: अ A आ Ā इ I ई Ī उ U ऊ Ū ऋ Ŗ ए Ē ऐ Æ ओ Ō औ Å क K ख K‛ ग G घ G‛ ङ Ŋ च C छ C‛ ज JJ‛ ञ Ñ ट TT‛ ड DD‛ ण N त T थ T‛ द D ध D‛ न N प P फ P‛ ब B भ B‛ म M य J र R ल L व V श Ś ष S स S ह H ळ L क़ Q ख़ X ग़ Ğ ज़ Z फ़ F

― Bengali: অ A আ Ā ই I ঈ Ī উ U ঊ Ū ঋ Ŗ এ Ē ঐ Æ ও Ō ঔ Å ক K খ K‛ গ G ঘ G‛ ঙ Ŋ চ C ছ C‛ জ JJ‛ ঞ Ñ ট TT‛ ড DD‛ ণ N ত T থ T‛ দ D ধ D‛ ন N প P ফ P‛ ব B ভ B‛ ম M য J য় J-Ŭ র R ল L শ Ś ষ S স S হ H ড় RR‛ ঃ ’  ং ~  ঁ ~

― Gurmukhi/Punjabi: ਅ A ਆ Ā ਇ I ਈ Ī ਉ U ਊ Ū ਏ Ē ਐ Æ ਓ Ō ਔ Å ਕ K ਖ K‛ ਗ G ਘ G‛ ਙ Ŋ ਚ C ਛ C‛ ਜ JJ‛ ਞ Ñ ਟ TT‛ ਡ DD‛ ਣ N ਤ T ਥ T‛ ਦ D ਧ D‛ ਨ N ਪ P ਫ P‛ ਬ B ਭ B‛ ਮ M ਯ J ਰ R ਲ L ਵ V ਸ S ਹ H ਸ਼ Š ਲ਼ L ਖ਼ X ਗ਼ Ğ ਜ਼ Z ੜ R ਫ਼ F  ੰ ~  ਂ ~  ੱ :

― Gujarati: અ A આ Ā ઇ I ઈ Ī ઉ U ઊ Ū ઋ Ŗ ૠ Ŗ: ઍ E એ Ē ઐ Æ ઑ O ઓ Ō ઔ Å ક K ખ K‛ ગ G ઘ G‛ ઙ Ŋ ચ C છ C‛ જ JJ‛ ઞ Ñ ટ TT‛ ડ DD‛ ણ N ત T થ T‛ દ D ધ D‛ ન N પ P ફ P‛ બ B ભ B‛ મ M ય J ર R લ L ળ L વ V શ Ś ષ S સ S હ H  ં ~ ઃ ’

― Sinhalese: අ A ආ Ā ඇ EĒ ඉ I ඊ Ī උ U උ Ū ඍ Ŗ ඎ Ŗ: ඏ Ļ ඐ Ļ: එ E ඒ Ē ඓ Æ ඔ O ඕ Ō ඖ Å ක K ඛ K‛ ග G ඝ G‛ ඞ Ŋ ඟ ŊG ච C ඡ C‛ ජ JJ‛ ඤ Ñ ඦ ÑJTT‛ ඩ DD‛ ණ NND ත T ථ T‛ ද D ධ D‛ න N ඳ ND ප P ඵ P‛ බ B භ B‛ ම M ඹ MB ය J ර R ල L ව V ශ Ś ෂ S ස S හ H ළ L ෆ F  ං ~  ඃ ’

― Oriya: ଅ A ଆ Ā ଇ I ଈ Ī ଉ U ଊ Ū ଋ Ŗ ୠ Ŗ: ଌ Ļ ୡ Ļ: ଏ Ē ଐ Æ ଓ Ō ଔ Å କ K ଖ K‛ ଗ G ଘ G‛ ଙ Ŋ ଚ C ଛ C‛ ଜ JJ‛ ଞ Ñ ଟ TT‛ ଡ DD‛ ଣ N ତ T ଥ T‛ ଦ D ଧ D‛ ନ N ପ P ଫ P‛ ବ B ଭ B‛ ମ M ୟ J ଯ Ž ର R ଲ L ଳ L ଶ Ś ଷ S ସ S ହ H ଡ଼ R ଢ଼ R‛  ଁ ~ ଃ ’.

― Malayalam: അ A ആ Ā ഇ I ഈ Ī ഉ U ഊ Ū ഋ Ŗ എ E ഏ Ē ഐ Æ ഒ O ഓ Ō ഔ Å ക K ഖ K‛ ഗ G ഘ G‛ ങ Ŋ ച C ഛ C‛ ജ JJ‛ ഞ Ñ ട TT‛ ഡ DD‛ ണ N ത T ഥ T‛ ദ D ധ D‛ ന N പ P ഫ P‛ ബ B ഭ B‛ മ M യ J ര R ല L വ V ശ Ś ഷ S സ S ഹ H ള LLR  ം ~ ഃ ’

― Kannada: ಅ A ಆ Ā ಇ I ಈ Ī ಉ U ಊ Ū ಋ Ŗ ೠ Ŗ: ಌ Ļ ೡ Ļ: ಎ E ಏ Ē ಐ Æ ಒ O ಓ Ō ಔ Å ಕ K ಖ K‛ ಗ G ಘ G‛ ಙ Ŋ ಚ C ಛ C‛ ಜ JJ‛ ಞ Ñ ಟ TT‛ ಡ DD‛ ಣ N ತ T ಥ T‛ ದ D ಧ D‛ ನ N ಪ P ಫ P‛ ಬ B ಭ B‛ ಮ M ಯ J ರ R ಱ R ಲ L ಳ L ವ V ಶ Ś ಷ S ಸ S ಹ H ೞ F  ಂ ~ ಃ ’

― Telugu: అ A ఆ Ā ఇ I ఈ Ī ఉ U ఊ Ū ఋ Ŗ ౠ Ŗ: ఌ Ļ ౡ Ļ: ఎ E ఏ Ē ఐ Æ ఒ O ఓ Ō ఔ Å క K ఖ K‛ గ G ఘ G‛ ఙ Ŋ చ C ఛ C‛ జ JJ‛ ఞ Ñ ట TT‛ డ DD‛ ణ N త T థ T‛ ద D ధ D‛ న N ప P ఫ P‛ బ B భ B‛ మ M య J ర R ఱ R ల L ళ L వ V శ Ś ష S స S హ H  ం ~ ః ’

― Tamil: அ A ஆ Ā இ I ஈ Ī உ U ஊ Ū எ E ஏ Ē ஐ Æ ஒ O ஓ Ō ஔ Å க K ங Ŋ ச C ஞ Ñ ட TN த T ந N ப P ம M ய J ர R ல L வ V ழ ZL ற Ř ன Ň (ஜ J ஷ Ś ஸ S ஹ H)

― Khmer: ឣ A ឤ Ā ឥ I ឦ Ī ឧ U (ឨ UK ឩ Ū´) ឪ Ū ឫ Ŗ ឬ Ŗ: ឭ Ļ ឮ Ļ: ឯ Ē ឰ Æ ឱ O ឲ Ō ឳ Å ក K ខ K‛ គ G ឃ G‛ ង Ŋ ច C ឆ C‛ ជ JJ‛ ញ Ñ ដ TT‛ ឌ DD‛ ណ N ត T ថ T‛ ទ D ធ D‛ ន N ប P ផ P‛ ព B ភ B‛ ម M យ J រ R ល L វ V ឝ Ś ឞ S ស S ហ H ឡ L អ Ɔ

― Thai: อะ A อั A อา Ā อิ I อี Ī อึ I อื Ī อุ U อู Ū [อฺ ] อเ Ē อแ E อโ Ō อใ Æ อไ Æ ก K ข K‛ ฃ K‛´ ค G ฅ G´ ฆ G‛ ง Ŋ จ C ฉ C‛ ช JJ´ ฌ J‛ ญ Ñ ฎ 'DTT‛ ฑ DD‛ ณ N ด 'D ต T ถ T‛ ท D ธ D‛ น N บ 'B ป P ผ P‛ ฝ P‛´ พ B ฟ B´ ภ B‛ ม M ย J ร R ฤ Ŗ ล L ฦ Ļ ว Ŭ ศ Ś ษ S ส S ห H ฬ L อ ’ ฮ H อํ ~

― Lao: ອະ A ອັ A ອາ Ā ອິ I ອີ Ī ອຶ I ອື Ī ອຸ U ອູ Ū [ອົ ອຼ ] ອຽ IA ອເ Ē ອແ E ອໂ Ō ອໃ Æ ອໄ Æ [ອໆ ] ກ K ຂ K‛ ຄ G ງ Ŋ ຈ C ຊ J ຍ Ñ ດ 'D ຕ T ຖ T‛ ທ D ນ N ບ 'B ປ P ຜ P‛ ຝ P⁣´ ພ B ຟ B´ ມ M ຢ J ຣ R ລ L ວ Ŭ ສ S ຫ H ອ ’ ຮ H ອໍ ~

― Burmese: က K K‛ G G‛ Ŋ C C‛ J Jဉ/ည Ñ T T D D N T T‛ D D‛ N P P‛ B B‛ M J R L Ŭ S H L ’.

― Tibetan: ཨ A ཨི I ཨུ U ཨེ E ཨོ O ཀ K ཁ K‛ ག G ང Ŋ ཅ C ཆ C‛ ཇ J ཉ Ñ (ཊ TT‛ ཌ DN) ཏ T ཐ T‛ ད D ན N པ P ཕ P‛ བ B མ M ཙ SS‛ ཛ Z ཝ Ŭ ཞ Ž ཟ Z འ ’ ཡ J ར R ལ L ཤ Ś (ཥ S) ས S ཧ H .

― Thaana: އަ A އާ Ā އި I އީ Ī އު U އޫ Ū އެ E އޭ Ē އޮ O އޯ Ō ހ H ށ S ނ N ރ R ބ B ޅ L ކ K އ Ɔ ވ V މ M ފ F ދ D ތ T ލ L ގ G ޏ Ñ ސ S ޑ D ޒ Z ޓ T ޔ J ޕ P ޖ Ž ޗ Č ( ޘ Þ ޙ H ޚ X ޛ Ð ޜ Ž ޝ Š ޞ [S] ޟ [D] ޠ [T] ޡ Ð ޢ C ޣ Ğ ޤ Q ޥ Ŭ ޱ N). Thaana is partially rooted in Brahmi (and is used for an Indic dialect), but also borrows heavily from Arabic.

Japanese: ア A イ I ウ U エ E オ O カ K ガ G サ S ザ Z タ T ダ D ナ N ハ H バ B パ P マ M ヤ J ラ R ワ Ŭ. Kana are transliterated, as any syllabary, directly, including for long vowels, based on a phonemic transcription. ( tu ) stands for small ッ TU when marking a geminate consonant, again corresponding to kana, as well as extended pronunciation. Kanji are transcribed phonemically, as kana above, with separate blocks for each character.

Korean: ᅡ A ᅣ JA ᅥ E ᅧ JE ᅩ O ᅭ JO ᅮ U ᅲ JU ᅳ U ᅵ I ᄀ K ᄂ N ᄃ T ᄅ L ᄆ M ᄇ P ᄉ S ᄋ Ø/Ŋ ᄌ C ᄎ C‛ ᄏ K‛ ᄐ T‛ ᄑ P‛ ᄒ H. Since the han kul is always transliterated with cluster (syllable) breaks, the elements of the han kul are transliterated separately, yielding ᅢ AI ᅤ JAI ᅦ EI ᅨ JEI ᅪ OA ᅫ OAI ᅬ OI ᅯ UE ᅰ UEI ᅱ UI ᅴ UI ᄁ KK ᄄ TT ᄈ PP ᄊ SS ᄍ CC, noting that ᅬ OI is rendered [ø] or [we], and ᅱ UI [y] or [wi].

Armenian: Աա A Բբ B Գգ G Դդ D Եե E Զզ Z Էէ Ē Ըը Ə Թթ T‛ Ժժ Ž Իի I Լլ L Խխ X Ծծ C Կկ K Հհ H Ձձ Z Ղղ Ğ Ճճ Č Մմ M Յյ J Նն N Շշ Š Ոո O Չչ Č‛ Պպ P Ջջ Ž Ռռ Ř Սս S Վվ V Տտ T Րր R Ցց C‛ Ււ U Փփ P‛ Քք K‛ Օօ Ō Ֆֆ F և eu. The eastern pronunciation has been used as a standard.

Georgian: ა A ბ B გ G დ D ე E ვ V ზ Z თ T‛ ი I კ K ლ L მ M ნ N ო O პ P ჟ Ž რ R ს S ტ T უ U ფ P‛ ქ K‛ ღ Ğ ყ Q შ Š ჩ Č‛ ც C‛ ძ Z წ C ჭ Č ხ X ჯ Ž ჰ H ჱ Ē ჲ J ჳ ŬI ჴ Q‛ ჵ Ō ჶ F.

Other conventions used across systems: « ‛ » indicates aspiration, and « ~ » nasalization; « Ɔ » represents [ʔ], « J » represents [j], « Ŭ » represents [w], « X » represents [x], « Z » represents [dz], « Ž » represents [dʒ], « Þ » represents [θ], and « Ð » represents [ð]; due to typographical restrictions, underscoring « _ » replaces a subscript dot.

In contexts where the Hellenistic scripts are transliterated, the following system is used:

Greek: Α-A Β-B Γ-G/Ŋ Δ-D Ε-E Ζ-Z Η-Ē/H Θ-T‛ Ι-I Κ-K Λ-L Μ-M Ν-N Ξ-KS Ο-O Π-P Ρ-R Σ-S Τ-T Υ-U Φ-P‛ Χ-K‛ Ψ-PS Ω-Ō

― Cyrillic: А-A Б-B В-V Г-G Ґ-Ġ Д-D Ђ-Ð Ѓ-Ǵ Е-E Ё-JO Є-É Ж-Ž Ѕ-Z З-Z Ӡӡ-Z И-I Ѳ-T‛ І-Í Ї- Й-Ĭ Ј-J К-K Ҝ-Ġ Л-L Љ-Ĺ М-M Н-N Њ-Ń Ң-Ŋ Ѯ-KS О-O Ө-Ö П-P Р-R С-S Т-T Ћ-Ć Ќ-Ḱ У-U Ў-Ŭ Ұ-U Ү-Ü Ф-F Х-X Һ-H Ѱ-PS Ѡ-Ō Ц-C Ч-Č Ҹ-Ž Џ-Ž Ш-Š Щ-Ŝ Ъ-Ə Ы-Y Ь-Ĵ Ѣ-Ě Э-È Ю-JU Я-JA Ѧ- Ѩ-J Ѫ-Õ Ѭ- Ѵ-Ý Ѥ- Ғ-Ğ Ӣ-Ī Қ-Q Ӯ-Ū Ҳ-H Ҷ-Ž.

The stewardship site uses numerical Unicode. Most of the characters used in this site are available in Unicode fonts. In all cases of unfamiliar (non-English) words, a translation into the standard English form will appear if the cursor is placed over the word. In many cases, I have been forced to use the only form available to me, already transliterated. In some cases, I have even retransliterated into what I suppose to be the original. Corrections to these usages are sought, and should be sent to the Stewardship Project, project the-stewardship.org.

 

 

|SITE MAP|STEWARDSHIP|UNION|PROJECT|EARTH|POLICIES|ESSAYS|RESEARCH|