💾 Archived View for spam.works › mirrors › textfiles › programming › ascii.ipa captured on 2023-06-16 at 20:08:19.
-=-=-=-=-=-=-
Newsgroups: sci.lang,alt.usag.english From: evan@hpl.hp.com (Evan Kirshenbaum) Subject: Final(?) Draft of ASCII/IPA Representation Message-ID: <1992Dec23.195025.27611@hplabsz.hpl.hp.com> Date: Wed, 23 Dec 1992 19:50:25 GMT Reply-To: kirshenbaum@hpl.hp.com References: <1992Dec23.194326.27000@hplabsz.hpl.hp.com> Organization: Hewlett-Packard Laboratories Lines: 498 This article describes a standard scheme for representing IPA transcriptions in ASCII for use in Usenet articles and email. The following guidelines were kept in mind: o It should be usable for both phonemic and narrow phonetic transcription. o It should be possible to represent *all* symbols and diacritics in the IPA. o The previous guideline notwithstanding, it is expected that (as in the past) most use will be in transcribing English, so where tradeoffs are necessary, decisions should be made in favor of ease of representation of phonemes which are common in English. o The representation should be readable. o It should be possible to mechanically translate from the representation to a character set which includes IPA. The reverse would also be nice. In order to be able to represent a wide range of segments while making common segments easy to type, we allow more than one representation for a given segment. Each segment has an "explicit" representation, which is a set of features between curly braces ("{" and "}"). Each feature is represented as a three letter abbreviation taken from a standardized set. The phoneme /b/ (a voiced, bilabial stop) could be represented as /{vcd,blb,stp}/. A first cut at the feature set appears in appendix A below. The word "tag" could thus be represented phonemically as /{vls,alv,stp}{low,fnt,unr,vwl}{vcd,vel,stp}/ and phonetically as [{vls,asp,alv,stp}{low,fnt,lng,unr,vwl}{unx,vcd,vel,stp}] This works, but it's a bit of a pain. To simplify transcription, we allow an "implicit" representation for a segment which consists of a (generally alphabetic) symbol followed by diacritics. Thus /b/ stands for /{vcd,blb,stp}/. Case is significant (/n/ and /N/ are different segments). The segment symbols are given in appendix B below. The word "tag" can thus be represented phonemically as /t&g/ The diacritics for a segment are represented between angle brackets ("<" and ">") and consist of symbols or features. (In the common case where the diacritic symbol is a single character which does not encode a segment, the brackets may be removed.) The features which the diacritics map to override those of the segment. The word "tag" thus becomes narrowly [t<asp>&<lng>g<unx>] or [t<h>&<:>g<o>] or [t<h>&:g<o>] Some diacritic symbols encode more than one feature set. Which one is meant should be apparent from context. For example, "." stands for "{rnd}" when attached to a vowel, but "{rfx}" when attached to a consonant. Clicks are common to many languages (especially in Africa), but there is no IPA diacritic that means "click". Rather than use up several characters for clicks (which are infrequent in the languages most often discussed), we instead use the diacritic "!" after the homorganic unvoiced stop. Thus /t!/ (= /t<clk>/ = /{alv,clk}/) is the sound commonly written "tsk" and used in English to show disapproval. The complete set of diacritic symbols appears in appendix C below. Appendices D and E contain representations of segments more or less ordered by feature (appendix D in tabular form, appendix E as a list). Appendix F contains a list of all of the ASCII characters and the uses they have been pressed to. For transcription of any specific language a group can by convention alter the character mappings (as an example, for Spanish /R/ may be better used to represent /{alv,trl}/ than /{mid,cnt,rzd,vwl}/). An author may also press a little used symbol (for the language under consideration) into service to highlight a distinction. Such an alteration should be made explicitly to avoid confusion. The diacritics "+" and "=" and the segment symbols "$" and "%" are explicitly left unspecified so that they can be used to mark language-specific features (that are otherwise cumbersome to mark). Such symbols can be assigned either by convention for a specific language or in an ad-hoc manner by an individual author. Stress marks are prepended to the syllable they attach to. "'" signals primary stress, "," signals secondary stress. Spaces should be employed to separate words (cliticized words may be written unseparated). When discussing single words, it may be helpful to insert a space before each syllable that doesn't carry a suprasegmental marker. The "I hear the secretary" for an American might be something like /aI hir D@ 'sEkrI,t&ri/ while to an Englishman it might be more like /aI hi@ DI 'sEkr^tri/ Transcribing tone is harder. Here's an attempt. For register tone languages (e.g., Hausa, Navajo), numbers should be used with one being the lowest. Thus in Navajo, "1" is low tone and "2" is high. In Yoruba "1" is low, "2" is mid, and "3" is high. The language's "default" tone need not be specified. For contour tone languages (e.g., Mandarin, Thai), there is generally a numeric system in place (Mandarin: "1" is high, "2" is rising, "3" is falling rising, "4" is falling). The tone indication should follow the syllable (vowel?). The symbol "#" is used to represent a syllable or word boundary. Appendix A. Feature Abbreviations ---------------------------------- vcd voiced nas nasal fnt front vls voiceless orl oral cnt center apr approximant bck back blb bilabial vwl vowel lbd labio-dental lat lateral unr unrounded dnt dental ctl central rnd rounded alv alveolar trl trill rfx retroflex flp flap asp aspirated pla palato-alveolar clk click unx unexploded pal palatal ejc ejective syl syllabic vel velar imp implosive mrm murmured lbv labio-velar lng long uvl uvular hgh high vzd velarized phr pharyngeal smh semi-high lzd labialized glt Glottal umd upper-mid pzd palatalized mid mid rzd rhoticized stp stop lmd lower-mid nzd nasalized frc fricative low low fzd pharyngealized Appendix B. Segment Symbols ---------------------------- This table lists the symbol, the associated feature set, and the Unicode character code and name for the corresponding IPA character. In some cases (e.g., /I/) there are multiple IPA characters in use for the segment. I have listed both. In some cases (e.g. /j/), the IPA symbol seems to be ambiguous (generally between an approximant and the homorganic voiced fricative). The entries marked with "??" are those that I am least sure of. When I have listed more than one possibility for a symbol, the first is my current preference. a {low,cnt,unr,vwl} U+0061 LATIN SMALL LETTER A b {vcd,blb,stp} U+0062 LATIN SMALL LETTER B c {vls,pal,stp} U+0063 LATIN SMALL LETTER C d {vcd,alv,stp} U+0064 LATIN SMALL LETTER D e {umd,fnt,urd,vwl} U+0065 LATIN SMALL LETTER E f {vls,lbd,frc} U+0066 LATIN SMALL LETTER F g {vcd,vel,stp} U+0067 LATIN SMALL LETTER G U+0261 LATIN SMALL LETTER SCRIPT G h {glt,apr} U+0068 LATIN SMALL LETTER H i {hgh,fnt,unr,vwl} U+0069 LATIN SMALL LETTER I j {pal,apr}/{vcd,pal,frc} U+006A LATIN SMALL LETTER J k {vls,vel,stp} U+006B LATIN SMALL LETTER K l {vcd,alv,lat} U+006C LATIN SMALL LETTER L m {blb,nas} U+006D LATIN SMALL LETTER M n {alv,nas} U+006E LATIN SMALL LETTER N o {umd,bck,rnd,vwl} U+006F LATIN SMALL LETTER O p {vls,blb,stp} U+0070 LATIN SMALL LETTER P q {vls,uvl,stp} U+0071 LATIN SMALL LETTER Q r {alv,apr} U+0279 LATIN SMALL LETTER TURNED R s {vls,alv,frc} U+0073 LATIN SMALL LETTER S t {vls,alv,stp} U+0074 LATIN SMALL LETTER T u {hgh,bck,rnd,vwl} U+0075 LATIN SMALL LETTER U v {vcd,lbd,frc} U+0076 LATIN SMALL LETTER V w {lbv,apr}/{vcd,lbv,frc} U+0077 LATIN SMALL LETTER W x {vls,vel,frc} U+0078 LATIN SMALL LETTER X y {hgh,fnt,rnd,vwl} U+0079 LATIN SMALL LETTER Y z {vcd,alv,frc} U+007A LATIN SMALL LETTER Z A {low,bck,unr,vwl} U+0251 LATIN SMALL LETTER SCRIPT A B {vcd,blb,frc} U+03B2 GREEK SMALL LETTER BETA C {vls,pal,frc} U+00E7 LATIN SMALL LETTER C CEDILLA D {vcd,dnt,frc} U+00F0 LATIN SMALL LETTER ETH E {lmd,fnt,unr,vwl} U+025B LATIN SMALL LETTER EPSILON F -- Unused -- G {vcd,uvl,stp} U+0262 LATIN LETTER SMALL CAPITAL G H {vls,phr,frc} U+0127 LATIN SMALL LETTER H BAR I {smh,fnt,unr,vwl} U+026A LATIN LETTER SMALL CAPITAL I U+0269 LATIN SMALL LETTER IOTA J {vcd,pal,stp} U+025F LATIN SMALL LETTER DOTLESS J BAR K -- Unused -- L {vcd,vel,lat} ?? U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE U+029F LATIN LETTER SMALL CAPITAL L {vls,alv,lat,frc} ?? U+026C LATIN SMALL LETTER L BELT M {lbd,nas} U+0271 LATIN SMALL LETTER M HOOK N {vel,nas} U+014B LATIN SMALL LETTER ENG O {lmd,bck,rnd,vwl} U+0254 LATIN SMALL LETTER OPEN O P {vls,blb,frc} U+03A6 GREEK CAPITAL LETTER PHI Q {vcd,vel,frc} U+0263 LATIN SMALL LETTER GAMMA R {mid,cnt,rzd,vwl} ?? U+025A LATIN SMALL LETTER SCHWA HOOK {alv,trl} ?? U+0280 LATIN LETTER SMALL CAPITAL R S {vls,pla,frc} U+0283 LATIN SMALL LETTER ESH T {vls,dnt,frc} U+03B8 GREEK SMALL LETTER THETA U {smh,bck,rnd,vwl} U+028A LATIN SMALL LETTER UPSILON U+0277 LATIN SMALL LETTER CLOSED OMEGA V {lmd,bck,unr,vwl} U+028C LATIN SMALL LETTER TURNED V W {lmd,fnt,rnd,vwl} ?? U+0153 LATIN SMALL LETTER O E X {vls,uvl,frc} U+03C7 GREEK SMALL LETTER CHI Y {umd,fnt,rnd,vwl} ?? U+00F8 LATIN SMALL LETTER O SLASH {lmd,fnt,rnd,vwl} ?? U+0153 LATIN SMALL LETTER O E Z {vcd,pla,frc} U+0292 LATIN SMALL LETTER YOGH ? {glt,stp} U+0294 LATIN SMALL LETTER GLOTTAL STOP @ {mid,cnt,unr,vwl} U+0259 LATIN SMALL LETTER SCHWA & {low,fnt,unr,vwl} U+00E6 LATIN SMALL LETTER A E