These are two dataframes that contain conventional phonetic features of the consonants and vowels used by CMU phonetic alphabet.
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 24 rows and 10 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 17 rows and 12 columns.
Details
Most of the features are self-evident and definitional. For example, /p/ is the bilabial voiceless stop. For fuzzier features, I consulted the general IPA chart and the Wikipedia page on English phonology. These issues included things like: what are the lax vowels again? or the last two rows of the consonant tables are approximants, so /r,l,j/ are approximants.
Some features have alternative feature sets in order to accommodate degrees
of aggregation. For example, /r,l,j,w/ are approximant in manner
but divided into liquid and glide in manner_alt
.
Consonants
data_features_consonants
is a dataframe with 24 rows and
10 variables.
knitr::kable(data_features_consonants)
phone | cmubet | wiscbet | voicing | manner | manner_alt | place | place_fct | sonorance | sonorance_alt |
p | P | p | voiceless | stop | stop | labial | labial | obstruent | obstruent |
b | B | b | voiced | stop | stop | labial | labial | obstruent | obstruent |
t | T | t | voiceless | stop | stop | alveolar | alveolar | obstruent | obstruent |
d | D | d | voiced | stop | stop | alveolar | alveolar | obstruent | obstruent |
k | K | k | voiceless | stop | stop | velar | velar | obstruent | obstruent |
g | G | g | voiced | stop | stop | velar | velar | obstruent | obstruent |
tʃ | CH | tsh | voiceless | affricate | affricate | postalveolar | postalveolar | obstruent | strident |
dʒ | JH | dzh | voiced | affricate | affricate | postalveolar | postalveolar | obstruent | strident |
m | M | m | voiced | nasal | nasal | labial | labial | sonorant | sonorant |
n | N | n | voiced | nasal | nasal | alveolar | alveolar | sonorant | sonorant |
ŋ | NG | ng | voiced | nasal | nasal | velar | velar | sonorant | sonorant |
f | F | f | voiceless | fricative | fricative | labiodental | labiodental | obstruent | strident |
v | V | v | voiced | fricative | fricative | labiodental | labiodental | obstruent | strident |
θ | TH | th | voiceless | fricative | fricative | dental | dental | obstruent | obstruent |
ð | DH | dh | voiced | fricative | fricative | dental | dental | obstruent | obstruent |
s | S | s | voiceless | fricative | fricative | alveolar | alveolar | obstruent | strident |
z | Z | z | voiced | fricative | fricative | alveolar | alveolar | obstruent | strident |
ʃ | SH | sh | voiceless | fricative | fricative | postalveolar | postalveolar | obstruent | strident |
ʒ | ZH | zh | voiced | fricative | fricative | postalveolar | postalveolar | obstruent | strident |
h | HH | h | voiceless | fricative | fricative | glottal | glottal | obstruent | obstruent |
l | L | l | voiced | approximant | liquid | alveolar | alveolar | sonorant | sonorant |
r | R | r | voiced | approximant | liquid | postalveolar | postalveolar | sonorant | sonorant |
w | W | w | voiced | approximant | glide | labiovelar | NA | sonorant | sonorant |
j | Y | j | voiced | approximant | glide | palatal | palatal | sonorant | sonorant |
Description of each column:
- phone
phone in IPA
- cmubet
phone in the CMU alphabet
- wiscbet
phone in an older system used by our lab
- voicing
voiced versus voiceless
- voicing_alt
spread_glottis versus plain
- manner
manner of articulation
- manner_alt
alternative manner coding that separates approximants into liquids and glides
- place
place of articulation
- place_fct
place coded as a factor and ordered based on frontness of the articulators. labiovelar is recoded as
NA
.- sonorance
obstruent versus sonorant
- sonorance_alt
obstruant versus sonorant versus strident.
Levels of the factor columns:
Considerations about consonant features
The CMU alphabet does not include a glottal stop.
Here /f,v/ are coded as strident following Wikipedia and Sound Pattern of English. If this feature value doesn't seem right, we should probably use an alternative feature of sibilant for the stridents minus /f,v/.
The alternative voicing scheme was suggested by a colleague because of how
the voice-voiceless phonetic contrast is achieved with different articulatory
strategies in different languages. Note that voicing_alt
does not assign
a feature to nasals or approximants.
Vowels
data_features_vowels
is a dataframe with 17 rows and
11 variables.
knitr::kable(data_features_vowels)
phone | cmubet | wiscbet | hint | manner | manner_alt | tenseness | height | height_fct | backness | backness_fct | rounding |
i | IY | i | beat | vowel | vowel | tense | high | high | front | front | unrounded |
ɪ | IH | I | bit | vowel | vowel | lax | high | high | front | front | unrounded |
eɪ | EY | eI | bait | vowel | vowel | tense | mid | mid | front | front | unrounded |
ɛ | EH | E | bet | vowel | vowel | lax | mid | mid | front | front | unrounded |
æ | AE | ae | bat | vowel | vowel | lax | low | low | front | front | unrounded |
ʌ | AH | ^ | but | vowel | vowel | lax | mid | mid | central | central | unrounded |
ə | AH | 4 | comma | vowel | vowel | lax | mid | mid | central | central | unrounded |
u | UW | u | boot | vowel | vowel | tense | high | high | back | back | rounded |
ʊ | UH | U | book | vowel | vowel | lax | high | high | back | back | rounded |
oʊ | OW | oU | boat | vowel | vowel | tense | mid | mid | back | back | rounded |
ɔ | AO | c | bought | vowel | vowel | tense | mid | mid | back | back | rounded |
ɑ | AA | @ | bot | vowel | vowel | tense | low | low | back | back | unrounded |
aʊ | AW | @U | bout | vowel | diphthong | diphthong | diphthong | NA | diphthong | NA | diphthong |
aɪ | AY | @I | bite | vowel | diphthong | diphthong | diphthong | NA | diphthong | NA | diphthong |
ɔɪ | OY | cI | boy | vowel | diphthong | diphthong | diphthong | NA | diphthong | NA | diphthong |
ɝ | ER | 3^ | letter | vowel | r-colored | r-colored | mid | mid | central | central | r-colored |
ɚ | ER | 4^ | burt | vowel | r-colored | r-colored | mid | mid | central | central | r-colored |
Description of each column:
- phone
phone in IPA
- cmubet
phone in the CMU alphabet
- wiscbet
phone in an older system used by our lab
- hint
a word containing the selected vowel
- manner
manner of articulation
- manner_alt
alternative manner with vowel, diphthong and r-colored
- tenseness
tense versus lax (versus diphthong and r-colored)
- height
vowel height
- height_fct
height coded as a factor ordered high, mid, low. diphthong is recoded to
NA
.- backness
vowel backness
- backness_fct
backness coded as a factor ordered front, central, back. diphthong is recoded to
NA
.- rounding
unrounded versus rounded (versus diphthong and r-colored)
Levels of the factor columns:
Considerations about vowel features
I don't consider /eɪ/ and /oʊ/ to be diphthongs, but perhaps manner_alt
could encode the difference of these vowels from the others.
In the CMU alphabet and ARPAbet, vowels can include a number to indicate
vowel stress, so AH1
or AH2
is /ʌ/ but AH0
is /ə/.
The vowel features for General American English, according to Wikipedia, are as follows:
Front | Central | Back | ||||
---|---|---|---|---|---|---|
lax | tense | lax | tense | lax | tense | |
Close | ɪ | i | ʊ | u | ||
Mid | ɛ | eɪ | ə | (ɜ) | oʊ | |
Open | æ | (ʌ) | ɑ | (ɔ) | ||
Diphthongs | aɪ ɔɪ aʊ |
I adapted these features in this way:
tense and lax features were directly borrowed. Diphthongs and r-colored vowels are were not assign a tenseness.
/ʌ,ɔ/ raised to mid (following the general IPA chart)
/ɑ/ moved to back (following the general IPA)
diphthongs have no backness or height
r-colored vowels were given the backness and height of the /ʌ,ə/
Based on the assumption that /ʌ,ə/ are the same general vowel with differing stress, these vowels have the same features. This definition clashes with the general IPA chart which places /ʌ/ as a back vowel. However, /ʌ/ is a conventional notation. Quoting Wikipedia again: "Although the notation /ʌ/ is used for the vowel of STRUT in RP and General American, the actual pronunciation is closer to a near-open central vowel [ɐ] in RP and advanced back [ʌ̟] in General American." That is, /ʌ/ is fronted in American English (hence, mid) in American English.