Skip to contents

These are two dataframes that contain conventional phonetic features of the consonants and vowels used by CMU phonetic alphabet.

Usage

data_features_consonants

data_features_vowels

Format

An object of class tbl_df (inherits from tbl, data.frame) with 24 rows and 10 columns.

An object of class tbl_df (inherits from tbl, data.frame) with 17 rows and 12 columns.

Details

Most of the features are self-evident and definitional. For example, /p/ is the bilabial voiceless stop. For fuzzier features, I consulted the general IPA chart and the Wikipedia page on English phonology. These issues included things like: what are the lax vowels again? or the last two rows of the consonant tables are approximants, so /r,l,j/ are approximants.

Some features have alternative feature sets in order to accommodate degrees of aggregation. For example, /r,l,j,w/ are approximant in manner but divided into liquid and glide in manner_alt.

Consonants

data_features_consonants is a dataframe with 24 rows and 10 variables.

knitr::kable(data_features_consonants)

phonecmubetwiscbetvoicingmannermanner_altplaceplace_fctsonorancesonorance_alt
pPpvoicelessstopstoplabiallabialobstruentobstruent
bBbvoicedstopstoplabiallabialobstruentobstruent
tTtvoicelessstopstopalveolaralveolarobstruentobstruent
dDdvoicedstopstopalveolaralveolarobstruentobstruent
kKkvoicelessstopstopvelarvelarobstruentobstruent
gGgvoicedstopstopvelarvelarobstruentobstruent
CHtshvoicelessaffricateaffricatepostalveolarpostalveolarobstruentstrident
JHdzhvoicedaffricateaffricatepostalveolarpostalveolarobstruentstrident
mMmvoicednasalnasallabiallabialsonorantsonorant
nNnvoicednasalnasalalveolaralveolarsonorantsonorant
ŋNGngvoicednasalnasalvelarvelarsonorantsonorant
fFfvoicelessfricativefricativelabiodentallabiodentalobstruentstrident
vVvvoicedfricativefricativelabiodentallabiodentalobstruentstrident
θTHthvoicelessfricativefricativedentaldentalobstruentobstruent
ðDHdhvoicedfricativefricativedentaldentalobstruentobstruent
sSsvoicelessfricativefricativealveolaralveolarobstruentstrident
zZzvoicedfricativefricativealveolaralveolarobstruentstrident
ʃSHshvoicelessfricativefricativepostalveolarpostalveolarobstruentstrident
ʒZHzhvoicedfricativefricativepostalveolarpostalveolarobstruentstrident
hHHhvoicelessfricativefricativeglottalglottalobstruentobstruent
lLlvoicedapproximantliquidalveolaralveolarsonorantsonorant
rRrvoicedapproximantliquidpostalveolarpostalveolarsonorantsonorant
wWwvoicedapproximantglidelabiovelarNAsonorantsonorant
jYjvoicedapproximantglidepalatalpalatalsonorantsonorant

Description of each column:

phone

phone in IPA

cmubet

phone in the CMU alphabet

wiscbet

phone in an older system used by our lab

voicing

voiced versus voiceless

voicing_alt

spread_glottis versus plain

manner

manner of articulation

manner_alt

alternative manner coding that separates approximants into liquids and glides

place

place of articulation

place_fct

place coded as a factor and ordered based on frontness of the articulators. labiovelar is recoded as NA.

sonorance

obstruent versus sonorant

sonorance_alt

obstruant versus sonorant versus strident.

Levels of the factor columns:

data_features_consonants |>
  lapply(levels) |>
  Filter(length, x = _)
#> $place_fct
#> [1] "labial"       "labiodental"  "dental"       "alveolar"     "postalveolar"
#> [6] "palatal"      "velar"        "glottal"

Considerations about consonant features

The CMU alphabet does not include a glottal stop.

Here /f,v/ are coded as strident following Wikipedia and Sound Pattern of English. If this feature value doesn't seem right, we should probably use an alternative feature of sibilant for the stridents minus /f,v/.

The alternative voicing scheme was suggested by a colleague because of how the voice-voiceless phonetic contrast is achieved with different articulatory strategies in different languages. Note that voicing_alt does not assign a feature to nasals or approximants.

Vowels

data_features_vowels is a dataframe with 17 rows and 11 variables.

knitr::kable(data_features_vowels)

phonecmubetwiscbethintmannermanner_alttensenessheightheight_fctbacknessbackness_fctrounding
iIYibeatvowelvoweltensehighhighfrontfrontunrounded
ɪIHIbitvowelvowellaxhighhighfrontfrontunrounded
EYeIbaitvowelvoweltensemidmidfrontfrontunrounded
ɛEHEbetvowelvowellaxmidmidfrontfrontunrounded
æAEaebatvowelvowellaxlowlowfrontfrontunrounded
ʌAH^butvowelvowellaxmidmidcentralcentralunrounded
əAH4commavowelvowellaxmidmidcentralcentralunrounded
uUWubootvowelvoweltensehighhighbackbackrounded
ʊUHUbookvowelvowellaxhighhighbackbackrounded
OWoUboatvowelvoweltensemidmidbackbackrounded
ɔAOcboughtvowelvoweltensemidmidbackbackrounded
ɑAA@botvowelvoweltenselowlowbackbackunrounded
AW@UboutvoweldiphthongdiphthongdiphthongNAdiphthongNAdiphthong
AY@IbitevoweldiphthongdiphthongdiphthongNAdiphthongNAdiphthong
ɔɪOYcIboyvoweldiphthongdiphthongdiphthongNAdiphthongNAdiphthong
ɝER3^lettervowelr-coloredr-coloredmidmidcentralcentralr-colored
ɚER4^burtvowelr-coloredr-coloredmidmidcentralcentralr-colored

Description of each column:

phone

phone in IPA

cmubet

phone in the CMU alphabet

wiscbet

phone in an older system used by our lab

hint

a word containing the selected vowel

manner

manner of articulation

manner_alt

alternative manner with vowel, diphthong and r-colored

tenseness

tense versus lax (versus diphthong and r-colored)

height

vowel height

height_fct

height coded as a factor ordered high, mid, low. diphthong is recoded to NA.

backness

vowel backness

backness_fct

backness coded as a factor ordered front, central, back. diphthong is recoded to NA.

rounding

unrounded versus rounded (versus diphthong and r-colored)

Levels of the factor columns:

data_features_vowels |>
  lapply(levels) |>
  Filter(length, x = _)
#> $height_fct
#> [1] "high" "mid"  "low" 
#> 
#> $backness_fct
#> [1] "front"   "central" "back"

Considerations about vowel features

I don't consider /eɪ/ and /oʊ/ to be diphthongs, but perhaps manner_alt could encode the difference of these vowels from the others.

In the CMU alphabet and ARPAbet, vowels can include a number to indicate vowel stress, so AH1 or AH2 is /ʌ/ but AH0 is /ə/.

The vowel features for General American English, according to Wikipedia, are as follows:

Front Central Back
lax tense lax tense lax tense
Close ɪ i ʊ u
Mid ɛ ə (ɜ)
Open æ (ʌ) ɑ (ɔ)
Diphthongs aɪ ɔɪ aʊ

I adapted these features in this way:

  • tense and lax features were directly borrowed. Diphthongs and r-colored vowels are were not assign a tenseness.

  • /ʌ,ɔ/ raised to mid (following the general IPA chart)

  • /ɑ/ moved to back (following the general IPA)

  • diphthongs have no backness or height

  • r-colored vowels were given the backness and height of the /ʌ,ə/

Based on the assumption that /ʌ,ə/ are the same general vowel with differing stress, these vowels have the same features. This definition clashes with the general IPA chart which places /ʌ/ as a back vowel. However, /ʌ/ is a conventional notation. Quoting Wikipedia again: "Although the notation /ʌ/ is used for the vowel of STRUT in RP and General American, the actual pronunciation is closer to a near-open central vowel [ɐ] in RP and advanced back [ʌ̟] in General American." That is, /ʌ/ is fronted in American English (hence, mid) in American English.