☢ Phones & Pronunciation ☢
☆ General Info & Tips ☆
This page goes over the list of phones included in the prewritten lists, as well as some that might be useful for others to include in their own voicebanks. The symbols used are in X-SAMPA, with minor adjustments for compatibility with UTAU and Windows file name restrictions, and generally correspond to contemporary vocals sung in General American English.
The reclists use phonemic transcription where ever possible; phonetic transcription is reserved for cases in which an allophonic distinction is beneficial for more natural synthesis. For example, the velarized lateral approximant [5]
is only used in medial CVs and transitional VCs where it's distinction from the non-velar lateral approximant [l]
is relevant. Elsewhere, [l]
is used regardless of how the phoneme may surface. Likewise, clusters such as /str/ will still be transcribed as [str]
even if they surface phonetically as [ʃ͜tʃɹ].
Because of this, the actual sound you're uttering may not match the X-SAMPA character used to represent it. Rather than trying to force a "correct" pronunciation, you should instead try to match that symbol to how you would naturally produce the sound in the example word. For consonants this is relatively straightforward, though speakers of outer-circle dialects and L2 speakers may have some variation. For vowels, it can be kind of tricky to nail down, so if you want to be certain of how you personally should produce each vowel in the reclist, you can try the vowel quality self-assessment.
Thus, the reclist should generally be compatible for speakers of different dialects and be able to accommodate more specific vocal styles. Essentially, as long as the voicebank is internally consistent, meaning a given symbol corresponds to the same phone throughout the voicebank, it should function just fine and produce naturalistic results.
That being said, the additional phones included in the FULL list may not be the most relevant to people with dialects that are quite different from the perceived General American. In the future, I would like to develop lists that are optimized for other English varieties, but for now, if you want a voicebank better tailored to your own idiolect, I recommend customizing the included phones, as long as all of the required phones (i.e. those in the LITE list) are still present.
Phone Distinction
If two sounds here are different and the phones are differentiated in the LITE core reclist:
They are separate phonemes and should both be included in your voicebank.
EX. FLEECE and THOUGHT are almost certainly going to be different vowels.
If two sounds here are different but the phones are not differentiated in the LITE core reclist:
They are either separate phonemes in your ideolect, they are allophones of one phoneme, or they exist in free variation. Either way, while not a requirement for the voicebank to function, including both may allow for more natural synthesis.
EX. [ɒ] is only present for some speakers, so someone with this vowel can choose whether or not to include it.
If two sounds here are the same and the phones are differentiated in the LITE core reclist:
They are considered separate phonemes by many native speakers, but they might not be for you, or they might sound so similar you can't tell the difference. I'd recommend still including both in your voicebank for ease-of-use, either as separate recordings or as duplicated lines with alternative aliases.
EX. If FLEECE and KIT both surface as [i], it can still be beneficial to distinguish them.
If two sounds here are the same and the phones are not differentiated in the LITE core reclist:
They do not need to be distinguished in your voicebank.
EX. If HAND and TRAP both surface as [æ], then you only need [{]
in your voicebank because [{~]
is an optional allophone. That is, unless you want to have duplicate samples for ease-of-use.
If you want to include any additional phones not covered by the symbols provided here, use their typical X-SAMPA character if possible, and try to match the notation style of the existing phones, such as using [h]
to denote aspiration.
TL;DR: Most voicebank variation will be in the inclusion of additional phones not in the LITE core list. Your voicebank should be able to function with a UST made using only the required samples, and a UST tailored to your voicebank generally shouldn't require excessive editing to be used with a different one.
☆ Vowels ☆
Note about Schwas:
Schwa, /ə/ [@]
, can be one of the hardest vowels to parse for someone with no phonetic ear/mouth training, as it is often the reduced form of other vowels, and is phonetically quick similar to the vowels in STRUT and FOOT. That said, if you are a native English speaker, you almost certainly produce this vowel in unstressed environments, and it's inclusion will help significantly with overpronunciation. The best tip for producing schwa intentionally I've found is to make the "laziest" vowel sound you can.
Note about LOT-THOUGHT (aka COT-CAUGHT) merging:
Even if you can't tell the difference between [O]
and [A]
, you are very likely still producing [O]
in some environments, so it is needed for the production of those words, especially in the abscense of extra vowels like [Or]
and [Ol]
.
Think of it like [Or]
without the [r]
, like [OI]
without the [I]
, or like a vowel halfway between [A]
and [oU]
.
Note about R-Coloured and L-Coloured vowels:
I'm going to use the terms "R-Coloured" and "L-Coloured" here for clarity, but it's worth mentioning that the technical terms for these are rhotic vowels (as a result of a following rhotic approximant) and velarized vowels (as a result of a following velarized lateral approximant).
If you speak a non-rhotic variety of English (i.e., you don't pronounce the <r> at the end of a syllable), you still have R-Coloured vowels, though they will surface as long vowels or as centralizing diphthongs. This is a lot of the reason I am using this terminology. R-Colouring resulting in long vowels like START /ɑ:/ likely won't sound different in singing, so they don't need to be included, but you might choose to still include centralizing dipthongs like that in NEAR /ɪə/, in which case it should still be notated as [Ir]
for ease-of-use.
Generally speaking, the ending consonants of these vowels are reduced when singing, so try to avoid overpronunciation for a more natural sound.
Required
X-SAMPA | IPA | Pronunciation | Notes |
---|---|---|---|
@ | ə | ABOUT / COMMA | Schwa; very central and very lax |
{ | æ | TRAP / BATH | Can be tense or lax |
A | ɑ | FATHER / LOT / THOUGHT / BATH | See note above |
E | ɛ | DRESS | |
I | ɪ | KIT | |
i | i | FLEECE | Might surface as a diphthong |
O | ɔ | THOUGHT / CLOTH / HAWK | See note above |
U | ʊ | FOOT | |
u | u | GOOSE | Might surface as a diphthong |
V | ʌ | STRUT | |
aI | aɪ | PRICE / PRIZE | |
eI | eɪ | FACE | Phonemically /e/ |
OI | ɔɪ | CHOICE | |
aU | aʊ | CLOUT / CLOUD | |
oU | oʊ | GOAT | Phonemically /o/ |
@r | ɚ / ɹ̩ | NURSE / LETTER | R-Coloured central vowel or syllabic /ɹ/; can be tense or lax |
@l | ̴ə / l̩ | CIRCLE | L-Coloured central vowel or syllabic /l/ |
Recommended for GenAm
Optional allophones included in the FULL list for more natural pronunciation.
X-SAMPA | IPA | Pronunciation | Notes |
---|---|---|---|
1 | ɨ | ROSES / TODAY | Centralized /i, ɪ/ or raised schwa |
a | a | PRICE | Reduced /aɪ/; /ɪ/ is weak or deleted |
{~ | ̃æ | HAND | Nasal /æ/ |
E~ | ̃ɛ | SANG | Nasal /e, ɛ/ |
I~ | ̃ɪ | SING | Nasal /i, ɪ/ |
Ar | ɑ˞ / ɑɹ | START | R-coloured /ɑ/ |
Er | ɛ˞ / ɛɹ | SQUARE | R-coloured /e, ɛ/ |
Ir | ɪ˞ / ɪɹ | NEAR | R-coloured /i, ɪ/ |
Or | ɔ˞ / ɔɹ | NORTH / FORCE | R-coloured /o, ɔ/ |
Al | ̴ɑ / ɑɫ | FALL | L-coloured /ɑ/ |
il | ̴i / iɫ | FEEL | L-Coloured /i/ |
Ol | ̴ɔ / ɔɫ | COLD | L-Coloured /ɔ/ |
ul | ̴u / uɫ | FOOL | L-Coloured /u/ |
mm | m̩ | RHYTHM | Syllabic /m/ |
nn | n̩ | BUTTON | Syllabic /n/ |
Other
Nonexhaustive list of vowels which are only applicable to non-American or specific regional dialects, are lower frequency by either dialect or production, or just aren't distinct enough for most speakers to warrant differentiation in the main lists.
It would be impossible for me to try and accomodate every possible vowel variation, so I strongly recommend researching your own dialect and testing your own vowels if you want to be precise.
Do not use these to replace required vowels! These are for differentiating these vowels from those already present in the reclist, for example using [3r]
as an additional symbol to represent a tense variant of [@r]
, not using it in place of.
X-SAMPA | IPA | Pronunciation | Notes |
---|---|---|---|
} | ʉ | GOOSE / CUE | Centralized /u, ʊ/ or raised + rounded schwa |
3 | ɜ | NURSE | If distinguishing non-rhotic from /ɚ/ and /ɜɹ/ |
e | e | FACE | Reduced /eɪ/; /ɪ/ is weak or deleted |
o | o | GOAT | Reduced /oʊ/; /ʊ/ is weak or deleted |
Q | ɒ | LOT / CLOTH / HAWK | Rounded /ɑ/ |
VI | ʌɪ | PRICE | Raised /aɪ/; if diff from PRIZE |
VU | ʌʊ | CLOUT | Raised /aʊ/; if diff from CLOUD |
aU~ | aʊ̃ | POUND | Nasal /aʊ/ |
3r | ɝ / ɜɹ | NURSE | If distinguishing the tense variant of /ɚ/ |
ar | a˞ / aɹ | FIRE / HOUR | Reduced + R-coloured /aɪ/ or /aʊ/ |
or | o˞ / oɹ | FORCE | R-Coloured /o/; if diff from NORTH |
Ur | ʊ˞ / ʊɹ | CURE | R-Coloured /ʊ/; if diff from NURSE |
aIr | aɪ˞ / aɪɹ | FIRE | R-coloured /aɪ/ |
aUr | aʊ˞ / aʊɹ | HOUR | R-coloured /aʊ/ |
{l | ̴æ / æɫ | SCALP | L-Coloured /æ/ |
al | ̴a / aɫ | WHILE / OWL | Reduced + L-Coloured /aɪ/ |
El | ̴ɛ / ɛɫ | SHELF | L-Coloured /ɛ/ |
el | ̴e / eɫ | WHALE | L-Coloured /e/ |
Il | ̴ɪ / ɪɫ | MILK | L-Coloured /ɪ/ |
Ul | ̴ʊ / ʊɫ | FULL | L-Coloured /ʊ/; if diff from COLD and CIRCLE |
aIl | a̴ɪ / aɪɫ | WHILE | L-coloured /aɪ/ |
aUl | a̴ʊ / aʊɫ | OWL | L-coloured /aʊ/ |
NN | ŋ̩ | Syllabic /ŋ/; doesn't occur phonemically but might be useful phonetically |
☆ Consonants ☆
Required
X-SAMPA | IPA | Pronunciation | Notes |
---|---|---|---|
p | p | SPOT / POT / CUP | |
b | b | BOT / CUB | |
t | t | STOP / TOP / LIT | |
d | d | DOG / LID | |
k | k | SKILL / KILL / LACK | |
g | g | GILL / LAG | |
' | ʔ | UH-OH | Grouped with the vowel strings |
tS | t͜ʃ | CHEAP / MERCH | |
dZ | d͜ʒ | JEEP / MERGE | |
f | f | FILE / SAFE | |
v | v | VILE / SAVE | |
T | θ | THING / BATH | |
D | ð | THIS / BATHE | |
s | s | SIP / BUS | |
z | z | ZIP / BUZZ | |
S | ʃ | SHIP / BUSH | |
Z | ʒ | MEASURE / GENRE | |
h | h | HOP | Does not occur in codas |
m | m | MICE / SIM | |
n | n | NICE / SIN | |
N | ŋ | SING | Does not occur in onsets |
4 | ɾ | LADDER | Flapped /t, d/ |
j | j | YOUR | Does not occur in codas |
w | w | WORE | Does not occur in codas |
l | l | LORE / CALL | |
r | ɹ | ROAR / CAR |
Recommended for GenAm
Optional allophones included in the FULL list for more natural pronunciation.
X-SAMPA | IPA | Pronunciation | Notes |
---|---|---|---|
ph | pʰ | POT | Aspirated /p/; only occurs in initial position |
th | tʰ | TOP | Aspirated /t/; only occurs in initial position |
kh | kʰ | KILL | Aspirated /k/; only occurs in initial position |
C | ç | HUE | Palatal /h/; occurs before /i, ɪ, j/ |
5 | ɫ | CALL | Velarized /l/; occurs in codas |
Other
X-SAMPA | IPA | Pronunciation | Notes |
---|---|---|---|
4~ | ̃ɾ | TWENTY | Flapped /n/ |
x | x | LOCH | Typically only in loanwords; often merged with /k/ |
W | ʍ | WHAT | Only retained in some dialects; often merged with /w/ |
☆ Consonant Clusters ☆
High-frequency clusters are those that are found in many words or words that occur commonly. They are included in the LITE list cluster add-on.
Low-frequency clusters are those that are found in uncommon words or only in specific varieties of English. They are included in the FULL list cluster add-on.
Clusters that are hypothetically possible according to English phonotactics but that aren't found in any known words are not included, but can still be formed from the regular consonant blends.
High-Frequency Onset Clusters
Clusters | Example Words | |
---|---|---|
C + approximant | pl bl kl gl fl sl | play blood clue glue flood sled |
pr br tr dr kr gr fr Tr Sr | pride brown tree dream crow grass free three shrimp | |
pj bj kj fj vj hj mj | pure beauty cute few view hue music | |
tw kw sw | twin queen swim | |
sibilant + C (+ approximant) | sp st sk sm sn | spin stim skill small snail |
spl spr str skr skj skw | split spring string scream skew squeeze |
Low-Frequency Onset Clusters
Clusters | Example Words | |
---|---|---|
C + approximant | vl Tl Sl | Vlad thlipsis schlep |
vr sr | vroom syringe | |
tj dj gj tSj Dzj Tj sj zj Sj nj lj | tuesday due argue chew jury suit Zeus sure new lieu | |
pw bw dw gw Tw Sw nw | puissance boire dwarf Guam thwart schwa noir | |
sibilant + C (+ approximant) | sf sT Sp St Sm Sn | sphere sthenic speil shtick schmuck schnoz |
skl sfr spj stj smj | sclera sphragistics spew stew smew |
High-Frequency Coda Clusters
Clusters | Example Words | |
---|---|---|
C (+ C) + plosive | sp mp pt kt ft tSt st St nt bd gd vd dZd zd md nd Nd sk Nk | cusp camp accept act laughed latched fast dashed dent grabbed bagged saved edged paused slammed hand hanged ask think |
spt mpt skt Nkt kst nst ntSt ndZd | clasped camped asked thanked risked danced inched hindged | |
C + affricate | ntS ndZ | inch hinge |
C (+ C) + fricative | ps ts ks fs Ts ns bz dz gz vz mz nz Nz | laps cats sacks laughs paths dance tabs dads saves hams pens things |
sps mps pts kts fts sts nts sks Nks ndz | clasps camps accepts acts lifts lists ants asks thanks hands | |
approximant + C (+ C) | lp lt ld lk lf lv lT ls lz lm | help salt held milk elf shelve health else sells film |
lpt lkt lvd lmd | helped milked shelved filmed | |
lps lts lks ldz lvz lmz | helps hilts milks folds shelves films | |
rp rt rd rk rtS rdZ rf rv rT rs rz rS rm rn rl | harp heart board fork arch barge scarf carve earth curse tears harsh harm turn curl | |
rpt rkt rtSt rft rTt rst rSt rdZd rvd rmd rnd rld | harped forked arched scarfed earthed first harshed barged carved harmed turned curled | |
rps rts rks rfs rTs rdz rvz rmz rns rlz | harps hearts forks scarfts earths heards carves harms turns curls |
Low-Frequency Coda Clusters
Clusters | Example Words | |
---|---|---|
C (+ C) + plosive | Tt mt Nt Dd Zd Ng | smithed dreamt instinct bathed collaged thing |
mft pst tst mst Nst | triumphed lapsed amidst glimpsed jinxed | |
C (+ C) + fricative | mf pT dT kT fT mT nT NT ms Ns | triumph depth width sixth fifth warmpth month length glimpse jinx |
mfs pTs dTs kTs fTs mTs nTs NTs mts Nts Ngz | triumphs depths widths sixths fifths warmths months lengths tempts instincts things | |
approximant + C (+ C) | lb ltS ldZ lS ln | bulb mulch bulge Welsh kiln |
ltSt lft lTt lst lSt lbd ldZd lnd | mulched shelft stealthed Welshed bulbed bulged kilned | |
lfs lTs lbz lnz | elf's tilths bulbs kilns | |
rb rg rD | barbed morgue birth | |
rbd rDd | barbed birthed | |
rbz rgz rDz | barbs morgues births |
☆ Phone Conversion Chart ☆
Quick reference for converting phones between different vocal synth transctiption systems. Only phones with equivalents across systems are included.
Salem's X-SAMPA | Delta's X-SAMPA | Vocaloid SAMPA | ARPAbet | CZ's Notation | IPA |
---|---|---|---|---|---|
@ | @ | @ | ax | x | ə |
@l | @l | el | ̴ə / l̩ | ||
@r | 3 | @r | er | 3 | ɚ / ɹ̩ |
{ | { | { | ae | @ | æ |
{~ | & | ̃æ | |||
1 | ix | ɨ | |||
4 | 4 | dx | dd | ɾ | |
A | A | @ | aa | a | ɑ |
aI | aI | aI | ay | I | aɪ |
aU | aU | aU | aw | 8 | aʊ |
b | b | b | b | b | b |
D | D | D | dh | dh | ð |
d | d | d | d | d | d |
dZ | dZ | dZ | jh | j | d͜ʒ |
E | E | e | eh | e | ɛ |
eI | eI | eI | ey | A | eɪ |
f | f | f | f | f | f |
g | g | g | g | g | g |
h / C | h | h / C | hh | h / hh | h / ç |
I | I | I | ih | i | ɪ |
I~ | 1 | ̃ɪ | |||
i | i | i: | iy | E | i |
j | j | j | y | y | j |
k / kh | k | k / kh | k | k | k / kʰ |
l / 5 | l | l0 / l | l | l | l / ɫ |
m | m | m | m | m | m |
N | N | N | ng | ng | ŋ |
n | n | n | n | n | n |
O | O | O: | ao | 9 | ɔ |
OI | OI | OI | oy | Q | ɔɪ |
oU | oU | @U | ow | O | oʊ |
p / ph | p | p / ph | p | p | p / pʰ |
r | r | r | r | r | ɹ |
S | S | S | sh | sh | ʃ |
s | s | s | s | s | s |
T | T | T | th | th | θ |
t / th | t | t / th | t | t | t / tʰ |
tS | tS | tS | ch | ch | t͜ʃ |
U | U | U | uh | 6 | ʊ |
u | u | u: | uw | o | u |
V | V | V | ah | u | ʌ |
v | v | v | v | v | v |
w | w | w | w | w | w |
Z | Z | Z | zh | zh | ʒ |
z | z | z | z | z | z |