☢ Phones & Pronunciation ☢

☆ General Info & Tips ☆

This page goes over the list of phones included in the prewritten lists, as well as some that might be useful for others to include in their own voicebanks. The symbols used are in X-SAMPA, with minor adjustments for compatibility with UTAU and Windows file name restrictions, and generally correspond to contemporary vocals sung in General American English.

The reclists use phonemic transcription where ever possible; phonetic transcription is reserved for cases in which an allophonic distinction is beneficial for more natural synthesis. For example, the velarized lateral approximant [5] is only used in medial CVs and transitional VCs where it's distinction from the non-velar lateral approximant [l] is relevant. Elsewhere, [l] is used regardless of how the phoneme may surface. Likewise, clusters such as /str/ will still be transcribed as [str] even if they surface phonetically as [ʃ͜tʃɹ].

Because of this, the actual sound you're uttering may not match the X-SAMPA character used to represent it. Rather than trying to force a "correct" pronunciation, you should instead try to match that symbol to how you would naturally produce the sound in the example word. For consonants this is relatively straightforward, though speakers of outer-circle dialects and L2 speakers may have some variation. For vowels, it can be kind of tricky to nail down, so if you want to be certain of how you personally should produce each vowel in the reclist, you can try the vowel quality self-assessment.

Thus, the reclist should generally be compatible for speakers of different dialects and be able to accommodate more specific vocal styles. Essentially, as long as the voicebank is internally consistent, meaning a given symbol corresponds to the same phone throughout the voicebank, it should function just fine and produce naturalistic results.

That being said, the additional phones included in the FULL list may not be the most relevant to people with dialects that are quite different from the perceived General American. In the future, I would like to develop lists that are optimized for other English varieties, but for now, if you want a voicebank better tailored to your own idiolect, I recommend customizing the included phones, as long as all of the required phones (i.e. those in the LITE list) are still present.

Phone Distinction

If two sounds here are different and the phones are differentiated in the LITE core reclist:

They are separate phonemes and should both be included in your voicebank.
EX. FLEECE and THOUGHT are almost certainly going to be different vowels.

If two sounds here are different but the phones are not differentiated in the LITE core reclist:

They are either separate phonemes in your ideolect, they are allophones of one phoneme, or they exist in free variation. Either way, while not a requirement for the voicebank to function, including both may allow for more natural synthesis.
EX. [ɒ] is only present for some speakers, so someone with this vowel can choose whether or not to include it.

If two sounds here are the same and the phones are differentiated in the LITE core reclist:

They are considered separate phonemes by many native speakers, but they might not be for you, or they might sound so similar you can't tell the difference. I'd recommend still including both in your voicebank for ease-of-use, either as separate recordings or as duplicated lines with alternative aliases.
EX. If FLEECE and KIT both surface as [i], it can still be beneficial to distinguish them.

If two sounds here are the same and the phones are not differentiated in the LITE core reclist:

They do not need to be distinguished in your voicebank.
EX. If HAND and TRAP both surface as [æ], then you only need [{] in your voicebank because [{~] is an optional allophone. That is, unless you want to have duplicate samples for ease-of-use.

If you want to include any additional phones not covered by the symbols provided here, use their typical X-SAMPA character if possible, and try to match the notation style of the existing phones, such as using [h] to denote aspiration.

TL;DR: Most voicebank variation will be in the inclusion of additional phones not in the LITE core list. Your voicebank should be able to function with a UST made using only the required samples, and a UST tailored to your voicebank generally shouldn't require excessive editing to be used with a different one.

☆ Vowels ☆

Note about Schwas:

Schwa, /ə/ [@], can be one of the hardest vowels to parse for someone with no phonetic ear/mouth training, as it is often the reduced form of other vowels, and is phonetically quick similar to the vowels in STRUT and FOOT. That said, if you are a native English speaker, you almost certainly produce this vowel in unstressed environments, and it's inclusion will help significantly with overpronunciation. The best tip for producing schwa intentionally I've found is to make the "laziest" vowel sound you can.

Note about LOT-THOUGHT (aka COT-CAUGHT) merging:

Even if you can't tell the difference between [O] and [A], you are very likely still producing [O] in some environments, so it is needed for the production of those words, especially in the abscense of extra vowels like [Or] and [Ol].

Think of it like [Or] without the [r], like [OI] without the [I], or like a vowel halfway between [A] and [oU].

Note about R-Coloured and L-Coloured vowels:

I'm going to use the terms "R-Coloured" and "L-Coloured" here for clarity, but it's worth mentioning that the technical terms for these are rhotic vowels (as a result of a following rhotic approximant) and velarized vowels (as a result of a following velarized lateral approximant).

If you speak a non-rhotic variety of English (i.e., you don't pronounce the <r> at the end of a syllable), you still have R-Coloured vowels, though they will surface as long vowels or as centralizing diphthongs. This is a lot of the reason I am using this terminology. R-Colouring resulting in long vowels like START /ɑ:/ likely won't sound different in singing, so they don't need to be included, but you might choose to still include centralizing dipthongs like that in NEAR /ɪə/, in which case it should still be notated as [Ir] for ease-of-use.

Generally speaking, the ending consonants of these vowels are reduced when singing, so try to avoid overpronunciation for a more natural sound.


@əABOUT / COMMASchwa; very central and very lax
{æTRAP / BATHCan be tense or lax
AɑFATHER / LOT / THOUGHT / BATHSee note above
iiFLEECEMight surface as a diphthong
OɔTHOUGHT / CLOTH / HAWKSee note above
uuGOOSEMight surface as a diphthong
eIFACEPhonemically /e/
oUGOATPhonemically /o/
@rɚ / ɹ̩NURSE / LETTERR-Coloured central vowel or syllabic /ɹ/; can be tense or lax
@l̴ə / l̩CIRCLEL-Coloured central vowel or syllabic /l/

Recommended for GenAm

Optional allophones included in the FULL list for more natural pronunciation.

1ɨROSES / TODAYCentralized /i, ɪ/ or raised schwa
aaPRICEReduced /aɪ/; /ɪ/ is weak or deleted
{~̃æHANDNasal /æ/
E~̃ɛSANGNasal /e, ɛ/
I~̃ɪSINGNasal /i, ɪ/
Arɑ˞ / ɑɹSTARTR-coloured /ɑ/
Erɛ˞ / ɛɹSQUARER-coloured /e, ɛ/
Irɪ˞ / ɪɹNEARR-coloured /i, ɪ/
Orɔ˞ / ɔɹNORTH / FORCER-coloured /o, ɔ/
Al̴ɑ / ɑɫFALLL-coloured /ɑ/
il̴i / iɫFEELL-Coloured /i/
Ol̴ɔ / ɔɫCOLDL-Coloured /ɔ/
ul̴u / uɫFOOLL-Coloured /u/
mmRHYTHMSyllabic /m/
nnBUTTONSyllabic /n/


Nonexhaustive list of vowels which are only applicable to non-American or specific regional dialects, are lower frequency by either dialect or production, or just aren't distinct enough for most speakers to warrant differentiation in the main lists.

It would be impossible for me to try and accomodate every possible vowel variation, so I strongly recommend researching your own dialect and testing your own vowels if you want to be precise.

Do not use these to replace required vowels! These are for differentiating these vowels from those already present in the reclist, for example using [3r] as an additional symbol to represent a tense variant of [@r], not using it in place of.

}ʉGOOSE / CUECentralized /u, ʊ/ or raised + rounded schwa
3ɜNURSEIf distinguishing non-rhotic from /ɚ/ and /ɜɹ/
eeFACEReduced /eɪ/; /ɪ/ is weak or deleted
ooGOATReduced /oʊ/; /ʊ/ is weak or deleted
QɒLOT / CLOTH / HAWKRounded /ɑ/
VIʌɪPRICERaised /aɪ/; if diff from PRIZE
VUʌʊCLOUTRaised /aʊ/; if diff from CLOUD
aU~aʊ̃POUNDNasal /aʊ/
3rɝ / ɜɹNURSEIf distinguishing the tense variant of /ɚ/
ara˞ / aɹFIRE / HOURReduced + R-coloured /aɪ/ or /aʊ/
oro˞ / oɹFORCER-Coloured /o/; if diff from NORTH
Urʊ˞ / ʊɹCURER-Coloured /ʊ/; if diff from NURSE
aIraɪ˞ / aɪɹFIRER-coloured /aɪ/
aUraʊ˞ / aʊɹHOURR-coloured /aʊ/
{l̴æ / æɫSCALPL-Coloured /æ/
al̴a / aɫWHILE / OWLReduced + L-Coloured /aɪ/
El̴ɛ / ɛɫSHELFL-Coloured /ɛ/
el̴e / eɫWHALEL-Coloured /e/
Il̴ɪ / ɪɫMILKL-Coloured /ɪ/
Ul̴ʊ / ʊɫFULLL-Coloured /ʊ/; if diff from COLD and CIRCLE
aIla̴ɪ / aɪɫWHILEL-coloured /aɪ/
aUla̴ʊ / aʊɫOWLL-coloured /aʊ/
NNŋ̩Syllabic /ŋ/; doesn't occur phonemically but might be useful phonetically

☆ Consonants ☆


'ʔUH-OHGrouped with the vowel strings
hhHOPDoes not occur in codas
NŋSINGDoes not occur in onsets
4ɾLADDERFlapped /t, d/
jjYOURDoes not occur in codas
wwWOREDoes not occur in codas

Recommended for GenAm

Optional allophones included in the FULL list for more natural pronunciation.

phPOTAspirated /p/; only occurs in initial position
thTOPAspirated /t/; only occurs in initial position
khKILLAspirated /k/; only occurs in initial position
CçHUEPalatal /h/; occurs before /i, ɪ, j/
5ɫCALLVelarized /l/; occurs in codas


4~̃ɾTWENTYFlapped /n/
xxLOCHTypically only in loanwords; often merged with /k/
WʍWHATOnly retained in some dialects; often merged with /w/

☆ Consonant Clusters ☆

High-frequency clusters are those that are found in many words or words that occur commonly. They are included in the LITE list cluster add-on.

Low-frequency clusters are those that are found in uncommon words or only in specific varieties of English. They are included in the FULL list cluster add-on.

Clusters that are hypothetically possible according to English phonotactics but that aren't found in any known words are not included, but can still be formed from the regular consonant blends.

High-Frequency Onset Clusters

ClustersExample Words
C + approximantpl bl kl gl fl slplay blood clue glue flood sled
pr br tr dr kr gr fr Tr Srpride brown tree dream crow grass free three shrimp
pj bj kj fj vj hj mjpure beauty cute few view hue music
tw kw swtwin queen swim
sibilant + C (+ approximant)sp st sk sm snspin stim skill small snail
spl spr str skr skj skwsplit spring string scream skew squeeze

Low-Frequency Onset Clusters

ClustersExample Words
C + approximantvl Tl SlVlad thlipsis schlep
vr srvroom syringe
tj dj gj tSj Dzj Tj sj zj Sj nj ljtuesday due argue chew jury suit Zeus sure new lieu
pw bw dw gw Tw Sw nwpuissance boire dwarf Guam thwart schwa noir
sibilant + C (+ approximant)sf sT Sp St Sm Snsphere sthenic speil shtick schmuck schnoz
skl sfr spj stj smjsclera sphragistics spew stew smew

High-Frequency Coda Clusters

ClustersExample Words
C (+ C) + plosivesp mp pt kt ft tSt st St nt bd gd vd dZd zd md nd Nd sk Nkcusp camp accept act laughed latched fast dashed dent grabbed bagged saved edged paused slammed hand hanged ask think
spt mpt skt Nkt kst nst ntSt ndZdclasped camped asked thanked risked danced inched hindged
C + affricatentS ndZinch hinge
C (+ C) + fricativeps ts ks fs Ts ns bz dz gz vz mz nz Nzlaps cats sacks laughs paths dance tabs dads saves hams pens things
sps mps pts kts fts sts nts sks Nks ndzclasps camps accepts acts lifts lists ants asks thanks hands
approximant + C (+ C)lp lt ld lk lf lv lT ls lz lmhelp salt held milk elf shelve health else sells film
lpt lkt lvd lmdhelped milked shelved filmed
lps lts lks ldz lvz lmzhelps hilts milks folds shelves films
rp rt rd rk rtS rdZ rf rv rT rs rz rS rm rn rlharp heart board fork arch barge scarf carve earth curse tears harsh harm turn curl
rpt rkt rtSt rft rTt rst rSt rdZd rvd rmd rnd rldharped forked arched scarfed earthed first harshed barged carved harmed turned curled
rps rts rks rfs rTs rdz rvz rmz rns rlzharps hearts forks scarfts earths heards carves harms turns curls

Low-Frequency Coda Clusters

ClustersExample Words
C (+ C) + plosiveTt mt Nt Dd Zd Ngsmithed dreamt instinct bathed collaged thing
mft pst tst mst Nsttriumphed lapsed amidst glimpsed jinxed
C (+ C) + fricativemf pT dT kT fT mT nT NT ms Nstriumph depth width sixth fifth warmpth month length glimpse jinx
mfs pTs dTs kTs fTs mTs nTs NTs mts Nts Ngztriumphs depths widths sixths fifths warmths months lengths tempts instincts things
approximant + C (+ C)lb ltS ldZ lS lnbulb mulch bulge Welsh kiln
ltSt lft lTt lst lSt lbd ldZd lndmulched shelft stealthed Welshed bulbed bulged kilned
lfs lTs lbz lnzelf's tilths bulbs kilns
rb rg rDbarbed morgue birth
rbd rDdbarbed birthed
rbz rgz rDzbarbs morgues births

☆ Phone Conversion Chart ☆

Quick reference for converting phones between different vocal synth transctiption systems. Only phones with equivalents across systems are included.

Salem's X-SAMPADelta's X-SAMPAVocaloid SAMPAARPAbetCZ's NotationIPA
@l@lel̴ə / l̩
@r3@rer3ɚ / ɹ̩
h / Chh / Chhh / hhh / ç
k / khkk / khkkk / kʰ
l / 5ll0 / llll / ɫ
p / phpp / phppp / pʰ
t / thtt / thttt / tʰ