☢ How to Use UTAU ☢

Before we jump into making USTs, first make sure you're on Mode 2, which is the improved note editing mode.

Next, for now, make sure P and the ~ buttons in the bottom left of the window are deselected; this will make it easier to see what we're doing.

I'm not going to be going over literally every function of UTAU here; some I feel are relatively intuitive, like that clicking Delete will delete notes, and others exist but aren't really necessary to know when you're just starting out. Even if you aren't planning on making your own USTs from scratch, you'll likely come accross ones made by others that you'll want to make adjustments to.

If you're looking for an easy-to-use voicebank to help get you acquainted with the software, I actually created one for this purpose back in 2021.

☆ Tempo, Length, & Pitch ☆

To change the tempo of the UST, click on the number next to Tempo, enter the desired BPM (beats be minute) in the pop up window, and click OK. The untranslated text just reads "cancel".

Quantization refers to what multiple the length value of the notes will snap to. In other words, if it is set to L8 8th note, drawing notes or editing note lengths will snap them to eight note intervals. The default value of L32 is usually fine, but lowering it might be helpful if you're worried about the timing getting off, and raising it is useful for when timing needs to be more precise.

Next to quantization, we can also set the default note length. With the pencil tool selected, clicking once on the piano roll will create a note of this length at the indicated pitch. Clicking and dragging will allow you to set the note length as you draw it.

To resize an existing note, click and drag the right boundary. If there is another note after it, a normal click+drag will only extend it up to the point where that note starts, or will create a rest note between it and the following note. SHIFT+click+drag will extend or contract the note length while shifting the positions of all following notes, and CTRL+click+drag will resize the following note to compensate without moving the rest of the UST.

Unlike most other MIDI softwares, UTAU handles rests by treating them as a special note type rather than empty space between notes. To create a rest note at the default length click the rest note button next to the lyric box. These can also be resized by adjusting the right boundary, but they behave slightly differently; a normal click and drag will behave like a CTRL+click for a regular note, and dragging either a rest or a regular note over the entirety of a following rest will delete it.

Finally, you can right click on a note and select Length to type in a numeric value yourself, but I don't recommend doing this until you get familiar with how these values correspond to musical length (e.g. quarter notes, eighth notes, etc.), because entering the wrong value here can throw off the timing of the entire UST, and you generally shouldn't need to get more precise than the highest level of quantization will allow anyways.

To change the pitch of a note, you can click and drag it up and down on the piano roll. Dragging left and right will change its position in the sequence. To select multiple adjacent notes, you can SHIFT+click the first and last note, highlight the desired region with the cursor tool, or press CTRL+A to select every note in the UST. If an M is present in this section of the window, this means that you can move all of the notes in a selection together. If there isn't an M here, click the area to enable it (or disable it if is active). This only works for changing pitch, though, not position.

Additionally, you can transpose an entire selection of notes by navigating to Edit > Move Region Up(U) to move it up by one, Edit > Move Region Down(D) to move it down by one, or Edit > Move Region By Number(N) to enter a specific amount of notes. Entering 12 will transpose the selection up by a whole octave, and -12 down a whole octave. If you're doing a cover, unless you're planning to transpose the instrumental track as well, stick to multiples of 12 here, or else your vocals will be in the wrong key for the song.

☆ Lyrics ☆

Differences in Input Based on Voicebank Type

The nomenclature here is a little misleading. As explained earlier, UTAU does not have a dictionary, and will not recognize orthographic input. It will only recognize the labels of the phonetic samples that are indicated in the loaded voicebank's configuration file, known as the oto.ini. These labels are called aliases.

So, rather than inputting whole words into each note, we must instead input the alias of the sample we want it to sing. For example, if we want it to sing the word 「木」(ki; tree), we'll need to input whatever alias the voicebank uses for the sounds of the word. In a typical Japanese CV, this will be either the hiragana character [き], or the romaji characters [ki], not the kanji. Multisyllabic words like 「猫」(neko; cat) will have one mora on each of two separate notes, e.g. [ね][こ] or [ne][ko].

For a VCV voicebank, things are slightly more complated, but not significantly so. If a sample is preceeded by a rest note, it begins with a hyphen and a space, [- CV], and if it comes after another sample, it begins with the vowel of the previous diphone and a space, [V CV]. For example, 「猫」(neko; cat) becomes [- ね][e こ].

Note: Japanese VCV voicebanks are almost always written with hiragana diphones. Using romaji would make it pretty difficult to find compatible USTs / voicebanks.

As part of a larger phrase like 「この猫は…」(kono neko wa...; this cat is...), we get something like [- こ][o の][o ね][e こ][o わ]. See how the [ね] (ne) still connects with the previous sample even though it is a separate word, and how the particle (wa) is written phonetically as [わ] even though it is spelled with the character 「は」 (ha).

CVVC voicebanks work very similarly to VCV, except that the transitional samples must also be split into their own notes. So, 「この猫は…」(kono neko wa...; this cat is...) becomes [こ][o n][の][o n][ね][e k][こ][o w][わ]. For languages with ending consonants (codas) such as English, the VC is also split into it's own note, so "cat", transcribed in X-SAMPA, becomes something like [k{][{ t], but those are beyond the scope of this tutorial.


Prefixes and Suffixes

For multipitch and multiexpression voicebanks, different versions of the same sample are differentiated by the use of prefixes and suffixes, which are simply additional characters added to the beginning or end of the alias. For example, a multipitch voicebank might have a [ねC3] sample and a [ねE3] sample to indicate what pitch each [ね] was recorded at.

These might not be necessary to type in yourself if the voicebank has a prefix map — a configuration file that tells UTAU what prefix or suffix to default to for a lyric based on the pitch of the note. I'm not going to cover making or editing these in this tutorial, but it's good to know they exist.


Entering Lyrics into a UST

The main way of inputting and editing lyrics is to simply double click on a note and type it into the text box. Alternatively, you can use text box labled Lyric at the top of the screen.

To use the lyric box, you'll type in the aliases of each sample you want to call. If they are written only in kana, you can just type them with no spaces, i.e. このねこわ. If they are written in romaji and do not contain special characters, you can write them separated by single spaces, i.e. ko no ne ko wa. For other aliases (like VCVs), you can enclose each alias in "quotes", i.e. "- こ""o の""o ね""e こ""o わ".

Using this method, you can use the substitute lyrics button to apply these lyrics to a selected group of notes on the piano roll, or you can use the insert lyrics button to create five new notes at the default length with these lyrics. This can be useful for batch editing.


Lyric Editing & Conversion Plugins

These can be pretty handy, especially when editing existing USTs for a different voicebank.

Simple Plugins:

  • Roman to Kana by yuuboku — basic romaji → hiragana diphone conversion. Can be easily edited for other conversions.
  • Roman to Kana Switch by Rai — edit of Roman to Kana that converts both ways.
  • Suffix Selector by nmasao1 — allows for editing the prefixes and suffixes of multiple notes at the same time without having to manually type them.

More Complex Plugins:

  • Iroiro by bizz — very useful plugin with many functions, including many types of lyric conversation. Interface can be set to English (if it isn't by default). Might give this it's own tutorial later.
  • autoCVVC by delta_kuro — another useful plugin for lyric conversion. It's main function is to automatically splits notes to create VCs, but it is imperfect so they usually need to be edited. Also probably deserves it's own tutorial.

☆ Envelope & Crossfade ☆

The envelope of the note is the visual representation of it's length and volume from start to finish. Due to the configuration settings as you'll see later, the place where the sample starts is not exactly the same as the place where the note starts. Crossfade refers to the intersection of two notes, where one fades out as the other fades in.

To see the note envelopes on the piano roll, click the ~ button in the bottom left of the screen. To view and manually edit the envelope of an individual note, right click on it and select Envelope. I'll be honest, though; I hardly ever need to do this, thanks to other more straightforward ways of envelope editing.

To adjust intensity (volume) manually, you can click and drag the top boundary of a note, or you can right click on it and select Property to open the Note Properties window. In this window, you can type in a value of 0 to 200 (100 is default, 0 is silent).

You can also adjust the modulation (stability) here from 0 to 100, where 0 is no pitch modulation (max stabilization) between notes and 100 is maximum pitch modulation (no stablization). I recommend manually setting this to 0 most of the time to ensure stable transitions. To have the modulation visible underneath each note on the piano roll, click the P button on the bottom left of the screen.

To automatically crossfade notes, select two or more notes on the piano roll and click either the p2p3 or p1p4 buttons on the top right of the toolbar. I typically use p2p3, but it doesn't matter that much. This ensures smooth crossfading from one note into the next.

To reset the envelopes, hit the reset button (this doesn't reset intensity, though). I don't really use the others, though if you accidentally click ACPT, you may need to open the Note Properties window and clear the Preutterance and Overlap settings to be sure this doesn't interefere with automatic crossfading.

☆ Pitchbends ☆

Pitchbending is often what is being referred to when vocal synth users talk about tuning, although that can also include all means of UST editing. The gist is that the red lines on the piano roll represent the specific pitch that the sample is tuned to at a given moment regardless of what the base note is set to.

While the western classical scale consists of only twelve notes at different octaves, the reality is there are many many pitches between these set frequencies, all of which can be captured when humans sing. Think of the note like a target that the pitchbends can move around; our brains fill in the average frequency and register it as a single pitch.

Vibrato is a singing technique that involves rapidly moving the pitch of one's voice up and down on top of the target note. Portamento (for our purposes) is the pitch transition from one note into the next, capturing all of the frequency intervals between targets; both of these are visualized in UTAU by the use of pitchbends.

To enable or disable both of these on one or more notes, right click on a selection and go to Pitch to open the Pitch Control window, and check or un-check the box next to each word. For most notes on a given UST, portamento should be enabled to ensure smooth pitch transitions rather than sharp jumps, and vibrato should typically be disabled, since most contemporary singing styles do not impliment it on every single note. Adding vibrato to every note will make the voice sound warbly and unstable.

While you can also use this window to edit the numerical values, I personally find pitchbends easier to edit on the piano roll; just make sure that the ~ button is pressed so that they are visible. I'm not going to go over all of the different ways that the pitchbends can be edited here, because there are a lot of different tuning techniques out there. I recommend playing around with the boundaries and anchor points yourself to help get a feel for what they do.

☆ Using Existing USTs ☆

When you're opening up a UST for the first time or for a new project, you'll need to set the voicebank, resampler, wavtool you want to use it with — though remember you can always go back and change these settings later.

Once these are set and I've loaded the UST, the first thing I do is press CTRL+A on my keyboard to select all the notes, right click on the selection, and go to Region Property to open up a Note Propeties window that will affect the entire selection. I set Intensity to 100, Modulation to 0, and clear all other boxes. You don't have to do this, technically, but with how different each voicebank is, settings intended for other voicebanks can make it sound funky. At the very least, be sure to clear the Preutterance and Overlap as well as the Flags, as those are paticularly voicebank-specific.

Next, if the UST isn't in an ideal range for the voicebank I'm using, I transpose it up or down an octave. Typically this is lowering a female part for a male voice to sing or raising a male part for a female voice to sing.

After that, if necessary, I convert the lyrics to match the type of voicebank I'm using, often with the Iroiro plugin (mentioned earlier). If I'm converting to CVVC, I usually split and time all the VCs myself, though sometimes I use autoCVVC (also mentioned earlier), but that's getting into more complicated territory.

Finally, I go through the UST and make other adjustments to the tuning and lyrics as needed to match the samples included in the voicebank and to make use of other features the voicebank may have.

If I want to tune from scratch, I reset all of the pitchbends by pressing CTRL+A to select everything, right clicking on the slection, going to Pitch, unchecking and rechecking Portamento, and unchecking Vibrato.