☢ All About Voicebanks ☢

☆ What are Voicebanks? ☆

Voicebank, or 音源 (ongen) in Japanese, is a catchall term for the library of sound files that a synth engine will sample from in order to render vocals. A given UTAU character, or UTAUloid, will have one or more voicebanks associated with it.

There are also derivative UTAUloids, which are ones that are derived from existing characters using non-standard render settings, such as genderbends like Ted Kasane, who is a derivative of Teto rendered to sound more masculine. Some UTAU have officially recognized derivatives, and others will typically outline their policy on derivative creation in their Terms of Use (TOU) / End User License Agreement (EULA).

The TOU for an UTAU are usually found in the readme file of the voicebank, and cover things such as commercial usage and content permissions. It's a good idea to read these before using a particular voicebank to avoid issues.

☆ Voicebank Classification ☆

Language Support

When categorizing voicebanks, the most obvious thing to look at is language support. While a lot of voicebanks will offer some amount of multilingual compatibility, and you can brute force one to sing just about anything, it's often easier to look for a voicebank that's designed specifically for the language you need.

As vocal synth fans, this will commonly be Japanese, which is one of the easier languages to work with since it's not very phonologically complex, the methodology is really standardized, and it's what UTAU was built for.

Using voicebanks for other languages can be more complicated, especially for complex languages like English. While some set methodologies exist for other languages, there is a lot less standardization, so they can be more difficult to work with and more difficult to create. While vanilla UTAU requires manual phonetic transcription, this is less of an issue for OpenUTAU thanks to the existance of Phonemizers, but even these likely won't be able to fully capture all the phonetic nuances.

Japanese voicebanks will typically indicate the characters used in the voicebank's aliases (what is used to write lyrics in the software) and encoding (how the sound files are named). These will either be in kana (often strictly hiragana), which means they're written using Japanese characters, or romaji, which means they're written using the Roman alphabet (same as English). Kana is most common for aliasing, but while it's definitely useful to learn how to read it, there are romaji ↔ hiragana conversion plugins you can use, as I'll go over later, and a built in conversion feature in OpenUTAU.


Voicebank Labels

There are three overarching kinds of labels you'll see applied to voicebanks: CV, VCV, and CVVC. The C's stand for consonant, the V's stand for vowel, and the order of them indicates the main types of voice samples the voicebank contains.

CV voicebanks, called 単独音 (tandoku-on) in Japanese, are the smallest and most basic type of voicebank. They consist primarily of consonant-vowel pairs called diphones, and are typically the easiest to use. However, they won't give you the most naturalistic vocals, and are often described as sounding "choppy" and "robotic" — but that doesn't mean they are inherently poor quality or not worth using. You may also see voicebanks labeled as れんたん (rentan), but this is just a specific recording method for CV; I'll go over what this means in a future section.

Next we have VCV voicebanks, or 連続音 (renzoku-on). These are the largest voicebank type, and contain not only every CV, but also every vowel + diphone combination, hence the name. These function by blending the leading vowel of a sample with the ending vowel of the previous sample, creating a more natural transition between notes.

And finally we have CVVC voicebanks. These may also be labeled as CVC, VCCV, or pseudo-VCV, but fundamentally these are all just different techniques for the same voicebank type, that being one which is built on both CV samples and VC samples. For many languages, the VCs are necessary to capture the final consonant of a given syllable, but for languages with open syllables like those in Japanese, a VC can act as the transition point, much like the leading vowel of a VCV. You can think of it like a VCV sample split in half and using the consonant as the blending point.

Another term you will frequently come across is multipitch (as compared to monopitch). Sometimes these are labeled something like dipitch, tripitch, powerscale, or キレ (kire), but these are all different classifications of the same idea. Multipitch works by recording the same samples in, well, multiple pitches, and configuring the voicebank in such a way so that when a sample falls on a specific musical note, the voicebank will render it from the sound file that corresponds with that pitch. Because a voicebank will sound clearer and more natural the closer the notes its singing are to the pitch it was recorded at, this can create more natural-sounding voicebanks overall and give the voicebank a wider ideal range.

Building off of this, we also have multiexpression voicebanks, which are recorded in more than one vocal style but can be rendered together in the software. These may or may not also be multipitch.


Vocal Range

This brings us to discussing vocal range. One of the appeals of vocal synths is that their range can easily extend past that of human vocalists, but while we can technically render any UTAU at any note from C1 to B7, we typically don't want to force them too low or too high, otherwise they start to sound distorted and hard to listen to. So, whether you're creating a cover or an original track, you'll generally want to stick to a note range that's well suited for the voicebank. We can also transpose the song if necessary — that is, move every note up or down by the same degree, but we'll go over how to do that later.

Most humans have a range of about two octaves, or 24 notes, but naturally there's a lot of variation. Oftentimes a voicebank will list out what pitches it was recorded at, as well as an approximate range of notes it sounds best in, but if not we can pretty easily determine its ideal range. A monopitch voicebank will likely sound best in a single octave range with its recorded pitch in the middle. For example, a voicebank recorded at C4 will likely sound best from about F#3 to F#4 — AKA half an octave (6 notes) in either direction, but of course this is all subjective. Similarly, if you're working with a multipitch UTAU with a wide range of recorded pitches, a good estimate for its ideal range will be half an octave above its highest pitch and half an octave below its lowest.

We can also approach this from the angle of western classical ranges, and indeed some UTAU may be labeled with these terms. Note that, while there's some overlap in terminology, these are not necessisarily the same as choral harmony parts.

The classical male ranges go from Bass at the lowest (E2~E4), Baritone in the middle (A2~A4), and Tenor at the highest (C3~C5). So, generally, a masculine voicebank should be rendered between about E2 and C5.

For the classical female ranges, we have Contralto at the lowest (E3~E5), Mezzo-Soprano is in the middle (A3~A5), and Soprano at the highest (C4~C6). The recommended range for a feminine UTAU is about E3 to C6 — one octave higher than masculine UTAUs.

Other ranges you might see are Contrabass, which is extremely low (C2~C4), Coloratura Soprano, which is extremely high (E4~E6), and Countertenor, which is the label for any masculine voice higher than Tenor, typically the Contralto range (E3~E5). Contralto as a label may also refer to any feminine voice lower than Mezzo-Soprano.

But know that these aren't hard rules by any means, just general classifications. A lot of UTAUs' ranges will exceed two octaves, and there are plenty of UTAU which don't fit the gendered schema here, whether because the UTAU in question is trans or nonbinary, their voice provider is, or they're just voiced by someone of a different gender. All of these are pretty common in the community.

☆ Finding and Installing Voicebanks ☆

Voicebank Databases

There are lot's of ways to find voicebanks you may want to use, and there are literally thousands of them out there. You may come across one by chance, such as from a song or cover you like, but there are also a few different databases you can browse or search:

  • UTAU Wiki 2.0 (EN) — This is the one I recommend most for English speakers, because it's the easiest to search, and the page format is consistent and easy to navigate. Here, you can search for voicebanks that are labled with specific tags, such as voicebank type, range, language support, and country of origin. You can also browse the entire database alphabetically.
  • UTAU Fandom Wiki (EN) — At time of writing, it's going through some remodeling, so the page format can be a bit inconsistent. It's probably the largest database, but not all of the pages have been kept up with over the years. Like Wiki 2.0, you can also search or browse voicebanks by specific tags.
  • UtaForum Showcase (EN) — This is the smallest database, and mostly used by forum regulars, but it's still a good way to find new voicebanks to use, especially if you're looking for more recent ones.
  • UTAU Visual Archive (JP) — Large database of UTAU with visual references. You can browse specific categories based on character traits.
  • UTAU DB (JP) — Nice search function (if you know Japanese), but not the best for general browsing.
  • Vocaloid Database (EN) — Also covers producers and other synths, but there are lots of UTAU catalogued on this site as well.

If you already know what UTAU you want to use, however, it's usually just a matter of typing their name into a search engine. A good number of UTAU, such as mine, even have their own webpages, which may be more up-to-date than their wikis.


Downloading Voicebanks

For UTAU:

Once you've located the download link for the voicebank you want to use, you can open it up and download the file. UTAU voicebanks usually come in .zip or .rar files, but occasionally you might find one that's stored in an executable. For convenience, I usually download them directly into the voice folder within my UTAU program folder.

Next, I open up the file in winrar to see whether or not they zipped the entire folder, or if the voicebank contents are loose inside of it. If the entire folder is preserved, I right click the file and select Extract Here. If not, I select Extract to [filename].

After the voicebank folder is extracted, I open it up to make sure everything looks good inside of it. The root folder of the voicebank — that is, the one that is stored directly inside of the voice folder — should have at the very least two text files inside of it: character.txt and readme.txt. character.txt is the file that registers and labels the voicebank inside of UTAU. The root folder may also contain the .wav files and oto.ini (configuration file), or these may be found in one or more subfolders within the voicebank. There may also be other files or folders included, but don't worry about those for now.

If you open the main folder and see nothing but another folder inside of it, simply cut and paste the contents from this subfolder into the main one. It's okay if the voicebank has subfolders, as those are often used for organization, but you don't need that unnecessary layer.

For OpenUTAU:

By default, OpenUTAU will have you install singers to the program by navigating to Tools > Install Singer... and selecting the voicebanks .zip or .rar file. OpenUTAU will then extract contents of the file itself into the folder labled Singers within the OpenUTAU program folder. Alternatively, you can follow the same method outlined above, but placing the voicebanks into this folder instead.

If you plan to use both softwares, there is an easy way of sharing voicebanks between them:

Step 1. Navigate to Tools > Preferences....

step 1 screenshot

Step 2. In the preferences window, scroll down to the section labled Paths.

Step 3. Under Additional Singer Path click on the Select button and locate your UTAU voice folder.

step 2 and 3 screenshot

Now, all of the voicebanks you've set up for UTAU will be accessible in OpenUTAU, so you do not need to have two copies of every voicebank you want to use.


Loading Voicebanks

For UTAU:

Inside of UTAU, you can change the voicebank selected by going to Project(P) > Project Property(R), which will pull up the Project Configurations window.

Alternatively, you can click on the voicebank name in the top left corner underneath the icon.

Here, you can either select the voicebank name from the dropdown menu labled Voice Bank, or click on the three dots to browse through your files yourself. Make sure you select the root folder of the voicebank, not any of the subfolders inside of it.

To view the voicebank's information, click on the info button in your project settings, or click on their icon in the top left corner. This usually includes a profile picture, voice sample, author credit, and any other information in the voicebank's readme.

If you saved your voicebank elsewhere on your computer, or it just isn't showing up in the dropdown menu, you can register it into UTAU manually:

Step 1. Navigate to Tools(T) > Option(O), and click on the tab labeled Bank regist..

Step 2. Click on the Select... button, navigate to the root folder of the voicebank, and click Okay.

Step 3. If the name doesn't load automatically, type it into the box above the file path and click Add.

Step 4. Click Okay. Now, the voicebank should show up in your project settings.

For OpenUTAU:

OpenUTAU allows you to use multiple voicebanks within the same file. To change the voicebank of a given track, click on the button labled Select Singer (or the currently loaded voicebank's name) and find the voicebank you want to use in the dropdown menu.

To view information about all of the voicebanks you have installed, go to Tools > Singers.

In addition to the character.txt and readme.txt files, voicebanks set up for OpenUTAU may also have character.yaml files containing information specifically for OU, such as defining specific voice colours (expressions) and setting the default phonemizer.