☢ Character Encoding ☢

☆ What is Character Encoding? ☆

Character encoding refers to the way that text characters are stored as numerical values. These values tell the computer which characters to display in a text sequence.

The most common encoding used for websites and what is often the default one used by text editors is UTF-8, but there are multiple different types, and each uses a unique numbering system. As such, not all of them are mutually compatible, especially when dealing with characters outside of the Latin alphabet.

Shift JIS and Mojibake

Classic UTAU is encoded in Shift JIS, an encoding designed for Japanese characters. UTAU also functions almost entirely by reading text files (TXT's, INI's, and even UST's). Given that many voicebanks and USTs are Japanese language and therefore use Japanese characters, you might then imagine that a text file with the wrong encoding could cause problems.

Indeed, wrongly encoded text files are a common error encountered by UTAU users. If you've ever opened up an UTAU's files and seen something like this, you've encountered the dreaded mojibake:

Mojibake is a term used for the garbage characters resulting from improper encoding; they cannot be read correctly by the software — or by its users, for that matter. This is also a problem when trying to use OREMO or setParam, as both of them also require Shift JIS encoding for Japanese characters.

Luckily, there is an easy fix; we simply need to convert the text files to Shift JIS.

Note: This error can be avoided by using newer softwares for voicebank development and usage, namely RecStar, vLabler, and/or OpenUTAU, but given that many users still use OREMO, SetParam, and/or classic UTAU, this tutorial still seems useful.

☆ Encoding in Shift JIS ☆

Converting Existing Files in Windows Notepad

If your computer's locale is set to Japanese (which it should be if you're using classic UTAU), the process is fairly straightforward.

Step 1. Open the improperly encoded text file in Notepad.

Step 2. Navigate to File > Save As or hit CTRL+SHIFT+S

Step 3. In the dropdown menu next to encoding, select ANSI. converting to Shift JIS in Windows Notepad

Step 4. Hit Save and click yes to overwrite the existing file.

Note that saving the file with ANSI encoding just tells it to use your computer's default non-unicode script. If problem persists, try the method below using Notepadd++.

Converting Existing Files in Notepad++

Notepad++ is a free text editor I highly recommend for its versatility and ease-of-use, but similar processes can be done in other text editors.

Step 1. Open the improperly encoded text file in Notepad++.

Step 2. Select all of the text in the file. This can be done manually or by hitting CTRL+A on your keyboard.

Step 3. Copy the text onto your clipboard by right clicking and selecting Copy or by hitting CTRL+C. You can paste the text into a temporary notepad window if you like, but it's unnecessary as long as you don't copy anything else during this process.

Step 4. On the toolbar, navigate to Encoding > Character sets > Japanese and click on Shift JIS. This will turn the characters into mojibake, which is why we have preserved the text elsewhere temporarily. converting to Shift JIS in Notepad++

Step 5. With all of the text in the file still selected (or reselected), paste the original text back into the file by right clicking and selecting Paste or by hitting CTRL+V to overwrite the garbage characters.

Step 6. Verify the file is now correctly encoded by checking the Encoding tab and making sure that Shift JIS is selected.

Step 7. Save the file.

If everything was done correctly, this should resolve the problem.

Creating New Shift JIS Files

For if you are writng an oto.ini, readme.txt, or other such UTAU file want to encode in Shift JIS from the get-go and avoid encountering problems later. Very similar to the above.

Files already encoded in Shift JIS should not typically encounter problems when writing or pasting Japanese text into them, for example copying and pasting a reclist or base oto with kana in it.

In Windows Notepad, select ANSI from the dropdown menu whenever you go to save the file.

In Notepad++, before typing anything in the new file, navigate to Encoding > Character sets > Japanese and click on Shift JIS.

And that's it. It's quite simple.