☢ How to Create Base OTOs ☢

A comprehensive guide on writing base configuration files (otos) for Classic UTAU and OpenUTAU. Also applicable to DIY synths that use the same configuration parameters. This resource assumes at least a basic knowledge of UTAU. If you need help with terminology or linguistic concepts, check the vocabulary page.

☆ What is a Base OTO? ☆

A base oto is a template for an UTAU voicebank configuration file, or oto.ini. Essentially, they do the work of splitting up a string into samples ahead of time so that this does not have to be done manually during the configuration process. They also often, but not always, provide numerical values for parameters with fixed values and rough estimates for those which will be adjusted later.

They are not meant to serve as a replacement for manual oto'ing; most samples will still need to be adjusted (human speech is imperfect, after all). Rather, they exist to make the process faster and easier.

There are a lot of benefits to creating a base oto, and they are not terribly difficult to set up. Having the samples already split into their own lines and named takes care of one of the more tedious parts of oto'ing, and if a certain parameter's value is going to be the same for every string, it can save a lot of time to have it already in place. This is especially true if you plan to use the same reclist for multiple voicebanks, multiple pitches in the same voicebank, or to release a reclist publically for other people's usage.


Things to Keep in Mind

First thing's first, you'll need to understand the reclist you're working with. Especially if it's one you've written yourself, you should already have in mind what samples are going to be pulled from each string and how they're going to be aliased.

Note: When building a base, it's best to only have one line for each sample, even if you plan to create identical duplicates with alternative aliases later, such as having both kana and romaji aliases for Japanese CVs. Since the parameters for it should be the same, the duplication should be done after the sample is finished being configured. Duplicate samples which are not taken from the same audio source and therefore would not be truly identical can be included separately, but should have something in the alias to distinguish them from one another (ARPAsing numbers them, for example).

You should also be familiar with the structure of the oto. Ideally, you should have a general idea of which values will go in the parameters of each sample type, but these can always be adjusted later after testing the base. If you need a refresher, each line of the oto follows this format:

[file_name].wav=[alias],[offset],[consonant],[cutoff],[preutterance],[overlap]

I've colour-coded the parameters to make it a little more obvious what each value is refering to in the visual editor.

Lastly, you'll want to know what tempo the reclist is meant to be recorded at, as this will affect the parameter values, notably the offset and cutoff.

With all of this in mind, you're ready to start building your base.

☆ Making Base OTOs from Scratch ☆

You wil need:

  • A complete reclist or an unfilled oto.ini from a fully recorded voicebank. Essentially, you just need something that has all of the file names in it.
  • A text editor. The default Notepad app will work fine, but one which is able to use regular expressions (regex) and which will allow you to change the character encoding is more efficient. For Windows users, I recommend Notepad++; it's free, open source, and easy to use.

The steps presented here do not have to be done in a strict order, this is just order of presenting them I felt was most logical.


Copy, Paste, Find, Replace

For most reclists, you are going to have a lot of strings which are identical in structure and differ only by what phones are included. As such, it's often more efficient to build the base of one string of each type which can then be duplicated and edited rather than having to manually create every line of the reclist.

Which strings this can be done with will of course depend on the reclist, but it can save a lot of time. The caveat is that if there is a mistake in the initial string, like a typo in the alias, this will also get copied to every other string, so be sure to proofread your work carefully.

For example, let's say we're making a base for a 2-mora CVVC and the majority of our strings look like CV-CVC. Rather than manually writing each line of the reclist, we can build the base for the first string, then copy it for the others. So, we alias the first line like so:

ka-kak.wav=- ka,,,,,
ka-kak.wav=a k,,,,,
ka-kak.wav=ka,,,,,
ka-kak.wav=a k-,,,,,

Then, we copy+paste it right underneath:

ka-kak.wav=- ka,,,,,
ka-kak.wav=a k,,,,,
ka-kak.wav=ka,,,,,
ka-kak.wav=a k-,,,,,
ka-kak.wav=- ka,,,,,
ka-kak.wav=a k,,,,,
ka-kak.wav=ka,,,,,
ka-kak.wav=a k-,,,,,

Now, we can highlight the duplicated string, open up the find and replace menu using CTRL+H, and replace a with i. Make sure that In selection is enabled so that other instances of a won't be effected, as well as Match case because UTAU is case-sensitive. Hit Replace All. find and replace example screenshot 1

Uh-oh, looks like we accidentally changed the .wav part of the oto for the new lines. There's an easy way to fix this, however; we repeat the find and replace process on the section by finding wiv and replacing it with wav.

Alternatively, this problem can be avoided altogether by creating a placeholder string which uses characters that won't be found elsewhere in the reclist and using this for creating new strings, for example:

XY-XYX.wav=- XY,,,,,
XY-XYX.wav=Y X,,,,,
XY-XYX.wav=XY,,,,,
XY-XYX.wav=Y X-,,,,,

But this isn't strictly necessary. Which placeholders are used is obviously going to depend on the specific reclist and string type as well.

When we have all of our vowels taken care of, we can duplicate the entire section and repeat the find and replace process with the next consonant. find and replace example screenshot 2

This can also be done the other way around for reclists which are organized by vowel first and consonant second, but general idea remains the same; no reason to create extra work for yourself.

Of course, this becomes trickier when dealing with triphones like VCVs, consonant clusters, and generally non-Latin characters, and this whole process is reliant on strings being uniform, but having this in your toolbelt is quite useful when applicable.

Ideally this will also be done with all of the parameters of each line set to avoid even more tedious work, but I wanted to show how to do it before diving into the parameters themselves.


Alias

Before setting any of the parameter values, it's a good idea to set the sample alises of the string you're working with. This will help you see what types of samples are being pulled from the string and how many there are of each type.

For an unstringed CV reclist, the alias is going to be the same as the file name (or its kana equivalent). Even if they are exactly the same, it's good practice to set the alias anyways, especially if you plan to put the audio files into a subfolder, just to ensure they'll be pulled correctly by the resampler.

For any other type of reclist, you'll want to duplicate the string for as many samples as will be pulled from it. This can be done by copypasting or by hitting CTRL+D in Notepad++.

Regex Technique: If you know exactly how many lines you need, or want to duplicate multiple strings by the same amount, you can highlight the string(s), find ^(.*)$ and replace with $1\r$1\r$1... with as many $1's as you need separated by \r.

For example, if we have a Japanese VCV string like ka-ka-ki-ka-ku-ka-N-ka, we would duplicate it 6 times for 7 lines total, one for the initial CV and six for each VCV. Assuming that [ん] is taken care of elsewhere in the reclist, we do not need to create a line for [a ん] here.

Next, fill in each of the aliases. Remember that these go between the equals sign and the first comma, and any whitespace will be read as part of the alias. So, our VCV string ends up looking like this:

ka-ka-ki-ka-ku-ka-N-ka.wav=- か,,,,,
ka-ka-ki-ka-ku-ka-N-ka.wav=a か,,,,,
ka-ka-ki-ka-ku-ka-N-ka.wav=a き,,,,,
ka-ka-ki-ka-ku-ka-N-ka.wav=i か,,,,,
ka-ka-ki-ka-ku-ka-N-ka.wav=a く,,,,,
ka-ka-ki-ka-ku-ka-N-ka.wav=u か,,,,,
ka-ka-ki-ka-ku-ka-N-ka.wav=n か,,,,,

For an English CVVC string like A-pA-bA-tA-dA-kA-gA, we get something like this, with 6 VCs and 6 CVs making for a total of 12 lines per string:

A-pA-bA-tA-dA-kA-gA.wav=A p,,,,,
A-pA-bA-tA-dA-kA-gA.wav=A b,,,,,
A-pA-bA-tA-dA-kA-gA.wav=A t,,,,,
A-pA-bA-tA-dA-kA-gA.wav=A d,,,,,
A-pA-bA-tA-dA-kA-gA.wav=A k,,,,,
A-pA-bA-tA-dA-kA-gA.wav=A g,,,,,
A-pA-bA-tA-dA-kA-gA.wav=pA,,,,,
A-pA-bA-tA-dA-kA-gA.wav=bA,,,,,
A-pA-bA-tA-dA-kA-gA.wav=tA,,,,,
A-pA-bA-tA-dA-kA-gA.wav=dA,,,,,
A-pA-bA-tA-dA-kA-gA.wav=kA,,,,,
A-pA-bA-tA-dA-kA-gA.wav=gA,,,,,

It's a good idea to generally keep your aliases in sequential order, as this will help you keep track of where you are in the audio file and make it more obvious if your offsets are out of whack, but as we see with the CVVC strings, we don't have to be strict about it. In that case, I prefer to group like-samples together to make for easier oto'ing, so that way you won't have to be constantly alternating between VCs and CVs. This is a matter of personal preference for organization, however; the order of the lines in the oto doesn't actually matter.

For some base otos, the process stops here, but most of the time, you'll want to fill in some of the numerical values as well.


Timing

As mentioned, in order to figure out our approximate parameter values, we need to know how the samples are going to be timed in the recordings. Most importantly, we need need to know the tempo (BPM) the strings will be recorded at, as this will tell us the duration of every beat and by extension how long every syllable will be held.

The oto is measured in milliseconds (ms), and thus voicebanks are usually recorded at tempos where every beat's duration is nice number to work with. Here's a handy chart:

Tempo1/4 beat1/2 beat1 beat2 beats4 beats8 beats
100 BPM150 ms300 ms600 ms1200 ms2400 ms4800 ms
120 BPM125 ms250 ms500 ms1000 ms2000 ms4000 ms
150 BPM100 ms200 ms400 ms800 ms1600 ms3200 ms

A longer vowel gives the config and by extension the resampler more to work with, but holding a note too long can be more difficult for the vocalist or start to create irregularities. A good note length is usually somewhere between 400 and 2400 ms. As such, voicebanks are usually recorded with each syllable held from 1 to 4 beats at somewhere between 100 and 150 BPM.

Note: Going above or below these BPMs is a bit cyclical with how the math works out. For example, 2 beats at 100 BPM is equivalent to 1 beat at 50 BPM and 4 beats at 200 BPM.

Using this, we can figure out how far apart each of our samples will be in the oto; for instance, if we're recording at 120 BPM and holding each syllable for 1 beat, then each syllable in the string is going to be 500 ms apart. This is what we'll use to determine our parameter values.


Preutterance and Overlap

It may seem counter-intuitive to start with these as they are are at the end of the oto line, but these values are often the easiest to predict, and are additionally used for determining what other parameters should be set to.

For samples like VCVs, VCs, and VVs as well as in some cases initial CVs, these are going to have fixed values, meaning that they won't be changed during configuration. How long the preutterance is will be somewhat dependant on how much of the previous vowel we have to work with and how long we want the transition to last from note to note. The overlap will usually be 1/4th or 1/3rd the length of the preutterance, and should not be more than half of it.

For a longer transition with more overlap (typical for VCVs), I use 300 for the preutterance and 100 for overlap, and for a shorter transition with less overlap (typical for VCs), I use 200 for the preutterance and 50 for the overlap.

Note: I don't really recommend going above or below those preutterance values for this sample type; setting the preutterance above 300 can cause problems with rendering shorter notes, and setting it below 200 can be too short to capture enough of the previous vowel.

For typical CVs, I find that a preutterance of 100 and an overlap of 25 is a good starting point, as a typical consonant is going to fall somewhere around that length, but these can be refined later if they're too long or short for a particular phone. For medial CCs, I use a preutterance of 75 and an overlap of 25 since slightly less of the consonant is included.

So, our examples from before now look like this:

ka-ka-ki-ka-ku-ka-N-ka.wav=- か,,,,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a か,,,,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a き,,,,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=i か,,,,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a く,,,,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=u か,,,,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=n か,,,,300,100


A-pA-bA-tA-dA-kA-gA.wav=A p,,,,200,50
A-pA-bA-tA-dA-kA-gA.wav=A b,,,,200,50
A-pA-bA-tA-dA-kA-gA.wav=A t,,,,200,50
A-pA-bA-tA-dA-kA-gA.wav=A d,,,,200,50
A-pA-bA-tA-dA-kA-gA.wav=A k,,,,200,50
A-pA-bA-tA-dA-kA-gA.wav=A g,,,,200,50
A-pA-bA-tA-dA-kA-gA.wav=pA,,,,100,25
A-pA-bA-tA-dA-kA-gA.wav=bA,,,,100,25
A-pA-bA-tA-dA-kA-gA.wav=tA,,,,100,25
A-pA-bA-tA-dA-kA-gA.wav=dA,,,,100,25
A-pA-bA-tA-dA-kA-gA.wav=kA,,,,100,25
A-pA-bA-tA-dA-kA-gA.wav=gA,,,,100,25


Cutoff

Next, let's determine the cutoff. A cutoff with a value of 0 would tell UTAU to stop the sample at the very end of the .wav file, which is almost never what we want. A cutoff with a positive value counts backwards from the end of the file to the beginning. We could, theoretically, calculate exactly where each of our cutoffs should be relative to that, but especially if you're not recording with a guideBGM, that can be tricky to determine.

Instead, there's a much easier solution: setting the cutoff to a negative number instead counts the milliseconds forward from the position of the offset. This is where the tempo comes into play; if we know how many beats a syllable is being held at at a particular tempo, we know, then, that the cutoff of a CV should be around that value in the negative.

More precisely, the cutoff of any sample ending in a vowel should be slightly less than the duration of a single syllable + the length of the preutterance. This is because we are not just considering the duration of the syllable, but also anything that's being captured in the syllable before it in the case of VCV. We also want to stop the sample a little bit before the syllable actually ends to avoid instability in the vowel; usually shaving off about 200 ms does the trick, depending on how much vowel we're working with.

To summarize our formula: cutoff = (syllable duration + preutterance - 200) * -1.

So, a CV sample held for 1 beat at 120 BPM winds up with a cutoff of -400. A VCV sample held for 1 beat at 120 BPM winds up with a cutoff of -600.

For transitional VCs, the cutoff will be much shorter, since we're using the consonant as the blending point. For these, we only have around 100 ms of duration to work with, but also much less of the end that needs to be cropped. I find leaving about 50 ms between the preutterance and cutoff is usually good.

Putting this together, we get:

ka-ka-ki-ka-ku-ka-N-ka.wav=- か,,,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a か,,,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a き,,,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=i か,,,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a く,,,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=u か,,,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=n か,,,-600,300,100


A-pA-bA-tA-dA-kA-gA.wav=A p,,,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A b,,,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A t,,,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A d,,,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A k,,,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A g,,,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=pA,,,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=bA,,,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=tA,,,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=dA,,,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=kA,,,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=gA,,,-400,100,25


Consonant

The consonant parameter, called the fixed parameter in vLabler, should fall between the absolute values of the preutterance and cutoff — meaning if the preutterance is 300 and the cutoff is -600, the consonant parameter should be between 300 and 600.

Ideally we want as much as possible of the final phone in the sequence to be stretched/looped by the resampler, so we want the consonant value to be large enough to capture any irregularities at the start of the final phone while still leaving room for that to happen.

For samples ending in vowels, setting this to be 100 to 200 more than the preutterance is a good starting point. For samples ending in consonants, usually we want this to be no more than 25 over than the preutterance since there's not much area to work with.

So, we get something like this:

ka-ka-ki-ka-ku-ka-N-ka.wav=- か,,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a か,,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a き,,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=i か,,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a く,,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=u か,,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=n か,,400,-600,300,100


A-pA-bA-tA-dA-kA-gA.wav=A p,,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A b,,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A t,,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A d,,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A k,,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A g,,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=pA,,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=bA,,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=tA,,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=dA,,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=kA,,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=gA,,250,-400,100,25


Offset

Finally, we get to the offset parameter. So far, all of our values have been identical for like-samples, but here is where we finally set them apart.

First, we need to figure out where our utterance actually begins in the .wav file. If using a guideBGM, this is straightforwardly predictable, otherwise, we can use our best guess. Typically, we want at least a beat or so of rest before the utterance to serve as a buffer and make sure none of the initial syllable gets cutoff. My own guideBGMs leave two.

Once again using our BPM to calculate the duration of each beat, we can determine where the first beat of the utterance lands. If we have two beats of rest at 120 BPM, then the first beat of utterance falls at approximately 1000 ms. But before we set the first offset to 1000 and call it a day, we can't forget about our preutterance value. See, the utterance does not truly begin at the first beat; the first vowel hits on the first beat; any consonants uttered at the start of the syllable will fall before that mark.

Thus, the parameter of a CV or VCV that hits exactly on the beat is not the offset, but rather the preutterance, so in these cases offset = position of the current beat - preutterance... in theory. In actuality, I find this is not always very accurate, but it's a good starting point and can be adjusted later after the base has been tested.

For VCs, offset = offset of the following syllable's CV - the VC's preutterance. Again, though, this is not always very accurate to real utterances.

The offsets of subsequent samples will simply be the previous offset + the duration of one syllable. Using these estimates, we end up with:

ka-ka-ki-ka-ku-ka-N-ka.wav=- か,700,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a か,1200,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a き,1700,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=i か,2200,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=a く,2700,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=u か,3200,400,-600,300,100
ka-ka-ki-ka-ku-ka-N-ka.wav=n か,3700,400,-600,300,100


A-pA-bA-tA-dA-kA-gA.wav=A p,1200,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A b,1700,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A t,2200,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A d,2700,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A k,3200,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=A g,3700,225,-250,200,50
A-pA-bA-tA-dA-kA-gA.wav=pA,1400,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=bA,1900,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=tA,2400,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=dA,2900,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=kA,3400,250,-400,100,25
A-pA-bA-tA-dA-kA-gA.wav=gA,3900,250,-400,100,25

These aren't exactly the offsets used in my own bases due to the aforementioned inaccuracies, but they're pretty close.

Now that we've completed a whole string of the oto, and we can use this string as a template for creating the rest of the base via the methods discussed earlier.

☆ Making Bases from Existing OTOs ☆

AKA, how to calculate the average values of an oto.ini and use them to make a base. This does not require actual math on your part.

I recommend still following the previous section in terms of setting up the first string of the oto (or a placeholder string) and then duplicating and modifying it for each of the following strings of the same type, rather than going through and setting the averaged values for every string manually.

This is also a good way to check how accurate an existing base oto is; see the next section for instructions on how to easily modify or update an existing base.

You will need:

  • A fully configured oto.ini from an existing voicebank.
  • A text editor. The default Notepad app will work fine, but one which is able to use regular expressions (regex) and which will allow you to change the character encoding is more efficient. For Windows users, I recommend Notepad++; it's free, open source, and easy to use.
  • A spreadsheet editor with built-in formulas, like Microsoft Excel, Google Sheets, or LibreOffice Calc.

Setting Up the Spreadsheet

First, open the fully configured oto.ini in your text editor.

Now, we want to get the oto into a format where every value will be placed in a different cell of a spreadsheet. There are two ways to accomplish this, but both have the same end result.

CSV Method

Shout out to slashhearts for suggesting this method.

CSV starts for "comma separate value"; it is a type of text file that will read each comma as, well, a separator. Handy for us then that the oto.ini already does this.

We will want to separate our aliases, too, however, but this can be done by opening the Find and Replace menu (CTRL+H) and replacing = with , in the entire file.

Next, save the file as something like oto.csv, and open the newly created .csv file in your spreadsheet editor.
CSV method example

Tab Method

This method involves finding and replacing all of the commas in the oto with a tab character.

Open up the Find and Replace menu (CTRL+H), find , and replace with a tab (you may need to copypaste it into the box). Next, find = and replace it with tab as well.

Now, create a new file in your spreadsheet editor. Select the entire contents of the edited oto (CTRL+A) and copypaste them into the blank sheet.
tab method example

Regardless of which method was used, we should now have a spreadsheet that looks something like this:
oto spreadsheet

If we have different types of strings, the next step is to separate them by string type. In this reclist, this is already done thanks to the different string prefixes, but that might not always be the case. I'll use the CVVC strings for this demonstration, so I separated them into their own page of the spreadsheet.

If we have different types of samples within the same string, as we do here, we'll also want to separate those from each other. It wouldn't be very useful to average, say, the values of a CV and VC together. This is why we separated the aliases into their own cells, as this makes sorting them by sample type easier.

Select the collumn with the aliases in it, right click, and select Sort sheet A to Z. I'm working in Google Sheets, but this process should be similar in other spreadsheet editors.
sort sheet A to Z

You'll likely still have to do a little bit of manual sorting by dragging the rows around, but this should speed up the process. I separate the sample groups with a few empty rows for visibility. You can also move like-samples to their own spreadsheet pages if it makes it easier for you.


Calculating Most Averages

Now comes the calculations. For every parameter except the offset, we can get one big average value using every sample of the same type. To do this, create an empty row underneathe the last sample, and enter this formula into the cells corresponding to the consonant, cutoff, preutterance, and overlap:

=AVERAGE(NX:NY)

Where N represents the current collumn letter, X represents the first row of the range, and Y represents the last row of the range.

It can also be useful to calculate the median value to make sure the average isn't being thrown off by outliers.
average and median values

In our case, they're pretty comperable, which is a good sign that our oto is consistant. We could plug these values into our base exactly as they are, but this would only representative of one speaker's production, so it can be better to round them to "nicer" values. We can do that by creating a new row underneath our average and entering this formula:

=MROUND(number, factor)

To round the average values to, say, the nearest 25 milliseconds, we end up with something like this:
rounding the average

Note: To use MROUND with a negative number, the factor needs to also be negative. So, for the cutoff value, I used -25.

Thus, the VCs in our base oto would end up with these values, which you might notice are the same as the ones I recommended if building a base from scratch:

[file_name].wav=V C,[offset],225,-250,200,50


Calculating Average Offsets

Finally, let's figure out our offsets. Since these will be different depending on which syllable in the string a sample is being taken from, we'll need to organize our samples by this.

To do this, select all the rows in the spreadsheet corresponding to the sample type we're looking at, right click on the selection, and navigate to View more cell actions > Sort range.
sort range 1

In this menu, select the collumn that corresponds to the offset value and sort it A to Z. Again, this process might not be exactly the same in other spreadsheet editors, but should be similar.
sort range 2

Now our samples should be more-or-less organized by which syllable they're taken from, and we can repeat the averaging process for each group of them. Where to split the groups should be somewhat obvious by a sudden spike in the offset value by a few hundred milliseconds.
first offset average

Next, let's compare the average offsets of each syllable position. In my case, I've ended up with 1336, 1860, 2344, 2864, 3362, and 3863. As expected, the initial offset hits at around 200 ms (the length of the preutterance) less than the position of fourth beat (the second syllable of a string preceeded by 2 beats of rest) and each offset is roughly 500 ms apart (the duration of 1 beat at 120 BPM).

Most of these values MROUND to ending in 50, so let's use that for our base value.

Putting it altogether, we get the following base values for our VCs:

[file_name].wav=V C1,1350,225,-250,200,50
[file_name].wav=V C2,1850,225,-250,200,50
[file_name].wav=V C3,2350,225,-250,200,50
[file_name].wav=V C4,2850,225,-250,200,50
[file_name].wav=V C5,3350,225,-250,200,50
[file_name].wav=V C6,3850,225,-250,200,50

These are pretty close to the base offset values we calculated manually, but should be more accurate on the whole. You may also notice that these values are the same as what I use in my own CVVC base otos, and this is exactly how I calculated them.

If we want to be even more precise, we could run these calculations separately for consonants with different manners of articulation, like taps which are quite short versus sibilants which are a bit longer than average, but taking a broad average is usually good enough for a base.

☆ Modifying Base OTOs ☆

If you want to modify an existing base oto, whether because you've just averaged your values and found more precise numbers to use, because you want to record at a different tempo than a base is set up for, or some other reason, the process is pretty straightforward.

Basically, you can just find and replace the values you want to change. Be careful not to get unintented lines of the oto caught in the crossfire, but usually matching and replacing all of the numerical values of the line isn't likely to match unintended samples.

For example, matching all instances of 225,-250,200,50 will affect the consonant, cutoff, preutterance, and overlap of the VCs in the previous examples, but is not likely to match with those parameters on any unintended sample. So if we want to, say, change the preutterance to 300 (and subsequently the other oto values), we could enter 325,-350,300,100 in the replacement box.