Adventure Game Studio | Forums

AGS Support => Advanced Technical Forum => Topic started by: GarageGothic on Mon 12/05/2008 00:49:15

Title: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Mon 12/05/2008 00:49:15
While surfing around, I came across this open-source lipsync software (http://www.annosoft.com/sapi_lipsync/docs/). It analyzes a voice clip (.wav, .ogg, .mp3*) and automatically assigns phonemes to it. It then writes the data in a fully documented file format. But I assume that the source code can be changed to output data to Pamela and other formats. (To see it in action, check out the demo for Lipsync Tool (http://www.annosoft.com/demos.htm) a commercial implementation of the technology - I tried it on a few old voice samples, and was quite impressed).

With the growing number of voice acted AGS games, lipsync support is becoming a standard feature of AGS. Unfortunately the only supported lipsync program, Pamela, is crash-prone and slow to work with. Everything has do be done manually, and the text of each voice line must be pasted in. The Al Emmo team spent a month or more just on the lipsyncing. It's probably wise that Dave Gilbert hasn't tried to use it on his games, or we would still be waiting for Blackwell Legacy.

However, with this source code lipsyncing would become a batch process with little-to-no work from the developer.

I don't really think this needs to be integrated in the AGS engine, but could work fine as a stand-alone tool (which would also prevent any license issues for the finished game). Unless CJ sees any need to add further lipsync format support, it would have to be customized to output Pamela files though. Unfortunately I don't have the programming skills to do any of this, but I thought it might be a good idea to make you aware of this technology, and maybe hear what people think about its possibly use with AGS.

Edit: I see now that it requires Microsoft's SAPI SDK to be installed, but I don't think that would be a problem for developers dedicated enough to recruit voice actors.

*Edit 2: It seems that, unlike the Lipsync Tool, the source code only supports .wav files. I don't know how much work it would be to integrate .ogg and .mp3 decoding before running the sync.

Edit 3: Made some further tests. I ran the compiled binary from the source code distribution file on a couple of wave files (purely automated, textless sync) and then played the output back in Lipsync Tool.  They were Blackwell Convergence samples of Joey and Rosangela that Dave posted on his forums, and both voices synced up great. I'm also quite impressed how many of the words that the speech recognition identified correctly in the text output (not that the exact words as written are all that important for the sync).
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: Rui 'Trovatore' Pires on Mon 12/05/2008 13:43:44
All of this interests me hugely and immensely. I have no programming skills outside AGS (which is "scripting", not "programming", anyway), so I can't help... but this sounds extraordinarily useful, if possible.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Mon 12/05/2008 14:35:11
I uploaded a short test clip. Click here to view directly in your browser (http://www.zippyvideos.com/2049755757583616/). Please don't sue me for copyright infringement, Dave :). I should also add that I recorded some lines in my own language, Danish, and the lipsync works just as well with non-English voice samples.

Edit: Fixed the video format and uploaded to streaming server so you don't have to download anything. Note that the lipsyncing was done entirely with the open-source tools. The commercial Lipsync Tool software is only used for playback in the example.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: Pumaman on Mon 12/05/2008 19:39:57
Impressive. Yeah, I think this is something better suited to a standalone utility, or editor plugin, if anyone feels so inclined.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Tue 13/05/2008 03:29:28
CJ, I know that voice lipsyncing is an unsupported feature of AGS. But would it be possible to get some documentation on how AGS interprets the Pamela format? I haven't been able to track down any official documentation of the format, and looking through the source code didn't help me. I'm messing about a bit with visual c++ to see how hard it would be to output .pam files from the automatic lipsync source.

I assume that since you can customize your own phonemes, that part of the conversion isn't a problem. As long as the phonemes put out by the automatic lipsync match those in AGS' lipsync table everything should be fine, right?

The timings puzzle me however, I thought they were measured in frames (Pamela's fps, not AGS loops), but the numbers seem far too high. The timings are not in sequence either, so they can't be times measured from the beginning of the wave file. Are they additive (i.e. first phoneme "1215:S" runs from 0 to 1215, second phoneme "765:IH1" runs from 1215 to 1215+765=1980)? If so, how is silence handled? I see no spaces in my .pam files?

Edit: Playing around a bit, I discovered that the timings are indeed measured from the beginning of the wave file, they're just listed totally out of order in the .pam file. Does this (lack of) order have any special significance? I still can't work out what the timings are measured in though. Changing the framespersecond doesn't seem to alter the timings. And framesperphoneme seems only to be used internally in Pamela when breaking up the phonemes from a text?
While playing around with Pamela, I also found out something about the lack of pauses. It seems that each phoneme frame is held until the next one starts, and that the default phoneme set of Pamela phonemes doesn't include silence (closed mouth frame). But if we were to associate a frame in AGS with a symbol to indicate pause (the lipsync software uses "), and used the same symbol in the .pam file, it should work, I believe. Can you please confirm this?

Also, just to make sure, is it possible for AGS to interpret timing values that aren't mutliples of 15 (as seems to be the default in Pamela and not very precise)? And do AGS at all parse the framespersecond and framesperphoneme values?

Edit 2: Sorry about the constant updates, it's just that I keep playing around with the programs. After adding a phoneme to a distinctive section of a voice clip in Pamela and locating the same part in Audacity, I worked out that Pamela values are the timing in seconds multiplied by 360, for some unknown reason.

So the basic conversion would now be:

*Retrieve phoneme start value from automatic lipsync in milliseconds
*Multiply this value with 0.360
*Retrieve the phoneme as-is. If we set the lipsync in AGS up properly we don't need any conversion

The above is a very simple process which could even be accomplished in AGS script. However, for the lipsync process to be manageble for the user, we'd still need:

*Batch processing of a whole audio folder (*.wav doesn't currently work as a parameter)
*Decoding of .mp3 and .ogg files to a temporary .wav before lipsync (could be done with external program before syncing, but would be nice to have integrated)
*Output of (modified as specified above) lipsync data to files named after the source audio files with the extension .pam (currently the program outputs data to a console)

As mentioned above, I downloaded Visual C++ Express to try to change the source code. But so far I haven't even figured out how to import it!  :P I convert it without any errors reported, and am then told it can't be opened in this version of Visual Studio. Perhaps I should try another compiler.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Tue 13/05/2008 14:20:24
I can certainly vouch for the fact that it takes a very, very long time to lipsynch lines for an AGS game. A faster and easier way to do this would be most welcome!

I had previously looked into these Annosoft programs, but they seemed very pricey. I never knew that they also offered a free version of their source code. Nice find!

With automated lipsycnch, you're probably not going to get as accurate results as you would from manually syncing the lines yourself. Background noise and bad quality recordings can result in phantom phonemes being added during an automated process.  But, of course, you could always go back and tweak them in Pamela later on (if they're indeed going to get output to .pam format.)

That Annosoft lipsync program also has tons of phonemes (a lot of the transitional ones) that are entirely unnecessary for a 2D AGS game.  I've done  fair bit of experimenting in regards to getting decent looking results in AGS with a minimum amount of portrait frames, and you really only need 8 phoneme frames in AGS to have convincing lipsync animations. I use the following Pamela phonemes to represent the visual mouth frames, and would suggest that all of the redundant phonemes in Annosoft's program revert to the most relevent of the 8 frames below (or if you want to use more or less phonemes, this aspect could even be entirely tweakable):

ZH = (Mouth closed frame)
AY0 = Mouth in A position. Used for the letters:  A, I, U
W = Mouth in W position. Used for the letters: Q, W
EH0 = Mouth in E position. Used for the letters: C, E, G, H, K, R, Y
S = Mouth in S position. Used for the letters: CH, D, J, N, S, SH, T, X, Z
F = Mouth in F position. Used for the letters F, V
L =Mouth in L position. Used for the letter: L
AO0 = Mouth in O position. Used for the letter: O
B = Mouth in M position. Used for the letters: B, M, P

This source code looks promising, though. Hopefully somebody will have luck turning it into something that can be used to simplify this time-consuming process!
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Tue 13/05/2008 17:28:13
Thanks for your input, AGD2. Great to hear from someone experienced with lipsyncing!

Of course an automated process will never be as accurate as manual lipsyncing. But considering the time saved, I'm quite happy with my tests so far. Even low quality samples (a low-bitrate clip sampled from an old VHS tape and voice recorded with a cheap headset) seemed to process quite well. As you say, it's always possible to tweak the file in Pamela, and it probably will be necessary for non-vocal sounds such as a character coughing.

Your phoneme list will also be very useful. I just have to figure out whether it makes most sense to force simplified phonemes during file format conversion or just let the developer set it up himself within AGS. I guess the latter solution would be more flexible, though you're right that the total amount of phonemes output by Annosoft's code is overkill for non-3D games.

Regarding my attempts with the source code, it turns out that the source depends on Microsoft's ATL classes which are not included with the free Visual C++ Express. Is there anyone out there with the full Visual C++/Visual Studio version who would like to give it a try?
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: Pumaman on Tue 13/05/2008 19:41:51
Quote from: GarageGothic on Tue 13/05/2008 03:29:28
CJ, I know that voice lipsyncing is an unsupported feature of AGS. But would it be possible to get some documentation on how AGS interprets the Pamela format?

Ok, I've uploaded an extract from the AGS editor source code. This is the function that compiles the PAM files, so that you can see what AGS does with them:
http://www.adventuregamestudio.co.uk/temp/pamreader.txt

Basically, it just reads each phenome and converts the timing, and that's basically it.
Currently it uses a hardcoded assumption of how to translate the pamela timings into milliseconds.

QuoteRegarding my attempts with the source code, it turns out that the source depends on Microsoft's ATL classes which are not included with the free Visual C++ Express. Is there anyone out there with the full Visual C++/Visual Studio version who would like to give it a try?

I don't think compiling the pamela source code would be particularly useful for this -- all you'd need to do would be to write a separate application that could convert output files from Annosoft into .PAM files that AGS can read.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Tue 13/05/2008 20:56:21
Thanks for the code, CJ! It's nice to know exactly which parts of the file AGS interprets and which it ignores. It's a bit ironic that AGS internally interprets the data to a format much closer to Annosoft's (in milliseconds and with end timings). For a moment the timing calculation confused me, but it all adds up to 0.360 as I had discovered (dividing a number by (1000/15)/24=2.777777 is the same as multiplying it by 0.36). That is only true if the fps setting hasn't been changed in Pamela, but with AGS using hardcoded values it's not a problem.

Quote from: Pumaman on Tue 13/05/2008 19:41:51
QuoteRegarding my attempts with the source code, it turns out that the source depends on Microsoft's ATL classes which are not included with the free Visual C++ Express. Is there anyone out there with the full Visual C++/Visual Studio version who would like to give it a try?

I don't think compiling the pamela source code would be particularly useful for this -- all you'd need to do would be to write a separate application that could convert output files from Annosoft into .PAM files that AGS can read.

Ah, no I meant for compiling the modified Annosoft code. I'm not touching the Pamela source at all, only using it for reference.

Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Tue 13/05/2008 23:51:25
I've made a test editor plugin that converts the output of the Annosoft program to Pamela's format:
http://ueberlicht.googlepages.com/AGS.Plugin.generatepam.dll
(only wav files atm; sapi_lipsync.exe has to be in the editor folder)

I think I'll add that to the audio manager plugin...
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: SSH on Wed 14/05/2008 10:20:26
Awesome job, smiley!
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Wed 14/05/2008 15:55:13
Excellent! Do you plan to add .ogg and .mp3 support? It would definitely be great to be able to batch process speech files from the Audio Manager menu.

I guess this means that I can cancel my, ahem, aquiring of the 3GB+ Visual Studio :)
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Wed 14/05/2008 19:21:46
Wow, very nice work, Smiley!  I've tested it out briefly and it's pretty impressive. At the moment it doesn't work when loading the generated .pam files into Pamela, on account of some of the letters being lowercase and others not having a number after them. I'll post some more about this tomorrow.  (Pamela only recognizes upper case letters, although AGS probably isn't as picky.)

Oh, one thing that would be handy is to have an option to offset all the phonemes to play a little earlier. The reason being that when speech is lip synced on-the-fly, the program has to process the sound first and then generate the letters. But in real life, people visibly move their lips into position before the vocalizations are produced.  So having the ability to offset the phonemes in that manner would really help it look more natural.

Great work though! I'll post more info on the complete phoneme list soon.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Wed 14/05/2008 23:50:09
Quote from: GarageGothic
Do you plan to add .ogg and .mp3 support?
Yes. Probably by converting them back to wav...

Quote from: AGD2
At the moment it doesn't work when loading the generated .pam files into Pamela, on account of some of the letters being lowercase and others not having a number after them.
And I didn't include the "Preferences" section in the .pam file. ;)


I definitely want to add an editor for .pam files which also shows a preview of the speech animation.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Thu 15/05/2008 00:03:35
QuoteYes. Probably by converting them back to wav...

Yeah, you could just decode them to a temporary file, use it as a source for Annosoft and then delete it. Since your AudioManager already supports .mp3 and .ogg playback, the lack of a wav wouldn't be an issue in your own phoneme editor.

QuoteI definitely want to add an editor for .pam files which also shows a preview of the speech animation.

That would be absolutely awesome! Especially with Pamela being so buggy, only working with .wav files and having a far from smooth preview playback. Being able to import, automatically lipsync and then tweak the lipsync animation, all within the editor would make the process so much smoother.

This whole thing has completely changed my view on adding speech to my game. Now it's beginning to seem manageable, despite the huge job of recording and processing the voice clips.

Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Fri 16/05/2008 06:24:26
Quote from: Smiley
And I didn't include the "Preferences" section in the .pam file. ;)

I think AGS can get by without the "Preferences" section in the .pam file, but Pamela needs it to locate the directory of .wav file.  Maybe the directory could be written to the .pam file based on where you opened the .wav file from.

Quote from: Smiley
I definitely want to add an editor for .pam files which also shows a preview of the speech animation.

That would be an excellent addition. Would this editor be only for preview purposes? Or would it also allow you to move phonemes around at will and change them to other values like Pamela's editor?



Anyhow, here's the full phoneme list from Pamela. It shows which of the 8 selected "AGS" phonemes I have each Pamela phoneme revert to. Perhaps this could be considered for usage as the 'default' setting, but you could also allow users to tweak, change, add and delete phonemes as they see fit.

Note that the table below encompasses ALL existing Pamela phonemes. This is exactly how I set them up in AGS's "Lip sync" section:

0:  ZH/None
1: AY0/AY1/AY2/AA0/AA1/AA2/AH0/AH1/AH2/AE0/AE1/AE2
2: W/OW0/OW1/OW2/OY0/OY1/OY2/UW0/UW1/UW2
3: EH0/EH1/EH2/CH/ER0/ER1/ER2/EY0/EY1/EY2/G/K/R/Y/HH
4: S/Z/IH0/IH1/IH2/IY0/IY1/IY2/SH/T/TH/D/DH/JH/N/NG
5: F/V
6: L
7: AO0/AO1/AO2/AW0/AW1/AW2/UH0/UH1/UH2
8: B/M/P

Those graphical frames in order are:

0: Mouth Closed
1: A frame
2: W frame
3: E frame
4: S frame
5: F frame
6: L frame
7: O frame
8: B frame

And here's an example of a dialogue portrait with those 8 frames in order:

(http://www.agdinteractive.com/images/dialogue_Graham2.gif)

Hope this helps.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Fri 16/05/2008 11:45:57
Quote from: AGD2 on Fri 16/05/2008 06:24:26I think AGS can get by without the "Preferences" section in the .pam file, but Pamela needs it to locate the directory of .wav file.  Maybe the directory could be written to the .pam file based on where you opened the .wav file from.

If you add phoneme editing to the plugin, it would make sense to read the .wav (or, hopefully, .mp3 or .ogg) directly from the speech folder. Especially as people tend to move their files around when upgrading to a new AGS version. But for those wanting to use that god-awful Pamela tool, legacy support is a good idea.

How did you come up with that phoneme list? Trial and error? It seems to differ just slightly from the Preston Blair mouth shapes (see here (http://minyos.its.rmit.edu.au/aim/a_notes/mouth_shapes_01.html) for original, here (http://z.about.com/d/animation/1/0/7/2/mouthpositions.jpg) for an alternate version, and here (http://www.garycmartin.com/mouth_shapes.html) for 3D version). I think it's definitely important to set up a good default scheme, since that's what a lot of people will end up using. Perhaps even with a default animation to model your own artwork on and to test lipsyncing for characters who don't yet have speech animations. Would it make sense to have two default phoneme setups, simple phonemes (for pixel art) and extended phonemes (for hi-res or pre-rendered art)?
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Fri 16/05/2008 15:27:20
Quote from: GarageGothic
How did you come up with that phoneme list? Trial and error? It seems to differ just slightly from the Preston Blair mouth shapes (see here (http://minyos.its.rmit.edu.au/aim/a_notes/mouth_shapes_01.html) for original, here (http://z.about.com/d/animation/1/0/7/2/mouthpositions.jpg) for an alternate version, and here (http://www.garycmartin.com/mouth_shapes.html) for 3D version).

Yes, this was completely trial and error. I didn't base the visual mouth shapes strictly off the Pamela sample ones, nor off any other set. I just figured out the absolute minimum amount of frames that could be used to cover all phonemes and still look convincing. I merged some of the vowel phonemes. Rather than having individual phonemes for each of the 5 vowels, now 3 phonemes cover them all (A and U were merged. E and I were merged.) The above list was the final result that I came up with. For Al Emmo, there were actually 9 phonemes in use (an additional T phoneme), but I decided that it could be dropped to bring it down to only 8. The resulting mouth animation is virtually undetectable, as S covers T very well.

Keep in mind that this is strictly from a Pamela/AGS perspective and doesn't take into account which Annosoft phonemes currently get assigned to each Pamela Phoneme in the conversion process. You would probably need to first compile a full list of all the Annosoft phonemes, compare them visually, and figure out which mouth frames look similar. Then you'd need to group all of those Annosoft phonemes into the  A, W, E, S, F, L, O, B, and 'Mouth Closed' categories, so that Annosoft and Pamela are both working with the same "AGS" phoneme set (if that makes sense.)

Quote from: GarageGothic
Would it make sense to have two default phoneme setups, simple phonemes (for pixel art) and extended phonemes (for hi-res or pre-rendered art)?

To be honest, I don't think it'd be necessary to have a seperate set of phonemes for high-res and pixel-art portraits. The mouth frames tend to move so quickly that you don't really notice how many phonemes there are. Looking at the Graham one I posted above, you'd probably be hard pressed to tell that there are 8 phonemes involved by just casually looking at it. 

I guess this is a good argument as to why it'd be ideal to allow people to add more phonemes if they think they'll need them for higher res pre-rendered portraits. After all, no sense in limiting people to one standard.  But personally speaking, I don't think having two defaults would make a great deal of visual difference. People tend to not look at the lip movements closely after a while either.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Fri 16/05/2008 15:53:59
From a quick comparison it seems that Annosoft use pretty much the same phonemes as Pamela. Though whithout the numbers (not sure what they signify, stresses?). Here's their phoneme list from Microsoft SAPI 5.1, which Annosoft uses:

Quote.SYM Example PhoneID
- syllable boundary (hyphen) 1
! Sentence terminator (exclamation mark) 2
& word boundary 3
, Sentence terminator (comma) 4
. Sentence terminator (period) 5
? Sentence terminator (question mark) 6
_ Silence (underscore) 7
1 Primary stress 8
2 Secondary stress 9
aa father 10
ae cat 11
ah cut 12
ao dog 13
aw foul 14
ax ago 15
ay bite 16
b big 17
ch chin 18
d dig 19
dh then 20
eh pet 21
er fur 22
ey ate 23
f fork 24
g gut 25
h help 26
ih fill 27
iy feel 28
jh joy 29
k cut 30
l lid 31
m mat 32
n no 33
ng sing 34
ow go 35
oy toy 36
p put 37
are red 38
s sit 39
sh she 40
t talk 41
th thin 42
uh book 43
uw too 44
v vat 45
w with 46
y yard 47
z zap 48
zh pleasure 49

So conversion shouldn't be necessary, except the few compatibility issues already mentioned - capital letters and the number after the phoneme. If your phoneme list doesn't distinguish between AY0, AY1, AY2 and so on, it should be safe to just add 0 to the phonemes needing a number. I haven't seen the stress values used in any of my Annosoft scripts so far, and I assume they're just not part of the automatic lipsync output - perhaps they're used if you also input a source text?
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Fri 16/05/2008 17:02:41
Looks pretty similar, yeah.  And yes, the numbers after Pamela phonemes represent stresses. Pamela was originally designed to assist with lip syncing for another program called "Magpie", which makes use of stresses. However, since AGS simply holds the phoneme in place until the next one supercedes it, stresses aren't needed in AGS .pam files and all of them can simply have 0 at the end.


Quote from: GarageGothic- perhaps they're used if you also input a source text?

Speaking of which, I don't suppose that feature is available in the free source code that Annosoft offer? I know that one of their commercial SDK's allows you to open a wav file AND also type the line's text in order to have both methods work together to calculate the best phoneme placement along the waveform.  This method seemed more accurate than plain on-the-fly lip syncing in Annosoft's program.


--EDIT--

Forgot to mention that the ZH phoneme is one that I hi-jacked for the "mouth closed" frame. In both Pamela and Annosoft it's used for a 'Z' sound, but since 'S' covers that (and Pamela doesn't have a default "mouth closed" frame), I reserved ZH for the "mouth closed" frame instead.

Note that Pamela also has a 'None' phoneme that displays if you forget to assign a phoneme letter to the bar. In this case, I also made unassigned 'None' phonemes revert to the "mouth closed" frame.  Just some things to keep in mind when doing the Annosoft>Pamela phoneme conversion.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Fri 16/05/2008 17:28:17
Oh yes, the free Annosoft code does indeed allow you to sync to both text and wav. It seems a bit ambitious for a plugin to try to retrieve the lipsynced text strings from AGS's dialog and script files, but if the Audio Manager got a new field for the spoken line (people might want to use the "Description" field for other things), perhaps that string could be used for syncing it.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Sat 17/05/2008 21:15:31
Quote from: AGD2
Or would it also allow you to move phonemes around at will and change them to other values like Pamela's editor?
Yes. Wouldn't be much of an editor if you couldn't do that ;). The layout will be similar to Pamela.

Thanks for the list. I've thought about letting the user add automatical converting rules, e.g. AY0 always becomes AA0 etc (After Annosoft->Pamela conversion).  But you will always be able to change the value manually.

Quote from: GarageGothic on Fri 16/05/2008 17:28:17
It seems a bit ambitious for a plugin to try to retrieve the lipsynced text strings from AGS's dialog and script files
Since the plugin can already find the usage of audio files in scripts, it should be easy to get the text...if it's already in the script/dialog.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Sat 17/05/2008 22:26:35
Quote from: smiley on Sat 17/05/2008 21:15:31Thanks for the list. I've thought about letting the user add automatical converting rules, e.g. AY0 always becomes AA0 etc (After Annosoft->Pamela conversion).  But you will always be able to change the value manually.

Wouldn't it make more sense to keep the original source format (with all phonemes), and do any customization in AGS' lipsync setup (just add AA0 to the same frame as AY0 if using your example)? That way the information would still be stored in case the user decides to add more mouth shapes later.

QuoteSince the plugin can already find the usage of audio files in scripts, it should be easy to get the text...if it's already in thescript/dialog.

Awesome! And even better to hear that it can also retrieve it from script since I use a custom dialog system which doesn't at all refer to the AGS dialog script.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Sun 18/05/2008 05:07:20
Quote from: smiley
Yes. Wouldn't be much of an editor if you couldn't do that ;). The layout will be similar to Pamela.

That's great! If I may make a suggestion about the implementation of this: Instead of having those excruciatingly annoying draggable vertical phoneme bars like Pamela (which are quite painful on the eyes) how about if phoneme placement and re-positioning were handled by simple clicks?  For example, you could have the full phoneme list displayed  in a constantly visible menu, up top (or left, or right, or wherever, really.) And whenever the user wants to place a new vertical phoneme bar, they'd simply left-click the phoneme they want from the menu, and then left-click again on the waveform to indicate exactly where they wish to place it. If they accidentally placed the phoneme bar in the wrong position, then a simple left-click would re-position it to the current mouse position along the waveform.

Left-clicking directly ON a phoneme bar already positioned along the waveform would allow that phoneme to become the active one that can be re-positioned (meaning 2 phonemes would not be allowed to occupy exactly the same space, like they currently can in Pamela.)

Left-clicking or Right-clicking on phoneme bar letters positioned on the waveform would allow you to change the currently assigned phoneme (just like Pamela.) 

Right-clicking directly ON the phoneme bar would delete the phoneme (again, just like Pamela.)

I think a simple, non-draggable placement method like this would save incredible amounts of time and frustration!


Quote from: smiley
Since the plugin can already find the usage of audio files in scripts, it should be easy to get the text...if it's already in the script/dialog.

Nice! Will the tool also allow a user to type their text in manually if they wish, rather than grabbing it automatically from the game script/dialog? As sometimes I've noticed with Annosoft, that you have to intentionally misspell a word in order to get the phoneme selection to look just right. Though, I suppose these kinds of  touch-ups would likely be few and far between, and could probably be performed manually with the editor.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Sun 01/06/2008 08:40:13
Okay, the first beta is ready:
http://ueberlicht.googlepages.com/AGS.Plugin.AudioManager0.7beta.zip

Speech files have now two new context menu entries:
"Create .pam file"
-mp3s and oggs are temporaly converted to wavs
-existing .pam files will be overridden/deleted

"Edit .pam file" (if the file exists...d'oh)
-left-click selects/places a phoneme
-right-click deselects it
-middle-click deletes it (alternatively via toolbar button)
-new phonemes can be added by selected the type in the listbox and left-clicking the waveform
-the type of the phoneme can be changed by selecting it and choosing the new type from the listbox
-you can replace a existing phoneme by placing a new or selected one on it
-by default selected/new phonemes follow the mouse cursor when you hover over the waveform. This can be turned off with the second toolbar button.

There's still some stuff missing:
-text-based lipsync is currently not included
-warning messages for the things you probably don't want to do...
-undo editing
-reload .pam file
-auto-convert phonemes
-multi-select phonemes
-batch creating .pam files
-automatic waveform zoom
-renaming the .pam file if you rename the speech file
-...

Known issue:
-slightly different Pamela phoneme positions if you save the file caused by rounding errors

Quote from: GarageGothic on Sat 17/05/2008 22:26:35
Wouldn't it make more sense to keep the original source format (with all phonemes), and do any customization in AGS' lipsync setup (just add AA0 to the same frame as AY0 if using your example)?
Well, the plugin can change the lipsync characters, but the changes will only be reflected in the lipsync tab if you save and reload the game.

Quote from: AGD2 on Sun 18/05/2008 05:07:20
If I may make a suggestion about the implementation of this:
The controls are currently a bit different but i think future versions will allow some degree of customization.

Quote
Will the tool also allow a user to type their text in manually if they wish, rather than grabbing it automatically from the game script/dialog?
Yes
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Sun 01/06/2008 12:09:19
Wow, smiley, that's pretty damn impressive!

I didn't have time to test it out much yet, but the new phoneme editor seems really solid. Much better than Pamela and the preview function is great (there's a seemingly endless row of "Object reference not set to an instance of an object" messages if you forget to assign it to a character, but I guess that's covered in your missing stuff list). I did find one smalle issue that you didn't mention, which is that the previous frame isn't cleared so that each new frame is drawn on top of the previous ones - a slight problem if your talking animation isn't just a static head with lip movement. But other than that, it's absolutely excellent. I'll run more tests in a day or two.

I never expected that we'd get something this professional looking when I originally suggested the lipsync feature. Keep up the good work, man. The audio manager is such as great improvement to AGS. And the auto lipsync will definitely speed up development for a lot of people.

Suggestion: Would it be possible to have an audio preview as you move the phonemes around (playback of the few milliseconds following the phoneme timing)? It would make it a bit easier than just looking at the waveform to try to match the phonemes. I guess random access on a millisecond basis may not work with all audio formats, but perhaps it could convert to .wav while you edit the .pam file?
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Tue 03/06/2008 09:07:54
Bump! Ok, I imported some phoneme sprites to test the plugin out, and the preview seems to work great. However, when I try to watch the synced dialog in-game the framerate is very slow, so slow that it's difficult to tell if anything is actually syncing up. Any idea why this could be? I put the delay of my character's speech view to 1 just to rule that out. The actual framerate of the game never dips below 40 while the animation plays so it's not hardware related.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Tue 03/06/2008 17:44:38
Quote from: GarageGothic on Tue 03/06/2008 09:07:54
Any idea why this could be?
Do you see any difference when you edit & save the pam file?

Quote
(there's a seemingly endless row of "Object reference not set to an instance of an object" messages if you forget to assign it to a character, but I guess that's covered in your missing stuff list). I did find one smalle issue that you didn't mention, which is that the previous frame isn't cleared so that each new frame is drawn on top of the previous ones.
Thanks. Will be fixed in the next version.

Quote
Suggestion: Would it be possible to have an audio preview as you move the phonemes around (playback of the few milliseconds following the phoneme timing)?
Yeah, I definitely want to add something like that. Should be possible without conversion. I have to look into it.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Tue 03/06/2008 18:52:50
Problem solved, or rather, at least it's not at your end. I tried some different audio formats, and while .ogg and .wav works perfectly in-game, whenever you play an .mp3 voice clip the animation framerate gets choppy. I tried copying the .pam file created for the .ogg and .wav files that worked (using same audio just in .mp3 format), and I get the same result. So the problem must be in AGS' mp3 voice playback routine, and I'll mention this to CJ.

So finally, I can now report that the .PAM files generated in your plugin work great!

Edit: CJ clarified the issue:

QuoteIt's because the MP3 decoder decodes a (for example) 0.25 second chunk of audio and then plays it, so AGS can only get the current position to the nearest 0.25 seconds. There's no realistic fix for this, other than moving to OGG instead.

Perhaps this should be reflected in the plugin. Either just by not accepting mp3s for .pam conversion (and explaining why), or maybe asking if the plugin should convert the file to .ogg. Is the full .ogg codec already in there or just the decoder?
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Fri 06/06/2008 13:09:06
I only just had time to test this out now. Fantastic job on getting this tool up to it's current usability in such a short time. This is really good, Smiley.

Here are some things I noted:

1) When watching the .pam preview in the audio editor, my dialogue pictures are drawn half outside the right-most edge of the box and there's no way to view the full dialogue picture. Having the ability to make the .pam preview pane at full-screen mode and auto adjust to fit in the width and height of any given Sierra-Style dialogue portrait would help.

2) I second GarageGothic on the request for some type of audio playback feature like Pamela has. For example, when clicking on a particular phoneme, it could play the audio segment between the phoneme you clicked and the next one, as Pamela does.

3) When you generate a .pam file from an .ogg or .mp3, and then look at the 'preferences' section of the .pam file, the path will list the file as an .ogg or .mp3.  However, Pamela can only open .wav files, so if you try to open this AGS-generated .pam file in Pamela it will give an 'incorrect path' error and refuse to open.  I guess auto-generated .pam files should always have the filename pointing to .wav in the path so that the backwards compatibility with Pamela still works.

4) The auto-lipsyncing seems to work quite well with male voices and I only had to tweak a few phonemes. However, female voices (or voices with echo) seem to give very inaccurate results.  Doesn't the Annosoft SDK allow you to specify whether it's a male or female voice file you're syncing, prior to doing the auto-generation of the phonemes?  Any chance of including that feature?

That's all I have for now. But really great work once again. Once this is finalized, it'll be a total breeze to lip-sync a game's worth of lines.  I'm really looking forward to seeing this with the text-based lip sycning support implemented!

Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: fovmester on Fri 06/06/2008 13:33:27
I am very impressed by this development!

The difficulty of lip-synching is one reason I have never tried to do voice-acting in my games. Now suddenly it all seem much more appealing! The idea to put the application as a plugin is just plain marvelous! Good work!
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Tue 10/06/2008 17:34:24
Quote from: GarageGothic on Tue 03/06/2008 18:52:50
Perhaps this should be reflected in the plugin. Either just by not accepting mp3s for .pam conversion (and explaining why), or maybe asking if the plugin should convert the file to .ogg. Is the full .ogg codec already in there or just the decoder?
I think I'll include a warning message for mp3s. Conversion will probably be in a future version (using lame and oggenc).

Quote from: AGD2 on Fri 06/06/2008 13:09:06
1) When watching the .pam preview in the audio editor, my dialogue pictures are drawn half outside the right-most edge of the box and there's no way to view the full dialogue picture. Having the ability to make the .pam preview pane at full-screen mode and auto adjust to fit in the width and height of any given Sierra-Style dialogue portrait would help.
Yeah, at the moment all sprites are stretched vertically to fit the preview pane height (maintaing the aspect ratio), but that will definitly be different in the next release.

Quote
2) I second GarageGothic on the request for some type of audio playback feature like Pamela has. For example, when clicking on a particular phoneme, it could play the audio segment between the phoneme you clicked and the next one, as Pamela does.
Will be in the next beta...

Quote
3) When you generate a .pam file from an .ogg or .mp3, and then look at the 'preferences' section of the .pam file, the path will list the file as an .ogg or .mp3.  However, Pamela can only open .wav files, so if you try to open this AGS-generated .pam file in Pamela it will give an 'incorrect path' error and refuse to open.  I guess auto-generated .pam files should always have the filename pointing to .wav in the path so that the backwards compatibility with Pamela still works.
Hmm, that's strange. I don't have any problems opening the pam files with Pamela. Even playback seems to work with mp3s, only oggs are messed up.

Quote
4) The auto-lipsyncing seems to work quite well with male voices and I only had to tweak a few phonemes. However, female voices (or voices with echo) seem to give very inaccurate results.  Doesn't the Annosoft SDK allow you to specify whether it's a male or female voice file you're syncing, prior to doing the auto-generation of the phonemes?  Any chance of including that feature?
That's currently not possible with Annosoft. Female voices could be recognized better if you add a new profile in Control Panel->Speech and let a woman train it (then you have to switch the profiles manually for male/female voices). Echoed voices may be possible if I add some cleaning stuff after the recognizion. I haven't test it yet and I'm not very optimistic about it.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Tue 10/06/2008 18:20:30
Quote from: smiley on Tue 10/06/2008 17:34:24Echoed voices may be possible if I add some cleaning stuff after the recognizion. I haven't test it yet and I'm not very optimistic about it.

For echoes, assuming they're digital effects and not due to poor voice recording, you could always run the sync on a clean version of the audio file and then substitute it with a filtered version before compiling the speech.vox.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Tue 10/06/2008 23:26:59
Quote from: smiley
Hmm, that's strange. I don't have any problems opening the pam files with Pamela. Even playback seems to work with mp3s, only oggs are messed up.

Sorry, I didn't actually test this with mp3s, only with an ogg. And when the ogg didn't work I just assumed mp3s would be the same. I just tested an mp3 in pamela and confirmed that it works. My bad.  It may just be oggs, then.

Quote from: smiley
That's currently not possible with Annosoft. Female voices could be recognized better if you add a new profile in Control Panel->Speech and let a woman train it (then you have to switch the profiles manually for male/female voices). Echoed voices may be possible if I add some cleaning stuff after the recognizion. I haven't test it yet and I'm not very optimistic about it.

Where exactly is this control panel>speech setting accessed from to train voices?  (I also discovered that temporarily lowering the pitch of female speech files so that they sound roughly as deep as a male voice, can sometimes give better phoneme results.)

I just noticed that it's not female voices that cause the inaccurate phoneme placement.  The inaccurate placement actually seems to occur when you have an .mp3 or .ogg speech file listed, then you right click and choose "Create .pam file". It seems that somewhere in the process of converting it to a .wav and then adding phonemes, it barely places any phonemes along the waveform.  Yet, if I place a .wav file of exactly the same speech file in the directory, right-click it, and choose "Create .pam file", then it will add many more phonemes across the entire length of the waveform, all of them being much more accurate.  This is the case with both male and female speech.

Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: GarageGothic on Wed 11/06/2008 08:00:02
Quote from: AGD2 on Tue 10/06/2008 23:26:59Where exactly is this control panel>speech setting accessed from to train voices?

He means Windows' own "Control Panel" (accessed from Start menu). There should be a "Speech" option available after installing SAPI 5.1. Potentially you could set up a unique profile for every voice actor if you just record them reading the text from the training wizard. I'm not sure, but I think it might also  be a good idea to deactivate the "Background adaptation" option for the default profile, or the speech detection could become confused over time when you sync different voices using the same profile.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Sun 15/06/2008 15:10:47
Quote from: AGD2 on Tue 10/06/2008 23:26:59
It seems that somewhere in the process of converting it to a .wav and then adding phonemes, it barely places any phonemes along the waveform.  Yet, if I place a .wav file of exactly the same speech file in the directory, right-click it, and choose "Create .pam file", then it will add many more phonemes across the entire length of the waveform, all of them being much more accurate.  This is the case with both male and female speech.
From the testing I've done, it looks like the resulting .pam file is even different (amount and type of phonemes and accuracy) each time if you're using the same source file. The conversion from mp3/ogg to wav is lossless, so that shouldn't be the problem. What bitrate are you using for the mp3s/oggs? Do you get better results if you encode the original waves in an higher quality?

Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Thu 19/06/2008 01:32:44
Hmm, I've been trying to get it to occur again, but now it seems to be giving me fairly consistent results with both ogg and wavs.  I've turned off that "Background adaptation" option that GarageGothic mentioned since then, so perhaps that may have had something to do with the varying results before.

I'll keep messing around with it, and see what happens. It will be interesting to see how much the accuracy improves when the tool can also use the text as a guide. When I tested text syncing in Annosoft, having the text present seemed to make a huge difference.
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: AGD2 on Mon 23/06/2008 01:48:16
Just another few things to mention:

1) The unrecognized phonemes UW and H get added to .pam files. Instead, they should be UW0 and HH. There may be others, but those were the ones I noticed.

2) If you have a file named EGO200.wav and then you right-click on the speech file in the lip-sycning tool and assign it to another character (or even the same character), it'll rename the file to EGO1.wav (or whatever the character's script name is). In this case, would it make more sense to simply keep the same number?

3) A way to view/edit the phonemes animating with any character's dialog picture view, without having to rename the speech file in the process, might be handy.

4) Will the final version allow you to highlight a bunch of files and then click once to generate all the .pam files as a batch (rather than having to do them one at a time)?
Title: Re: Suggestion (not just for CJ): Automatic lipsync
Post by: smiley on Sat 28/06/2008 19:46:27
Next beta:
http://ueberlicht.googlepages.com/AGS.Plugin.AudioManager0.7beta.zip

The main features are complete, except for batch conversion...

The new release fixes a bug in the waveform generation for mp3 and ogg files.
The speech animation preview now looks like AGS 'character view(centre pivot)' preview.

@AGD2:
'Assign to character' now works as you suggested if there isn't already a file with that number. The new beta has a combobox above the animation preview that allows you to change the character. And thanks for finding those phonemes.