Suggestion (not just for CJ): Automatic lipsync

Started by GarageGothic, Mon 12/05/2008 00:49:15

Previous topic - Next topic

GarageGothic

#20
Oh yes, the free Annosoft code does indeed allow you to sync to both text and wav. It seems a bit ambitious for a plugin to try to retrieve the lipsynced text strings from AGS's dialog and script files, but if the Audio Manager got a new field for the spoken line (people might want to use the "Description" field for other things), perhaps that string could be used for syncing it.

smiley

Quote from: AGD2
Or would it also allow you to move phonemes around at will and change them to other values like Pamela's editor?
Yes. Wouldn't be much of an editor if you couldn't do that ;). The layout will be similar to Pamela.

Thanks for the list. I've thought about letting the user add automatical converting rules, e.g. AY0 always becomes AA0 etc (After Annosoft->Pamela conversion).  But you will always be able to change the value manually.

Quote from: GarageGothic on Fri 16/05/2008 17:28:17
It seems a bit ambitious for a plugin to try to retrieve the lipsynced text strings from AGS's dialog and script files
Since the plugin can already find the usage of audio files in scripts, it should be easy to get the text...if it's already in the script/dialog.

GarageGothic

Quote from: smiley on Sat 17/05/2008 21:15:31Thanks for the list. I've thought about letting the user add automatical converting rules, e.g. AY0 always becomes AA0 etc (After Annosoft->Pamela conversion).  But you will always be able to change the value manually.

Wouldn't it make more sense to keep the original source format (with all phonemes), and do any customization in AGS' lipsync setup (just add AA0 to the same frame as AY0 if using your example)? That way the information would still be stored in case the user decides to add more mouth shapes later.

QuoteSince the plugin can already find the usage of audio files in scripts, it should be easy to get the text...if it's already in thescript/dialog.

Awesome! And even better to hear that it can also retrieve it from script since I use a custom dialog system which doesn't at all refer to the AGS dialog script.

AGD2

#23
Quote from: smiley
Yes. Wouldn't be much of an editor if you couldn't do that ;). The layout will be similar to Pamela.

That's great! If I may make a suggestion about the implementation of this: Instead of having those excruciatingly annoying draggable vertical phoneme bars like Pamela (which are quite painful on the eyes) how about if phoneme placement and re-positioning were handled by simple clicks?  For example, you could have the full phoneme list displayed  in a constantly visible menu, up top (or left, or right, or wherever, really.) And whenever the user wants to place a new vertical phoneme bar, they'd simply left-click the phoneme they want from the menu, and then left-click again on the waveform to indicate exactly where they wish to place it. If they accidentally placed the phoneme bar in the wrong position, then a simple left-click would re-position it to the current mouse position along the waveform.

Left-clicking directly ON a phoneme bar already positioned along the waveform would allow that phoneme to become the active one that can be re-positioned (meaning 2 phonemes would not be allowed to occupy exactly the same space, like they currently can in Pamela.)

Left-clicking or Right-clicking on phoneme bar letters positioned on the waveform would allow you to change the currently assigned phoneme (just like Pamela.) 

Right-clicking directly ON the phoneme bar would delete the phoneme (again, just like Pamela.)

I think a simple, non-draggable placement method like this would save incredible amounts of time and frustration!


Quote from: smiley
Since the plugin can already find the usage of audio files in scripts, it should be easy to get the text...if it's already in the script/dialog.

Nice! Will the tool also allow a user to type their text in manually if they wish, rather than grabbing it automatically from the game script/dialog? As sometimes I've noticed with Annosoft, that you have to intentionally misspell a word in order to get the phoneme selection to look just right. Though, I suppose these kinds of  touch-ups would likely be few and far between, and could probably be performed manually with the editor.

smiley

#24
Okay, the first beta is ready:
http://ueberlicht.googlepages.com/AGS.Plugin.AudioManager0.7beta.zip

Speech files have now two new context menu entries:
"Create .pam file"
-mp3s and oggs are temporaly converted to wavs
-existing .pam files will be overridden/deleted

"Edit .pam file" (if the file exists...d'oh)
-left-click selects/places a phoneme
-right-click deselects it
-middle-click deletes it (alternatively via toolbar button)
-new phonemes can be added by selected the type in the listbox and left-clicking the waveform
-the type of the phoneme can be changed by selecting it and choosing the new type from the listbox
-you can replace a existing phoneme by placing a new or selected one on it
-by default selected/new phonemes follow the mouse cursor when you hover over the waveform. This can be turned off with the second toolbar button.

There's still some stuff missing:
-text-based lipsync is currently not included
-warning messages for the things you probably don't want to do...
-undo editing
-reload .pam file
-auto-convert phonemes
-multi-select phonemes
-batch creating .pam files
-automatic waveform zoom
-renaming the .pam file if you rename the speech file
-...

Known issue:
-slightly different Pamela phoneme positions if you save the file caused by rounding errors

Quote from: GarageGothic on Sat 17/05/2008 22:26:35
Wouldn't it make more sense to keep the original source format (with all phonemes), and do any customization in AGS' lipsync setup (just add AA0 to the same frame as AY0 if using your example)?
Well, the plugin can change the lipsync characters, but the changes will only be reflected in the lipsync tab if you save and reload the game.

Quote from: AGD2 on Sun 18/05/2008 05:07:20
If I may make a suggestion about the implementation of this:
The controls are currently a bit different but i think future versions will allow some degree of customization.

Quote
Will the tool also allow a user to type their text in manually if they wish, rather than grabbing it automatically from the game script/dialog?
Yes

GarageGothic

Wow, smiley, that's pretty damn impressive!

I didn't have time to test it out much yet, but the new phoneme editor seems really solid. Much better than Pamela and the preview function is great (there's a seemingly endless row of "Object reference not set to an instance of an object" messages if you forget to assign it to a character, but I guess that's covered in your missing stuff list). I did find one smalle issue that you didn't mention, which is that the previous frame isn't cleared so that each new frame is drawn on top of the previous ones - a slight problem if your talking animation isn't just a static head with lip movement. But other than that, it's absolutely excellent. I'll run more tests in a day or two.

I never expected that we'd get something this professional looking when I originally suggested the lipsync feature. Keep up the good work, man. The audio manager is such as great improvement to AGS. And the auto lipsync will definitely speed up development for a lot of people.

Suggestion: Would it be possible to have an audio preview as you move the phonemes around (playback of the few milliseconds following the phoneme timing)? It would make it a bit easier than just looking at the waveform to try to match the phonemes. I guess random access on a millisecond basis may not work with all audio formats, but perhaps it could convert to .wav while you edit the .pam file?

GarageGothic

#26
Bump! Ok, I imported some phoneme sprites to test the plugin out, and the preview seems to work great. However, when I try to watch the synced dialog in-game the framerate is very slow, so slow that it's difficult to tell if anything is actually syncing up. Any idea why this could be? I put the delay of my character's speech view to 1 just to rule that out. The actual framerate of the game never dips below 40 while the animation plays so it's not hardware related.

smiley

Quote from: GarageGothic on Tue 03/06/2008 09:07:54
Any idea why this could be?
Do you see any difference when you edit & save the pam file?

Quote
(there's a seemingly endless row of "Object reference not set to an instance of an object" messages if you forget to assign it to a character, but I guess that's covered in your missing stuff list). I did find one smalle issue that you didn't mention, which is that the previous frame isn't cleared so that each new frame is drawn on top of the previous ones.
Thanks. Will be fixed in the next version.

Quote
Suggestion: Would it be possible to have an audio preview as you move the phonemes around (playback of the few milliseconds following the phoneme timing)?
Yeah, I definitely want to add something like that. Should be possible without conversion. I have to look into it.

GarageGothic

#28
Problem solved, or rather, at least it's not at your end. I tried some different audio formats, and while .ogg and .wav works perfectly in-game, whenever you play an .mp3 voice clip the animation framerate gets choppy. I tried copying the .pam file created for the .ogg and .wav files that worked (using same audio just in .mp3 format), and I get the same result. So the problem must be in AGS' mp3 voice playback routine, and I'll mention this to CJ.

So finally, I can now report that the .PAM files generated in your plugin work great!

Edit: CJ clarified the issue:

QuoteIt's because the MP3 decoder decodes a (for example) 0.25 second chunk of audio and then plays it, so AGS can only get the current position to the nearest 0.25 seconds. There's no realistic fix for this, other than moving to OGG instead.

Perhaps this should be reflected in the plugin. Either just by not accepting mp3s for .pam conversion (and explaining why), or maybe asking if the plugin should convert the file to .ogg. Is the full .ogg codec already in there or just the decoder?

AGD2

I only just had time to test this out now. Fantastic job on getting this tool up to it's current usability in such a short time. This is really good, Smiley.

Here are some things I noted:

1) When watching the .pam preview in the audio editor, my dialogue pictures are drawn half outside the right-most edge of the box and there's no way to view the full dialogue picture. Having the ability to make the .pam preview pane at full-screen mode and auto adjust to fit in the width and height of any given Sierra-Style dialogue portrait would help.

2) I second GarageGothic on the request for some type of audio playback feature like Pamela has. For example, when clicking on a particular phoneme, it could play the audio segment between the phoneme you clicked and the next one, as Pamela does.

3) When you generate a .pam file from an .ogg or .mp3, and then look at the 'preferences' section of the .pam file, the path will list the file as an .ogg or .mp3.  However, Pamela can only open .wav files, so if you try to open this AGS-generated .pam file in Pamela it will give an 'incorrect path' error and refuse to open.  I guess auto-generated .pam files should always have the filename pointing to .wav in the path so that the backwards compatibility with Pamela still works.

4) The auto-lipsyncing seems to work quite well with male voices and I only had to tweak a few phonemes. However, female voices (or voices with echo) seem to give very inaccurate results.  Doesn't the Annosoft SDK allow you to specify whether it's a male or female voice file you're syncing, prior to doing the auto-generation of the phonemes?  Any chance of including that feature?

That's all I have for now. But really great work once again. Once this is finalized, it'll be a total breeze to lip-sync a game's worth of lines.  I'm really looking forward to seeing this with the text-based lip sycning support implemented!


fovmester

I am very impressed by this development!

The difficulty of lip-synching is one reason I have never tried to do voice-acting in my games. Now suddenly it all seem much more appealing! The idea to put the application as a plugin is just plain marvelous! Good work!

smiley

Quote from: GarageGothic on Tue 03/06/2008 18:52:50
Perhaps this should be reflected in the plugin. Either just by not accepting mp3s for .pam conversion (and explaining why), or maybe asking if the plugin should convert the file to .ogg. Is the full .ogg codec already in there or just the decoder?
I think I'll include a warning message for mp3s. Conversion will probably be in a future version (using lame and oggenc).

Quote from: AGD2 on Fri 06/06/2008 13:09:06
1) When watching the .pam preview in the audio editor, my dialogue pictures are drawn half outside the right-most edge of the box and there's no way to view the full dialogue picture. Having the ability to make the .pam preview pane at full-screen mode and auto adjust to fit in the width and height of any given Sierra-Style dialogue portrait would help.
Yeah, at the moment all sprites are stretched vertically to fit the preview pane height (maintaing the aspect ratio), but that will definitly be different in the next release.

Quote
2) I second GarageGothic on the request for some type of audio playback feature like Pamela has. For example, when clicking on a particular phoneme, it could play the audio segment between the phoneme you clicked and the next one, as Pamela does.
Will be in the next beta...

Quote
3) When you generate a .pam file from an .ogg or .mp3, and then look at the 'preferences' section of the .pam file, the path will list the file as an .ogg or .mp3.  However, Pamela can only open .wav files, so if you try to open this AGS-generated .pam file in Pamela it will give an 'incorrect path' error and refuse to open.  I guess auto-generated .pam files should always have the filename pointing to .wav in the path so that the backwards compatibility with Pamela still works.
Hmm, that's strange. I don't have any problems opening the pam files with Pamela. Even playback seems to work with mp3s, only oggs are messed up.

Quote
4) The auto-lipsyncing seems to work quite well with male voices and I only had to tweak a few phonemes. However, female voices (or voices with echo) seem to give very inaccurate results.  Doesn't the Annosoft SDK allow you to specify whether it's a male or female voice file you're syncing, prior to doing the auto-generation of the phonemes?  Any chance of including that feature?
That's currently not possible with Annosoft. Female voices could be recognized better if you add a new profile in Control Panel->Speech and let a woman train it (then you have to switch the profiles manually for male/female voices). Echoed voices may be possible if I add some cleaning stuff after the recognizion. I haven't test it yet and I'm not very optimistic about it.

GarageGothic

Quote from: smiley on Tue 10/06/2008 17:34:24Echoed voices may be possible if I add some cleaning stuff after the recognizion. I haven't test it yet and I'm not very optimistic about it.

For echoes, assuming they're digital effects and not due to poor voice recording, you could always run the sync on a clean version of the audio file and then substitute it with a filtered version before compiling the speech.vox.

AGD2

Quote from: smiley
Hmm, that's strange. I don't have any problems opening the pam files with Pamela. Even playback seems to work with mp3s, only oggs are messed up.

Sorry, I didn't actually test this with mp3s, only with an ogg. And when the ogg didn't work I just assumed mp3s would be the same. I just tested an mp3 in pamela and confirmed that it works. My bad.  It may just be oggs, then.

Quote from: smiley
That's currently not possible with Annosoft. Female voices could be recognized better if you add a new profile in Control Panel->Speech and let a woman train it (then you have to switch the profiles manually for male/female voices). Echoed voices may be possible if I add some cleaning stuff after the recognizion. I haven't test it yet and I'm not very optimistic about it.

Where exactly is this control panel>speech setting accessed from to train voices?  (I also discovered that temporarily lowering the pitch of female speech files so that they sound roughly as deep as a male voice, can sometimes give better phoneme results.)

I just noticed that it's not female voices that cause the inaccurate phoneme placement.  The inaccurate placement actually seems to occur when you have an .mp3 or .ogg speech file listed, then you right click and choose "Create .pam file". It seems that somewhere in the process of converting it to a .wav and then adding phonemes, it barely places any phonemes along the waveform.  Yet, if I place a .wav file of exactly the same speech file in the directory, right-click it, and choose "Create .pam file", then it will add many more phonemes across the entire length of the waveform, all of them being much more accurate.  This is the case with both male and female speech.


GarageGothic

Quote from: AGD2 on Tue 10/06/2008 23:26:59Where exactly is this control panel>speech setting accessed from to train voices?

He means Windows' own "Control Panel" (accessed from Start menu). There should be a "Speech" option available after installing SAPI 5.1. Potentially you could set up a unique profile for every voice actor if you just record them reading the text from the training wizard. I'm not sure, but I think it might also  be a good idea to deactivate the "Background adaptation" option for the default profile, or the speech detection could become confused over time when you sync different voices using the same profile.

smiley

Quote from: AGD2 on Tue 10/06/2008 23:26:59
It seems that somewhere in the process of converting it to a .wav and then adding phonemes, it barely places any phonemes along the waveform.  Yet, if I place a .wav file of exactly the same speech file in the directory, right-click it, and choose "Create .pam file", then it will add many more phonemes across the entire length of the waveform, all of them being much more accurate.  This is the case with both male and female speech.
From the testing I've done, it looks like the resulting .pam file is even different (amount and type of phonemes and accuracy) each time if you're using the same source file. The conversion from mp3/ogg to wav is lossless, so that shouldn't be the problem. What bitrate are you using for the mp3s/oggs? Do you get better results if you encode the original waves in an higher quality?


AGD2

Hmm, I've been trying to get it to occur again, but now it seems to be giving me fairly consistent results with both ogg and wavs.  I've turned off that "Background adaptation" option that GarageGothic mentioned since then, so perhaps that may have had something to do with the varying results before.

I'll keep messing around with it, and see what happens. It will be interesting to see how much the accuracy improves when the tool can also use the text as a guide. When I tested text syncing in Annosoft, having the text present seemed to make a huge difference.

AGD2

Just another few things to mention:

1) The unrecognized phonemes UW and H get added to .pam files. Instead, they should be UW0 and HH. There may be others, but those were the ones I noticed.

2) If you have a file named EGO200.wav and then you right-click on the speech file in the lip-sycning tool and assign it to another character (or even the same character), it'll rename the file to EGO1.wav (or whatever the character's script name is). In this case, would it make more sense to simply keep the same number?

3) A way to view/edit the phonemes animating with any character's dialog picture view, without having to rename the speech file in the process, might be handy.

4) Will the final version allow you to highlight a bunch of files and then click once to generate all the .pam files as a batch (rather than having to do them one at a time)?

smiley

Next beta:
http://ueberlicht.googlepages.com/AGS.Plugin.AudioManager0.7beta.zip

The main features are complete, except for batch conversion...

The new release fixes a bug in the waveform generation for mp3 and ogg files.
The speech animation preview now looks like AGS 'character view(centre pivot)' preview.

@AGD2:
'Assign to character' now works as you suggested if there isn't already a file with that number. The new beta has a combobox above the animation preview that allows you to change the character. And thanks for finding those phonemes.

SMF spam blocked by CleanTalk