EDITOR PLUGIN: LipSync Manager (for AGS 3.01+, .Net 3.0)

Started by smiley, Sun 17/05/2009 14:00:59

Previous topic - Next topic

smiley

This plugin allows you to:
1) automatically create Pamela lip sync files from your speech files.
2) edit existing Pamela lip sync files.

This was part of my MediaManager plugin, but since the upcoming version 3.2 of AGS will make it pretty much obsolete, it's now starting its solo career.

Main difference:
Speech recognition is now written in C# instead of using an external tool, which lead to the following sentence:
This plugin requires at least .Net 3.0!
So XP (Vista comes with 3.0) users who haven't installed it already, please download it from here.

Download:
http://sites.google.com/site/infralicht/home/ags/AGS.Plugin.LipSync.zip
Unpack the content of the zip file to the editor folder.

GarageGothic

#1
Beautiful work! I was actually thinking about this plugin just the other day and about what would happen to the MediaManager/lipsync tool after CJ rewrote the audio system. Is it still based on the Annosoft code, or did you rewrite the whole thing from scratch? It seems this doesn't even require the Microsoft Speech API to be installed anymore, or is that part of .NET 3.0?

Unfortunately I don't have to test it thoroughly at the moment, so I'll have to wait about two weeks to really play around with it and give you my feedback. But what a wonderful contribution to the community - and so shortly after the release of AGS 3.2. Color me impressed!

Edit: I ran a quick test on one of the samples I had also used with the old lipsync plugin, and it seems the new one doesn't handle pauses as well. In the sample there's a distinct pause, and while the old version closed the character's mouth for a second or so, with the new detection his lips keep moving. I'm sure this is related to Microsofts speech recognition and nothing you can do anything about, but I just thought I'd mention it.
Also, what role does the text-based lipsync actually play? Even if I put in the correct reading of the line, the autosync still interprets "watch" as "want" for instance.

smiley

It uses the same length for phonemes and a similar method for aligning them as the Annosoft program, the rest had to be rewritten.

.Net 3.0 only works on XP and Vista, which iirc have the Speech API already installed.

Quote from: GarageGothic on Sun 17/05/2009 14:37:45
Edit: I ran a quick test on one of the samples I had also used with the old lipsync plugin, and it seems the new one doesn't handle pauses as well.
It should be similar if you check the "Add 'None' at the end of a recognized phrase" option.

GarageGothic

#3
QuoteIt should be similar if you check the "Add 'None' at the end of a recognized phrase" option.

Thanks, that seems to work better. Still no closed mouth, but at least no movement during the pause. Perhaps I just need to clean up the audio.

Edit: As an experiment I filtered the noise from the sound file, but the pause was still recognized as containing four different phonemes. I then copied the pause into a separate sound file and let the lipsync tool analyze it, and it didn't find a single phoneme. I get the impression that it recognizes the preceding sound as the beginning of a specific word and then decides to "fill in the blank" following it.

A bit more feedback:

1) When playing back individual phonemes (brilliant feature by the way),  I must right click afterwards (to deselect) to avoid moving the phoneme position when clicking on the next one. Some kind of "lock phoneme positions" button would be a big help since usually you just want to either play them or change them, not actually move stuff around.

2) It would be nice if the plugin remembered your previous setting in the Create Pamela File options ("Add none" and "Save + Close after recognizing"). It could be quite annoying having to do this for a every single file. Speaking of which, any plans for batch conversion of selected files from the list?

3) Also, standard hotkey support (CTRL+Z undo, CTRL+Y redo, DEL delete) would be greatly appreciated.

smiley

Quote from: GarageGothic on Sun 17/05/2009 14:37:45
Also, what role does the text-based lipsync actually play?
The recognition engine tries to construct/identify phrases which begin with that text. This should lead to better results.
For instance in my tests, the phrase "This is a test." was always recognized as "Jessica's antitrust" without text-based lipsync.

Quote from: GarageGothic on Sun 17/05/2009 16:16:01
I then copied the pause into a separate sound file and let the lipsync tool analyze it, and it didn't find a single phoneme. I get the impression that it recognizes the preceding sound as the beginning of a specific word and then decides to "fill in the blank" following it.
Recognition doesn't work on a per-phoneme basis. It always tries to recognize words, and build a phrase out of them.
Are the other phonemes placed correctly? It's possible that the duration of the pause is included in the total duration of the phrase, which would mess up the position of the phonemes.


Your suggestions will be implemented in the next version.

GarageGothic

#5
Quote from: smiley on Tue 19/05/2009 08:43:42The recognition engine tries to construct/identify phrases which begin with that text. This should lead to better results.

In my tests it has made absolutely zero difference whether I put text in or not, but I'm still only testing it on one sample. I'll look further into this when I have some more audio files to test with.

QuoteAre the other phonemes placed correctly? It's possible that the duration of the pause is included in the total duration of the phrase, which would mess up the position of the phonemes.

I think they are more or less correctly placed. What I've discovered though is that the phonemes during the pause actually belong to the words (or rather, the erroneously identified words) after the pause and not those preceding it as I first thought.

QuoteFor instance in my tests, the phrase "This is a test." was always recognized as "Jessica's antitrust" without text-based lipsync.

You can get some hilarious recognitions. When I tested the noise-filtered version of my file (which returns a much poorer result than the original clip), even with correct text input it inteprets the line "I only watch pirate movies" as "I a lot higher than things"  :)

QuoteYour suggestions will be implemented in the next version.

Excellent! Thank you very much. If I may ask one more thing, could you please allow the "Recognized Phrase:" line in the console to linebreak? It's quite helpful to see what words it got wrong so you know where to focus your attention in the editing. Some kind of copy function could also be handy since the .pam file doesn't store the identified text but only phonemes.

Olleh19

Is this obsolete or can it still work?
I could not find the program Pamela tho. Anybody? With intense googling, is this still based on the Annosoftware automatic sync feature? Cause that's what i'm interested in.
To drop a wave file in and output dat or pam format data file to be put in the speech folder..

morganw

I've never used it, but here is a link:
https://users.monash.edu.au/~myless/catnap/pamela3/
(the link is in the AGS manual but it looks like their webserver has some sort of broken redirection implemented on it)

Olleh19

Quote from: morganw on Tue 13/10/2020 17:29:14
I've never used it, but here is a link:
https://users.monash.edu.au/~myless/catnap/pamela3/
(the link is in the AGS manual but it looks like their webserver has some sort of broken redirection implemented on it)

Exactly, it does not direct me to the link, so i thought it was gone.
Thanks Morgan, i will give it a try later tonight

SMF spam blocked by CleanTalk