MODULE: TotalLipSync v0.5

Snarky · Tue 18/04/2017 19:22:17

TotalLipSync

(Character head/animation by Preston Blair. King Graham sprite ripped from King's Quest II VGA by AGDI. Background from AGS Awards Ceremony by Ali.)

Download TotalLipSync Module (v0.5)

TotalLipSync is a module for voice-based lip sync. It allows you to play back speech animations that have been synchronized with voice clips. TotalLipSync has all the same capabilities as the voice-based lip sync that is built into AGS, and offers the following additional advantages:

It works with LucasArts-style speech (as well as with Sierra-style and full-screen speech modes).
In addition to the Pamela (.pam) and Papagayo/Moho Switch (.dat) formats supported by AGS, it also reads Annosoft SAPI 5.1 (.anno) and Rhubarb (.tsv) lip sync files.
In particular, Rhubarb support means that lip syncing can be 100% automated (with decent results): no manual tracking of the speech clips is required.
It is more flexible: You can switch speech-styles mid-game, change the phoneme mapping, use files with different data formats, etc.
You don't have to do the phonemes-to-frames mapping manually: The module comes with a default auto-mapping.

How to use

Create the lip sync data files for the speech clips. You can use one of these tools (personally I would recommend Papagayo for manual tracking, and Rhubarb for automatic lip syncing, but the Lip Sync Manager plugin is good too):
- PAMELA
- AGS Lip Sync Manager plugin
- Papagayo
- Annosoft SAPI 5.1 Lipsync (requires Windows Speech Recognition 5.1 exactly - may not work on newer systems)
- Rhubarb
The filename of each sync file should be the same as the speech clip except for the extension, and you need to place them in your compiled game folder (by default, in a folder names "sync/" inside the game folder).
Create the speech animation for your character(s), with different animation frames for the different phonemes (see below), and set it as their speech view.
Download and import the TotalLipSync module into your AGS project.
Make sure your game settings are correct: the AGS built-in lip sync (in the project tree under "Lip sync") should be set to "disabled".
If you are going to use Sierra-style (or full-screen) speech for your lip sync animations, you must create a dummy view. Make sure to give it exactly one loop and one frame. If you name the view TLS_DUMMY it will automatically be used by the module. Otherwise you can set the view to use with TotalLipSync.SetSierraDummyView().

You are now ready to use the module. Add the code to initialize TotalLipSync on startup:

Code: ags

function game_start() 
{
  TotalLipSync.Init(eLipSyncRhubarb);    // Or whatever lip sync format you're using
  TotalLipSync.AutoMapPhonemes();
}

Or if you want a custom phonemes-to-frames mapping:

Code: ags

function game_start() 
{
  TotalLipSync.Init(eLipSyncPamelaIgnoreStress);

  TotalLipSync.AddPhonemeMappings("None",0);
  TotalLipSync.AddPhonemeMappings("B/M/P",1);
  TotalLipSync.AddPhonemeMappings("S/Z/IH/IY/SH/T/TH/D/DH/JH/N/NG/ZH",2);
  TotalLipSync.AddPhonemeMappings("EH/CH/ER/EY/G/K/R/Y/HH",3);
  TotalLipSync.AddPhonemeMappings("AY/AA/AH/AE",4);
  TotalLipSync.AddPhonemeMappings("AO/AW/UH",5);
  TotalLipSync.AddPhonemeMappings("W/OW/OY/UW",6);
  // Frame 7 unassigned to match default Moho mapping
  TotalLipSync.AddPhonemeMappings("F/V",8);
  TotalLipSync.AddPhonemeMappings("L",9);
}

To speak a line with lip syncing, you simply call the extender functions Character.SaySync() or Character.SayAtSync(), using a speech clip prefix:

Code: ags

  cGraham.SaySync("&1 This line will be animated with lip sync");
  cGraham.SayAtSync(320, 100, 240, "&2 ... and so will this");    // x_left, y_top, width, message

And that's all there is to it! (If you don't use a speech clip prefix, or if there is no matching sync file, the speech animation won't play at all.)

Phoneme-to-frame mappings
The principle of lip syncing is that different sounds (phonemes) correspond to different mouth shapes. If we display an animation frame with the right mouth shape at the same time as that sound appears in the audio being played, the animation will seem to match the speech. The first step, then, is to identify the phonemes and timing of them in the speech (that's what the tools listed above are for), and the second step is to choose an appropriate animation frame for each phoneme. We usually don't use different animation frames for all the different phonemes, so we combine phonemes into groups that are all mapped to a single frame. The different tools have different sets of phonemes (or phoneme groups), so we have to define different mappings from phonemes to frames.

So here is the default mapping for each data format used by TotalLipSync. It has been set up for a speech animation with ten different frames, each representing a different mouth position. (This is a fairly standard setup.) If you stick to these frames and these mappings, you can use the same speech view no matter what lip sync tool or data format you use:

Spoiler

Frame	Description	Preston Blair	Rhubarb	AGD2	Rhubarb phoneme ID	Moho phoneme	Pamela phonemes
0	Mouth closed (or slack)	[slack or same as 1]			X	rest	None
1	M, B, P				A	MBP	M/B/P
2	Various consonants, (Rhubarb: Ee-type sounds)				B	etc	K/S/T/D/G/DH/ TH/R/HH/CH/Y/N/ NG/SH/Z/ZH/JH
3	Eh-type sounds, (Non-Rhubarb: Ee-type sounds)				C	E	IH/IY/EH/AH/ EY/AW/ER
4	Ah-type and I-type sounds				D	AI	AA/AE/AY
5	Aww-type sounds, Ow-type sounds (can also go in 6)				E	O	AO/OW
6	U-type and Oo-type sounds (Non-Moho: W)				F	U	UW/OY/UH
7	Moho: W		[same as 6]	[same as 6]	[same as 6]	WQ	W
8	F, V				G	FV	F/V
9	L (Th-type sounds can also go here, rather than in 2)				H	L	L

[close]

Where to get it
TotalLipSync is hosted on Github (mainly just as a way for me to learn about how Github works):
https://github.com/messengerbag/TotalLipSync

You can download the current release from there:

Download TotalLipSync Module (v0.5)

Known bugs
None

Change log
0.5
-Added APIs to get the currently lip syncing character, the current phoneme and current frame.

0.4
-Fixed support for Sierra-style speech
-Minor bug fixes for edge-cases
-Documentation

0.2 (pre-release)
-Added support for Papagayo/Moho Switch (.dat), Annosoft SAPI 5.1 LipSync (.anno) and Rhubarb (.tsv)

0.1 (pre-release)
-Pamela support for LucasArts-style speech

Originally based on code by Calin Leafshade (though very little of it remains in the current version).
Thanks to Grundislav for providing a speech view used in development and testing of the module!

Snarky · Tue 18/04/2017 19:23:02

A couple more things:

Pamela and Annosoft SAPI 5.1 LipSync use almost exactly the same phoneme sets, with only minor variation (TotalLipSync is not case sensitive). Pamela can tag vowels with three levels of stress (0-2), e.g. AY0, UW1. This is not particularly useful in AGS, and should be ignored (by setting TotalLipSync.Init(ePamelaIgnoreStress)) unless there's good reason not to. Anyway, here's the full list and what they represent (ones where Annosoft differs emphasized):

Spoiler

IPA	Example	Pamela	Anno
[silence]	-	None	x
ɑ:, ɒ	father, box (AmEng)	AA	AA
æ	at, snack	AE	AE
ʌ, ə	hut, alone	AH	AH
ɑ, ɔ	thaw, dog	AO	AO
aʊ	cow, out	AW	AW
aɪ	hide, guy	AY	AY
b	bang	B	b
tʃ	cheese	CH	CH
d	damn	D	d
ð	these, bathe	DH	DH
ɛ, ɛə	bed, bear	EH	EH
ɜ:, ɚ	hurt, butter	ER	ER
eɪ	ate, bait	EY	EY
f	fine	F	f
g	good	G	g
h	house	HH	h
ɪ, ɪə	it, fear	IH	IH
i:	eat, free	IY	IY
dʒ	gee, jaw	JH	j
k	key, crush	K	k
l	lip	L	l
m	monkey	M	m
n	no	N	n
ŋ	ping, pong	NG	NG
əʊ	oak, slow	OW	OW
ɔɪ	toy	OY	OY
p	put	P	p
r	read	R	r
s	sap	S	s
ʃ	sharp	SH	SH
t	total	T	t
θ	thin	TH	TH
ʊ, ʊə	good, poor	UH	UH
u:	you	UW	UW
v	viking	V	v
w	we, question	W	w
j	yield	Y	y
z	zoo	Z	z
ʒ	seizure, genre	ZH	ZH

(Based on the in-app list in PAMELA and the Annosoft list here.)

[close]

Also, if you use Rhubarb to do the lip syncing, this could be helpful:

Quote from: Snarky on Sat 15/04/2017 12:30:44You don't need to compile Rhubarb, just get the latest release for Windows or OSX: https://github.com/DanielSWolf/rhubarb-lip-sync/releases

The script doesn't call Rhubarb, you'll have to do all of that. Take all the speech clips, convert them to .wav if necessary, copy them into the Rhubarb directory, and for each one, call "rhubarb.exe myclip.wav > myclip.tsv" (where "myclip" is the name of the clip). You can also put the text corresponding to each clip in individual .txt files to assist with the speech recognition, and then you'd call "rhubarb.exe myclip.wav -d myclip.txt > myclip.tsv". Then once that's done, copy all the .tsv files over into the directory of your compiled AGS game, and the module will read them.

Obviously this process is tedious, and also Rhubarb takes quite a while to process each clip, so if you have more than a couple of dozen clips you'll definitely want to automate it (you could write a batch file to go through and process each .wav file in the directory)

In fact, I wrote a very simple version of such a batch file:

Code: bash

for %%F in (clips/*.wav) do (
    rhubarb.exe clips/%%~nxF -d guide/%%~nF.txt > sync/%%~nF.tsv
)

This assumes that the voice clips are in a folder called "clips/" inside the Rhubarb directory, that the text files are in a folder called "guide/", and that there is a folder called "sync/" where the .tsv files will be written. It also requires a .txt file for each .wav file. So there are a lot of possible improvements. Save this in a text file in the Rhubarb directory and name it something like agsbatch.bat, and you can run it to process all the speech clips in one go (which might take a while!).

Mehrdad · Wed 19/04/2017 15:15:42

Hey Snarky . It's great module . Nice Job!!

Cassiebsg · Wed 19/04/2017 19:00:37

I may give it a test try at some point, if I can figure out how to properly animate mouths.

Looks like an awesome module though, thanks for the hard work.

Snarky · Wed 19/04/2017 19:18:06

Thanks! I hope it turns out to be useful to someone.

As for figuring out how to animate mouths, if you look in the second spoiler-hidden section of the post, there are three references which might help.

Crimson Wizard · Wed 19/04/2017 19:20:39

The easiest way to test this, I think, is to draw said letters on sprites instead of mouth animation

.

Cassiebsg · Wed 19/04/2017 20:07:10

Oh, nice! Just what I needed!

and the drawing one is just perfect reference for me to try and "copy" into the Blender models.

I've been doing 6 to 8 frames lipsync, so jumping to 10 doesn't feel that scary.

Grundislav · Thu 20/04/2017 23:36:13

This is a great module, thanks so much for making it!

It's made me reeeeeeeally tempted to use it...

bx83 · Sat 22/04/2017 04:26:26

Okay I've converted all files to TSV etc
One last question: where do I put the TSV files?
My game files are in K:\
The speech (as .OGG files) is in K:\gamename\Speech\
The WAV files for speech is in K:\gamename\Speech\WAV (I use .OGG files)

Do I put the TSV's in the Speech dir?...

I notice '$INSTALLDIR$/sync' - is this "C:\Program Files (x86)\Adventure Game Studio 3.4.0\sync" or "my installed game\sync" dir, or "my currently un-installed un-compiled game directory\sync", or...?

Snarky · Sat 22/04/2017 07:50:22

While you're working on the game, it's â€" in your case â€" "K:\gamename\Compiled\sync" or "K:\gamename\Compiled\Windows\sync" (I believe AGS will read both directories). "K:\gamename\Compiled\Windows" is probably the directory you will ultimately distribute once the game is finished (unless you're aiming for another platform), so that's where I would put it.

bx83 · Sun 23/04/2017 00:36:08

Excellent, thankyou for all you've done

bx83 · Fri 19/05/2017 11:38:23

May I suggest changing
#define TLS_PHONEMES_LINE_MAX 50
To
#define TLS_PHONEMES_LINE_MAX 5000

This had me running out of space in a 13 second audio file.

Crimson Wizard · Fri 19/05/2017 14:11:19

I seem to be late for this, but I'd just make a small note -

Quote from: bx83 on Sat 22/04/2017 04:26:26
I notice '$INSTALLDIR$/sync' - is this "C:\Program Files (x86)\Adventure Game Studio 3.4.0\sync" or "my installed game\sync" dir, or "my currently un-installed un-compiled game directory\sync", or...?

Quote from: Snarky on Sat 22/04/2017 07:50:22
While you're working on the game, it's â€" in your case â€" "K:\gamename\Compiled\sync" or "K:\gamename\Compiled\Windows\sync" (I believe AGS will read both directories). "K:\gamename\Compiled\Windows" is probably the directory you will ultimately distribute once the game is finished (unless you're aiming for another platform), so that's where I would put it.

$INSTALLDIR$ is the directory where main game data file is located at (*.exe or else) the moment you run it.

AGS does not really check more than one directory normally; there are special rules when running from under the Editor only (debugger mode): in that case Editor passes couple of alternative paths to the engine (one includes AudioCache, for instance).

bx83 · Sun 04/06/2017 14:39:44

I have a strange problem.

I'm using a different movement view, and speech view, for my character. Before this, no bug; now, the compiler will randomly choose a point and then crash with a runtime error:

But then, if I toggle breakpoints in the file in one of the sections leading to the error (line 649), I can do the same thing, but no crash:

This would appear to be a timing issue, as I've checked everything else; all frames and movement and speech are the same as the original character, nothing is missing. No idea how the module works though; it appears to crashing on speechstyle=lucasarts...?

Ps. my game is lucasarts/rhubarb.

Please help.

bx83 · Sun 04/06/2017 14:57:38

False alarm, it was to do with the number of frames in the speech view (I added 9, but whatever, it's 10 frames... but 9 frame descriptors XAB..GH. So who knows). Should have checked all, sorry

Snarky · Sun 04/06/2017 15:07:20

Ah, good! As I was just about to write, there are two things to check first:

-What version of the module are you using?
-Does your speech view have enough frames (in each direction)? It should be 10 for the default mapping.

Rhubarb only has 9 different phonemes/mouth positions (it uses the same frame for W as for U/OO sounds), but the auto-mapping still assumes a 10-frame speech view (W, frame 7, will never be used, so you can leave it blank or make it the same as frame 6) for consistency with the other formats. Also note that the order of the frames is not as listed on the Rhubarb page, but as in the table (behind spoiler tags) in the first post in this thread.

However, I'll see about adding some checking so the module can give a more informative error message if this happens.

bx83 · Tue 06/06/2017 11:50:56

I am curious - haven't yet tried a comipiled game...
If my sync/ folder is Compiled/; but my actual game is in Compiled/Windows/ - does this mean I include the sync directory in Windows/ too? Is the sync/ files just compiled into the game EXE?

Snarky · Tue 06/06/2017 12:04:41

The sync files are not compiled in (I don't think there's really any way to do that), so you need to include the sync directory when you distribute the game.

bx83 · Wed 07/06/2017 03:12:31

But if the game searches for sync/ and not Windows/sync/ ?...
Oh well I'll change it.

Crimson Wizard · Wed 07/06/2017 03:15:56

AFAIK when you run the game from the Editor (with F5), game does not look into Compiled at all, it gets files and subfolders that are located right in the project's root folder.

But when you order "Build Exe" it builds it to Compiled/Windows. But it won't put extra files there automatically, so you would need to copy them over when preparing package.
(EDIT: actually just files from Compiled get copied into Compiled/Windows, but not subfolders.)