How should a proper audio system look like?

Snarky · Sat 29/03/2025 13:09:38

As it seems like AGS 4.0 is moving closer to release, I would like to revive this thread, as I feel an improved audio API should be part of that revision.

I think the basic idea of getting rid of AudioChannel in favor of something tied to the playback instance, and guaranteed to be non-null, is probably the right approach. The API might not need to change all that much, but there are a few different things to consider:

- AudioClip
- Voice clips
- Frame-linked audio (esp. on looping animations)
- AudioType
- Playback with repeat

For example, with a looping or repeating sound, is it always the same playback instance, or a new one each time? And how would you configure frame-linked audio (volume, etc)? If there was a persistent playback instance, that might offer a good API.

I think @Crimson Wizard has also sometimes mentioned more advanced effects, like changing playback speed or adding echo. I would suggest that this shouldn't be a priority to implement at this time, but it might be worth keeping in mind how it might be added in future.

Crimson Wizard · Sat 29/03/2025 13:19:35

Quote from: Snarky on Sat 29/03/2025 13:09:38I think the basic idea of getting rid of AudioChannel in favor of something tied to the playback instance, and guaranteed to be non-null, is probably the right approach.

Not ready to give a full reply right now, just couple of quick notes.
I am not certain about getting rid of AudioChannel is a good thing or not. It may be a good thing to not have a channel as a return value from Play(), but at the same time I found audio channels to be an interesting logical concept in how multiple simultaneous playbacks are organized.

In regards to a "null pointer" problem, there's something that hit me recently, and I wonder why did not I think about this earlier. What if Play() returned a dummy audio playback? There are 2 reasons for Play to fail: failure to decode and failure to play, the former happens when the file cannot be opened at all, and the latter when the sound device is not working, or game set to use "no driver".
We are mostly interested to prevent script errors in the second case, because the first case will be noticed immediately during the game development. In audio software that works with filter chains there's a concept of "null" output, where the processed sound is simply discarded. So what we could do is to process the sound internally (which will even update its Position property), but then discard it through such "null" output. I did not check yet, but it's possible that our sound library supports that already.

Snarky · Sun 30/03/2025 08:00:23

Quote from: Crimson Wizard on Sat 29/03/2025 13:19:35Not ready to give a full reply right now, just couple of quick notes.
I am not certain about getting rid of AudioChannel is a good thing or not. It may be a good thing to not have a channel as a return value from Play(), but at the same time I found audio channels to be an interesting logical concept in how multiple simultaneous playbacks are organized.

I think if there is something like an AudioPlayback instance for everything that's currently playing, that will basically take the place of AudioChannels. You might still want a way to iterate through all the playing sounds (for example if you need to adjust the overall volume), but I'm not sure a fixed list of AudioChannels (that exist regardless of whether anything is playing on the channel or not) is the right way to do that if the limit on simultaneous playback is removed.

AudioTypes seem more useful to me as a way to easily set up game logic for sound. MaxChannels would still help control whether a track replaces what is currently playing or not, for example. It's also important to consider how speech will work. It would be good if the system could support voiced background speech; some way to control the speech "channel(s)"/get speech playback instances would be needed.

Quote from: Crimson Wizard on Sat 29/03/2025 13:19:35In regards to a "null pointer" problem, there's something that hit me recently, and I wonder why did not I think about this earlier. What if Play() returned a dummy audio playback? There are 2 reasons for Play to fail: failure to decode and failure to play, the former happens when the file cannot be opened at all, and the latter when the sound device is not working, or game set to use "no driver".
We are mostly interested to prevent script errors in the second case, because the first case will be noticed immediately during the game development. In audio software that works with filter chains there's a concept of "null" output, where the processed sound is simply discarded. So what we could do is to process the sound internally (which will even update its Position property), but then discard it through such "null" output. I did not check yet, but it's possible that our sound library supports that already.

I think it's clear that if it must never return null but playback may fail for whatever reason, it must then return a "dummy." If I understand you correctly, though, what you're saying is that the dummy may still "pretend" to play, specifically in terms of reporting playback position. Yes, I think that could be useful because other game logic may depend on it (use it for timing, for example, like if a cut-scene has been designed to sync to a music track), but at the same time I think it's important that it should be possible to tell somehow that playback has failed.

I think there may be cases where the file fails to open/play but this isn't known at design time (e.g. if using AudioClip.GetByName() or Game.PlayVoiceClip(), or using an external AUDIO.VOX file; perhaps a way to load audio files from the file system or even streaming will also be reintroduced in the future), but there is already an AudioClip.IsAvailable property that can deal with that, if the AudioClip isn't simply null.

Crimson Wizard · Mon 02/06/2025 15:13:07

I've been busy with other things, but now I will probably return to this issue. It's been a while since I've given any thought to the audio system, so will have to spend some time and think this through again.

I do believe that there has to be a way of limiting the number of simultaneous playbacks according to user-defined rule. The question is whether we support this as a native engine feature, or expose enough API to let users write their own "audio mixer", or both. Native feature may still be useful, because limiting e.g. music to 1 channel and having an automatic replacement is a very convenient functionality and used almost in every game.

There's one thing regarding the ideas mentioned above, which I do not like:

Quote from: Snarky on Sun 30/03/2025 08:00:23AudioTypes seem more useful to me as a way to easily set up game logic for sound. MaxChannels would still help control whether a track replaces what is currently playing or not, for example. It's also important to consider how speech will work. It would be good if the system could support voiced background speech; some way to control the speech "channel(s)"/get speech playback instances would be needed.

The reason why I do not like this approach is that if AudioTypes control number of simultaneous tracks, then that means there are still "audio channels" out there, except now they are "hidden" if AudioChannel is removed. We remove AudioChannels from script, and instead get secret "channels" in AudioTypes.

If we allow to access a list of AudioPlayback instances per AudioType, that makes AudioType to be a sort owning container of playback references (which seems like a strange organization to me). And if we would like to let users read the contents of these containers, then a) we have to expose AudioType as a struct in script, and b) we practically recreate audio channel in a less convenient way.

Why not keep AudioChannel, but make it
* not having a hard engine limit, but let users define as many as they like
* strip their API by moving playback control to AudioPlayback, and keep the channel object only as a "slot" which may contain a playback?

eri0o · Mon 02/06/2025 21:03:21

One thing I would like is to be able to pass a volume right when using Play, and have that volume be a percentage that gets multiplied by whatever is the volume type percentage - so I could have a slider for the sounds per type in a menu.

Crimson Wizard · Mon 02/06/2025 22:15:08

Quote from: eri0o on Mon 02/06/2025 21:03:21One thing I would like is to be able to pass a volume right when using Play, and have that volume be a percentage that gets multiplied by whatever is the volume type percentage - so I could have a slider for the sounds per type in a menu.

Having volume as a multiplier everywhere is a correct way IMO too. I've posted this somewhere before, the volume should be adjusted as a combination of multipliers:

System (master) volume -> Audio Type volume -> Clip volume

and optionally -> Emitter volume (e.g. character) -> Animation volume (for view frame linked sounds)

eri0o · Mon 02/06/2025 22:36:23

Yeah, and also, the volume could be something optional right in the play call instead of requiring the need of channel?

I think from what I remember from previous discussions people forget to deal with the null audio channel. I think if there was some way to grasp for errors of audio - like if there is no audio device or the audio file doesn't exist - not sure how.

The other thing is some people like positional audio, but they understand as either it being relative to the player character or relative to the position on screen. There was also at some point a question about having different regions in a room with different music and some crossover transition. But these nuances could also be left for scripting.

The other thing I remember being asked is something like a filter for when the player is in a cave or under water. I don't think this is easy to do with mojoAl. I think this was done in strangeland through a plugin and maybe Laura Hunt asked about it too at some point.

eri0o · Sat 14/06/2025 13:42:59

It would be nice if there was some way to connect something from audio output to the shader input for fun music synced effects

Crimson Wizard · Sat 14/06/2025 16:42:06

Quote from: eri0o on Sat 14/06/2025 13:42:59It would be nice if there was some way to connect something from audio output to the shader input for fun music synced effects

Changes in shaders is done by setting constants. What remains is reading audio. I'd just generate a separate file with instructions, similar to how voice-based lipsync is done, read that file and apply changes to shaders using timestamps.

Crimson Wizard · Tue 24/06/2025 17:13:43

I keep getting distracted..., but I suppose this issue has to be dealt with somehow, at least by making a draft.

Still must sit down and think, and get a full picture in my head.

Meanwhile, as a few quick notes...

I remembered a "audio mixer" I've coded for the tzachs's MonoAGS project. Tzachs did not want to have AudioChannels also iirc, so I wanted to see how a engine's user could code ones themself. The mixer was written as a separate library, here's its code (it's in C#):
https://github.com/ivan-mogilko/MonoAGSGames/tree/master/Libs/AudioMixer

Its structure is this:

A Mixer allocates a number of AudioChannels; their number is dynamic and may be changed whenever.
A Mixer also has a dictionary of "Audio Rules" attached to Tags. Audio Rules include things like default priority, default volume, and so on. A Tag is just a unique string.
AudioClips may have any combination of tags.
AudioChannels may have tags.
A clip may be only played on a channel that has no tags or that match tags with the clip (If channel has tags, then the clip must have at least one matching tag). If it's allowed, then Mixer creates a AudioPlayback object and places on a suitable channel.

In the above concept the AudioTypes that we know in AGS are replaced by AudioRule objects associated with Tags. Reserving channels is done by applying tags to them: e.g. you may create "Music" tag, create 8 channels, and then assign "Music" tag to 4 of them. So it's kind of done other way around.

There's a "demo game" which uses this system:
https://github.com/ivan-mogilko/MonoAGSGames/blob/master/Games/AudioMixerGame/Rooms/MixerRoom.cs

I am posting this here just as an additional thought.

Then, I did not reply to some notes left by @Snarky above:

Quote from: Snarky on Sat 29/03/2025 13:09:38For example, with a looping or repeating sound, is it always the same playback instance, or a new one each time? And how would you configure frame-linked audio (volume, etc)? If there was a persistent playback instance, that might offer a good API.

I think that looping sound should be same playback instance, it works like that internally now, and that's how VideoPlayer works in AGS 4.0 too. I doubt if an opposite approach would work well.

About frame-linked audio. Naturally one would need to configure the future playback instead of doing that repeatedly every time one plays (and that won't work stably). OTOH I do not think that having a "persistent playback" - in the sense that it exists always and just have different sounds played - is a good idea. I believe that playback object should be valid only until the sound stops. Also, character frames may run several sounds in quick succession - and these may be different sounds, even played simultaneously.

I'd rather suggest to classify objects that may play linked sounds as a Audio Emitter, and have "emitter properties" on them. I've been following this principle when adding Character.AnimationVolume in AGS 3.6.0:
https://adventuregamestudio.github.io/ags-manual/Character.html#characteranimationvolume
https://adventuregamestudio.github.io/ags-manual/Object.html#objectanimationvolume