[FEATURE REQUEST] Script Objects compression

Started by eri0o, Sat 26/08/2023 22:04:02

Previous topic - Next topic

eri0o

I notice in my either script intensive games, or very lots of dialog games, there is a lot of the size of the game that is mainly due to the size of the scripts - I usually work in low resolution, so the image files get very small already.

I am not sure how to do or proceed with this. On one hand in the past games usually compressed their texts using some strategy - this is really common in very old games that had to fit in a ROM cartridge with very limited storage. This is an strategy, to simply compress only the strings, but I wonder if there is any way to directly compress the entirety of the script objects.

I don't know if this is the best approach or some other approach is best - say have the packaging format itself support compression. I also don't know if it's reasonable to just decompress on game boot and have the uncompressed script objects to work with or if it would need to be streamed from the compressed package.

Anyway, because of the unknowns around this I decided to write it here just to start to think about this.

Crimson Wizard

#1
Could you give at least some specifics of this problem, how large are these compiled scripts? Is it size of bytecode, or size of text that makes them large on disk?
I think it's best to investigate the cause first, and make sure this is not because of mistakes or an inefficient functionality in AGS.

For instance, I should mention that the way AGS allocates global variables currently is that it writes their whole initialization to the script data. Of course that does not work for arrays, because AGS currently does not support syntax of initializing arrays at compile time, but global arrays are written to script data anyway.
What this means in practice, is that if you have a global array of 1000 ints in script, then script compiler will write 1000 ints into script data.
I recall a number of users had problem with this in the past, because they did not use dynamic arrays, and stored lots of stuff in fixed arrays, which increased the size of their compiled game by many megabytes.

Regarding compression,
Script objects must be read completely before creating them in memory, otherwise the operations on addresses won't work. Looking from this perspective, does it make sense to compress only text when we may compress it a whole?

In regards to the text,
there are also translations. Each translation currently holds a full duplicate of all the game texts as keys (at least unless we change how translation keys work), and translated items to 99% of that text. If game text is a culprit, then the translations should also be compressed?

In the past we discussed moving towards game data packaged simply in ZIP files (they don't have to have zip extension). I know that engines often do a similar thing. The advantage to this is compression and ease of working, as zip package may be read and modified by any suitable tool. The disadvantage of course is that it's easier to view the contents of the game, if that matters.

eri0o

Quote from: Crimson Wizard on Sat 26/08/2023 22:48:06Could you give at least some specifics of this problem, how large are these compiled scripts? Is it size of bytecode, or size of text that makes them large on disk?



I don't know to break it apart, but in my Sandwalker project the biggest file size is in game28.dta which contains all the scripts.

QuoteFor instance, I should mention that the way AGS allocates global variables currently is that it writes their whole initialization to the script data. Of course that does not work for arrays, because AGS currently does not support syntax of initializing arrays at compile time, but global arrays are written to script data anyway.
What this means in practice, is that if you have a global array of 1000 ints in script, then script compiler will write 1000 ints into script data.

There is a chance that 20 MB of this is the map code then (there are 16 of these here, with around 1.25 MB each), because it's a very big int array.

There is still other code and things that have unknown relations on how it makes things so big.

I am alright with the game script being duplicated in ram because of decompression if necessary, but my guess is this may either be a thing that matters or not depending on how heavily scripted is the game.

Crimson Wizard

#3
Quote from: eri0o on Sat 26/08/2023 23:41:42I don't know to break it apart, but in my Sandwalker project the biggest file size is in game28.dta which contains all the scripts.

It may be worth writing a tool that analyzes this, either picks things out, or just prints a table of contents. The reading code may be found in Common lib, entry point is in main_game_file.h/cpp.

Quote from: eri0o on Sat 26/08/2023 23:41:42There is still other code and things that have unknown relations on how it makes things so big.

If it's something serious, then perhaps setting up a test case may help to find out. Even if we just support compression, such investigation may help improving how AGS works in the future.

Quote from: eri0o on Sat 26/08/2023 23:41:42I am alright with the game script being duplicated in ram because of decompression if necessary, but my guess is this may either be a thing that matters or not depending on how heavily scripted is the game.

This will be only 1 script that is duplicated at a time when loading a game or a room (unless we make a multithreaded loader at some point...).
Also, it might be possible to implement a decompressing Stream type, that buffers up a fixed portion of compressed and decompressed data.

eri0o

About the decompression stream, I think it could be used then by multifilelib, so it could have each asset in it individually compressed and then just load them. Zip files also do individual file compression.

I don't remember if .ags packages have some magic number identifier. But I could imagine a version of multifilelib that has support for both regular .ags packages and also .zip files, and if there is some unique magic number they could even have the same extension and then it would use whatever is the format it was packed.

Crimson Wizard

#5
Quote from: eri0o on Sat 02/09/2023 12:41:51I don't remember if .ags packages have some magic number identifier. But I could imagine a version of multifilelib that has support for both regular .ags packages and also .zip files, and if there is some unique magic number they could even have the same extension and then it would use whatever is the format it was packed.

The way working with the spritefile is done now, it cannot be worked with while it's compressed whole, it has to be plain data with strict offsets that are known beforehand. This means that either the compression should be applied not to all assets, but to individual ones (where some assets may have it and some not); or the current streaming method should be dropped and replaced with another strategy; or spritefile would need to be decompressed to a temporary file on disk elsewhere at the game launch, and read from there.

This is going to be a major issue if AGS would support a game packed in a regular zip file. I don't know whether zip allows to store different entries with different compression.

I guess this is a common problem for big files that contain lots of data, where not all data is necessary at once. Like the spritefile is a data package inside a data package.

eri0o

#6
Well we don't have to support zip files then, was just imagining things when looking into multifilelib, and then we could support looking into individual files and deal with compression in a second layer when opening the files - I really liked the zip stream idea.

SMF spam blocked by CleanTalk