Why isn't AGS script much, much faster?

Started by Monsieur OUXX, Tue 21/09/2021 12:50:10

Previous topic - Next topic

Monsieur OUXX

Context:
AGS has no problem working with high-resolution games : With hardware acceleration it can display lots of sprites and tint them, etc. The engine is also performant enough for scripting effects when you stick to built-in operations (native sprite scaling, native transparency, etc.).

But AGS becomes slower when you want to make the AGS script handle more intensive algorithms, such as iterating on large arrays of hundreds of thousands of items, or trying to perform per-pixel operations on larger sprites, or when your function calls stack exceeds a certain depth.

This is why when it comes to scripting, especially special effects, AGS is better suited for low-resolution games : 320x200 or 640x480. And that's also why advanced shaders written with the AGS scripting language are out of the question.


I sometimes ask myself why that is. What are the reasons why the AGS scripting language is not lightning fast?

Having very little knowledge of the engine's code, I can't seem to come up with a reason :
- It uses SDL, which is very close to the hardware. If not the fastest possible option, it's pretty close.
- It's written in C/C++
- The virtual machine is running pre-compiled code, literally just byte code.

So, what is slow in all this?
Is it that the VM's instructions are not mapped directly to the underlying CPU instructions? Is it that the engine uses simple data structures with a lot of overhead and lookups? Etc.

 

Crimson Wizard

#1
Slow in what exactly? Rendering, running script, else? In which circumstances? Separate components will have separate reasons for being slow. Separate game objects will have separate reasons for being slow too (for instance, GUIs update was very unoptimized until 3.5.1).

Quote from: Monsieur OUXX on Tue 21/09/2021 12:50:10
Is it that the VM's instructions are not mapped directly to the underlying CPU instructions?

AGS script is slower than many other scripting languages, because it's compiled to generally inefficient bytecode; it emulates x86 asm, but have many repeating operations that are redundant in a scripting language. I wrote a post about this many years ago: https://www.adventuregamestudio.co.uk/forums/index.php?topic=47320.0

Additionally, the changes made to the opensourced version made it even slower, for few reasons, primarily making it run on 64-bit machines and backward compatibility.
In regards to 64-bit systems, AGS script was unfortunately designed to store real memory addresses in the bytecode, which is 32-bit, therefore current engine works around that by storing them differently, and that made it slower.
There could've been other way to resolve these problems, but we never had a chance to address these again.

eri0o

Modern CPUs use a lot of advanced techniques to be efficient. I think the code for the script runner and other functions just doesn't use these newer CPU features well.

Generally the code is not written to prevent cache misses - enforce a predictable code path, the CPU drawing/bitmap code doesn't make use of SIMD instructions (allegro4 is before this was a thing...), and object references could be direct but they use a hashmap/btree instead, ... All of these things are not easy to change and also keep compatibility and portability. But these are the main things I kinda noticed... I have made occasional experiences but overall I find the existing performance is alright for my needs so the work required to change is just too much for me to care. Overall, I find it's better to favor portability and flexibility instead of performance.

Monsieur OUXX

I'm mostly talking about the script language being slow. It's a bit suicidal to iterate a lot on some data in AGS, or implementing processing that does a lot of function calls.  Both the stack and data access are rather slow.
Again, it's enough for a 320x200 game, but I'm only just wondering why it's not going at the speed of light.

Quote from: eri0o on Tue 21/09/2021 21:28:56
object references could be direct but they use a hashmap/btree
Ouch.

Quote
AGS script is slower than many other scripting languages, because it's compiled to generally inefficient bytecode; it emulates x86 asm, but have many repeating operations that are redundant in a scripting language. I wrote a post about this many years ago: https://www.adventuregamestudio.co.uk/forums/index.php?topic=47320.0
Interesting read. Thanks!
 

Crimson Wizard

#4
Quote from: Monsieur OUXX on Wed 22/09/2021 12:12:00
Quote from: eri0o on Tue 21/09/2021 21:28:56
object references could be direct but they use a hashmap/btree
Ouch.

I do not know if eri0o is refering to what I may suspect he's refering to, but if it's about accessing managed objects in script, primary access operations are done through flat array; hashmap is only used for the backward lookup (object pointer to managed handle) and that is done in limited cases (like disposing an object).
Of course hashmap or btree may be somewhat slower than accessing objects directly, but there has to be an actual research to find out if that is the cause of slowdown, and how much does it slow down anything in relative terms.
There's much more going on there when working with managed objects.

Monsieur OUXX

Quote from: Crimson Wizard on Wed 22/09/2021 13:01:55
Of course hashmap or btree may be somewhat slower than accessing objects directly, but

Indeed, if it's not for direct access but only for disposing and such it's no big deal.
 

eri0o

Quote from: Crimson Wizard on Wed 22/09/2021 13:01:55
Quote from: Monsieur OUXX on Wed 22/09/2021 12:12:00
Quote from: eri0o on Tue 21/09/2021 21:28:56
object references could be direct but they use a hashmap/btree
Ouch.

I do not know if eri0o is refering to what I may suspect he's refering to, but if it's about accessing managed objects in script, primary access operations are done through flat array; hashmap is only used for the backward lookup (object pointer to managed handle) and that is done in limited cases (like disposing an object).
Of course hashmap or btree may be somewhat slower than accessing objects directly, but there has to be an actual research to find out if that is the cause of slowdown, and how much does it slow down anything in relative terms.

So, in my experiments I did (unfortunately I blew up the recent  banch experiment with this by accident), there was pretty much no impact at all - or the way I was measuring was wrong. So I just mentioned because I thought it was something that could matter, but when I tested changing, it was a lot of work and I couldn't perceive any gains.

This is other problem, I have feelings and ideas of what I think are impacting performance but when I go to measure stuff, it's never simple or obvious.

VS has a reasonable profiler for instance, in theory I should be able to use it and pickup which things are strangling performance. But in my experiments nothing stands up - it produces flamegraphs and other visualizations.

Crimson Wizard

#7
As far as I know, the biggest recent impact on script perfomance was caused by using a RuntimeScriptValue struct, or rather extra function calls (sometimes virtual function calls) and pointer accesses which followed as a consequence to using this struct. Originally there was no this struct and all memory accesses were direct although anonymous (you could not tell what object or variable is being accessed, not even its type), which was very fast, but due to how the interpreter worked (and how the bytecode was compiled) it was limiting pointers to 32-bit, which of course made this incompatible with 64-bit systems.
There were other reasons and purposes for this change, but today I no longer think these reasons and purposes were right.

In theory it might be possible to rewrite the script interpreter again and solve this differently. It might also be possible to do some partial changes without big rewrite and improve things somewhat.

But knowing that most of trouble comes from backward compatibility, the better way could be to redesign the bytecode and its compilation, and make it not rely on the pointer size when generating code.

Most of the trouble with AGS kept happening because back in 2012 we (the 3 or 4 people who were working on it at the time) used a backward compatible engine (meant for running old games) as a base for the further development. Thus the engine had 2 purposes which often conflicted with each other, causing restriction to what we may do.


PS. And of course there's always an option to replace the scripting language to something which is faster and has better list of features. For instance, years ago several people were advocating for Lua (there even has been a Lua plugin). Personally, my main concern was a huge difference in syntax between Lua and AGS script. Well, this all has been discussed many times on these forums.

Monsieur OUXX

#8
Maybe AGS 4 should be 64 bits and ditch retro-compatibility.


This is from 2018 :

[imgzoom]https://www.extremetech.com/wp-content/uploads/2018/04/Modern-OS-Marketshare-640x367.png[/imgzoom]


96% of Windows installed were already 64 bits.
 

Danvzare

#9
Quote from: Monsieur OUXX on Wed 22/09/2021 14:00:47
Maybe AGS 4 should be 64 bits and ditch retro-compatibility.
As someone who occasionally has to use an old 32-bit Vista laptop, I'm all for this. Despite sometimes having to use that old 32-bit Vista laptop.
If I recall correctly, AGS 4 is going to practically be a fresh start, so ditching the 32-bit support altogether makes sense if it results in a better overall product, especially since so few computers are 32-bit nowadays.
But if there wouldn't be any affect by ditching the 32-bit support, then of course I say keep it.

That's just my two-cents on the matter.



Quote from: Crimson Wizard on Wed 22/09/2021 13:37:47
For instance, years ago several people were advocating for Lua (there even has been a Lua plugin).
Ugh, Lua. I'm glad that change wasn't implemented.  :-X

Crimson Wizard

Quote from: Danvzare on Wed 22/09/2021 14:32:57
If I recall correctly, AGS 4 is going to practically be a fresh start

No, it's fully based on current AGS, with just certain things cut off.

eri0o

I am not sure, but I think by 32 bit compatibility mentioned is the compatibility of running a game made in a 32 bit computer in an old version of AGS that worked differently than today in the current engine on a 64 bit system.

Anyway, we need to support 32 bit processors for Android as far as I can tell there are still a good number of 32 bit arm processors in use.

Crimson Wizard

#12
There's no "32-bit compatibility" in the engine at the moment, rather there's "64-bit compatibility".

The whole problem was that the script's byte-code was made to assume that the variables store memory addresses directly and these are 32-bit values; therefore all of its internal arithmetics are based on that assumption. This is how the script compiler currently works.

There are following alternate solutions which I can think of:

1. Stick to 32-bit memory arithmetics in bytecode, and restrict memory allocations to the lower 32-bit on all systems, at least for data directly referenced by the script. This has been suggested couple of times in the past. This will also keep backward compatibility with old games. As a downside engine may have trouble if game requires lots of RAM. Also existing plugins that create managed objects will not work on 64-bit systems, so there will be a need to change plugin API and make engine allocate these objects for plugins on demand instead.
Tbh I don't know if this is directly supported on all OSes. There's also a way to create a "virtual heap" management in the engine (I think Nick Sonneveld had similar experiments in his repository).

2. Move to 64-bit memory arithmetics completely; will break backward compatibility with older games, and there will be some complications to make it work correctly on 32-bit systems.

3. Create a bytecode converter that adjusts it to the certain pointer bitness, either as a tool or as an engine component. Will make script depend on the system, and if converter is inside the engine will slow down game startup.

4. Redesign the script bytecode and interpreter to not depend on pointer size.
(Additionally, there's always an option to create a bytecode converter that adjusts old game bytecode to the new format, if backward compatibility is wanted.)

eri0o

#13
About bytecode, it's possible to generate both 32bit and 64bit bytecode and the runner just use the one it's compatible with.

(In case 4 is too hard)




Uhm, there's one concept I am not familiar what it means but I see it mentioned when people talk about interpreters, which is JIT, but I don't know what that is. Looking online it appears to be an important thing of why JVM is independent of bytecode "bitness". But again, I don't know what that concept means.




I took a brief look at the compiler and it seems there are some byte offsets that are multiples of 4 (instead of 8 for 64 bit), but out of that I can't readily identify where are the 32 bit limits in the compiler. Now on the running the script, assuming most problems occurs on cc_instance.cpp, it looks there's a lot dependency on 32bit. These are parts of the code I haven't played with much, so this is just from quickly glancing at it.

fernewelten

#14
If you take a “typical” AGS adventure program then I'd say that most of the heavywork is done in functions such as Character.Say. So if the aim is in speeding up the engine, I'd start looking for bottlenecks in these functions first, and then in the way that calls are handled.

Most probably the engine will be bored stiff most of the time waiting for the d*mn user to move their mouse around or click it again at long last.

I'd also plead for getting at actual data as a first step. Collect some typical games and some “problematical” games and get heat maps.

As far as I know, the engine misuses the "high bits" of memory locations in some circumstances to store additional information there (e.g., for linking purposes). They evidently don't believe that even all 32 bits of the address space are needed. So do we have proof that doubling the memory bits will actually buy actual game writers or actual gamers some coffee?

Crimson Wizard

#15
Quote from: fernewelten on Wed 22/09/2021 16:17:26
As far as I know, the engine misuses the "high bits" of memory locations in some circumstances to store additional information there (e.g., for linking purposes). They evidently don't believe that even all 32 bits of the address space are needed.

I don't remember all details right now, but even if these "high bits" are filled - this is before the "fixups". When the real memory address are saved in script data (variables, registers) they don't contain any extra packed information.

Quote from: fernewelten on Wed 22/09/2021 16:17:26
So do we have proof that doubling the memory bits will actually buy actual game writers or actual gamers some coffee?

Back in 2012 it was not a question of providing enough RAM support, it was simply a matter of letting engine work with 64-bit memory addresses. Because without special treatment the address of allocated object could be anything beyond 32-bit offset. Plus there were engine plugins which could also allocate managed objects, and were uncontrollable by the engine.

Whether or not full 64-bit or full 32-bit addresses are required for the script to run is a question to which we have currently no clear answer, and there's a lack of game perfomance statistics in general (and little to no means to gather these, as the engine itself does not provide any means for this).

Personally I would speculate that most games created with AGS so far may fit into 32-bit memory, and those which don't likely are overusing memory due to low optimization (again speculating, but I suspect that majority of game authors are either not much concerned about this, or are not savvy enough to do this properly).

In general, I doubt that standard adventure games *need* to occupy that much memory at once. And even then, most of the mem is likely to be taken by resources (sprites, sounds), and these are not exposed into script VM, so not part of this address issue. What is left for VM addresses is: script variables and managed objects. Most managed objects are merely "wrappers" which contain ID of an actual object in the engine. So probably most managed script memory goes to: dynamic containers (arrays, etc), and dynamic sprites.

eri0o

I in particular would love to be able to do the stuff I can do in löve with Lua but in AGS.

In addition, having a faster language would enable porting some libraries and things to AGS Script directly instead of a plugin. When doing ImGi module I had to be reaaaally economical in my scripting and it still runs much slower than the original version of that library  (microui).

eri0o

I am curious if it would be possible to disconnect the AGS Script runner from AGS Engine to be easier to play with it, and then reconnect it back after. I think this would reduce the scope to see what can be optimized in it.

fernewelten

As far as I know, the Engine does much more than the microprocessor would do that the Engine has been based on. Mainly, double checks, e.g., that the registers contain the type of data that they're supposed to. Seen from this vantage point, the script runner is a “debugging” script runner, only the “debugging” runs all the time no matter whether the code is still debugged.

There might be two angles to speed this up:

  • Make the script runner trust the Compiler. The Compiler only runs once whereas the script runner runs each time. So when the script runner abstains from checking those things that the compiler should have already made sure of, then this can potentially cut out a lot of superfluous runtime activity.
  • Provide a sleek “production”  script runner that only does the barest necessary, side-to-side to the current script runner. Some kind of flag would determine whether the code is run with the careful current runner or with the fast “production” runner. Those users that need a very fast engine could set the flag appropriately and forego the checks. Or else we could set up things as follows: When the game is started with F5 then the engine uses the careful current script runner, when it is compiled with F7 and run then the engine uses the sleek “production” script runner.

eri0o

#19
I would not trust the compiler as much. I don't think the checks are that burdensome - they only happen at the frontier in the API, which is not what I meant.

Anyway, if I went that route I would just add all relevant event registering to the plugin API, and then compile AGS Script to C++, similar to YACC.

Python, Lua, and most script languages they still detect failures at runtime, so the error checking is not something that has to be removed. What could be done in the script runner is run a step where it looks into the compiled assembly code and workout things that can be done to optimize - like, some instructions I believe could be merged at runtime to leverage a more sensible modern AMD64 instruction.

SMF spam blocked by CleanTalk