Why isn't AGS script much, much faster?

Started by Monsieur OUXX, Tue 21/09/2021 12:50:10

Previous topic - Next topic

Monsieur OUXX

Context:
AGS has no problem working with high-resolution games : With hardware acceleration it can display lots of sprites and tint them, etc. The engine is also performant enough for scripting effects when you stick to built-in operations (native sprite scaling, native transparency, etc.).

But AGS becomes slower when you want to make the AGS script handle more intensive algorithms, such as iterating on large arrays of hundreds of thousands of items, or trying to perform per-pixel operations on larger sprites, or when your function calls stack exceeds a certain depth.

This is why when it comes to scripting, especially special effects, AGS is better suited for low-resolution games : 320x200 or 640x480. And that's also why advanced shaders written with the AGS scripting language are out of the question.


I sometimes ask myself why that is. What are the reasons why the AGS scripting language is not lightning fast?

Having very little knowledge of the engine's code, I can't seem to come up with a reason :
- It uses SDL, which is very close to the hardware. If not the fastest possible option, it's pretty close.
- It's written in C/C++
- The virtual machine is running pre-compiled code, literally just byte code.

So, what is slow in all this?
Is it that the VM's instructions are not mapped directly to the underlying CPU instructions? Is it that the engine uses simple data structures with a lot of overhead and lookups? Etc.

 

Crimson Wizard

#1
Slow in what exactly? Rendering, running script, else? In which circumstances? Separate components will have separate reasons for being slow. Separate game objects will have separate reasons for being slow too (for instance, GUIs update was very unoptimized until 3.5.1).

Quote from: Monsieur OUXX on Tue 21/09/2021 12:50:10
Is it that the VM's instructions are not mapped directly to the underlying CPU instructions?

AGS script is slower than many other scripting languages, because it's compiled to generally inefficient bytecode; it emulates x86 asm, but have many repeating operations that are redundant in a scripting language. I wrote a post about this many years ago: https://www.adventuregamestudio.co.uk/forums/index.php?topic=47320.0

Additionally, the changes made to the opensourced version made it even slower, for few reasons, primarily making it run on 64-bit machines and backward compatibility.
In regards to 64-bit systems, AGS script was unfortunately designed to store real memory addresses in the bytecode, which is 32-bit, therefore current engine works around that by storing them differently, and that made it slower.
There could've been other way to resolve these problems, but we never had a chance to address these again.

eri0o

Modern CPUs use a lot of advanced techniques to be efficient. I think the code for the script runner and other functions just doesn't use these newer CPU features well.

Generally the code is not written to prevent cache misses - enforce a predictable code path, the CPU drawing/bitmap code doesn't make use of SIMD instructions (allegro4 is before this was a thing...), and object references could be direct but they use a hashmap/btree instead, ... All of these things are not easy to change and also keep compatibility and portability. But these are the main things I kinda noticed... I have made occasional experiences but overall I find the existing performance is alright for my needs so the work required to change is just too much for me to care. Overall, I find it's better to favor portability and flexibility instead of performance.

Monsieur OUXX

I'm mostly talking about the script language being slow. It's a bit suicidal to iterate a lot on some data in AGS, or implementing processing that does a lot of function calls.  Both the stack and data access are rather slow.
Again, it's enough for a 320x200 game, but I'm only just wondering why it's not going at the speed of light.

Quote from: eri0o on Tue 21/09/2021 21:28:56
object references could be direct but they use a hashmap/btree
Ouch.

Quote
AGS script is slower than many other scripting languages, because it's compiled to generally inefficient bytecode; it emulates x86 asm, but have many repeating operations that are redundant in a scripting language. I wrote a post about this many years ago: https://www.adventuregamestudio.co.uk/forums/index.php?topic=47320.0
Interesting read. Thanks!
 

Crimson Wizard

#4
Quote from: Monsieur OUXX on Wed 22/09/2021 12:12:00
Quote from: eri0o on Tue 21/09/2021 21:28:56
object references could be direct but they use a hashmap/btree
Ouch.

I do not know if eri0o is refering to what I may suspect he's refering to, but if it's about accessing managed objects in script, primary access operations are done through flat array; hashmap is only used for the backward lookup (object pointer to managed handle) and that is done in limited cases (like disposing an object).
Of course hashmap or btree may be somewhat slower than accessing objects directly, but there has to be an actual research to find out if that is the cause of slowdown, and how much does it slow down anything in relative terms.
There's much more going on there when working with managed objects.

Monsieur OUXX

Quote from: Crimson Wizard on Wed 22/09/2021 13:01:55
Of course hashmap or btree may be somewhat slower than accessing objects directly, but

Indeed, if it's not for direct access but only for disposing and such it's no big deal.
 

eri0o

Quote from: Crimson Wizard on Wed 22/09/2021 13:01:55
Quote from: Monsieur OUXX on Wed 22/09/2021 12:12:00
Quote from: eri0o on Tue 21/09/2021 21:28:56
object references could be direct but they use a hashmap/btree
Ouch.

I do not know if eri0o is refering to what I may suspect he's refering to, but if it's about accessing managed objects in script, primary access operations are done through flat array; hashmap is only used for the backward lookup (object pointer to managed handle) and that is done in limited cases (like disposing an object).
Of course hashmap or btree may be somewhat slower than accessing objects directly, but there has to be an actual research to find out if that is the cause of slowdown, and how much does it slow down anything in relative terms.

So, in my experiments I did (unfortunately I blew up the recent  banch experiment with this by accident), there was pretty much no impact at all - or the way I was measuring was wrong. So I just mentioned because I thought it was something that could matter, but when I tested changing, it was a lot of work and I couldn't perceive any gains.

This is other problem, I have feelings and ideas of what I think are impacting performance but when I go to measure stuff, it's never simple or obvious.

VS has a reasonable profiler for instance, in theory I should be able to use it and pickup which things are strangling performance. But in my experiments nothing stands up - it produces flamegraphs and other visualizations.

Crimson Wizard

#7
As far as I know, the biggest recent impact on script perfomance was caused by using a RuntimeScriptValue struct, or rather extra function calls (sometimes virtual function calls) and pointer accesses which followed as a consequence to using this struct. Originally there was no this struct and all memory accesses were direct although anonymous (you could not tell what object or variable is being accessed, not even its type), which was very fast, but due to how the interpreter worked (and how the bytecode was compiled) it was limiting pointers to 32-bit, which of course made this incompatible with 64-bit systems.
There were other reasons and purposes for this change, but today I no longer think these reasons and purposes were right.

In theory it might be possible to rewrite the script interpreter again and solve this differently. It might also be possible to do some partial changes without big rewrite and improve things somewhat.

But knowing that most of trouble comes from backward compatibility, the better way could be to redesign the bytecode and its compilation, and make it not rely on the pointer size when generating code.

Most of the trouble with AGS kept happening because back in 2012 we (the 3 or 4 people who were working on it at the time) used a backward compatible engine (meant for running old games) as a base for the further development. Thus the engine had 2 purposes which often conflicted with each other, causing restriction to what we may do.


PS. And of course there's always an option to replace the scripting language to something which is faster and has better list of features. For instance, years ago several people were advocating for Lua (there even has been a Lua plugin). Personally, my main concern was a huge difference in syntax between Lua and AGS script. Well, this all has been discussed many times on these forums.

Monsieur OUXX

#8
Maybe AGS 4 should be 64 bits and ditch retro-compatibility.


This is from 2018 :

[imgzoom]https://www.extremetech.com/wp-content/uploads/2018/04/Modern-OS-Marketshare-640x367.png[/imgzoom]


96% of Windows installed were already 64 bits.
 

Danvzare

#9
Quote from: Monsieur OUXX on Wed 22/09/2021 14:00:47
Maybe AGS 4 should be 64 bits and ditch retro-compatibility.
As someone who occasionally has to use an old 32-bit Vista laptop, I'm all for this. Despite sometimes having to use that old 32-bit Vista laptop.
If I recall correctly, AGS 4 is going to practically be a fresh start, so ditching the 32-bit support altogether makes sense if it results in a better overall product, especially since so few computers are 32-bit nowadays.
But if there wouldn't be any affect by ditching the 32-bit support, then of course I say keep it.

That's just my two-cents on the matter.



Quote from: Crimson Wizard on Wed 22/09/2021 13:37:47
For instance, years ago several people were advocating for Lua (there even has been a Lua plugin).
Ugh, Lua. I'm glad that change wasn't implemented.  :-X

Crimson Wizard

Quote from: Danvzare on Wed 22/09/2021 14:32:57
If I recall correctly, AGS 4 is going to practically be a fresh start

No, it's fully based on current AGS, with just certain things cut off.

eri0o

I am not sure, but I think by 32 bit compatibility mentioned is the compatibility of running a game made in a 32 bit computer in an old version of AGS that worked differently than today in the current engine on a 64 bit system.

Anyway, we need to support 32 bit processors for Android as far as I can tell there are still a good number of 32 bit arm processors in use.

Crimson Wizard

#12
There's no "32-bit compatibility" in the engine at the moment, rather there's "64-bit compatibility".

The whole problem was that the script's byte-code was made to assume that the variables store memory addresses directly and these are 32-bit values; therefore all of its internal arithmetics are based on that assumption. This is how the script compiler currently works.

There are following alternate solutions which I can think of:

1. Stick to 32-bit memory arithmetics in bytecode, and restrict memory allocations to the lower 32-bit on all systems, at least for data directly referenced by the script. This has been suggested couple of times in the past. This will also keep backward compatibility with old games. As a downside engine may have trouble if game requires lots of RAM. Also existing plugins that create managed objects will not work on 64-bit systems, so there will be a need to change plugin API and make engine allocate these objects for plugins on demand instead.
Tbh I don't know if this is directly supported on all OSes. There's also a way to create a "virtual heap" management in the engine (I think Nick Sonneveld had similar experiments in his repository).

2. Move to 64-bit memory arithmetics completely; will break backward compatibility with older games, and there will be some complications to make it work correctly on 32-bit systems.

3. Create a bytecode converter that adjusts it to the certain pointer bitness, either as a tool or as an engine component. Will make script depend on the system, and if converter is inside the engine will slow down game startup.

4. Redesign the script bytecode and interpreter to not depend on pointer size.
(Additionally, there's always an option to create a bytecode converter that adjusts old game bytecode to the new format, if backward compatibility is wanted.)

eri0o

#13
About bytecode, it's possible to generate both 32bit and 64bit bytecode and the runner just use the one it's compatible with.

(In case 4 is too hard)




Uhm, there's one concept I am not familiar what it means but I see it mentioned when people talk about interpreters, which is JIT, but I don't know what that is. Looking online it appears to be an important thing of why JVM is independent of bytecode "bitness". But again, I don't know what that concept means.




I took a brief look at the compiler and it seems there are some byte offsets that are multiples of 4 (instead of 8 for 64 bit), but out of that I can't readily identify where are the 32 bit limits in the compiler. Now on the running the script, assuming most problems occurs on cc_instance.cpp, it looks there's a lot dependency on 32bit. These are parts of the code I haven't played with much, so this is just from quickly glancing at it.

fernewelten

#14
If you take a “typical” AGS adventure program then I'd say that most of the heavywork is done in functions such as Character.Say. So if the aim is in speeding up the engine, I'd start looking for bottlenecks in these functions first, and then in the way that calls are handled.

Most probably the engine will be bored stiff most of the time waiting for the d*mn user to move their mouse around or click it again at long last.

I'd also plead for getting at actual data as a first step. Collect some typical games and some “problematical” games and get heat maps.

As far as I know, the engine misuses the "high bits" of memory locations in some circumstances to store additional information there (e.g., for linking purposes). They evidently don't believe that even all 32 bits of the address space are needed. So do we have proof that doubling the memory bits will actually buy actual game writers or actual gamers some coffee?

Crimson Wizard

#15
Quote from: fernewelten on Wed 22/09/2021 16:17:26
As far as I know, the engine misuses the "high bits" of memory locations in some circumstances to store additional information there (e.g., for linking purposes). They evidently don't believe that even all 32 bits of the address space are needed.

I don't remember all details right now, but even if these "high bits" are filled - this is before the "fixups". When the real memory address are saved in script data (variables, registers) they don't contain any extra packed information.

Quote from: fernewelten on Wed 22/09/2021 16:17:26
So do we have proof that doubling the memory bits will actually buy actual game writers or actual gamers some coffee?

Back in 2012 it was not a question of providing enough RAM support, it was simply a matter of letting engine work with 64-bit memory addresses. Because without special treatment the address of allocated object could be anything beyond 32-bit offset. Plus there were engine plugins which could also allocate managed objects, and were uncontrollable by the engine.

Whether or not full 64-bit or full 32-bit addresses are required for the script to run is a question to which we have currently no clear answer, and there's a lack of game perfomance statistics in general (and little to no means to gather these, as the engine itself does not provide any means for this).

Personally I would speculate that most games created with AGS so far may fit into 32-bit memory, and those which don't likely are overusing memory due to low optimization (again speculating, but I suspect that majority of game authors are either not much concerned about this, or are not savvy enough to do this properly).

In general, I doubt that standard adventure games *need* to occupy that much memory at once. And even then, most of the mem is likely to be taken by resources (sprites, sounds), and these are not exposed into script VM, so not part of this address issue. What is left for VM addresses is: script variables and managed objects. Most managed objects are merely "wrappers" which contain ID of an actual object in the engine. So probably most managed script memory goes to: dynamic containers (arrays, etc), and dynamic sprites.

eri0o

I in particular would love to be able to do the stuff I can do in löve with Lua but in AGS.

In addition, having a faster language would enable porting some libraries and things to AGS Script directly instead of a plugin. When doing ImGi module I had to be reaaaally economical in my scripting and it still runs much slower than the original version of that library  (microui).

eri0o

I am curious if it would be possible to disconnect the AGS Script runner from AGS Engine to be easier to play with it, and then reconnect it back after. I think this would reduce the scope to see what can be optimized in it.

fernewelten

As far as I know, the Engine does much more than the microprocessor would do that the Engine has been based on. Mainly, double checks, e.g., that the registers contain the type of data that they're supposed to. Seen from this vantage point, the script runner is a “debugging” script runner, only the “debugging” runs all the time no matter whether the code is still debugged.

There might be two angles to speed this up:

  • Make the script runner trust the Compiler. The Compiler only runs once whereas the script runner runs each time. So when the script runner abstains from checking those things that the compiler should have already made sure of, then this can potentially cut out a lot of superfluous runtime activity.
  • Provide a sleek “production”  script runner that only does the barest necessary, side-to-side to the current script runner. Some kind of flag would determine whether the code is run with the careful current runner or with the fast “production” runner. Those users that need a very fast engine could set the flag appropriately and forego the checks. Or else we could set up things as follows: When the game is started with F5 then the engine uses the careful current script runner, when it is compiled with F7 and run then the engine uses the sleek “production” script runner.

eri0o

#19
I would not trust the compiler as much. I don't think the checks are that burdensome - they only happen at the frontier in the API, which is not what I meant.

Anyway, if I went that route I would just add all relevant event registering to the plugin API, and then compile AGS Script to C++, similar to YACC.

Python, Lua, and most script languages they still detect failures at runtime, so the error checking is not something that has to be removed. What could be done in the script runner is run a step where it looks into the compiled assembly code and workout things that can be done to optimize - like, some instructions I believe could be merged at runtime to leverage a more sensible modern AMD64 instruction.

Crimson Wizard

#20
Quote from: eri0o on Sat 16/04/2022 17:16:13
I would not trust the compiler as much. I don't think the checks are that burdensome - they only happen at the frontier in the API, which is not what I meant.

What fernewelten is refering to, probably, are checks that are done by the script interpreter during certain bytecode operations. Like - stack operations. They happen all the time, for every little move on the stack (I know that some were disabled in the past, but some may remain).

As for trusting or not the compiler, frankly in all these years there have been a minimal amount of errors found by these checks, maybe one in several years. fernewelten's proposal to have separate "production" interpreter, and "debugger" makes sense, in my opinion. (There has been a ticket regarding this, although it's more like a note, not a full plan: https://github.com/adventuregamestudio/ags/issues/843)




Then, there are things introduced during earlier rewrite, when trying to handle 64-bit problem, and backward compatibility, that may be bad, like virtual method calls in dynamic object wrappers. Where the old interpreter would simply access a memory pointer and read and write in a buffer, the new interpreter will dereference 2-4 pointers and maybe make a virtual method call.
In other operations, where old engine would do a simple pointer math (+/- a single value), the new one does more computation (e.g. see https://github.com/adventuregamestudio/ags/issues/869)
This all objectively slows things down, which may not be too much for a single operation, but will accumulate if there's alot of them, especially when working with dynamic arrays, i think. What makes things worse, some of these were probably necessary only in certain cases, but not all the time, yet affect many actions.

In any case, whether something slows things down or not, and how much, should not be guessed, it has to be measured.

Personally, I believe that it may be worth to go back to the old script interpreter and look at the system awhole, write down notes about the big picture, and search for a better solution to the problems which I and JJS were trying to solve with this rewrite. Maybe some of these problems have much better and simplier workarounds.

For the reference, there also is the earlier ScummVM AGS port done by fuzzie, from which I took some inspiration, and strangely we came to similar ideas about interpreter. My memories of it are vague now, but maybe it's worth checking out how she solved same issues in her rewrite: https://github.com/adventuregamestudio/scummvm/tree/ags




As for merging instructions, this also may be looked into in the parallel. There's an old thread about this here, but there has been no work on that. There's also a code optimizer written by rofl0r called "agsoptimize" here, from what i understand, it does not introduce new instructions, but squashes multiple into existing ones.

BTW rofl0r also has a tool that runs purely bytecode without the engine, if that's what you meant by "disconnect the AGS Script runner from AGS Engine"? it's called agssim:
https://github.com/rofl0r/agsutils/blob/master/agssim.c

fernewelten

#21
Quote from: Crimson Wizard on Sat 16/04/2022 19:25:38
What fernewelten is refering to, probably, is of is checks that are done by the script interpreter during certain bytecode operations. Like - stack operations. They probably happen all the time, for every little move on the stack (I may be mistaken here, because some were disabled in the past).

Yes, that's the gist of what I was trying to say. When the script interpreter is told to float-add two things, for example, it doesn't just do just that, it checks beforehand whether the respective register etc. has been loaded, whether a value of type 'float' has been written into it, and so on. That's good for checking whether the Compiler has done its thing correctly. That mechanism is bound to have discovered lots of bugs by this time.

But the trouble is, if the Compiler has done the job properly that it was programmed for, then those checks might still be done thousands of times at runtime, over and over again, even for the same source code statement and the same block of Bytecode, when it needn't have been done at all.

There are certain things that the Compiler can't check, for instance whether a pointer location that needs to be dereferenced contains a null pointer at runtime. It's sensible to make the script interpreter check these things. The Compiler may even tell the script interpreter to check something specific, by issuing a suitable Bytecode instruction.

On the other hand, there are a lot of things that the Compiler could make sure of, given our AGS language. For instance, it should be able to make sure that a memory location that is supposed to contain a float hasn't been loaded with an int instead. In these cases, a lot of time can be saved by letting the Compiler do its thing, and telling the interpreter: “Look, Buster, Master has told you that these two things are floats and commanded you to add them, so just obey right now without wasting Master's time!”

It would mean requiring that every Bytecode that is given to the Engine has been produced by the Compiler, or at least, that if you do give "hand-written assembly Bytecode" to the Engine, side-stepping the Compiler, you do it at your own risk.

eri0o

#22
Usually the modern interpreters doesn't just run the bytecode, they do a previous step where they do something that is like compiling the bytecode, which does some of these checks. AGS Script is not being manipulated in realtime and eval is being run, and it doesn't have reflection.

It could run the bytecode by a second tool before actually executing, which would interpret the bytecode and make sure things when ran are smooth. This is what I mean by not trusting the compiler. There's no need to forego the checking if we can just run it beforehand, at runtime.

About disconnecting the Script Runner, if it was less entrenched in the engine, this would make two things:


  • ease with improving it
  • reducing the barrier to connect other script runners (e.g.: Lua Jit)

Also just to further explain what I meant before about the YoYo Games Compiler, here's Game Maker Studio:

https://help.yoyogames.com/hc/en-us/articles/235186048-Setting-Up-For-Windows

You can see it can make/run games using VM settings, which is similar to AGS and using YYC, which essentially builds Game Maker Script Language to C++, and then run the resulting code through a C++ Native compiler. This results in a really fast executable. (this would be like building a plugin in AGS, if everything that can be done through scripting could be done through the plugin interface)

Crimson Wizard

#23
Quote from: eri0o on Sat 16/04/2022 21:13:00
About disconnecting the Script Runner, if it was less entrenched in the engine, this would make two things:


  • ease with improving it
  • reducing the barrier to connect other script runners (e.g.: Lua Jit)

As a note, this task will probably have to be completed in order to make engine not connected too hard to current interpreter:
https://github.com/adventuregamestudio/ags/issues/1223

as explained in the ticket itself, where it sais
QuoteThe way these "translator" functions are registered, and because they are using internal engine's types not meant for anything else, prevents from sharing same registration with other potential users.

eri0o

#24
Quotehttps://github.com/adventuregamestudio/ags/issues/1223

I still doesn't understand which type of solution for that you would find acceptable. Like the template approach you mention, would it be accepted?

Or are you looking into the reverse, like the Lua FFI interface (https://luajit.org/ext_ffi.html), which would register C functions to the AGS Script interface, and then it's mostly about exporting the AGS Engine API as C functions. Problem is this second option is not something we could then repurpose for other languages - it would be a new functionality to the script runner.

Looking through AGS code, there's three things there:


  • RuntimeScriptValue / macros like API_OBJCALL_VOID, these apparently are the things that would make the C++ function symbol compatible with being imported in the Script Runtime.
  • ccAddExternalObjectFunction like functions, makes it possible to access the function symbol from the script runner
  • a header for runtime linkage (agsdefns.sh for the internal AGS API)


Crimson Wizard

#25
Quote from: eri0o on Sat 16/04/2022 22:51:11
Quotehttps://github.com/adventuregamestudio/ags/issues/1223

I still doesn't understand which type of solution for that you would find acceptable. Like the template approach you mention, would it be accepted?

I can't tell which "template approach" are you refering to?
Personally I wanted to try the switch method described in the ticket, similar to what fuzzie did in her scummvm port. It may be not hard to make a minimal version with only few registered functions and test for correctness and any perfomance changes.
From the function-registration side this approach requires functions to register along with the "type description" that notes the number and types of parameters, and type of return value. This description may then even be made available through the plugin interface.

Quote from: eri0o on Sat 16/04/2022 22:51:11
Or are you looking into the reverse, like the Lua FFI interface (https://luajit.org/ext_ffi.html), which would register C functions to the AGS Script interface, and then it's mostly about exporting the AGS Engine API as C functions.

I don't know what "FFI" is, I would have to research that first to have any opinion. But the short term goal is to be able to easily share registered functions at least between script interpreter and plugins, because a plugin may implement any scripting language and use engine api to call the functions. Similar to how lua plugin was done in the past.

eri0o

Sorry, FFI means Foreign Function Interface if I am not mistaken. Usually other languages use iterop when they are talking about this, like here, D about C++: https://dlang.org/spec/cpp_interface.html

About template, I mean like evolving from the issue you describe, it's the last proposition you present in your list, on the top post.

About fuzzie port, you meant the interface here: https://github.com/adventuregamestudio/scummvm/blob/ags/engines/ags/scripting/character.cpp#L2130-L2142 ?
(and that fork, the ags branch could be made the default one in GitHub to make it easier to browse)

Crimson Wizard

#27
Quote from: eri0o on Sat 16/04/2022 23:28:20
About template, I mean like evolving from the issue you describe, it's the last proposition you present in your list, on the top post.

Hmm, if you are refering to the paragraphs starting with "An example of a very straightforward solution for type safety could be helper function template, where implementations would deduce a type and pass it further to actual registration."
That was merely a suggestion for registration helpers. It was supposed to be on top of the actual system, and its only purpose is type safety, not changing anything in how it works.

Quote from: eri0o on Sat 16/04/2022 23:28:20
About fuzzie port, you meant the interface here: https://github.com/adventuregamestudio/scummvm/blob/ags/engines/ags/scripting/character.cpp#L2130-L2142 ?

We have a similar interface, but the difference is that she also passes the function type along. In her variant this type is defined as a string "i", "iii" and so on.
But looking at the how this solution works again now, she does not have a switch, which I thought about for some reason. In fact, she came to an opposite variant, where the api functions actually are like our "wrapper" functions. E.g. like this.

This is an alternate approach, which I forgot to mention in the ticket for some reason. It's to instead have this kind of function type exposed to plugin API, and make plugins work with it instead of calling functions of potentially unknown prototype.

The consequence of such approach is:
* any previously existing plugins which use script functions will no longer work;
* plugins will likely have to pack parameters in an array in order to pass them into this function (extra work for them).
* a big "cleanup" work which would replace calls to "real functions" in wrappers with a working code itself, similar to how it's done in fuzzie's port.
I might add this information into the ticket later.

eri0o

it looks like there are three things going on


  • we want the tying of a C or C++ function to a script to be convenient, possibly including being type safe
  • we don't care much about performance for the act of making these available to script, since this will happen only once, before the game starts
  • script calls to C++ function should have a low overhead, since we know the types of things beforehand, this should not need to figure things out at runtime

QuoteWe have a similar interface, but the difference is that she also passes the function type along. In her variant this type is defined as a string "i", "iii" and so on.

I feel there has to be some way to leverage compile time introspection through templates that we could use to tie the primitive C++ types to AGS types. Unfortunately whenever I look into these things (like https://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error) I kinda fail to grasp how to actually code these stuff...

eri0o

I was looking at this: https://github.com/adventuregamestudio/ags/blob/master/Engine/script/runtimescriptvalue.cpp

I had an idea for refactor here, instead of the type being a thing, having different versions of this class from the same interface that each implemented their own behavior instead of all those IFs per type. I don't know if this helps yet, but was looking at it and in theory this would reduce the branches.

Crimson Wizard

#30
Quote from: eri0o on Thu 13/10/2022 21:13:04I had an idea for refactor here, instead of the type being a thing, having different versions of this class from the same interface that each implemented their own behavior instead of all those IFs per type.

Are you speaking of a virtual inheritance? In such case these objects would have to be allocated dynamically too, one at a time, and accessed through a pointer to a base class.

Alternatively you could have a pointer to vtable in each object; then the object itself could be same struct, allocated regularily, but have a pointer to table of functions that may be different (this is C-style of override, seen, for example, in a Allegro4 BITMAP struct).

Quote from: eri0o on Thu 13/10/2022 21:13:04I don't know if this helps yet, but was looking at it and in theory this would reduce the branches.

In theory, majority if not all of these if branches may be removed by replacing if/else with a switch, similarily to how it was done with ReadValue:
https://github.com/adventuregamestudio/ags/blob/master/Engine/script/runtimescriptvalue.h#L316

I tried that recently, but found that it actually reduced the fps a little in a project I've been testing with, so I decided to leave for later.
Maybe I did something wrong, or the test was wrong. Or this particular branching was not the main problem for that particular project.

EDIT:
My belief is that ideally there should not be any branching or behavior switch at all, and all the memory access implemented similarly, somehow.
One major reason this was written in the first place was because AGS compiled script assumes 32-bit pointer size, so it won't work with 64-bit systems. Couple of people have suggested to implement a virtual memory instead, and use virtual 32-bit addresses instead of the real ones, which might fix this issue.
I suppose this is what Nick Sonneveld started to write in one of his experimental branches few years ago.

eri0o

Quote from: Crimson Wizard on Thu 13/10/2022 22:05:15I suppose this is what Nick Sonneveld started to write in one of his experimental branches few years ago.

Uhm, not sure, you mean this branch I think: https://github.com/sonneveld/ags/commits/ags--script

Quote from: Crimson Wizard on Thu 13/10/2022 22:05:15Are you speaking of a virtual inheritance?

Yeahp, that was my thought!

I think the new compiler is a good step in the right direction, and perhaps in ags4 realm there's something that could improve breaking bytecode compatibility, but unfortunately I don't understand that well enough to be able to participate in such discussion. Maybe in the future.

Crimson Wizard

#32
Quote from: Crimson Wizard on Thu 13/10/2022 22:05:15One major reason this was written in the first place was because AGS compiled script assumes 32-bit pointer size, so it won't work with 64-bit systems. Couple of people have suggested to implement a virtual memory instead, and use virtual 32-bit addresses instead of the real ones, which might fix this issue.
I suppose this is what Nick Sonneveld started to write in one of his experimental branches few years ago.

I've been testing a couple of script-heavy games with a Nick Sonneveld's script interpreter's rewrite
https://github.com/sonneveld/ags/commits/ags--script
using "infinite fps mode" (where the game runs as fast as possible without frame delays) and depending on a game and situation it gives about 20-25% improvement in fps, compared to the 3.5.0 engine it was based on. In one game it raised from 70 fps to around 84 fps, in another - from 330 to 400+ fps.

Code-wise it's bit dirty in places, and I don't know if it's fully feature complete.
There are few things that it probably does not do, which current engine does, like being able to address explicit variables from the engine structs exposed to script instead of letting interpreter read/write memory raw without knowing where it reads or writes to (which may be dangerous), but maybe it uses an alternate safety mechanism which I have not understood yet.

It also does very little safechecks, which is a very good thing for performance, but may make debugging for mistake harder. If it had these checks under some compilation flag, - that could improve debugging too.

Implementation-wise, it solves the memory issue by having a joint virtual/real memory mapper. Whenever possible the script data is allocated on the virtual memory "heap", which size is limited by 32-bit, which lets to reference it by using 32-bit offsets instead of real addresses. But when not possible (or not wanted for some reasons) it uses the virtual-to-real mem map (so it translates 32-bit handles to whatever-bit addresses). The latter is like the classic managed objects handles, except it seem to be able to work for anything. I haven't looked too deep into this, but I may imagine this mem map could be used for plugins too, which can allocate on their own and thus cannot be restricted to a virtual heap.

I actually wonder why we haven't tried at least this virtual-to-real map mechanism back in 2012/13, it alone might have been more performant than the solution that I did. It seems a quite logical thing to try.




Separately, I'd like to re-visit two my past comments in this thread:

Quote from: Crimson Wizard on Fri 24/09/2021 17:49:06Personally I would speculate that most games created with AGS so far may fit into 32-bit memory, and those which don't likely are overusing memory due to low optimization.

So, in the recent year it's been found that Dave Gilbert's new full-HD game actually goes above 32-bit RAM limit, but this was mostly due to the graphics. We did number of memory optimizations, which reduced the RAM usage by few hundreds MBs, but apparently reaching the limit is realistic. If this becomes a problem again, we might use 64-bit engine which has a much much more RAM support.


Quote from: Crimson Wizard on Fri 24/09/2021 17:49:06most of the mem is likely to be taken by resources (sprites, sounds), and these are not exposed into script VM, so not part of this address issue. What is left for VM addresses is: script variables and managed objects. Most managed objects are merely "wrappers" which contain ID of an actual object in the engine. So probably most managed script memory goes to: dynamic containers (arrays, etc), and dynamic sprites.

I must correct the last statement here: dynamic sprites do not store the image data in the script memory, it's being stored inside the sprite storage (aka sprite cache), and therefore this data does not have to be restricted by the size or address. The script's memory only stores minimal reference info.

eri0o

#33
Hey, when I played with that Nick branch I kinda didn't push it to GitHub after properly recovering it and lost my work. If you had a somewhat working version of it, I advise to push to GitHub - or somewhere, just to have a backup.

Crimson Wizard

#34
Quote from: Crimson Wizard on Sat 22/04/2023 16:44:03I've been testing a couple of script-heavy games with a Nick Sonneveld's script interpreter's rewrite
https://github.com/sonneveld/ags/commits/ags--script
using "infinite fps mode" (where the game runs as fast as possible without frame delays) and depending on a game and situation it gives about 20-25% improvement in fps, compared to the 3.5.0 engine it was based on. In one game it raised from 70 fps to around 84 fps, in another - from 330 to 400+ fps.

For more experiments, I downported the one "script heavy" game mentioned above to AGS 3.2.1. The results combining several versions of AGS are these:

1. AGS 3.2.1: 95-98 fps
2. AGS 3.4.3: 75 fps
3. AGS 3.5.0: 70-73 fps
4. AGS 3.5.1: 69-70 fps
5. AGS 3.6.0: 65-66 fps (with certain fixes was able to increase to around 68.5 fps so far; UPD: 72 fps now)
6. Nick Sonneveld's interpreter rewrite: 85-89 fps (85 when running a game made in 3.4.3, and 89 when running a game made in 3.2.1 for some reason).

This is just to illustrate the results of me editing the script interpreter in 2012-13 when trying to make it work on 64-bit systems and the engine to have better compatibility with the old games. Basically, starting since AGS 3.3 the script execution lost about 1/3 of its potential speed.

Guess this also answers @Monsieur OUXX 's original question to some degree.

Of course, the above "only" matters when you're doing a huge amount of calculations, and manipulations with data in your game, like 3D matrix math and physics. This may also be related to the way data is stored, and how often do you create and delete dynamic objects (dynamic arrays, managed structs). Although I do not have an assessment on what impact, relatively, managed objects have in this. It might be curious to reimplement this test game I've been using here, and maybe another my game (car racing), relying strictly on non-managed variables, and record the difference, for statistics.

In all the other cases the performance issues would likely be caused by unoptimized graphics, etc.

LimpingFish

I'm way out of my depth here, but I'd just like to point out that Kweepa's Panorama modules run considerably slower in post 3.21 versions of AGS.
Steam: LimpingFish
PSN: LFishRoller
XB: TheActualLimpingFish
Spotify: LimpingFish

Crimson Wizard

#36
Quote from: LimpingFish on Thu 27/04/2023 00:33:28I'm way out of my depth here, but I'd just like to point out that Kweepa's Panorama modules run considerably slower in post 3.21 versions of AGS.

I could imagine it's slower, but do you have any data on this?

For the reference, I tried a demo game from this thread:
https://www.adventuregamestudio.co.uk/forums/modules-plugins-tools/module-panorama3d-v1-3/msg636644446/#msg636644446

Results on my system were:
- Original exe (made in AGS 3.12): 31-33 fps with Software renderer (was called Allegro/DX5); 27-28 fps with Direct3D.
- 3.6.0 engine: ~30 fps when using Software renderer; 25-26 fps with Direct3D/OpenGL.

(I have a medium level 8 years old PC, if that matters)

My first assumption is that the main culprit is inefficient Get/SetPixel command in AGS. EDIT: Ah, not, instead it seems to do this by repeatedly creating dynamic sprites and rotating them each time in "redraw" function.
Of course, if AGS script had some kind of a 3D polygon API, and engine did the main calculation/drawing internally, then things could be done much more efficiently.


UPDATE
I've been experimenting with speed fixes for script lately, and surprisingly my last attempt runs the Panorama Demo faster than original by about couple fps:
https://www.dropbox.com/s/q3ga78b09vo09m6/acwin-361-perffixes.zip?dl=0

Of course I don't assume I made script run faster than 3.1, so my explanation is that this module or older engine had more performance problems elsewhere than the script itself, and newer engines improved that.

eri0o

Quote from: Crimson Wizard on Thu 27/04/2023 01:42:01if AGS script had some kind of a 3D polygon API, and engine did the main calculation/drawing internally, then things could be done much more efficiently.

SDL2 has a polygon API, but it depends on the SDL2 renderer. In SDL3 it should be rewritten to use it's new SDL GPU backend, which is a new API in development in SDL3 for a generic, shader first, 3D rendering - in Metal/Vulkan/Direct12 first spirit. Just mentioning, in case we ever want to walk that road, it should be easier in a not so far future.

eri0o

@Crimson Wizard I tried some of my games, apparently most of my stuff is bounded by the drawing operations, but my ImGi module benefited enormously by your changes.

This module has it's own software render where each drawing command is hashed and it only draws the rectangles where the hash has changed and this math is still reasonably expensive on regular AGS but with your changes the processing in those is more than cut in half.

Crimson Wizard

Final version of a performance fix (for now, probably):
https://cirrus-ci.com/task/6713476341563392

Bumps game fps by 15-20% in the test games with lots of array manipulations.

Crimson Wizard

#40
Today there have been a new breakthrough.
After @eri0o gave me an interesting idea on improving something, I had a "eureka" moment, and pushed that change further, which in accumulation gave around 15% of more improvement.

Here's the test build:
https://cirrus-ci.com/task/4504035709943808

My tests currently are:
1. A 3D space sim (https://github.com/ivan-mogilko/ags-wcproto)
improved from 65 fps to 84 fps
2. A racing game (https://github.com/ivan-mogilko/ags-lastfurious)
improved from 150 fps to 210 fps

So, if my math is correct, that gives around 30% improvement? hmm or even 40% for the second game.

EDIT
Panorama3D demo now runs in 42 fps for me, compared to ~33 fps it had with original engine...

@LimpingFish, could you try it if you have spare moment?

eri0o

#41
I was thinking here, and if we want at some point have both the script runner with checks and without checks in the same engine, it could be done with some preprocessor tricks by separating the script runner in it's own file and including it twice with different defines - allegro library does this. There's maybe some way with c++ templates to do this too, but I don't know much how to use those in C++11 restricted land.

Obviously this is something for a future, what-if case. The results so far is really good! I am sure there are ways to push things further but probably with more complicated rewrites - maybe the approach from Nick's branch, but now I am unsure if the speed improvements from his branch is from not making the checks too.

If we are free to change compiler and interpreter, it looks like we are not benefiting from the register nature as much as we could, as there are stack like operations to pass certain types of values.

eri0o

#42
Just to note here, but recently I was thinking about cache locality and how this plays out with local memory inside a function and member variables in a struct, so I foundout that the order of member variables in a struct is not optimized by the compiler - apparently for compatibility with some serialization assumptions from C. But I found this super fun VS extension on the marketplace:

https://github.com/Viladoman/StructLayout

So I decided to play around with the order of the members of the cc_instance struct and test what this would change regarding fps on these script intensive games running with unlimited fps.

So, with current improvements, for CW space game I get normally 108fps, I managed to make things worse enough to get 98fps. I could not find any order that makes things faster. But, this is one more hidden thing to lookout for when changing things that are ran a LOT of times per frame.

Ah right, in modern processors, according to information on the internet, a cache line is around 64 bytes, so if you can make things that operate accessing only this much memory locally, you get some performance improvements.

So from all this, I feel like reducing branching is a better place to find hidden performance gains so far. Something more for the future, just wanted to note down these things as they were new for me.

Edit: had an idea to change the order of the instructions in the switch, if I put the instructions in the same order as their values (SCMD_ADD first, ...) then I get ~2 fps improvement (108fps->110fps), in the space game.

Also found an interesting old thread and source code.

LimpingFish

#43
Quote from: Crimson Wizard on Sat 29/04/2023 12:21:12Panorama3D demo now runs in 42 fps for me, compared to ~33 fps it had with original engine...

@LimpingFish, could you try it if you have spare moment?

It's definitely faster than other post-3.21 versions.

For the record, I get sub-30fps (which I consider "normal" for my PC's specs) running 2.72, sub-25fps running 3.21 (I mistakenly thought 3.21 was on par with 2.72, but it is marginally slower), sub-15 (closer to 10) running anything past that. With your new version I'm back to sub-25fps. Not quite as fast as 2.72 (which might be down to my machine more that anything), but more or less the same as 3.21, and much better than anything else.

Thanks for taking the time to check it out. :)
                                 
EDIT: Also for the record, I always assumed that my PC just had an easier time running 2.72, and once AGS jumped to 3.x, and moved to newer, modern libraries, that it simply took more effort for my machine to run AGS games (particularly calculation-heavy games, such as those made with Panorama3D), thus resulting in slower fps. I'm not sure if that's true (or if it's even relevant to the conversation!), but it's very interesting to see your improvements making such a difference.
Steam: LimpingFish
PSN: LFishRoller
XB: TheActualLimpingFish
Spotify: LimpingFish

Crimson Wizard

#44
A small update, these performance fixes were merged into the dev master branch, and required few fixes already; so the latest version, if anybody wants to try, is this:
https://cirrus-ci.com/task/5080770293792768

(might be still building at the time of me posting this, so if no downloads are available, wait for 15-20 mins)

I also noticed a curious thing: the builds made by our CI server are little faster than the ones I build on my PC (maybe it uses a newer compiler), so final fps stats may be higher by 3-5 fps.



Here are the stats testing the aforementioned "3D space sim" game using the engine builds from CI:
v3.6.0: 65 fps
master branch (3.6.1 update in dev): 88-90 fps
ags4 branch, using classic compiler: 87-88 fps (got slightly slower)
ags4 branch, using new compiler by @fernewelten: 110-111 fps !! (it produces optimized bytecode)

Testing racing game is more difficult because the fps jumps all the time, so I chose the highest values:
v3.6.0: 150 fps
master branch: 210-215 fps
ags4 branch, using classic compiler: 200-210 fps
ags4 branch, using new compiler: 225-235 fps


SMF spam blocked by CleanTalk