Why isn't AGS script much, much faster?

Started by Monsieur OUXX, Tue 21/09/2021 12:50:10

Previous topic - Next topic

Crimson Wizard

#40
Today there have been a new breakthrough.
After @eri0o gave me an interesting idea on improving something, I had a "eureka" moment, and pushed that change further, which in accumulation gave around 15% of more improvement.

Here's the test build:
https://cirrus-ci.com/task/4504035709943808

My tests currently are:
1. A 3D space sim (https://github.com/ivan-mogilko/ags-wcproto)
improved from 65 fps to 84 fps
2. A racing game (https://github.com/ivan-mogilko/ags-lastfurious)
improved from 150 fps to 210 fps

So, if my math is correct, that gives around 30% improvement? hmm or even 40% for the second game.

EDIT
Panorama3D demo now runs in 42 fps for me, compared to ~33 fps it had with original engine...

@LimpingFish, could you try it if you have spare moment?

eri0o

#41
I was thinking here, and if we want at some point have both the script runner with checks and without checks in the same engine, it could be done with some preprocessor tricks by separating the script runner in it's own file and including it twice with different defines - allegro library does this. There's maybe some way with c++ templates to do this too, but I don't know much how to use those in C++11 restricted land.

Obviously this is something for a future, what-if case. The results so far is really good! I am sure there are ways to push things further but probably with more complicated rewrites - maybe the approach from Nick's branch, but now I am unsure if the speed improvements from his branch is from not making the checks too.

If we are free to change compiler and interpreter, it looks like we are not benefiting from the register nature as much as we could, as there are stack like operations to pass certain types of values.

eri0o

#42
Just to note here, but recently I was thinking about cache locality and how this plays out with local memory inside a function and member variables in a struct, so I foundout that the order of member variables in a struct is not optimized by the compiler - apparently for compatibility with some serialization assumptions from C. But I found this super fun VS extension on the marketplace:

https://github.com/Viladoman/StructLayout

So I decided to play around with the order of the members of the cc_instance struct and test what this would change regarding fps on these script intensive games running with unlimited fps.

So, with current improvements, for CW space game I get normally 108fps, I managed to make things worse enough to get 98fps. I could not find any order that makes things faster. But, this is one more hidden thing to lookout for when changing things that are ran a LOT of times per frame.

Ah right, in modern processors, according to information on the internet, a cache line is around 64 bytes, so if you can make things that operate accessing only this much memory locally, you get some performance improvements.

So from all this, I feel like reducing branching is a better place to find hidden performance gains so far. Something more for the future, just wanted to note down these things as they were new for me.

Edit: had an idea to change the order of the instructions in the switch, if I put the instructions in the same order as their values (SCMD_ADD first, ...) then I get ~2 fps improvement (108fps->110fps), in the space game.

Also found an interesting old thread and source code.

LimpingFish

#43
Quote from: Crimson Wizard on Sat 29/04/2023 12:21:12Panorama3D demo now runs in 42 fps for me, compared to ~33 fps it had with original engine...

@LimpingFish, could you try it if you have spare moment?

It's definitely faster than other post-3.21 versions.

For the record, I get sub-30fps (which I consider "normal" for my PC's specs) running 2.72, sub-25fps running 3.21 (I mistakenly thought 3.21 was on par with 2.72, but it is marginally slower), sub-15 (closer to 10) running anything past that. With your new version I'm back to sub-25fps. Not quite as fast as 2.72 (which might be down to my machine more that anything), but more or less the same as 3.21, and much better than anything else.

Thanks for taking the time to check it out. :)
                                 
EDIT: Also for the record, I always assumed that my PC just had an easier time running 2.72, and once AGS jumped to 3.x, and moved to newer, modern libraries, that it simply took more effort for my machine to run AGS games (particularly calculation-heavy games, such as those made with Panorama3D), thus resulting in slower fps. I'm not sure if that's true (or if it's even relevant to the conversation!), but it's very interesting to see your improvements making such a difference.
Steam: LimpingFish
PSN: LFishRoller
XB: TheActualLimpingFish
Spotify: LimpingFish

Crimson Wizard

#44
A small update, these performance fixes were merged into the dev master branch, and required few fixes already; so the latest version, if anybody wants to try, is this:
https://cirrus-ci.com/task/5080770293792768

(might be still building at the time of me posting this, so if no downloads are available, wait for 15-20 mins)

I also noticed a curious thing: the builds made by our CI server are little faster than the ones I build on my PC (maybe it uses a newer compiler), so final fps stats may be higher by 3-5 fps.



Here are the stats testing the aforementioned "3D space sim" game using the engine builds from CI:
v3.6.0: 65 fps
master branch (3.6.1 update in dev): 88-90 fps
ags4 branch, using classic compiler: 87-88 fps (got slightly slower)
ags4 branch, using new compiler by @fernewelten: 110-111 fps !! (it produces optimized bytecode)

Testing racing game is more difficult because the fps jumps all the time, so I chose the highest values:
v3.6.0: 150 fps
master branch: 210-215 fps
ags4 branch, using classic compiler: 200-210 fps
ags4 branch, using new compiler: 225-235 fps


SMF spam blocked by CleanTalk