Script code optimization

Started by Crimson Wizard, Tue 25/12/2012 20:31:34

Previous topic - Next topic

Crimson Wizard

Seeing as my changes to script interpreter caused noticable slowdown on PSP port (don't know about others), I became interested in how much feasible is would be to optimize the script byte code at runtime.
I say runtime, because we deal with existing games. If this concept will prove useful, the optimization could be added to the editor too.

(I have other ideas on how to improve the speed a bit, but that's irrelevant in the context of this thread.)

Thing is that AGS byte code has lots of tiny assembler-like sub-operations for each basic operation, that may be logical from compiler's "point of view", but is somewhat excessive from the engine's "point of view". And if we keep in mind, that each instruction is associated with number of safety checks, address fixups, etc, this results in significant overhead.

But what if we parse the script at load, and "fix" it up to certain degree?

Please, correct me if I am wrong in my reasonings anywhere below.
Let's examine couple of code "snippets" (I chose them at random while debugging real game).

1. This is how a comparison of a variable value with literal value is done:
Code: "asm"

PUSHREG ax         <--- copy first value from Ax register to stack
LITTOREG ax, arg2  <--- copy second (literal) value to Ax register
POPREG bx          <--- copy first value from stack to Bx register
LESSTHAN bx, ax    <--- compare first value to second and store boolean result in Bx register
REGTOREG bx, ax    <--- copy result from Bx to Ax

What we basically do here, is comparing Ax with literal value and storing result in Ax. So, this is practically the same as:
Code: "asm"

LESSTHANLIT ax, arg2  <--- compare first value in Ax with to second (literal) and store boolean result in Ax register

Here I "invented" new LESSTHANLIT instruction that compares register with literal value.

2. Allocating a local variable on stack:
Code: "asm"

LITTOREG ax, arg2  <--- copy the literal number (initial value) to Ax register
REGTOREG sp, mar   <--- copy stack ptr address to memory address register (MAR)
MEMWRITE ax, mar   <--- write value from Ax register to where MAR points to (here - stack ptr)
ADD      sp, arg2  <--- advance stack ptr by N bytes (4, if that was integer), thus completing new local variable

This is practically the same as:
Code: "asm"

MEMWRITE arg1, sp  <--- write literal number at stack pointer
ADD      sp, arg2  <--- advance stack ptr by N bytes

Or, if we go further and invent new "allocate integer on stack" instruction, this may be done in one turn, something like:
Code: "asm"

PUSHINT32 arg1     <--- write integer at stack ptr and advance stack ptr by 4 bytes



And so forth.

Of course, this requires careful implementation, for we must correctly define which sequences produce full "meta-operation", otherwise we may screw up next instructions. Also, to what extent should we limit our optimization; like in second example, is it ok to create new instruction for such specific case as pushing integer to stack, or we should've stop at converting 4 instructions to 2?

JJS

I think inventing new instructions is a good idea if it gives more speed.

Of course implementation really has to be done carefully because a register that appears to be temporary could theoretically be reused later. But I don't know if the compiler is smart enough to do this or if it emits the stack push/pop operation every time it accesses a variable.
Also it has to be clarified how much the compiler interleaves instruction blocks, this will influence how easy it is to automatically identify the meaning behind the code for rewriting.
Ask me about AGS on PSP, Android and iOS! Source, Daily builds

Crimson Wizard

#2
After some more thinking, I decided to scrap the idea to seek for certain code sequences, because that would result in long execution (reading instructions, appending to sequence, comparing with each known variant).
There probably could be better way, and more generic one too. The idea behind optimization is to skip intermediate calculations when the result could be deduced from starting conditions. Only operations that adds uncertainty are calls to script functions.
Also, program is mainly about passing values from one place to another, and arithmetics.

This means we could read the code, remembering the calculation changing history. As soon as we meet instruction with undefined result, or end of operation (LINENUM instruction, for instance), we observe what is known and squeeze the code.

Example: setting global variable's value:
Code: asm

LITTOREG ax, integer
LITTOREG mar, global data ptr
MEMWRITE ax (write ax value at address stored in mar)

The history:
Code: text

integer X is literal, originates from byte-code array, at some offset
integer X moves to Ax
address Z is literal (fixupped actually), originates from byte-code array at some offset
address Z moves to MAR
integer X is taken from Ax
address Z is taken from MAR
X is written at address Z

Optimization would be one instruction: write literal X to destination, defined by literal Z, with 2 arguments (both literals, one requires address fixup).

Not that it is trivial task... but probably worth to try out some day.
Also, I am still not sure how fast that would be to process all existing scripts anyway. As an option, engine could do this only once, and store script cache in game's directory for the future use.

Quote from: JJS on Wed 26/12/2012 10:10:26
Of course implementation really has to be done carefully because a register that appears to be temporary could theoretically be reused later. But I don't know if the compiler is smart enough to do this or if it emits the stack push/pop operation every time it accesses a variable.
That's a good question. I think that it does not reuse registers, but more tests could be made to make sure.



//--------------------------
EDIT:
In fact, we may not need all the history of movements. What we need is value's origin (where it came from) and current location.
But what about arithmetic operations? The "location" may be defined as "result of instruction with certain arguments".

Example: int a = 10 + 20;
Code: asm

LITTOREG ax, 10
PUSHREG ax
LITTOREG ax, 20
POPREG bx
ADDREG bx, ax
REGTOREG bx, ax
REGTOREG sp, mar
MEMWRITE ax
ADD sp, 4


So, we have value A (10), value B (20), value C (result) and value Z (stack pointer).
Code: text

A: origin: literal (code array)
A: current location: Ax, then stack
B: origin: literal (code array)
B: current location: Bx
C: origin: result of instruction ADD with args A and B.
C: current location: Bx, then Ax
Z: origin: stack pointer
Z: current location: MAR
C is being written at Z.

SpeechCenter

Quote from: Crimson Wizard on Tue 25/12/2012 20:31:34
Seeing as my changes to script interpreter caused noticable slowdown on PSP port (don't know about others)...
How much slower is it now compared to before? Have you considered profiling the code on PSP (or if no tool is available try to add measurements)?
The reason I'm mentioning this is that profiling usually reveals a lot of information that is very hard to know just by viewing the code. If in fact the solution is some kind of JIT optimizer then profiling will also help knowing where it's best to focus the effort.

Crimson Wizard

All I know is that there were script-heavy places in game which run with 7 fps, and now they run with 4 fps. So, it's nearly twice slower.
I don't own PSP myself and frankly have little idea about the system. But I know few places in the code that have excessive work being done, so I am going to fix them first and see if that improves anything.

Monsieur OUXX

I think it's a great idea to introduce new instructions.

I suppose that historically the old instructions were meant to map the x86's assembler as closely as possible, when AGS was running on MS-DOS, or later on Windows in "real mode".
But nowadays, there are so many layers between the engine and the actual processor. Therefore I think it's OK to invent new instructions that will require less parsing and will let Windows do the actual computing.
You could say that parsing the engine's "assembler" has become more costly than executing fancy calculations.

Actually, my opinion would be that, on the long run, the engine should contain as many instructions as there are built-in high-level instructions in the AGS API -- but that's off-topic.


 

SpeechCenter

I read the instructions and followed on what they do and there is quite a lot already and not the simplest to know. I wouldn't recommend adding more instructions without some kind of review process of the existing opcodes and the proposed new ones.

The idea of runtime optimizer is very common when you have an IL and it could also help when dealing with specific processor architecture.

SMF spam blocked by CleanTalk