Author Topic: (Upcoming) experiment of supporting pointers in managed structs  (Read 587 times)

Crimson Wizard

  • Local Moderator
    • Lifetime Achievement Award Winner
    • Best Innovation Award Winner 2013, for spearheading the AGS 3.3.0 project
    • Crimson Wizard worked on one or more games that won an AGS Award!
    •  
    • Crimson Wizard worked on one or more games that was nominated for an AGS Award!
It looks like we may be few steps away from actually supporting managed pointers inside managed structs.

To clarify, since not everyone knows why they are not supported, the problem is that engine does not know real contents of the struct at runtime, and does not know which members are pointers.
Because of that it cannot deduce if something inside struct also has to be released. And if there were a managed struct with pointers, these will not get deleted, creating a so called memory leak. Basically, this may lead to program crashing with "out of memory" error, because objects to which these pointers-in-struct point never get deleted.
So we had to artificially forbid them.

The proposed solution was to introduce a so called RTTI - table containing description of all the types in the game script. This table would not only tell which types are there, but also, what members do they have, and which types these members are. Specifically for this problem this description must tell which of these members are pointers.
A related ticket: https://github.com/adventuregamestudio/ags/issues/1259


I did a small experiment for fun, generating such table along with the script data, and letting engine consult it when disposing managed structs.
https://github.com/ivan-mogilko/ags-refactoring/commits/experiment--rtti

And it works - if you create a chain of managed structs in script - it correctly deletes everything when necessary without any memory leaks.

The code above is extremely dirty though, so it will take time to prepare a polished version.
« Last Edit: 15 Jul 2021, 11:42 by Crimson Wizard »

eri0o

Hey, could this give some C# like serialization capabilities? (imagining once generics are a thing)

Crimson Wizard

  • Local Moderator
    • Lifetime Achievement Award Winner
    • Best Innovation Award Winner 2013, for spearheading the AGS 3.3.0 project
    • Crimson Wizard worked on one or more games that won an AGS Award!
    •  
    • Crimson Wizard worked on one or more games that was nominated for an AGS Award!
Hey, could this give some C# like serialization capabilities? (imagining once generics are a thing)

Could you give an example of what you are refering to?

Hypothetically, RTTI means reflection, and as a consequence - watching variable values. In perspective - also virtual function tables, but I am not looking so far atm.
Oh, and of course, dynamic type casting - from parent to child.

In regards to generics - no idea if that will be ever possible to implement for ags script, or if it is a suitable language for that sort of thing. But some kind of "any-type" symbol would be nice... like "any dynamic array" for instance, that could be sent into specialized api. Just imagining things.
« Last Edit: 15 Jul 2021, 13:54 by Crimson Wizard »

eri0o

What I meant is if it's possible to be aware in the script of what are the types of struct members, well, it would be possible to traverse them and get each member to workout the serialization, in script - Google's protobuf does this in it's C# implementation, but uses access to the  Marshall Memory after picking the references and sizeof to figure things out.

Generics are the proper way to ensure type safety - among other things, auto-complete is a lot easier to make it work with them.

Quote
RTTI means reflection

Reflection would be huge but this would mean storing variable names too? I think this would be information that is not needed to store. I think Java does by only storing public members of classes (not sure though), see http://openjdk.java.net/jeps/118

It looks like we may be few steps away from actually supporting managed pointers inside managed structs.
:-D :-D

« Last Edit: 15 Jul 2021, 19:18 by fernewelten »

Crimson Wizard

  • Local Moderator
    • Lifetime Achievement Award Winner
    • Best Innovation Award Winner 2013, for spearheading the AGS 3.3.0 project
    • Crimson Wizard worked on one or more games that won an AGS Award!
    •  
    • Crimson Wizard worked on one or more games that was nominated for an AGS Award!
Bit unrelated, but I've been randomly thinking about all that reflection stuff, and also this feature proposal by fernewelten, and it suddenly came to me that there may be a relatively non-complicated way of implementing a variable watch in the Editor.

Assuming the editor is running a game that it just compiled, it may have a saved table of variables from the compiler (or rather - from tokenizer - a first compilation stage). That gives a mapping between variable names and their memory offsets. Editor may keep that in memory or rather save it to a file somewhere in its workspace for the future use.

When user is running a test, and wants to know current value of a variable, Editor would pass a command into the engine, asking for a value at a certain offset in the script memory (global or function stack), and engine would just return that.

I've been thinking about Crimson's idea a little.

Here's a very tentative and high-level idea that builds on this.

Currently, the Editor is holding a complete copy of the Compiler.
When a program is run in debug mode and halted, then the Editor is running.
This means that the Compiler is available at that point in time and can be called.
It might "compile" an expression such as "Weapon[15].Shield.Damage",
and then you'd get the Bytecode for it, i.e., a step-for-step instruction for the Engine of just how to reach that value at this point in time.
The Engine already knows how to run Bytecode.
It could be configured to save its current state (content of the registers and so on) and then temporarily execute the Bytecode of the expression.
Then it would have the expression result in AX (or in the memory cell that MAR points to), ready to be passed to the Editor to be displayed.
Afterwards, it would retrieve its current state and continue running.

Now we don't need to see the compiler as a cumbersome and heavy-weight blob of software that can only deal with complete programs:
it consists of several separate classes that can operate independently, such as a scanning class..
These parts of the compiler that specifically deal with expressions could be further split off, giving an "expression parser class".
The compiler itself would use this "expression parser class", and the "expression parser class" could also be used independently.
That's what would happen at debugging time:
For each expression to be watched, the debugging mode of the Editor would call the scanner class and then the "expression parser class".

It's inevitable that the compiler will evolve over time.
Modifications will be made, new features will be added, and so on.
In these cases, the "debugger" will neatly follow suit.
It uses the same parts that the compiler uses, so when the compiler changes, the debugger changes.
So the "debugger code" can't become out ouf sync to the "compiler code".

This is an advantage of the approach. But this can also be a disadvantage:
The compiler would no longer be free to do whatever it wants to.
There would be an interface that the debugger uses,
and that interface would need to remain rather stable.
And so the compiler would also need to be rather stable, as concerns calling this interface.

An expression such as  "Weapon[15].Shield.Damage" changes its meaning dependent on the context.
For instance, "Weapon" might be a local variable at some point in time, and another local variable at another point in time.
So we will need to construct the context at the point that is being debugged.
When the compiler runs its proper compiling run, it has this context available: When it's in the middle of compiling some function, it knows exactly what variables are at what stack offsets and so on.
When the debugger runs and wants to query a variable, this context must be reconstructed.
For instance, when the debugger halts, e.g., on line 42 of "globalscript.asc", then the context at this line 42 must be reconstructed.
This means that the compiler must write the various contexts into, e.g., a debugging file in the proper compiling run,
and a function such as "recall_context(debugging_file, "globalscript.asc", 42)" must be provided that the debugger can use.

So in total, when the debugger halts alt line 42 of file "globalscript.asc" and wants to know the value of "Weapon[15].Shield.Damage", it must;
- call recall_context(debugging_file, "globalscript.asc", 42)
- call expression_parse(scan("Weapon[15].Shield.Damage"))
- pass the resulting Bytecode to the Engine for executing
- the Engine must save its current state,
- execute the Bytecode
- report AX (or the value of the memory that MAR points to)
- restore its current state

------

We have:
the compiler
the Engine

We need
- to split off the expression parser from the compiler. This is time-consuming, but will strongly improve the compiler (it will become more modularized and easier to modify and extend).
- to make the compiler write the context information. This information is available in the symbol table. It gets there incrementally, so at the point where a local variable is added or invalidated, this can also be written into the "debugging file". This part is probably fairly easy.
- to write recall_context(). One way of going about it is to start with a given context, e.g., at the start of a function, and then "replay" the context steps until the line in question is reached. This can probably be done with middling effort.
- The last two steps must be developed in tandem, of course.

- to prepare the Engine for
    = saving its state, executing "temporary" bytecode, restoring its state
    = passing register AX (or the value that MAR points to) to the Editor
I don't know enough of the Engine to make educated guesses on how cumbrous this is.

By the way, if the compiler can find an instruction sequence that makes MAR point to the memory that contains the value, then it is possible in principle to "patch in" another value at debugging time. So variables can not only be watched but even modified. This may open a can of worms, so we'd better not do this right from the beginning. But it's a perspective.
« Last Edit: 03 Aug 2021, 08:19 by fernewelten »

eri0o

Isn't it possible to generate something like a .pdb instead of reusing the compiler at runtime? Like a symbol map or a DWARF file.
« Last Edit: 04 Aug 2021, 12:07 by eri0o »

I did a small experiment for fun, generating such table along with the script data, and letting engine consult it when disposing managed structs.
https://github.com/ivan-mogilko/ags-refactoring/commits/experiment--rtti

Having a look to see just what the compiler would need to provide.

So currently, what I basically have as type information in the symbol table is this:

A type is either atomic
  • primitive types: int, short, char; float; string
  • enumerations (currently equivalent to int)
or compound
  • non-dynamic array with specific_number elements of a type
  • dynarray of a type
  • dynpointer of type
  • struct, composed of
    • optionally the parent (another struct)
    • a list of (fieldname, type)

There's no built-in restriction in the symbol table itself that compound types must consist of atomic types. Compound types can consist of compound types. (But the compiler ensures with code that the types are limited by what the engine can currently handle. For instance, no dynarrays of structs that contain pointers and non-pointers.)

That's what has already been implemented, i.e., what the (new) compiler is currently working with.
« Last Edit: 08 Aug 2021, 14:06 by fernewelten »

What the engine needs of all this is only in structs, at what offsets the pointers are and what the types of thes pointers are.

We could store this separately for the benefit of the engine as a compact list of (offset, type-id), e.g.,
<id> -->
<number_of_entries>
<offset_1><type_id_1>
<offset_2><type_id_2>
....
<offset_n><type_id_n>
« Last Edit: 08 Aug 2021, 19:01 by fernewelten »

Monsieur OUXX

  • Mittens Vassal
  • Cavefish
  • Mittens Half Initiate
    • I can help with proof reading
    • I can help with translating
    • I can help with voice acting
    • Monsieur OUXX worked on one or more games that won an AGS Award!
    •  
    • Monsieur OUXX worked on one or more games that was nominated for an AGS Award!
A bit late to the party, but: generics are hard to implement (at least syntactically in the compiler).
Whereas there's a much more reachable goal post:  an untyped pointer type (for example: Pointer*) that supports casting. This is how genericity was achieved in early languages since ever -- before C++ had templates and so on.  (void* in C, Object in early Javas... Even object in C# if you're a psychopath).

This idea is already partially implemented in AGS when you have a GUIControl* and you cast it to Button*, Label*, etc.


Apart from that I am extremely excited to put my hands on pointers in managed structs.
« Last Edit: 03 Sep 2021, 12:04 by Monsieur OUXX »