Supporting pointers in managed structs

Started by Crimson Wizard, Thu 15/07/2021 11:35:59

Previous topic - Next topic

Crimson Wizard

UPDATED in 2023

Test build: https://cirrus-ci.com/task/5586565305466880
Ticket: https://github.com/adventuregamestudio/ags/pull/1923

UPDATED later in 2023

This is now fully merged into ags4 branch.




It looks like we may be few steps away from actually supporting managed pointers inside managed structs.

To clarify, since not everyone knows why they are not supported, the problem is that engine does not know real contents of the struct at runtime, and does not know which members are pointers.
Because of that it cannot deduce if something inside struct also has to be released. And if there were a managed struct with pointers, these will not get deleted, creating a so called memory leak. Basically, this may lead to program crashing with "out of memory" error, because objects to which these pointers-in-struct point never get deleted.
So we had to artificially forbid them.

The proposed solution was to introduce a so called RTTI - table containing description of all the types in the game script. This table would not only tell which types are there, but also, what members do they have, and which types these members are. Specifically for this problem this description must tell which of these members are pointers.
A related ticket: https://github.com/adventuregamestudio/ags/issues/1259


I did a small experiment for fun, generating such table along with the script data, and letting engine consult it when disposing managed structs.
https://github.com/ivan-mogilko/ags-refactoring/commits/experiment--rtti

And it works - if you create a chain of managed structs in script - it correctly deletes everything when necessary without any memory leaks.

The code above is extremely dirty though, so it will take time to prepare a polished version.

eri0o

Hey, could this give some C# like serialization capabilities? (imagining once generics are a thing)

Crimson Wizard

Quote from: eri0o on Thu 15/07/2021 13:18:00
Hey, could this give some C# like serialization capabilities? (imagining once generics are a thing)

Could you give an example of what you are refering to?

Hypothetically, RTTI means reflection, and as a consequence - watching variable values. In perspective - also virtual function tables, but I am not looking so far atm.
Oh, and of course, dynamic type casting - from parent to child.

In regards to generics - no idea if that will be ever possible to implement for ags script, or if it is a suitable language for that sort of thing. But some kind of "any-type" symbol would be nice... like "any dynamic array" for instance, that could be sent into specialized api. Just imagining things.

eri0o

What I meant is if it's possible to be aware in the script of what are the types of struct members, well, it would be possible to traverse them and get each member to workout the serialization, in script - Google's protobuf does this in it's C# implementation, but uses access to the  Marshall Memory after picking the references and sizeof to figure things out.

Generics are the proper way to ensure type safety - among other things, auto-complete is a lot easier to make it work with them.

QuoteRTTI means reflection

Reflection would be huge but this would mean storing variable names too? I think this would be information that is not needed to store. I think Java does by only storing public members of classes (not sure though), see http://openjdk.java.net/jeps/118

fernewelten

Quote from: Crimson Wizard on Thu 15/07/2021 11:35:59
It looks like we may be few steps away from actually supporting managed pointers inside managed structs.
:-D :-D


Crimson Wizard

Bit unrelated, but I've been randomly thinking about all that reflection stuff, and also this feature proposal by fernewelten, and it suddenly came to me that there may be a relatively non-complicated way of implementing a variable watch in the Editor.

Assuming the editor is running a game that it just compiled, it may have a saved table of variables from the compiler (or rather - from tokenizer - a first compilation stage). That gives a mapping between variable names and their memory offsets. Editor may keep that in memory or rather save it to a file somewhere in its workspace for the future use.

When user is running a test, and wants to know current value of a variable, Editor would pass a command into the engine, asking for a value at a certain offset in the script memory (global or function stack), and engine would just return that.

fernewelten

I've been thinking about Crimson's idea a little.

Here's a very tentative and high-level idea that builds on this.

Currently, the Editor is holding a complete copy of the Compiler.
When a program is run in debug mode and halted, then the Editor is running.
This means that the Compiler is available at that point in time and can be called.
It might "compile" an expression such as "Weapon[15].Shield.Damage",
and then you'd get the Bytecode for it, i.e., a step-for-step instruction for the Engine of just how to reach that value at this point in time.
The Engine already knows how to run Bytecode.
It could be configured to save its current state (content of the registers and so on) and then temporarily execute the Bytecode of the expression.
Then it would have the expression result in AX (or in the memory cell that MAR points to), ready to be passed to the Editor to be displayed.
Afterwards, it would retrieve its current state and continue running.

Now we don't need to see the compiler as a cumbersome and heavy-weight blob of software that can only deal with complete programs:
it consists of several separate classes that can operate independently, such as a scanning class..
These parts of the compiler that specifically deal with expressions could be further split off, giving an "expression parser class".
The compiler itself would use this "expression parser class", and the "expression parser class" could also be used independently.
That's what would happen at debugging time:
For each expression to be watched, the debugging mode of the Editor would call the scanner class and then the "expression parser class".

It's inevitable that the compiler will evolve over time.
Modifications will be made, new features will be added, and so on.
In these cases, the "debugger" will neatly follow suit.
It uses the same parts that the compiler uses, so when the compiler changes, the debugger changes.
So the "debugger code" can't become out ouf sync to the "compiler code".

This is an advantage of the approach. But this can also be a disadvantage:
The compiler would no longer be free to do whatever it wants to.
There would be an interface that the debugger uses,
and that interface would need to remain rather stable.
And so the compiler would also need to be rather stable, as concerns calling this interface.

An expression such as  "Weapon[15].Shield.Damage" changes its meaning dependent on the context.
For instance, "Weapon" might be a local variable at some point in time, and another local variable at another point in time.
So we will need to construct the context at the point that is being debugged.
When the compiler runs its proper compiling run, it has this context available: When it's in the middle of compiling some function, it knows exactly what variables are at what stack offsets and so on.
When the debugger runs and wants to query a variable, this context must be reconstructed.
For instance, when the debugger halts, e.g., on line 42 of "globalscript.asc", then the context at this line 42 must be reconstructed.
This means that the compiler must write the various contexts into, e.g., a debugging file in the proper compiling run,
and a function such as "recall_context(debugging_file, "globalscript.asc", 42)" must be provided that the debugger can use.

So in total, when the debugger halts alt line 42 of file "globalscript.asc" and wants to know the value of "Weapon[15].Shield.Damage", it must;
- call recall_context(debugging_file, "globalscript.asc", 42)
- call expression_parse(scan("Weapon[15].Shield.Damage"))
- pass the resulting Bytecode to the Engine for executing
- the Engine must save its current state,
- execute the Bytecode
- report AX (or the value of the memory that MAR points to)
- restore its current state

------

We have:
the compiler
the Engine

We need
- to split off the expression parser from the compiler. This is time-consuming, but will strongly improve the compiler (it will become more modularized and easier to modify and extend).
- to make the compiler write the context information. This information is available in the symbol table. It gets there incrementally, so at the point where a local variable is added or invalidated, this can also be written into the "debugging file". This part is probably fairly easy.
- to write recall_context(). One way of going about it is to start with a given context, e.g., at the start of a function, and then "replay" the context steps until the line in question is reached. This can probably be done with middling effort.
- The last two steps must be developed in tandem, of course.

- to prepare the Engine for
    = saving its state, executing "temporary" bytecode, restoring its state
    = passing register AX (or the value that MAR points to) to the Editor
I don't know enough of the Engine to make educated guesses on how cumbrous this is.

By the way, if the compiler can find an instruction sequence that makes MAR point to the memory that contains the value, then it is possible in principle to "patch in" another value at debugging time. So variables can not only be watched but even modified. This may open a can of worms, so we'd better not do this right from the beginning. But it's a perspective.

eri0o

Isn't it possible to generate something like a .pdb instead of reusing the compiler at runtime? Like a symbol map or a DWARF file.

fernewelten

Quote from: Crimson Wizard on Thu 15/07/2021 11:35:59
I did a small experiment for fun, generating such table along with the script data, and letting engine consult it when disposing managed structs.
https://github.com/ivan-mogilko/ags-refactoring/commits/experiment--rtti

Having a look to see just what the compiler would need to provide.

fernewelten

So currently, what I basically have as type information in the symbol table is this:

A type is either atomic

  • primitive types: int, short, char; float; string
  • enumerations (currently equivalent to int)
or compound

  • non-dynamic array with specific_number elements of a type
  • dynarray of a type
  • dynpointer of type
  • struct, composed of

    • optionally the parent (another struct)
    • a list of (fieldname, type)

There's no built-in restriction in the symbol table itself that compound types must consist of atomic types. Compound types can consist of compound types. (But the compiler ensures with code that the types are limited by what the engine can currently handle. For instance, no dynarrays of structs that contain pointers and non-pointers.)

That's what has already been implemented, i.e., what the (new) compiler is currently working with.

fernewelten

What the engine needs of all this is only in structs, at what offsets the pointers are and what the types of thes pointers are.

We could store this separately for the benefit of the engine as a compact list of (offset, type-id), e.g.,
<id> -->
<number_of_entries>
<offset_1><type_id_1>
<offset_2><type_id_2>
....
<offset_n><type_id_n>

Monsieur OUXX

A bit late to the party, but: generics are hard to implement (at least syntactically in the compiler).
Whereas there's a much more reachable goal post:  an untyped pointer type (for example: Pointer*) that supports casting. This is how genericity was achieved in early languages since ever -- before C++ had templates and so on.  (void* in C, Object in early Javas... Even object in C# if you're a psychopath).

This idea is already partially implemented in AGS when you have a GUIControl* and you cast it to Button*, Label*, etc.


Apart from that I am extremely excited to put my hands on pointers in managed structs.
 

Crimson Wizard

As a heads up, I consider the RTTI (and expanding the managed pointers support in particular) my priority task after 3.6.0 release. So, I return to this issue, and posted an update on a RTTI table format in the ticket:
https://github.com/adventuregamestudio/ags/issues/1259#issuecomment-1412573652

eri0o

Is there a way to have the table of names be built separately so it's not part of the release build? I think the names are not needed unless the game is being debugged? Maybe not consider this on the first interaction of the feature, but something to consider at some point.

Crimson Wizard

Quote from: eri0o on Sat 04/02/2023 12:57:03Is there a way to have the table of names be built separately so it's not part of the release build? I think the names are not needed unless the game is being debugged? Maybe not consider this on the first interaction of the feature, but something to consider at some point.

Technically, the strings may be skipped, because with the updated format (as described in the latest comment in the ticket) the strings are saved in a separate table, so it's a matter of not generating this data during compilation, and skipping the step during writing.

The problem is though that the full type names are used as global keys when merging rtti tables from different script units. Either something else should be used to identify types globally, or these names may be obfuscated.

eri0o

Uhm, do the builtin types (Character, Object, ...) also have this type name? And say, a built-in type from a plugin.

Just trying to figure out what happens with things that have no "script unit" - imagining rtti is already in and I am looking at a variable through the new rtti powered AGS debugger.

Crimson Wizard

#16
Quote from: eri0o on Sat 04/02/2023 14:47:12Uhm, do the builtin types (Character, Object, ...) also have this type name? And say, a built-in type from a plugin.

Just trying to figure out what happens with things that have no "script unit" - imagining rtti is already in and I am looking at a variable through the new rtti powered AGS debugger.

Their "unit" is actually called a "BuiltinScriptHeader" (or something like that).
I imagine plugin's "unit" will also have some generated name like that.

By the way, it is already possible to see these names through a program debugger, if using my new wip branch:
https://github.com/ivan-mogilko/ags-refactoring/tree/ags4--rtti

There's a number of debug fields I added that work as a cross-references between types and fields, and let explore the table after it's built by compiler.
(I think I move these to separate struct later)

Crimson Wizard

#17
Opened two consecutive pull requests (second depends on the first):

RTTI: https://github.com/adventuregamestudio/ags/pull/1922
Support managed pointers inside managed structs: https://github.com/adventuregamestudio/ags/pull/1923

Downloadable temp build:
https://cirrus-ci.com/task/4863501278117888

With this test build you should be able to have pointers inside managed structs, therefore any kind of struct relations. Deleting these structs should not cause memory leaks (the program's memory use should go down if you delete them).
There are still some things not complete, like, managed structs get broken after restoring a save, but that because I did not fix that yet.
Well, see the ticket for the full description of the situation.

EDIT: Oh, another thing, this build also supports "--print-rtti" command line arg, which tells the engine to dump type information into the log.
For instance, you may run your game from command line as:
"gamename.exe --log-file=all --print-rtti" and the type table will be printed in the engine log (e.g. "%USERPROFILE%\Saved Games\Adventure Game Studio" on Windows).

Crimson Wizard

#18
An update: fingers crossed, the saves problem seem to be fixed to some "minimal necessary" degree.
(details are in comments to the ticket https://github.com/adventuregamestudio/ags/pull/1922)

The latest code I wrote there is quite ugly, so I will take couple more days cleaning it up, and testing.

There are still certain things that you may do to managed structs that will break older saves. There may be potential solutions, but not all of these are worth considering at the moment (too much work, and likely not much use on its own).

For example, the biggest break is if you change the order of member variables inside a struct, and pointers will appear at different offsets. This breaks the structs in saves even without pointers, actually.

Crimson Wizard

#19
Updated again. There's a working solution. Can't tell if most optimal one, but it formally works.
Technical details are here: https://github.com/adventuregamestudio/ags/pull/1923#issuecomment-1502145700

May be downloaded and tested with this build (UPDATED 17th april):
https://cirrus-ci.com/task/4872943738552320

WARNING:
* Changes compiled game format and save format;
* Based on AGS 4 experimental version, which means that it will irreversibly update your game project too if you are coming from AGS 3.*. Better test on a project's copy.
* If you are already a AGS 4 user, this does NOT change the AGS 4 project format further, so it's safe to test on your real projects, except you'll have to fully rebuild if changing to a previous AGS 4; also the new engine will produce incompatible saves.

SMF spam blocked by CleanTalk