Supporting pointers in managed structs

Crimson Wizard · Tue 18/04/2023 18:39:44

Unfortunately, this is still not done. There's another problem which I forgot about: it's circular dependencies. The objects not referenced from script may still reference each other, which keeps them in memory until the program ends.

I never researched how this is done in other managed languages, but from a quick check they somehow are able to distinguish whether reference is from a script or internally from an object, and then consider every object that does not have a "script" reference a garbage amd remove it during garbage collection.

Need to invent something similar for AGS script now.

I really wish someone mentioned this earlier to me! Maybe it's going to be even larger work than reflection with RTTI... which means this functionality is not as close to completion as i supposed.

Crimson Wizard · Tue 18/04/2023 20:27:50

Speaking theoretically, the GC should be able to detect, which managed objects are referenced from within the normal script memory, and which are not. If an object is not referenced from the script mem, - such object is detached and considered garbage.

NUANCE: engine may also hold "internal" managed references, so GC must also take that into account (not certain how yet).

Supposing we only run GC between the scripts, during the game frame update. That means that the only place we need to check is global script memory (because stack will be reset at the time). If the script memory would have contained hints on the data types in it, we could know where are "dynamic object handles" among it.

Hypothetically, the GC process then may work like this:

* scan the global script memory;
* for each value in that memory, which has a "dynamic handle" type,
* mark that handle as used;
* retrieve the dynamic object itself, find out if there are managed handles inside too, mark these as used;
* repeat recursively for each connected dynamic object?

* after all this, compare list of existing handles with the list of "used" ones, and dispose objects with "unused" handles.

eri0o · Tue 18/04/2023 21:40:01

I don't think I get quite well what circular dependency case you mean here, can you explain it with some example?

I am curious if it's something that would be possible to detect at compile time and just not allow for the time being.

QuoteSupposing we only run GC between the scripts, during the game frame update

Would the GC not need to happen on each scope change? Or would this only affect the circular dependent object?

Crimson Wizard · Tue 18/04/2023 21:49:18

Quote from: eri0o on Tue 18/04/2023 21:40:01I don't think I get quite well what circular dependency case you mean here, can you explain it with some example?

Simply when two objects reference each other, directly or indirectly.

Code: ags

// Example 1
struct LinkedItem {
    LinkedItem* next;
    LinkedItem* prev;
};

// Example 2
struct Parent {
    Child* children[];
};

struct Child {
    Parent* parent;
};

Quote from: eri0o on Tue 18/04/2023 21:40:01
QuoteSupposing we only run GC between the scripts, during the game frame update

Would the GC not need to happen on each scope change? Or would this only affect the circular dependent object?

What is "scope change"?

GC does not have a strict requirement to happen on particular events, it's okay to run it periodically.
It may be run by a timer, or under some trigger conditions based on managed pool statistics.

It also may be run on a separate thread, but only during some script-unrelated processes, like render or game update, that do not make any immediate script runs.

Crimson Wizard · Tue 18/04/2023 22:16:55

Quote from: Crimson Wizard on Tue 18/04/2023 20:27:50If the script memory would have contained hints on the data types in it, we could know where are "dynamic object handles" among it.

In regards to this: as the RTTI feature is already merged to ags4, it's potentially possible to expand and generate a script global variables table, which would have a structure similar to RTTI tables.

RTTI already has a Field type, which may as well be reused here.
It may contain even variable names, but that's not essential, and may be skipped for now.

To summarize, the necessary changes for GC would require:
1) compilers generating a reflection table for script's global variables (each script); the table is saved as another extension to script format, similar to how RTTI table is saved now.
2) writing a search algorithm that finds managed handles in the script memory.
3) figuring out a good trigger condition for running GC.

PS. I'd need to write a proper ticket for this; imo it's better to do this separately, and prior to the task of supporting nested managed pointers.

eri0o · Tue 18/04/2023 22:27:49

Not sure if it's because I am doing a lot of python recently, but the first result for garbage collection in Google for me was this: https://devguide.python.org/internals/garbage-collector/

From the link, see

QuoteIn this example, container holds a reference to itself, so even when we remove our reference to it (the variable "container") the reference count never falls to 0 because it still has its own internal reference. Therefore it would never be cleaned just by simple reference counting. For this reason some additional machinery is needed to clean these reference cycles between objects once they become unreachable. This is the cyclic garbage collector, usually called just Garbage Collector (GC), even though reference counting is also a form of garbage collection.

Crimson Wizard · Tue 18/04/2023 22:36:09

Quote from: eri0o on Tue 18/04/2023 22:27:49Not sure if it's because I am doing a lot of python recently, but the first result for garbage collection in Google for me was this: https://devguide.python.org/internals/garbage-collector/

Alright, I will read this, maybe this will give ideas how to optimize the whole thing...

Previously I found this funny article about how to write a GC memory in C language:
https://maplant.com/gc.html

eri0o · Tue 18/04/2023 22:38:28

If you want to instead play around in a debugger, I found an implementation of it that is not that big

https://github.com/micropython/micropython/blob/master/py/gc.c

I am trying to look if I find some version of this that is even smaller, perhaps for some of those tiny languages available in GH. But this one has a "design document" to compare it seems.

But the most interesting parts of the code above are the comments and the general skeleton of the garbage collector.

Crimson Wizard · Tue 18/04/2023 22:55:57

Alright, after reading this article, I can see that Python's gc has an opposite approach, where it does not check which pointers exist in script, instead it checks only the pointers in the "managed objects", to speak in AGS terms.

In a nutshell, there's a separate "reference counter" meant only for GC. It's initialized with real "reference counter" at the start. Then GC scans all the managed objects, and subrefs the objects it references, but it changes the "gc ref count" (not the real one). After that the objects that have 0 ref counts remaining are the ones that do not have link from the script memory.

Then it does a second pass, restoring "gc ref counts", but only starting at the objects that still kept positive "gc ref count" (these are ones that have a link from the script memory).

Finally in the end only the completely unattached objects remain with "gc ref count" = 0.

This is all really clever!, and this is possible to do in AGS without adding anything extra to script format, or checking internal engine refs; because this algorithm does not care about all that, it only cares to find out which refs come from withing the managed objects themselves.

eri0o · Wed 19/04/2023 00:15:26

So, rereading your explanation and the article on python garbage collection, I think any actual code reference has to be the actual python code, so if necessary, it's here: https://github.com/python/cpython/blob/main/Modules/gcmodule.c (I mostly like code references to steal variable names...)

But I would say to stick to the concepts you narrowed down in your comment, which I think is a good reference for the AGS version of a GC.

The MicroPython code - lets call it just MP to avoid confusion - is doing something a bit different, which I think is some optimization on top of a Mark and Sweep implementation. A note that like, 20% of issues from MP are Garbage collector related - it seems it matters a lot in embedded systems. The Mark and Sweep in the wikipedia article on GC, it looks like it's the basic GC to implement - not optimal but reasonably minimal. There is a good visual in the article for it.

Crimson Wizard · Wed 19/04/2023 00:42:00

The "Mark & Sweep" is probably the method which I described earlier, but, from what I understood, it requires to have a "root" reference which AGS does not have. This could be worked around by generating script reflection.

Comparing these two, I'd rather first try the one that does not require this.

eri0o · Wed 19/04/2023 01:11:33

I found another text for the same pictures in the Python GC article here: https://csrgxtu.github.io/2020/02/18/CPython-s-Garbage-Collector/

Reading this shorter version I could understand it better how it works. (there is also a for dummies video here, but it doesn't go into specifics)

Spoiler

thinking about the concept of generations Python GC implements, for optimization, if we want to do that at some point, it could be like, a collection that runs per frame and the old generation is collected in room transitions.

thinking about the basic of the garbage collection, I wonder if it needs the managed pool, the rtti for cross-object references, but not the object themselves for it to work, meaning it can be implemented reasonably separated - which I guess is what you meant when you said the same thing.

Design wise, maybe the GC holds a reference to both things. And I was thinking, that, in the text, when it's designing the memory layout, it mentions that because it's a better performance to do that way instead of the GC holding a map that would contain the objects and their extra properties (gc_ref), and also because this avoids the need to keep things in sync.

[close]

Quote from: Crimson Wizard on Wed 19/04/2023 00:42:00Comparing these two, I'd rather first try the one that does not require this.

Agreed.

Crimson Wizard · Thu 20/04/2023 01:24:02

Alright, I wrote something, based on Python's idea:
https://github.com/adventuregamestudio/ags/pull/1923#issuecomment-1515510131
Test build:
https://cirrus-ci.com/task/6681544534786048

It seems working using the simple script test:

Spoiler

Code: ags

managed struct ListItem
{
    ListItem* prev;
    ListItem* next;
    
    String name;
    int data;
};

managed struct LinkedList
{
    ListItem* first;
    ListItem* last;
};

import ListItem* newItem(ListItem* add_after, String name, int data);
import void      detachItem(ListItem* item);
import ListItem* findNthItem(ListItem* first, int n);
import void      printItems(ListItem* first, ListBox* lb);

import ListItem* addItem(this LinkedList*, String name, int data);
import ListItem* remItem(this LinkedList*, ListItem* item);
import ListItem* remItemN(this LinkedList*, int index);
import void      clear(this LinkedList*);

Code: ags


ListItem* newItem(ListItem* add_after, String name, int data)
{
    ListItem* item = new ListItem;
    item.name = name;
    item.data = data;
    if (add_after != null)
    {
        ListItem* next = add_after.next;
        add_after.next = item;
        item.prev = add_after;
        if (next != null)
        {
            next.prev = item;
            item.next = next;
        }
    }
    return item;
}

void detachItem(ListItem* item)
{
    if (item.prev != null)
        item.prev.next = item.next;
    if (item.next != null)
        item.next.prev = item.prev;
    item.next = null;
    item.prev = null;
}

ListItem* findNthItem(ListItem* first, int n)
{
    ListItem* item = first;
    for (int i = 0; i < n; i++)
    {
        if (item.next == null)
            return null;
        item = item.next;
    }
    return item;
}

void printItems(ListItem* first, ListBox* lb)
{
    lb.Clear();
    if (first == null)
        return;
    ListItem* item = first;
    do
    {
        lb.AddItem(String.Format("%s : %d", item.name, item.data));
        item = item.next;
    }
    while (item != null);
}

ListItem* addItem(this LinkedList*, String name, int data)
{
    ListItem* item = newItem(this.last, TextBox1.Text, Random(32000));
    if (this.first == null)
        this.first = item;
    this.last = item;
    return item;
}

ListItem* remItem(this LinkedList*, ListItem* item)
{
    if (item == this.first)
        this.first = item.next;
    if (item == this.last)
        this.last = item.prev;
    detachItem(item);
    return item;
}

ListItem* remItemN(this LinkedList*, int index)
{
    ListItem* item = findNthItem(this.first, ListBox1.SelectedIndex);
    if (item == null)
        return null;
    this.remItem(item);
    return item;
}

void clear(this LinkedList*)
{
    this.first = null;
    this.last = null;
}

[close]

eri0o · Sun 23/04/2023 23:13:05

Hey, after applying this, would pursuing a refactor to port the script rewrite idea from Nick's branch much harder or the same difficulty? Because if it doesn't change much, perhaps applying it and later figuring out how to recover some performance by applying that approach would be a way to go.

Crimson Wizard · Sun 23/04/2023 23:29:10

The updated version, with some fixes related to game saves, and plugins which create their own objects:
https://cirrus-ci.com/task/5586565305466880
I made a comment about performance:
https://github.com/adventuregamestudio/ags/pull/1923#issuecomment-1519166283

In summary, it reduces slightly, but for me it was only noticeable in script-heavy games when running in "unlimited fps" mode. But I have not tested anything "heavy" that would have objects (cross-)referencing objects, simply because I don't have a ready game with that.

Still, some things in this branch were done as a quick first attempt, and there's room for improvement. Some optimization may be done quicker, other would require a redesign of e.g. managed object storage internally.

Quote from: eri0o on Sun 23/04/2023 23:13:05Hey, after applying this, would pursuing a refactor to port the script rewrite idea from Nick's branch much harder or the same difficulty?

No, I do not think that it makes it more difficult, as the main principles of the script executor's and managed pool's work were not changed.
The biggest changes are related to
* generating extra data (rtti) as a post-step in script compiling;
* loading extra data (rtti), and generating helper data from it;
* disposing user structs and dynamic arrays (changes are contained within Dispose method);
* garbage collector was practically correctly written now.

Crimson Wizard · Wed 03/05/2023 15:41:29

Updated version, built on top of the script performance fixes:
https://cirrus-ci.com/task/6166331867791360
New related comment on performance:
https://github.com/adventuregamestudio/ags/pull/1923#issuecomment-1532781830

Crimson Wizard · Thu 11/05/2023 18:45:46

Well, I've been waiting for anybody interested in testing this, but it's been too long, so I ~~cancelled the feature~~ merged this into ags4 branch.