SUGGESTION : Data structures classes

Monsieur OUXX · Mon 13/02/2017 09:51:50

OK, I know this is not easy, but bear with me.

The biggest issue I encounter when programming in AGS is the lack of data structures classes -- stacks, heaps, dictionaries, linked lists, hashmaps, etc.
Apart from arrays (whether dynamic or static), there's nothing.

Why I'm saying that :

Spoiler

At the moment, even arrays (even dynamic arrays!) are a real pain. Because even if you write a fancy module (to implement, let's say, a stack) (and God knows I and others have done that a trillion times), then, because of AGS language limitations, it will immediately become quite complex to make an array of those custom "stacks". Close to impossible to store that array into another array. And clearly impossible to reach a third level of arrays referencing each other.

Are you wondering why on earth that would be required?

Spoiler

Well, that sort of things is required all the time when you want to do "serious" progrmaming.

Look, for the skeptics, here is an example that I'm inventing right now, on the spot :
- a sprite is virtually an array of int
- a collection of sprites (for example, a loop) is an array of the previous
- a view is an array of the previous
- a character has an array of views
- a game has an array of characters
See, even before you realize it, you have 5 levels of arrays referencing each other.

[close]

The example above is taken from AGS built-in situations. So you don't need such strcuture. But as soon as you're trying to escape that, you're screwed. if some arrays contain int, some contain String, some DynamicSprite, etc. ... It's a nightmare.

Some clever people manage to get away with it (temporarily) by resorting to some tricks, such as :
- keeping a unique int ID for each item in their collections (that they can then use as some sort of pointer)
- moving the data storage array outside of the "struct" to be able to create arrays of complex types
- providing a unique interface (a struct with public functions) for all the data structures of the same type (for example, one struct dedicated to all stacks of int.

But then again, you quickly reach the maximum level of complexity for a normal human being if you want each new array to be put into another array. Especially if you don't just grow your data strucutres, but (a perfectly legitimate need) also want to be able to reduce memory use each time you delete stuff. You quickly have a case of memory management involved. At first it's easy. But after 3 levels of referencing between arrays, storing different data types, good luck!

[close]

I hope I've convinced you. Every language does that. Not AGS.

So now, for the suggestion :
The most natural suggestion would be...

Spoiler

... to allow pointers to unmanaged structures, allow (dynamic) arrays of them, and let arrays of structs store such dynamic arrays --or that kind of difficult stuff. I say difficult, because it requires deep modifications of the script parser and compiler. Only someone with a good understanding of the AGS parser classes and the virtual machine pseudo-code can do that.
I'm sure it will come eventually, it's on its way. CW and the boys are working hard on it. But there's no guarantee.

[close]

It would take forever, so I'm putting that aside.

There's something much simpler though.
Think fo the way GUIControl* is implemented.

Spoiler

There's the base class, GUIControl.
It's polymorphic. You can give any kind of control to a functin that expects a GUICOntrol, but then you can cast it using .AsButton, .AsList, whatever. And then you can use it as that specific type of control, with the methods that come with it.

[close]

Now imagine if AGS was providing a generic new type :

Code: ags


DataStructure* //the equivalent to GUIControl*

Then you would have several types of standard data structures:

Code: ags


DataArray //the basic wrapper to dynamic arrays. It offers 'count' (aka 'length') conveniency
DataDictionary //for quickly matching keys and values. Quick reading of values
DataList //For easily inserting or removing items anywhere in the list
DataStack //for pushing and popping shit
DataHeap //you know what a heap is
DataHashMap //Expensive in terms of space, but storing and reading values is almost immediate ( O(1) )

Then you would do this:

Code: ags


DataStack* myStack;
DataDictionnary* myDictionnary;
DataStructure* myDataStructure = myStack; //Just like you would do GUIControl* c = myButton

//you can also use it in arrays of mixed data structures types : 
DataStructure* myDataArray[10]; //just like you can do GUIControl* myControlsArray[10];
myDataArray[0] = myStack;
myDataArray[1] = myDictionnary;

Casting types would also be permitted (again: like it's already possible with CUIControls) :

Code: ags


DataStructure* myDataStructure1 = myDataArray[0];
DataStructure* myDataStructure2 = myDataArray[1];

myStack = myDataStructure1.AsStack; //throws an error or returns null if myDataStructure1 is not an actual stack. Just like ".AsButton" does
myDictionnary = myDataStructure2.AsDictionnary; //throws an error or returns null if myDataStructure2 is not an actual dictionnary. Just like ".AsListBox" does

Instanciation would look like what exists for DynamicSprites :

Code: ags


DataStack* myStack = DataStack.Create(10); //initial size: 10

Each data structure type would have the usual functions know to each of them :

Code: ags


//Stack : 
   - ID (a unique int value, just like DynamicSprites or GUIControls have one)
   - create (initial size)
   - push (value)
   - pop (value)
   - count

//dictionary : 
   - ID (a unique int value, just like DynamicSprites or GUIControls have one)
   - Create
   - AddValue(key, value)
   - GetValue(key)
   - KeyExists(key)
   - ...

//array:
   - ID (a unique int value, just like DynamicSprites or GUIControls have one)
   - Create (initial size)
   - Add (at the end)
   - Insert (arbitrary place)
   - Sort (by value if contains [i]int[/i], by alphabetical order if it contains [i]String[/i], by ID other wise) 
   - ...

Now all of that is, I suppose, well within reach if we stick to structures meant to store int.

It could get more complicated for structures meant to contain several types of values:

Spoiler

It might be needed to duplicate all the prototypes of all the functions :

Code: ags


DataDictionnary.Create_int
DataDictionnary.Create_String
DataDictionnary.Create_DataStack

DataDictionnary.AddValue_int
DataDictionnary.AddValue_String
DataDictionnary.AddValue_DataStack

...But I suppose that code wizards could use the code generation features of their favorite IDE (such as Visual Studio with ReSharper) to generate billions of functions without needing to write much code.

[close]

The beauty of this is that, apart from the wrapper AGS classes, not much coding would be required, since everything else is already implemented within the underlying language and/or libraries. Don't tell me that Allegro or whatever does not offer a Dictionnary class that already implements all the functions I suggested?

So what do you think of that?

Radiant · Mon 13/02/2017 09:57:44

Frankly, this sounds like something that should be in a module (that's also what it is in languages like C++). This is just not something developers frequently need when designing adventure games.

Monsieur OUXX · Mon 13/02/2017 10:24:11

Quote from: Radiant on Mon 13/02/2017 09:57:44
Frankly, this sounds like something that should be in a module (that's also what it is in languages like C++). This is just not something developers frequently need when designing adventure games.

I addressed all this in the first "hidden" section. I frankly think the exact opposite, both from a functional perspective and a technical perspective.

I do need that all the time, and it's clearly a glass ceiling that prevents from achieving powerful modules in AGS (for example the source code of the "tweens" module would be 50% more compact with such classes -- and we can consider ourselves extremely lucky that Edmundito is a good enough programmer to have worked around those issues with hard work and persistence. Otherwise we wouldn't have tweens).

Consider the Stacks module by monkey0506. It's quite powerful, and it is a good starting point to achieve what I'm describing. And yet : it's hard to go beyond one "level" of storing those structures into arrays, because they're unmanaged structs. On top of that, the module relies on lots of function calls to get/set/convert. Performance-wise, it's not very good (stil because it's AGS script handling all that). Whereas a simple wrapper from AGS to the underlying C++ would go at the speed of light.

About C++

Spoiler

This example is not very relevant because, by nature, C++ is a continuation of C, which was historically designed to loosely follow underlying hardware structures. It was meant to be the best compromise between assembly and high-level languages. That's why data structures are (by choice) out of its scope. But every "modern" language offers such structures natively : Java, C#, Python, etc.

Also C++ has something that AGS doesn't have, and I will repeat it again and again: pointers. I'm not saying that AGS needs pointers, I'm saying that in C++ pointers allow virtually infinite re-usability of components, since you can store anything into anything without being annoyed by casting types.

[close]

Crimson Wizard · Mon 13/02/2017 19:20:45

I wanted to address this post from different angle, but since you probably expect opinions on your suggestion, I will start with that. Also, after re-reading your post several times, I kinda altered my mind to some degree.

In regards to your workaround suggestion

First of all, I must confess that I do not understand your idea about being able to cast one container into another. I think you need to elaborate on that. I have certain mild guess, but I do not like to guess in occasions like these.

Now, I am not directly opposing idea of built-in containers, in fact rewriting containers in scripts may be a tough task, because that is not always trivial on its own (even assuming AGS script provides necessary code constructs), and not trivial to achieve good efficiency, while STL we use in he engine already has everything. Furthermore, built-in containers mean that you mahe have them right out of the AGS installation.

On the other hand, I strongly oppose the idea of copy&pasted code for each separate type. Not only this is looking bad, but that will be hell to maintain in the future. And what will you do if you need to keep user objects there? And how do you get what types of objects the container stores, at all?

That said, I'd rather assume, that we may consider creating a set of containers for Integer type, as most useful type of all, that can keep both values and IDs, keeping in mind a plan to deprecate and replace these containers with proper generic/template containers in some "ideal future".

BTW, have you actually thought about using plugins? C++ plugin can do same thing, because plugins can declare new managed types.

In regards to your ideal (or "natural") suggestion

You say that:

Quote
... to allow pointers to unmanaged structures, allow (dynamic) arrays of them, and let arrays of structs store such dynamic arrays

I need to elaborate that having pointers to unmanaged structures will automatically require to make them managed, otherwise -- all kind of memory access problems all the time (imagine someone storing a pointer to local variable). Managed pointers were introduced (in C#, Java and AGS) for these reasons too (among others).

Current most wanted changes to script are:
- being able to store managed pointers inside managed (user) structs. This alone solves tonn of problems.
- being able to get length of dynamic array. Self-explanatory.

An interesting part is storing dynamic arrays inside dynamic arrays. Technically, from engine's side, there is literally no reason why these could not exist. Dynamic arrays, being managed objects themselves, are perfectly able to store other managed object pointers (dynamic array of character pointers, or dynamic sprites, for instance).
So it seems to me that this is rather a compiler's limitation, because it does not have proper syntax for declaring array of arrays of arrays of some objects inside them (i.e. multi-dimensional arrays or templates/generics).

In regards to your motive of choosing workaround (warning: big rant ahead)

Quote
I say difficult, because it requires deep modifications of the script parser and compiler. Only someone with a good understanding of the AGS parser classes and the virtual machine pseudo-code can do that.
I'm sure it will come eventually, it's on its way. CW and the boys are working hard on it. But there's no guarantee.

You are asking to implement less difficult solution, but who is going to work on it? "CW and boys" too, is not it? Which means that, if we agree, we would have to drop other things, and spend time on this limited workaround instead of working on better solutions.

There was a topic in General discussions titled "What grinds my gears"; this is how I learnt this idiomatic expression.

So, what grinds my gears all this time, actually, is that this was supposed to be community project, but it became one-two-three men project, which is sad, frustrating and annoying. Because the biggest difficulty we are facing is not the difficult code (it may become easier when you study it, and certainly after you clean it), but the lack of human resources. This is genuinely amazing (and not amazing in a good way), that we few have to do EVERYTHING, even something that could be done by less experienced coders, and even something that does not require coders.

Like check this manual thread, for instance: http://www.adventuregamestudio.co.uk/forums/index.php?topic=51340.0
It is two years old now (and there was another one even before that). What prevented the "main" developers to solve that? Probably we were busy with some objectively or subjectively more critical issues. But what prevented OTHERS from working on that? Can someone tell?
At some point I was so frustrated that I wrote a post there... which I later deleted, but I wrote another one (here is the link). In that post I basically asked -- "is the reason why no one is interested in working on this - that people are expecting for me to do something first? And what is that?"
I got couple of replies to that post, but neither lead to any actual work done, nor anyone actually answered to my most serious question of that moment.

Because neither back then neither now I can comprehend, how, in the presence of numerous people who can write pretty advanced game scripts, there are zero volunteers to help us work on AGS. Not necessarily difficult parts, but at least some parts, which would let us focus on some important issues. Is it because people are afraid? Or expecting me to say or do something?

What I am trying to say... You are telling that going "ideal" way requires "good understanding of the AGS parser classes" etc, assuming that only me, or Gurok can do this. But that is ridiculous! Of course we will be only ones who can do this -- and yet not have enough free time to do this -- if no one else even TRIES to also get that understanding of AGS parser and things.

Mind you, that when JJS was making his ports, and when I was researching and refactoring script virtual machine, and when Gurok was researching script parser -- we had no source of knowledge whatsoever, no one to ask, we could only rely on ourselves, trying to figure out what does the code mean and how it works.

But now, for instance, if someone would like to know how engine's virtual machine works, I can explain this to them. I already (and still) have that knowledge (part of it is already written in the wiki). So for this potential newcomer, it would be times easier to get a grasp of that code. It's just that no one is interested in getting it!!

Monsieur OUXX · Mon 13/02/2017 20:39:44

thanks for taking time to think about all this.
I'll just address the additional questions you raised:

- about plugins : we made a choice not to use them for our game, because we don't have the skills/maintenance workforce to ensure final compatibility on all the targetted platforms. (I would have difficulties making one for Windows, because like everything there's a learning curve, especially in C++, but then I'd probably go insane trying to compile it for Linux, and then trying to figure out it's possible at all to make it work on Android, iPhone, etc.)

- about managed/unmanaged : I keep mixing up the two words, so I probably meant "managed". As in : custom structs. My point being : you can't have pointers on those. You addressed the core issue with this sentence : "Current most wanted changes to script are: being able to store managed pointers inside managed (user) structs. This alone solves tons of problems." --> Well, if you think this is within reach, then go for it and skip my suggestion. I'd be deliriously happy.

- about base type "DataStructure": I can't give you a solid reason right now why it should create a bridge between all the subtypes, it's just my instinct. If that makes no sense, then just drop that aspect. If each of those types instances have an int ID, then the rest is secondary for manipulating and storing them. That said, if it turns out to be like DynamicSprite, where an instance gets disposed of if a pointer to it is not stored, then it would be much simpler for the programmer to keep an array of all of them together, rather than an array for each type.

Monsieur OUXX · Tue 14/02/2017 09:12:32

Haha i didnt have time to read the second part of your message, I do it only now. Sorry for contributing to grinding your gears! Your ranting is perfectly legitimate.
About the team project: truth is, I'm not good enough in C++. I can program in C++, but not well enough. Believe it or not, yesterday I pulled the engine project from git to see if I could start a branch that does what I'm suggesting. I haven't been too far yet, but like every time I have a feeling I'll get overwhelmed. That said, considering how much time I've wasted on those stupid data structures, the engine C+++ learning curve could be worth it .

About the online manual: another funny coincidence: yesterday a friend of mine stumbled upon an online manual for AGS. I was puzzled as I thought this actually never went through, bit it seems rather official. That's excellent news. But if your concern is the manual not being up to date, then you're totally right, people like me should work on updating it. I'll see what I can do. -- I forgot literally everything I learnt from the help file generation, from the time I worked on it.

Crimson Wizard · Tue 14/02/2017 12:33:19

Since monkey's Stack module was mentioned here, I think I'd mention my own module I wrote back in 2010 to be used in conjunction with Stack:
http://www.adventuregamestudio.co.uk/forums/index.php?topic=41243.msg545630

Module is called StructStream and what it does basically is encoding a list of varied parameters into string. Such strings then may be stored inside other strings (encoded similarily) or with dynamic arrays, monkey's Stack, etc.

While it does not solve speed issue, I think it is convenient to use if you have a complicated data structure.

Monsieur OUXX · Wed 15/02/2017 14:49:26

Quote from: Crimson Wizard on Tue 14/02/2017 12:33:19
Since monkey's Stack module was mentioned here, I think I'd mention my own module I wrote back in 2010 to be used in conjunction with Stack:
http://www.adventuregamestudio.co.uk/forums/index.php?topic=41243.msg545630

Module is called StructStream and what it does basically is encoding a list of varied parameters into string. Such strings then may be stored inside other strings (encoded similarily) or with dynamic arrays, monkey's Stack, etc.

While it does not solve speed issue, I think it is convenient to use if you have a complicated data structure.

Thanks for the link. I know that module, it's definitely a cool module based on a cool concept. And every now and then I consider using it -- but then I remember that it wouldn't work well in a large-scale use (large arrays), because all the String conversion would impact performance.

Monsieur OUXX · Mon 20/02/2017 10:17:21

I forgot to mention: another (relatively easy) way to unlock most of AGS issues would be to let arrays be passed by reference into functions, instead of by copy.

Crimson Wizard · Mon 20/02/2017 10:53:42

Quote from: Monsieur OUXX on Mon 20/02/2017 10:17:21
I forgot to mention: another (relatively easy) way to unlock most of AGS issues would be to let arrays be passed by reference into functions, instead of by copy.

Dynamic arrays can be passed by reference. They can also be returned from function too.
Non-managed arrays cannot be passed by reference, for the same reason I mentioned above - AGS script supports only managed references.

Actually, there is no way to pass arrays by copy in AGS.

Monsieur OUXX · Tue 21/02/2017 10:22:48

if by "managed arrays" you mean "arrays (whether static or dynamic) of managed types", then let's forget about those for a moment. They are of limited interest for the paradigm that i'm trying to address here.
The really important arrays here are arrays of unamanaged types. arrays of int for general use, arrays of String for dictionaries, arrays of float for coordinates, etc. Those are indeed always passed by-copy.
Anyway, you get my point. I was just suggesting another "relatively easy" way of unlocking some difficulties in AGS scripting.

Spoiler

Just to avoid a misunderstanding in "programmers talk" -- yes, the pointer's array is passed by reference, but then if you change values inside the function it does not reflect back when the array is returned from it. So in the end you may say it's passed by copy.

[close]

EDIT I'm so confused right now. I recal running tests some time ago to check if arrays of unmanaged types were passed by copy or by ref, and I came to the conclusion that they were passed by copy.
I even wrote it here (and I got conforted in my result by the fact that nobody contradicted me)

I just ran this test :

Code: ags


void dummyFunc(int ar2[])
{
    ar2[0]=88; ar2[1]=99;
}

void Test2()
{
    int ar[] = new int[2];  ar[0]=66;  ar[1]=77;
    dummyFunc(ar);
    Console.W("values after call: %d, %d", ar[0],  ar[1]);

}

...And indeed the console shows : "values after call: 88, 99"

Maybe it's because I ran my tests on Strings back then, and made a mistake. I'll never know.
It's still not possible to reference an array from another array, ~~but it's made unnecessary~~ since arrays can be passed around and worked on, with no limits, very cheaply. Now I'm praying not to trip on some unexpected AGS bullshit (you know, something like "even though we just pass the array by reference, the function call is still proportionally slow to the number of items in the array"), but I can't see why something like that should arise.

EDIT: passing arrays by reference does not solve every problem (I'll update with a new post)

Crimson Wizard · Tue 21/02/2017 10:37:22

Quote from: Monsieur OUXX on Tue 21/02/2017 10:22:48
The really important arrays here are arrays of unamanaged types. arrays of int for general use, arrays of String for dictionaries, arrays of float for coordinates, etc. Those are indeed always passed by-copy.

By managed array I mean dynamic array.
In AGS arrays are never supposed to be passed by copy, only by reference (and only managed, aka dynamic ones).

Quote from: Monsieur OUXX on Tue 21/02/2017 10:22:48
Just to avoid a misunderstanding in "programmers talk" -- yes, the pointer's array is passed by reference, but then if you change values inside the function it does not reflect back when the array is returned from it. So in the end you may say it's passed by copy.

If what you say is true, then this is very strange and should not be happening. I will check that out.

Monsieur OUXX · Tue 21/02/2017 10:39:07

while you were writing I ran a test and edited my previous message

Crimson Wizard · Tue 21/02/2017 10:43:57

Quote from: Monsieur OUXX on Tue 21/02/2017 10:22:48
I'm so confused right now. I recal running tests some time ago to check if arrays of unmanaged types were passed by copy or by ref, and I came to the conclusion that they were passed by copy.
I even wrote it here (and I got conforted in my result by the fact that nobody contradicted me)

I might have missed that one. Also, TBH I did not check that table very carefully at the first time, because I was very busy back then (except for making few
remarks on user managed types).

We should review this table for the latest 3.4.0 release I guess. Also, maybe make it sticky topic?

Monsieur OUXX · Tue 21/02/2017 12:58:28

Quote from: Crimson Wizard on Tue 21/02/2017 10:43:57
We should review this table for the latest 3.4.0 release I guess. Also, maybe make it sticky topic?

i have edited it to update with the info I got here.
and I'm using 3.4 patch 2 for my tests

Monsieur OUXX · Fri 24/02/2017 09:23:10

Just for the record: passing arrays around by reference does not solve the core issue I raised, but it does help on some things:
- it's still not possible to store arrays into arrays. (except using the StructStream trick, which is rather expensive)
- But it does help separating data from functions (a complex structure stored into an array can still be passed around to several modules for a very low cost).
- It also helps in another way (at least, it facilitates) : passing arrays by reference facilitates emulating the storage of an array (or a pointer to that array) into a container array. For example, it's not too expensive to store the index of a dictionnary in the first half of the dynamic array, and the values in the second half. then it's easy to make the index "point" to the values, within the container array. However beyond a certain level of complexity, it becomes expensive and inconvenient again.

Crimson Wizard · Fri 24/02/2017 12:23:46

When you say "expensive" you need to be more specific. Naturally, copying stuff, and parsing strings to extract some data is relatively more expensive than passing data by pointers and accessing them from proper containers. But how does it look in absolute measurement? And for human players?

For the reference, I used monkey0506's vector class, and a prototype functionality of my structstream module to make an arcade game. That game did have about 8 or 10 enemies on screen simultaneously (plus their shots, which were temporary objects), that were moving, attacking, updating, playing distinct animations on various events, etc. And on every game tick in repeatedly_execute I unpacked their data from the string, and then put it back if I had to modify it. And I never noticed any slowdowns.

In the end, I thought that using strings to store data was more inconvenient than slow, because frankly with the number of elements and data complexity I had it would be simplier to just have fixed-sized arrays. But, well... that was all for an experiment.

Monsieur OUXX · Mon 27/02/2017 09:29:26

Quote from: Crimson Wizard on Fri 24/02/2017 12:23:46
When you say "expensive" you need to be more specific. Naturally, copying stuff, and parsing strings to extract some data is relatively more expensive than passing data by pointers and accessing them from proper containers. But how does it look in absolute measurement? And for human players?

For the reference, I used monkey0506's vector class, and a prototype functionality of my structstream module to make an arcade game. That game did have about 8 or 10 enemies on screen simultaneously (plus their shots, which were temporary objects), that were moving, attacking, updating, playing distinct animations on various events, etc. And on every game tick in repeatedly_execute I unpacked their data from the string, and then put it back if I had to modify it. And I never noticed any slowdowns.

In the end, I thought that using strings to store data was more inconvenient than slow, because frankly with the number of elements and data complexity I had it would be simplier to just have fixed-sized arrays. But, well... that was all for an experiment.

[not saying that in an arrogant or condescending way] My statement was something about the language abilities, therefore I'd rather not enter detailed discussions about performance because it would lead us away from that. I'll just say that I'm skeptical about storing/reading an int into/from an array being on the same performance scale as performing Substring operations. But apart frmo that it meets the requirement to sort arrays into arrays.