arrays of arrays : what should be the syntax?

Started by Monsieur OUXX, Thu 16/03/2017 16:38:53

Previous topic - Next topic

Monsieur OUXX

Let's imagine for a second that I branched from the AGS engine, and let's imagine that I manage to make heads or tails of the interpreter's memory management.  :-D

If we decide that there was no good reason for a managed struct to store a dynamic array, and if we go further along those lines and allow to store an array into another array, then... what should be the syntax?
Code: ags

int array[] = new int[100];
int* arrayOfArrays[] = new int*[10]; //this?
arrayOfArrays[0] = array;
int* arrayPointer = array; //this?


Also, as you know, C++ can manage pointers and references. The two are very similar but have some differences (I won't detail here). A long time ago, Chris Jones has made the choice of making referencing go down the "pointers" road instead of the "references" road (except in some very specific cases). I suppose that felt more natural for something that spawned from C then C++.
However, if I implemented references, how woud you picture it? Would there be some syntax traps caused by AGS' C-like-but-not-completely syntax?
Actually maybe it's already there. I never ever try to pass structs (managed or unmanaged) as parameters to functions, except for built-in ones. I have no clue.


 

Crimson Wizard

#1
Quote from: Monsieur OUXX on Thu 16/03/2017 16:38:53
Let's imagine for a second that I branched from the AGS engine, and let's imagine that I manage to make heads or tails of the interpreter's memory management.  :-D

From what I remember, interpreter already supports dynamic arrays in dynamic arrays. If that is correct (I am 95% sure), then it is primarily a matter of making compiler work with them.
But, yes, to edit either compiler or interpreter you would need to learn how AGS assembly language works...


Regarding references and pointers, to be frank, I am very strongly against C++like syntax and features and in favour of more C# or Java like features and syntax. When OO was intriduced in AGS, the script was built around idea of safe memory management, including garbage collector. If you add something like real references, you open a path for all kinds of memory errors (like returning pointer/reference to local variables from a function, and similar).

Perhaps introducing something similar to C# "ref" and "out" methods of passing variable into function could work better... just a thought.

Monsieur OUXX

#2
By references I mean C# and Java-like references. Not the memory-leakish pointerish sort.

But my initial queston still stands: how do we formally (synctactically) manage arrays of arrays? It doesn't exist in C# and Java (apart from 2D arrays, like int[][], which is something relatively different) because they use templates (List<int>).
In javascript they just do this: var array1 = []; var array2 = []; array1[0] = array2;



On an unrelated note: the new compiler (on the Editor side) is wonderfully written. The tokenizer, precompiler, etc. are really easy to read. That's pretty cool. I must say. Amognst many things, i liked the way it's piling up modifiers until there's an encounter with an actual attribute or function. It maintains the "read tokens left to right" stream without adding unnecessary states to the syntax state machine.

 

Crimson Wizard

#3
I have a question, why do you think that int[][] syntax is not suitable here? I was going to suggest that at the first thought.

Relating to existing syntax, in AGS script int* a would mean managed pointer to integer, and int* arr[] - would mean array of managed pointers to integer.

Compare to Character *a which means pointer to one character, and Character *arr[] meaning array of pointers to character.

Monsieur OUXX

Quote from: Crimson Wizard on Thu 16/03/2017 19:46:08
in AGS script int* a would mean managed pointer to integer, and int* arr[] - would mean array of managed pointers to integer.
Yes but as you know, in C/C++, an int* is both a pointer to an int and an array. The symbols [] are used "only for declaration", like a synctatic trick, but the "real" type of an array of int, to manipulate it afterwards, is int*. (That is explained very clumsily, I hope you understand what I mean).

I don't know exactly why int[][] is not suitable here. Just my guts telling me that if C/C++ made "[][]" mean something else (2D arrays), then it would mean that there is a core incompatibility between those syntaxes. An ambiguity somewhere, either in parsing or in meaning, that we don't detect at first sight (you know how C++ can be tricky synctatically). But if you think I'm wrong, then why not. I find it indeed less confusing.

I think I'm just completely forgetting my Java. I forgot that [] is used all the time in Java.

I think my instinct is triggered by the fact that in C you cannot do this :
Code: ags

int[] newArray(int size)
{
    int[] newArray = new int[100];
    return newArray;
}

instead you are forced to do this :
Code: ags

int * newArray(int size){
    int * array = malloc(sizeof(int)*size);
    return array;
}


...But I completely forgot that tehe first syntax is possible in Java and C#.
However do these languages also allow to return int[][] ?



 

Crimson Wizard

#5
QuoteYes but as you know, in C/C++, an int* is both a pointer to an int and an array. The symbols [] are used "only for declaration", like a synctatic trick, but the "real" type of an array of int, to manipulate it afterwards, is int*. (That is explained very clumsily, I hope you understand what I mean).

While I know what you mean, but formally that is not true. In C/C++ int* is always nothing else than a pointer. Indeed it may be used as a pointer to the first element of an array, thus allowing to access whole array with certain offsets, but that does not make "int*" stand for double meaning. It has only one meaning - a pointer to "int" in memory. Nothing in this tells that there is any array. There may be one or not... in fact, there may be completely unlreated object there, or even gigabytes of unallocated memory :tongue:.


"[]" syntax can be used as function argument, although AFAIK these are not used very often (usually people prefer to have either pointer + size, or a reference to struct/class that groups these, e.g. like std::vector):
https://www.tutorialspoint.com/cprogramming/c_passing_arrays_to_functions.htm


On other hand, like I said, I do not like comparisons of AGS Script to C/C++, because there is more of a C# or Java in how it deals with pointers, in my opinion. It's just that Chris Jones decided to declare pointers as "*", which is kind of syntax mix, and may be confusing if you start comparing to known languages.
What I mean to say, I do not see much reason to look at C when planning AGS syntax, and ask what something would mean in C. Because AGS script is really not C.

I do not remember Java much, but in C# "int[][][]" syntax means "array of arrays", where each nested array can be of different size.
Meanwhile "[,,,]" syntax means multidimensional array (where all dimensions are determined at the very creation):
http://stackoverflow.com/questions/12567329/multidimensional-array-vs
https://msdn.microsoft.com/en-us/library/2yd9wwz4.aspx

Looking at your script example in the first post, array of arrays is what you perhaps want.



//----------------------------------------------------------------------

EDIT: A day later I am wondering if my argument about C/C++ pointers is clear, so I decided to add a little elaboration:
Spoiler
What I mean is that differences in C/C++ vs C# syntax is very much related to the difference between unmanaged vs managed memory.
In C/C++ there is no such built-in thing as "array", whatsoever. Even with static array ( int a[10]; ) - that is just an allocation of memory chunk on stack (or heap).
"int *a" is just a pointer to some memory address, and neither compiler, nor runtime enviroment cannot tell what is the meaning of the memory allocated there: is it a single variable, or array, or struct, etc - program does not care. Only programmer knows what it supposed to be (or rather... assumes).

For instance, you can do this:
Code: cpp

struct MyStruct
{
    int a,b,c,d,e,f,g;
}

void main()
{
    MyStruct obj;
    int *a = &obj.a;
    int f = a[5];
}

And that will WORK! Although I did not create any arrays.

The conceptual difference between unmanaged and managed memory languages is:
- In unmanaged language YOU (the programmer) tell the program how to treat the pointed memory, and machine abides, as much as it can without breaking something.
- In managed language PROGRAM tells you what the pointer is pointing to. And it protects such object from misuse (to some degree).

This is why, there is no and cannot be any "array" type of object in unmanaged memory. There are simply "chunks of bytes", and various ways to interpret them.
That is the main reason why I think it is not a good idea to refer to C or C++ when developing AGS script.
[close]

Monsieur OUXX

You're very clear.
On the back of my head I have that vague idea that some day, the use of "*" should be completely removed, to stop making it look like these are pointers, since they are more like C#/Java References.
Anyways, back on topic : I'll toy aroun din C# with [][] to see if I'm not overlooking something (I usually use List<int>) and then see what I can do.
On the engine side, the code is much less clear to me (not to mention it's C++, with which I used to be much more comfortable a long time ago). Maybe you could point me towards the place where "a pointer to int" is read (and interpreted) from the compiled script?
 

Crimson Wizard

#7
Quote from: Monsieur OUXX on Fri 17/03/2017 09:28:50
On the engine side, the code is much less clear to me (not to mention it's C++, with which I used to be much more comfortable a long time ago). Maybe you could point me towards the place where "a pointer to int" is read (and interpreted) from the compiled script?

I was planning to write an explanation of AGS byte-code and script interpreter for a long time; I hope to do some parts on this weekend. The main problem is that there is no explicit "get value by this pointer" action, but a number of sequential instructions, emulating x86 processor work.
That looks like something very scary at first, but I believe it is possible to understand over time, especially with a reference chart of some kind.
But... I'd rather not go too far in there until I have an opportunity to write an elaborate text.

Monsieur OUXX

#8
Found it : ccInstance::Run

Something that puzzles me is this:
If the purpose of the byte codes is to map the x86 instructions, or some specific CPU instructions, then I suppose that originally CJ would directly let DOS or whatever low-level system execute that byte code where possible (by copying the instructions into the registers, or DOS interruptions?) -- and execute some "meta" code the rest of the time.

But nowadays that doesn't make much sense does it? it's like writing an assembly compiler manually. Wouldn't it be simpler, considering AGS syntax is almost exactly C++, to let the compiler compile it to some sort of DLL with a big interface containing all our script custom functions -- that the engine could just call? We wouldn't have to care about syntax or optimization or "escape" instructions at all anymore.

(Oh, by the way, don't worry, I'm not having a rewritis attack . I'm not saying that everything should be thrown to the bin. I'm just wondering that's all)

 

Crimson Wizard

#9
Quote from: Monsieur OUXX on Fri 17/03/2017 13:18:51
But nowadays that doesn't make much sense does it? it's like writing an assembly compiler manually. Wouldn't it be simpler, considering AGS syntax is almost exactly C++, to let the compiler compile it to some sort of DLL with a big interface containing all our script custom functions -- that the engine could just call? We wouldn't have to care about syntax or optimization or "escape" instructions at all anymore.

Uhhh, you keep saying that AGS is almost like C++, but it is not! Syntax is not everything. C++ is unmanaged language, but AGS script is (well, relatively).

Anyway, there are two issues here.

Firstly, having a script interpreter written in a x86-processor fashion did not make big sense from the very beginning, because the instruction list gets bloated and execution is slower than it could be. For example, LUA script run with "LUA for AGS" plugin is faster, because LUA's engine is actually optimized for the script language.
IDK what Chris Jones plans were, I tend to assume, that he simply thought that using existing execution algorithm (from x86 proc) would make it easier for him to program it.

BTW, I do not know if instructions in AGS script actually map to x86 instructions by their codes.


Regarding compiling as DLL... this is like saying: let's write AGS scripts as C++ plugins. Yes, it's an option, Unreal Engine 4 does something like that.
Consequences are:
- you screw up every AGS user who does not know how to program in C++;
- game developer will have to compile his game on exactly the system that needs to run it (because, native code...).

EDIT ..Hmmm.. I just realized that by "let the compiler compile it to some sort of DLL" you could mean something different?


On other hand, we could allow AGS script in C#, like Unity does. Or we could take Lua-for-AGS plugin and incorporate it so that AGS will be scripted in Lua from now on.
At one point I was mentioning a possibility to make "script interpreter" plugin interface that would let to attach other script engines and write script in anything you want.



Monsieur OUXX

#10
I said C++ but indeed I meant C# or any language that is "managed". (I keep bringing up C++ because AGS brings back a flavor of old C).
Apart from that, you got me : yes, writing AGS scripts would be  like writing plugins (in a managed language!) EXCEPT we could maintain the AGS compiler as some sort of syntax checker that would force to use only basic C# features that are very similar to current AGS language. any language feature that could cause risks of memory leaks or unnecessarily-complex debugging would be forbidden. To keep it as a script-like language.
Imagine the Lua plugin, just with C# syntax if you will.

But then again, I'm just daydreaming. I'm not pushing for that kind of deep refactoring. Unless you think that incorporating the Lua plugin (or any C-like syntax language -- Lua is weird) is within reach. Then yes, that would be marvellous and, in my humble opinion, the best option on the long run.
 

Crimson Wizard

Posted couple of articles to project Wiki today, that may be useful for researching how script code works. One of them is a rewrite of a short notes previously created by JJS:
https://github.com/adventuregamestudio/ags/wiki/Script-memory
https://github.com/adventuregamestudio/ags/wiki/Script-execution

Monsieur OUXX

#12
I'm having a look. Very interesting!

I've seen in the code that it refers to "instances".
Just to be sure: at one given time of the execution of a game, how many instances are there?
- Is it sorrect to say that these "instances" have nothing to do with the potential multiple execution threads?
- Or are they precisely these execution threads?
 

Crimson Wizard

#13
Quote from: Monsieur OUXX on Mon 20/03/2017 12:30:24
Just to be sure: at one given time of the execution of a game, how many instances are there?
- Is it sorrect to say that these "instances" have nothing to do with the potential multiple execution threads?
- Or are they precisely these execution threads?

They are sort of potentially execution threads, and one is created per each script module, but AFAIK only one runs at a time. I recall seeing some code related to multithreading plans, but it was never actually used that way.

The functions like start_game, repeatedly_execute, etc, are called with "RunScriptFunctionIfExists" and "DoRunScriptFuncCantBlock"; if you find them in code you will see that they are run in strict sequence, one module at a time.

The only case at which this instance id is used is SCMD_CALLAS instruction (calling script function from another module), so this ID is actually module ID. And, ironically, even then execution does not switch to another instance, instead only code address is switched in current instance and it continues executing "external" block of code.


PS. Also, it looks like some things there are obsolete after we did some rewrite to the interpreter. I will fix them later.

Monsieur OUXX

So in effect it acts more like a "scope".
 

SMF spam blocked by CleanTalk