So, let's make the sequence irrelevant in which functions are defined

Started by fernewelten, Fri 01/02/2019 00:45:35

Previous topic - Next topic

fernewelten

Currently, functions in AGS must be ordered with an important restriction in mind: At every point in the source code, we can only call functions that have been defined above that point. This is unfortunate in two ways:

  • Firstly, many programmers like to order the functions differently, for instance to keep similar functions near each other.
  • Secondly, in many algorithms there are functions that mutually call each other: a function A calls a function B and function B calls function A â€" this is known as "mutual recursion". With the restriction as it is now, we can't use any of these algorithms!

In case you're wondering: There's a real-life use case for such algorithms: Right in my very first AGS program ("Commando raid" for the MAGS 2018/5 competition) I wanted to code a maze dynamically. The easiest way to do it was using functions that called each other.

To allow for this, other languages such as C or C++ have a mechanism that says, in effect, “There's a function that I will define later on below; but right now, I'm only telling you its parameters so that I can code calls for it.” That's a forward declaration.

I found out to my delight that the AGS language have a forward declaration, too, using the import keyword:
Code: ags
import function A(int);
means that a function A will follow somewhere below that takes one parameter that is an int.

But I was foiled all the same: Turns out that even though AGS does have forward declarations, you still need to order the functions in such a way that any function that calls another one must have its body completely defined below the body of that other one! (wtf)  What a shame!

So why is this so hard for the AGS compiler?

(Technical reason below...)
Spoiler
There are two quite different uses for the import keyword. On the one hand, it can be a forward declaration as described above - it tells the parameters of a local function that will be defined below later on. On the other hand, it can be a declaration that an external function has been defined and exported in another module, so it must be retrieved from there when the program is bundled together.

So when an "import" comes along, the compiler must guess which meaning is meant. And it guesses that an external, imported function is declared. And it prepares everything for an import in all kinds of lists and tables.

Later on, when the compiler encounters an identically named local function that has a body, it goes, “OOPS!! Eeek! Oh my goodness. I screwed up back then. It had been a forward declaration after all. Let's sweep all those wrong preparations under the rug and pretend that that import never happened.” Then it hides all the traces as well as it can.

Trouble is that under the hood, an external function is called in a completely different way than a local function.  So if that “imported” function has already been called when it turns out it was local after all, it's too late for the compiler to clean up the mess. So that's a technical reason why the compiler insists that all the local functions must have been defined with their bodies before they are called.
[close]

How can we fix that?

(Proposed technical solution below)
Spoiler
Let's first eliminate the “oops” effect, shall we? The compiler would need to quickly scan the input to see what functions are defined locally; then it can process the input as before. Whenever it encounters an “import” statement, it would now know whether it was an import proper or a forward declaration in disguise. So it can avoid doing the wrong preparations from the get-go.

Granted, the compiler would still only know where the local method starts when it had reached and compiled it. So whenever the function is called before that point, the compiler would need to note down somewhere the position of the call in the code. When the body of the function comes along in good time, it could go through its list and patch the address in that it now knows.

We're still not in the clear, because from time to time, the compiler will emit some code, then rip it out of the codebase, then insert it at adifferent place - or even several times at several different places. So the code that keeps that list of calls to be patched needs to be aware of this activity.
[close]

Some snags with this idea:
Spoiler
This approach will only enable arbitrary ordering for functions that reside in the same compilation unit. The reason is that modules must be ordered in such a way that any module that calls another module must be placed below that other module. So you can't have mutual recursion that spans modules.

Also, as of now, you can't have forward declarations in room files -- that is expressly forbidden (I haven't found out the rationale yet, but there's sure to be a very good one). So you can't have mutual recursion in room files, it will only work in globaldata or in custom modules.
[close]


I can offer a “proof-of-concept” implementation on https://github.com/fernewelten/ags in the branch "Mutual_recursion".

(Details below)
Spoiler
It's lined up behind an oversized “monster” pull request that has unfortunately gotten rather huge (I'm at fault for that) and so will take its time, so I'm leaving the proof-of-concept for mutual recursion on my fork for the moment.

I've centralized all the main functionality in the class FuncCallpointMgr of files cs_parser.cpp / cs_parser.h
[close]
What do you think?


fernewelten


Crimson Wizard

#3
Being able to not depend on functions order is of course long wanted feature. I am not sure about the feature title "mutual recursion" though, because that's just one use case, and even more common use case is simply being able to put functions in order of your preference and not bother about it. But that's just my ramblings.

To be honest I am not fully sure what exactly do you suggest from the user perspective. You say:
Quote from: fernewelten on Fri 01/02/2019 00:45:35
The compiler would need to quickly scan the input to see what functions are defined locally; then it can process the input as before. Whenever it encounters an “import” statement, it would now know whether it was an import proper or a forward declaration in disguise. So it can avoid doing the wrong preparations from the get-go.

Granted, the compiler would still only know where the local method starts when it had reached and compiled it. So whenever the function is called before that point, the compiler would need to note down somewhere the position of the call in the code. When the body of the function comes along in good time, it could go through its list and patch the address in that it now knows.

So, do I understand correctly that this will work without any forward declarations for local functions (i.e. C#-style as opposed to C-style, so to speak)?


Quote from: fernewelten on Fri 01/02/2019 00:45:35
Also, as of now, you can't have forward declarations in room files -- that is expressly forbidden (I haven't found out the rationale yet, but there's sure to be a very good one).

Hmm can you elaborate on this, what commands exactly do not work in room script?

fernewelten

#4
Quote from: Crimson Wizard on Sat 02/02/2019 17:12:53
To be honest I am not fully sure what exactly do you suggest from the user perspective. […]
So, do I understand correctly that this will work without any forward declarations for local functions (i.e. C#-style as opposed to C-style, so to speak)?

We _could_ have it this way with moderate effort I think.

So far, my proof-of-concept implementation only adds a "pre-phase" where it skims quickly through the source and collects the names all the locally-defined functions it can find. After the "pre-phase", it backs up to the start of the code and hands over to the old parser code.

We could extend that pre-phase and not only collect the names of the functions, but their headers too. In essence, whenever there's an "int foo(short bar) {" in the code, then don't just collect the "foo", but instead collect the whole "int foo(short bar)". Then, when the "old" parser starts after the pre-phase, it already knows all about all the local functions (except where they start) and can generate proper code when they are called. The sequence of function definitions wouldn't matter any more.

We might need to watch out for strange edge cases if enums or struct definitions are interspersed with the functions, I don't know for sure, but if there are, there will be a way to handle them.

Quote from: fernewelten on Fri 01/02/2019 00:45:35
Also, as of now, you can't have forward declarations in room files -- that is expressly forbidden (I haven't found out the rationale yet, but there's sure to be a very good one).

Quote from: Crimson Wizard on Sat 02/02/2019 17:12:53
Hmm can you elaborate on this, what commands exactly do not work in room script?

That was a false alarm, I think. I found in the code that called the script compiler (file Editor\AGS.Native\ScriptCompiler.cpp around line 75) an option "NOIMPORTOVERRIDE":
Code: c#

ccSetOption(SCOPT_EXPORTALL, 1);
ccSetOption(SCOPT_LINENUMBERS, 1);
// Don't allow them to override imports in the room script
ccSetOption(SCOPT_NOIMPORTOVERRIDE, isRoomScript);
ccSetOption(SCOPT_OLDSTRINGS, !game->Settings->EnforceNewStrings);
if (exceptionToThrow == nullptr)
{
    scrpt = ccCompileText(mainScript, mainScriptName);


However, checking the code of the script compiler itself (including the original one, before I rewrote it), it doesn't query "SCOPT_NOIMPORTOVERRIDE" so the flag is ignored. The context shows that the flag was once used for forbidding that an import declaration was followed by a local function definition after all.


Crimson Wizard

#5
Quote from: fernewelten on Mon 04/02/2019 02:00:09
Quote from: Crimson Wizard on Sat 02/02/2019 17:12:53
To be honest I am not fully sure what exactly do you suggest from the user perspective. […]
So, do I understand correctly that this will work without any forward declarations for local functions (i.e. C#-style as opposed to C-style, so to speak)?

We _could_ have it this way with moderate effort I think.

Yes, that would be awesome if that's going to work!

Quote from: fernewelten on Mon 04/02/2019 02:00:09
Quote from: Crimson Wizard on Sat 02/02/2019 17:12:53
Quote from: fernewelten on Fri 01/02/2019 00:45:35
Also, as of now, you can't have forward declarations in room files -- that is expressly forbidden (I haven't found out the rationale yet, but there's sure to be a very good one).
Hmm can you elaborate on this, what commands exactly do not work in room script?

That was a false alarm, I think. I found in the code that called the script compiler (file Editor\AGS.Native\ScriptCompiler.cpp around line 75) an option "NOIMPORTOVERRIDE":
Code: c#

ccSetOption(SCOPT_EXPORTALL, 1);
ccSetOption(SCOPT_LINENUMBERS, 1);
// Don't allow them to override imports in the room script
ccSetOption(SCOPT_NOIMPORTOVERRIDE, isRoomScript);
ccSetOption(SCOPT_OLDSTRINGS, !game->Settings->EnforceNewStrings);
if (exceptionToThrow == nullptr)
{
    scrpt = ccCompileText(mainScript, mainScriptName);


However, checking the code of the script compiler itself (including the original one, before I rewrote it), it doesn't query "SCOPT_NOIMPORTOVERRIDE" so the flag is ignored. The context shows that the flag was once used for forbidding that an import declaration was followed by a local function definition after all.

Only my guess, but maybe this is to prevent declaring room functions with import keyword? Since there are no other modules that can read and call these except for same room script itself.

fernewelten

Quote from: Crimson Wizard on Mon 04/02/2019 10:36:12
Yes, that would be awesome if that's going to work!

I've looked at the parser code again and I think that the parts that handle function headers are fairly easy to relocate and repurpose. So I suggest I'll have a go at making the sequence irrelevant in which functions are defined.

For now, this would be strictly within each compilation unit.

SMF spam blocked by CleanTalk