Adventure Game Studio

AGS Support => Advanced Technical Forum => Topic started by: Crimson Wizard on Sun 09/04/2023 01:42:22

Title: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Sun 09/04/2023 01:42:22
This question is partially about syntax and partially about a technical issue in AGS.

There is a number of languages that are written right-to-left (Arabic, Persian, Hebrew, few others).

AGS has a built-in "Right to left" text mode, but it looks like it was meant strictly for ASCII method, and is not fully usable with Unicode texts. The thing is that unicode texts may already be written right-to-left (they may contain control characters for this, if I understand correctly).

The problem with AGS though is that the way it wraps the text assumes the text was originally written left-to-right (in script or translation files). In the Right-to-left mode the engine will wrap the text same way as for left-to-right, but then reverse each line separately.

This does not work at all with "proper" R-to-L Unicode texts. As they already are reversed, the line splitting may result in beginning of the sentence appearing on the last line.

My first thought was that a fix should simply be to revert the order of the split lines instead.
Or, more correctly: do the line splitting by scanning the text in reverse.

But then I started considering things like multiple sentences in the same string, or even multiple paragraphs in the same string, and got very confused.

My question is: supposing you have 2 sentences in, say, Arabic or Persian. How are those 2 sentences are written normally? Is their order also reversed? If yes, then the above fix will work... if not, then it won't work, because you will need to have first sentence arranged first (which may take multiple lines), then the second, and so on.

And then there's a case when you have a number of paragraphs in one string. Obviously, the order of paragraphs must not be reversed. But if a single sentence inside a paragraph is wrapped, then wrapping of that particular sentence will have to be reversed...
In other words, there may be groups of lines wrapped in reverse order, but groups themselves have to be arranged in original order...

So, thinking about this, I was coming to an idea that there's no trivial generic solution here.

Or rather, the only "trivial" generic solution is to write R-to-L languages completely left-to-right in the source texts, and let the engine revert them after splitting in lines.

Without this, the engine would have to parse the text's syntax using punctuation, finding where the sentences are (separated with fullstops), and where the paragraphs are (separated with manual linebreaks).
Well, this is not impossible, but at the same time things like that was not done in the AGS engine before.

Does anybody have ideas on this?
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Sun 09/04/2023 07:33:22
I think you've confused yourself there. I am 99% sure that in RTL scripts, everything works about the same as in LTR scripts, except mirrored.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Sun 09/04/2023 12:58:42
I was discussing this late at night on Discord, so may have had strange ideas. I will try to explain my confusion with the images.

If you type Arabic text, using google translation for example, this is how the text appears:
(https://i.imgur.com/HNfA8hp.png)

In English this sais "I love beaches. The water is warm."
Notice there are 2 sentences. Not only the words & letters inside a sentence are revertse R->L, but also sentences themselves are reverse R->L.

Now, if I put this full text into the AGS script as-is, it will look like this:
(https://i.imgur.com/gMLlHga.png)

Is this how it's supposed to be in script and translation file?

OR, should we require the sentence in the L->R order, and only text R->L? Like:
(https://i.imgur.com/2uuxud5.png)


Now, let's assume that there are paragraphs in this string.
(https://i.imgur.com/IsRQxQV.png)
In English this sais: "I love beaches. The water is warm. \n The sun is high. \n"

How do we treat these sections separated by \n? In R->L or L->R order?





In regards to the line-splitting and text wrapping. If everything is treated R->L (sentences are R->L and paragraphs are R->L), then we basically just need to have a reverse splitting algorithm, that is: run the string from back to front and split like that.
That's the easiest solution.



After this there will still be a logical problem of string concatenation, e.g.:
Code (ags) Select
String s ="احب الشواطئ."; // I love beaches.
s = s.Append("الماء الحار."); // The water is warm.
s = s.Append("الشمس عالية."); // The sun is high.
This will be displayed in a wrong order on screen.
But I guess we cannot do anything about it, unless we introduce a R->L aware "String.Concatenate" function?
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Sun 09/04/2023 16:20:28
These sounds like problems with the way AGS and the AGS editor treats RTL strings. To me it seems like the correct behavior is quite clear in each case:

1. A RTL string has its beginning on the right and its end on the left, so obviously the first sentence goes first (that is, on the right).

2. And it is processed from beginning (right) to end (left), just like a LTR string is (from beginning to end, so in that case left to right). OK, so there is a question of how the '/n' separator is parsed, but the way you have written it here, I would guess that there must be an invisible control code that switches from RTL to LTR for that bit, so that it's read:

[5:LTR "/n"],[4:RTL "The sun is high."],[3:LTR "/n"],[2:RTL "The water is warm."],[1:RTL "I love beaches."]

If there is no such control character, and it's all part of the RTL string, I would guess you'd type '/' and then 'n' as normal, but your cursor would be moving "backwards" so that it would appear onscreen as "n/".

3. Here I think you're making the incorrect assumption that a RTL string still begins on the left and ends on the right, and that its internal representation has simply been reversed, so that something concatenated to it is added to its right side. I believe if the strings have been correctly set to RTL, concatenating them will make the added part show up on the left. Similarly, if you took s.Chars[0] in your first example it should return the rightmost character (the one that looks like a bar, '|').

Now, it could very well be that the AGS editor doesn't properly support RTL strings in string literals, so that this bears little resemblance to how it actually works in AGS, but intuitively this is how I think it should treat them. Though it's probably worth verifying that with someone who actually uses a RTL script and is familiar with how it is embedded and edited in otherwise LTR text.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Sun 09/04/2023 17:46:00
So, the important question is, how the characters of these strings are positioned in char array.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Mon 10/04/2023 08:31:14
OK, so I've done a minimal amount if research to check my intuitions, and it does look like I was right.

The correct order in the array is always the reading order. String.Char[0] is the start of the text, and whether that first symbol is displayed on the left or right end of the string depends on the script. If the text is totally RTL, this is no more complex than in LTR: you can use the same layout logic only mirrored. It does get more complex if you have a mix of text chunks going in different directions (bidirectional text), such as if RTL strings are embedded in LTR code. Unicode has an algorithm for determining the layout in those cases: http://unicode.org/faq/bidi.html

If we hack RTL by just reversing the strings so that .Chars[0] is the end of the text, that is going to break ligatures etc., as @Mehrdad reports.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Mon 10/04/2023 09:03:43
EDIT: scrap this, I need to check something out when I have more spare time.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Mon 10/04/2023 15:05:03
Quote from: Snarky on Mon 10/04/2023 08:31:14The correct order in the array is always the reading order.

Before you deleted it, @Crimson Wizard, you asked what this was based on and if I had tested it in AGS. I have not tested in AGS because I'm talking generally about how it is meant to work (specifically in Unicode), whether or not that it how it is currently implemented in AGS.

The Unicode document I linked to states it explicitly or implicitly in a few places (e.g. "No matter how the layout is resolved the order of characters in memory essentially follows the order they are typed."), but I also think it's kind of true more or less by definition: A text string starts at the beginning and ends at the end, so that the first symbol (apart from any metadata like control characters) is the one at index 0. The layout of the string for display is a secondary consideration or formatting convention that is not strictly part of the string itself, comparable to things like font face and line-wrapping. (And like for line-wrapping, it is sometimes necessary or desirable to embed hints/overrides within the string representation to get the correct/preferred appearance, e.g. non-breaking spaces, optional hyphens, explicit line breaks.)

Now, that said, for all I know it is possible that some tools we want to support (perhaps indeed the Scintilla editor) do provide "RTL" text by just reversing the string and treating that as a LTR string. If so, I think some (compile-time?) process to convert it back to the correct representation is required, if we want to do it right.

BTW, does TextBox support RTL input?
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Tue 11/04/2023 20:21:51
@Snarky, I did more actual tests and you are correct. When I copy a "proper" unicode RTL text into the editor, it's saved LTR, in other words, first "syntactical" character appears in Char[0], and rest follow. In other words, it only looks RTL when drawn in the application, but the data is LTR.

I did the test by simply printing Char[0] on a separate label.

What this means is, that if you print a "proper unicode" RTL text, then everything just works, so long as you set "Right-to-left" mode in the settings.

There's still a letter-linking problem, but that's an issue on its own.



The letter linking may be solved by a program called "Parsi Negar" that comes with its own fonts, suited for this particular purpose.
https://leomoon.com/downloads/desktop-apps/leomoon-parsinegar/
This is pointed out by @Mehrdad.

This program somehow converts the text. I am not 100% certain, but I think that it merges the linking letters into a single glyph with special number, and in combination with the special fonts it allows the Persian (and maybe Arabic?) text to be drawn visually correctly in AGS.

The problem with ParsiNegar is that by default it actually stores the text RTL, meaning its already reversed. This does not work with AGS linewrapping, neither LTR not RTL one.

I found that it's possible to workaround by disabling the "reverse" option in this program. If you do, then it seemingly works well in combination with AGS's RTL mode.
The nuance is that the text possibly looks "weird" in the script & translation file. I suspect that it will probably be hard to read by the native language speakers (as it looks literally reversed).

I am currently in the process of checking this out with Mehrdad on Discord. If he thinks that the workaround I found is not convenient for him, then the only option for us would be to introduce a 3rd selection to the "text direction" option, that would assume the text is only reverted in data.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Wed 12/04/2023 08:27:00
I'm glad to hear you're figuring out the ins and outs of it. If I understand what you're saying correctly, I believe there is still one part of your thinking that isn't quite right (at least, it confuses me), and it might be good to clear it up right away:

Quote from: Crimson Wizard on Tue 11/04/2023 20:21:51it only looks RTL when drawn in the application, but the data is LTR.

I think by this you mean that the string is stored with its beginning (i.e. the letter you're supposed to start reading at) at Char[0]. Great, that sounds like how it's meant to be. But it's wrong to say that this means "the data is LTR": the memory doesn't have any particular direction. We are used to arranging consecutive characters from left to right, but that's just habit, not anything inherent in the string or data (or properly designed Unicode string rendering libraries).

If I understand the Unicode documentation correctly, there is normally no explicit information stored in the string about whether it should be displayed LTR or RTL. Instead, each character (code point) is associated with a directionality flag, depending on the convention of the script it belongs to: letters in the Latin, Greek, Cyrillic, etc. alphabets are LTR, while letters in Arabic and Hebrew scripts are RTL. Punctuation characters are typically neutral (can go either way), and take their directionality from the surrounding characters. There are override codes that can force a particular directionality, which can be used for example to clarify if punctuation between a LTR and a RTL text chunk belongs to the preceding or following chunk (if it's the end of the LTR chunk, it should be placed at the right end of that; if it's the start of the RTL chunk, it should be placed at the right end of that).

tl;dr: LTR/RTL is not a property of the way strings are stored in memory, and it gets confusing if we talk as if LTR is the true and natural way memory is ordered. For accuracy and clarity, the question is whether the strings are reversed, so that Char[0] represents the end (as it would be read in that script).
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Wed 12/04/2023 08:38:58
Maybe it would be useful to make a table, to keep track of how AGS treats text that is:

-LTR (we know this works)
-RTL (Unicode)
-bidirectional (Unicode)
-pseudo-RTL (reversed, non-Unicode?)

In the case of text that is:

-Copied into IDE from an external editor
-Entered/edited in IDE
-Entered in translation file (Unicode)

For:
-Display in IDE
-Display in-game

As well as:
-Text entry in game (TextBox)

With AGS set to:
 -RTL mode
-"Normal" mode

Edit: And by "how AGS treats," I mean:
-Do the characters appear in the right order/direction?
-Are lines of text correctly aligned?
-Are ligatures displayed correctly?
-Are string operations performed correctly? (.Append(), .Substring(), etc., as well as string substition tags like %s)
-Does editing work correctly? (in IDE and TextBox), i.e. caret movement on entering/deleting a character
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Mehrdad on Wed 12/04/2023 08:46:21
@Crimson Wizard 
Uncheck reverse is a nice idea and works perfectly in the game. But it doesn't correctly show in Editor and I can't read texts
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: eri0o on Wed 12/04/2023 11:50:15
A small memory dump, Freetype can draw text, but it needs Harfbuzz for ligatures - and they have a circular dependency where both libraries depend on each other, so if you aren't building both from source together, you have to do a weird thing that is build FreeType, build Harfbuzz pointing to FreeType, and then build FreeType again pointing to Harfbuzz. Ligatures also have special functions for emojis - changing color or type of a emoji.

Now, neither support the characters that inverts text, so you need a bidi library. There's also care when doing this to look for the library with the correct license.

Here's the issue for the feature of bidi in SDL_ttf : https://github.com/libsdl-org/SDL_ttf/issues/135

I remember there was also some drawing library (Cairo?) that did have support and it was what chromium used.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Wed 12/04/2023 14:15:37
From the engine's perspective it will of course be desired that there's a minimal amount of settings and "types" of data, and most conversions were done by compilers. For example, if compiler could inverse the "reversed" RTL text before packaging the game.

But in our case this will either be impossible or difficult to achieve, as the game script is allowed to have strings in multiple languages, and scripting language allows to switch Text Direction at runtime (see SetGameOption(OPT_RIGHTTOLEFT) (https://adventuregamestudio.github.io/ags-manual/Globalfunctions_General.html#setgameoption)). This assumes that script strings may contain text of all kinds.

Resolving this at game compilation time would likely require a new preprocessor feature, using some kind of annotations added to the string literals. And that would definitely bring more issues with texts loaded from custom files.

Considering the above, the easiest option at the moment is to add a new mode to OPT_RIGHTTOLEFT, and fix all places where OPT_RIGHTTOLEFT has a meaning, accounting for this new mode.

I.e, OPT_RIGHTTOLEFT will be:
0 - normal LTR;
1 - normal RTL (that is - characters stored in the order of natural reading);
2 - pre-reversed RTL (that is - characters stored against the order of natural reading, and in order of how they are displayed left-to-right).

How the engine handles the new mode 2 is purely internal to the engine. If new font libraries are added later, which may somehow affect the situation, this behavior may be adjusted accordingly.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Thu 13/04/2023 23:15:35
Also, I found something weird again; apparently there are gui controls that never reverse the text in RTL mode: Buttons and ListBoxes.

Basically, only text that supports wrapping does this: Labels, speech, display boxes.

I'm not sure if dialog options or drawing text on drawing surface does this... DrawTextWrapped probably does, since it wraps.

This is like that at least since AGS 3.2.1.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 14/04/2023 00:38:19
The preliminary (unoptimized) version of "reverse RTL" solution code may be found here:
https://github.com/ivan-mogilko/ags-refactoring/tree/361--reversertl

It's unoptimized, because it does double text reverse when splitting (once before the split, and then second time after). I would have to somehow rewrite the splitting algorithm and make it capable of reading forwards and backwards...
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: eri0o on Fri 14/04/2023 01:23:14
Just to try to understand, all the context of what you are talking here is ags3? Because on ags4 (we are unicode only there, right?) we could use a bidi library - it's not simple to integrate and requires some refactoring, but it's possible. (There's also the issue of vertical text with horizontal typesetting as a next hard thing, which requires a bit more thought)
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 14/04/2023 01:32:42
Quote from: eri0o on Fri 14/04/2023 01:23:14Just to try to understand, all the context of what you are talking here is ags3?

Yes, because it needs enhancing unicode feature to let use these certain languages properly; and because other solutions require major changes to the font rendering, to which we do not have any estimate yet.

Quote from: eri0o on Fri 14/04/2023 01:23:14Because on ags4 (we are unicode only there, right?) we could use a bidi library

The case I'm trying to resolve is the "incorrect" unicode RTL text, which is stored in memory in reverse way compared to what it should normally be (it does not have these direction control characters). Because of that it has to be drawn as if it were LTR, but wrapped differently.

The reason why this kind of text data is used is because our font renderer does not have a correct Arabic, Persian etc ligature support. So user is relying on special converter that generates a text represented by a different kind of data, meant to be displayed with very particular fonts (I mentioned this  in a comment above (https://www.adventuregamestudio.co.uk/forums/index.php?msg=636654665)).
If the font renderer was drawing these languages correctly, with proper ligatures, then this converter hack would likely not even be necessary. The proper solution would require at least a replacement of a font renderer to something modern (I guess that's "Harfbuzz" that you mentioned). But until then users of these languages have to resolve to this hack, whether that is ags3 or ags4.

Currently I'm looking for the minimalistic solution to the problem, with as little changes as possible.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Fri 14/04/2023 09:29:05
Quote from: Crimson Wizard on Fri 14/04/2023 01:32:42The case I'm trying to resolve is the "incorrect" unicode RTL text, which is stored in memory in reverse way compared to what it should normally be
Quote from: Crimson Wizard on Fri 14/04/2023 01:32:42The proper solution would require at least a replacement of a font renderer to something modern (I guess that's "Harfbuzz" that you mentioned). But until then users of these languages have to resolve to this hack, whether that is ags3 or ags4.

In that case, I would strongly urge you to consider how much effort to put into a stopgap solution built on a hack (and introducing even more hacks). Because if I understand correctly, this is not a regression or really even a bug: it's just that the very limited hacky support AGS has long had for RTL scripts imposes some inconveniences on devs that makes text formatting difficult (e.g. not being able to rely on automatic wrapping, which simply means they'll have to do so explicitly). So is it really worth it, or would it be better to devote that energy to a proper solution down the road?

Is it possible that some of the inconveniences can even be solved in script without any engine changes, simply by reversing the strings, doing any manipulations on them (including, potentially, finding the linebreak points, reordering the lines and inserting explicit linebreaks), and re-reversing them? (Or not re-reversing them, in the case of some GUI Controls apparently.)
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 14/04/2023 12:30:42
Quote from: Snarky on Fri 14/04/2023 09:29:05So is it really worth it, or would it be better to devote that energy to a proper solution down the road?

Indeed, it would be better to aim a proper solution. The thing is, I don't know when this will be done, in which version etc, and at least few users already asked for this to work (and some been asking for years prior).

Before this whole conversation I was not fully aware of how RTL is stored internally normally, so I needed to test this out anyway.

In regards to the quick solution that I've been testing, so far it is this:
* expand the existing "Right to left mode", adding a new choice (Right-to-left reversed);
* if this option is on, then when doing text splitting scan the text from right to left.
* also set "alignment to right" as with normal RTL.
* everything else stays the same.

I spent 3-4 hours yesterday adding this option and testing things, but got stuck at the line splitting, because the algorithm is hardcoded for left-to-right scanning. So I wanted to look if I will be able to write a "generic" one instead.

Meanwhile, I also found and fixed 2 actual bugs in 3.6.0, so that was not a fully wasted time...


Quote from: Snarky on Fri 14/04/2023 09:29:05Is it possible that some of the inconveniences can even be solved in script without any engine changes, simply by reversing the strings, doing any manipulations on them (including, potentially, finding the linebreak points, reordering the lines and inserting explicit linebreaks), and re-reversing them? (Or not re-reversing them, in the case of some GUI Controls apparently.)

Afaik this is what Mehrdad is currently doing: he puts linebreaks himself, everywhere where necessary.

Another workaround, which I mentioned above, is to keep using this converter program, but turn off the option to "reverse" text during conversion. In this case it works in the game (with normal RTL setting), but it makes all the texts look reverse in script and translation file, obviously making it inconvenient to the user (I am unaware if human can normally read Arabic and Persian in reverse same way as e.g. English can be read in reverse).
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Fri 14/04/2023 12:37:23
Quote from: Crimson Wizard on Fri 14/04/2023 12:30:42Afaik this is what Mehrdad is currently doing: he puts linebreaks himself, everywhere where necessary.

Right, that's the simple solution, but what I meant was that one could also just write a "reversed-pRTL" (pseudo-right-to-left) linebreak algorithm in script (measuring line widths from the end of the string, and reordering the lines so that the end of the string is at the top), rather than implement it in the engine.

Because to me this seems like cruft: special-case code that isn't useful for 99% of users, and adds complications to the engine while not really being the "right" solution to the problem.

(Right-alignment as an option in more GUI Controls would be useful, but shouldn't be tied to a RTL setting.)
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 14/04/2023 12:43:52
Quote from: Snarky on Fri 14/04/2023 12:37:23
Quote from: Crimson Wizard on Fri 14/04/2023 12:30:42Afaik this is what Mehrdad is currently doing: he puts linebreaks himself, everywhere where necessary.

Right, that's the simple solution, but what I meant was that you could also just write a "reversed-pRTL" (pseudo-left-to-right) linebreak algorithm in script (measuring line widths from the end of the string, and reordering the lines so that the end of the string is at the top), rather than implement it in the engine.

I guess you may try that, except you will have to then apply this everywhere throughout the game, for each existing string that may be wrapped:
- when assigning a string to gui labels;
- when calling Display (and variants);
- when calling DrawingSurface.DrawTextWrapped.
- when displaying a speech (so have to write custom speech function);
- when displaying dialog options (so have to write custom dialog option rendering)

This will also make adding e.g. Arabic / Persian translations to existing games quite difficult.

Implementing this as an option to text wrapping algorithm would likely cover this case for the time being (again, I cannot tell when is the new font render estimate, and which new problems that would bring).

Quote from: Snarky on Fri 14/04/2023 12:37:23(Right-alignment as an option in more GUI Controls would be useful, but shouldn't be tied to a RTL setting.)

Controls have alignment setting, but it's not that, they do not reverse text, ever. I don't know why; this is either a regression in the new 3.* engine, or that RTL feature was never complete.
This won't be an issue with this converter solution though, as it does not need text reverse.

Actually, if we have the font render that can account for direction control chars, as eri0o mentioned, maybe we won't need RTL setting in the engine at all (rather than for backwards compatibility).
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Fri 14/04/2023 13:36:10
Quote from: Crimson Wizard on Fri 14/04/2023 12:43:52I guess you may try that, except you will have to then apply this everywhere throughout the game, for each existing string that may be wrapped:

Easily done with a few helper functions: DisplayRtl, SayRtl, DrawStringWrappedRtl, Label.SetTextRtl...
Yes, you might have to implement custom dialog options rendering, but again, this seems like an edge-case of an edge-case.

Quote from: Crimson Wizard on Fri 14/04/2023 12:43:52Actually, if we have the font render that can account for direction control chars, as eri0o mentioned, maybe we won't need RTL setting in the engine at all (rather than for backwards compatibility).

We might want a way to do RTL text input, but it should probably be a setting per TextBox rather than game-wide. (Though if the input box was made fully bidi-compatible, the only effect of this setting might be that the caret would start off right-aligned rather than left-aligned.) And in the mean time, the TextField module could relatively easily be modified to do so.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 14/04/2023 15:18:25
Quote from: Snarky on Fri 14/04/2023 13:36:10
Quote from: Crimson Wizard on Fri 14/04/2023 12:43:52I guess you may try that, except you will have to then apply this everywhere throughout the game, for each existing string that may be wrapped:

Easily done with a few helper functions: DisplayRtl, SayRtl, DrawStringWrappedRtl, Label.SetTextRtl...
Yes, you might have to implement custom dialog options rendering, but again, this seems like an edge-case of an edge-case.

Well, then someone will have to help with these script functions, or write a module...

At least in case of a speech one would have to comply to the undocumented width calculations of the speech overlays. Or write completely custom speech to not depend on that.

Another thing to consider is that you will have to add these in your game if you even guess that it might have Arabic/Persian translation, because you cannot add these helper functions through the translation file itself. If the game was already done when you realized that you want such translation, then you'll have to edit all the text assignments and Say calls in script.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Fri 14/04/2023 23:12:33
Quote from: Crimson Wizard on Fri 14/04/2023 15:18:25Well, then someone will have to help with these script functions, or write a module...

At least in case of a speech one would have to comply to the undocumented width calculations of the speech overlays. Or write completely custom speech to not depend on that.

The SpeechBubble module already has a version of this calculation (for LucasArts speech).

(Edit: The SpeechBubble calculation differs a little from the engine calculation, in order to accommodate the border around a speech bubble. Example modified to match engine.)

Code (ags) Select
int lecSpeechWidth(Character* c)
{
  int vpWidth = Screen.Viewport.Width;
  Point* cp = Screen.RoomToScreenPoint(c.x, c.y);
  int cx = cp.x;
  int w = vpWidth/2 + vpWidth/6;
  if(cx < vpWidth/4 || cx > vpWidth - vpWidth/4)
    w -= vpWidth/5;
  return w;
}

So, assuming a String.BreakRtl(int width, FontType font) extender function, SayRtl() could be implemented like so:

Code (ags) Select
function SayRtl(this Character*, String message)
{
  String rtlMessage = message.BreakRtl(lecSpeechWidth(this), Game.SpeechFont);
  this.Say(rtlMessage);
}

I think in each case all the helper functions would be a one- or two-liner.

Quote from: Crimson Wizard on Fri 14/04/2023 15:18:25If the game was already done when you realized that you want such translation, then you'll have to edit all the text assignments and Say calls in script.

Is this really a problem in practice, though? If this situation were to occur, there are two options:

1. Edit all the relevant calls and string assignments and rebuild. Most of that can probably be done by search-replace.
2. Do the linewrapping manually in the translation file.

Keep in mind that we're talking about support for a hacky workaround as a stopgap solution.

I don't know, man. You do what you want. It's not like I can stop you. But you do periodically bemoan how you keep getting distracted by patching up and making minor additions and tweaks to the current, out-of-date code instead of focusing on a forward-looking architecture.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 14/04/2023 23:18:54
Well, I'd like to see if this script solution would work for @Mehrdad. Also afaik @Wesley wanted to try Arabic translations in his game.

I wrote an experimental change for the engine already (found here (https://github.com/adventuregamestudio/ags/pull/1977)), but I will postpone merging until it is more clear whether scripting solution works conveniently.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 21/04/2023 14:50:34
Something has to write a script module for handling these special cases.

I think what is required is this:
* a function that calculates width of drawn text (given string, font and width limit) and inserts linebreaks in it.
* a special handling for the case when the text has to be read Right-to-left. Probably this means that either the splitting has to be done in reverse, or the text has to be reversed char by char twice: first before the splitting and then each separate part (between the linebreaks) is reversed on its own again.


For the reference, I had a function that splits text written long time ago for the TypedText module:
https://github.com/ivan-mogilko/ags-script-modules/blob/f55bc9015f6e6443de7f4d293b5b199779b79e88/scripts/gui/TypedText/TypedText.asc#L41
but it actually had a bug, mentioned here, with a proposed fix:
https://github.com/ivan-mogilko/ags-script-modules/issues/5

I had plans to pick this function out as a separate module, but never had found time to do this.

Or it could be rewritten from scratch.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Fri 21/04/2023 15:05:23
A linebreak function would be useful to have in general. I'm interested in having a crack at it. One question: how will manual linebreaks made using "/n" appear in the reversed "RTL" string? As "/n", as "n/", or already converted to a newline code? (Actually, it might be best to optionally support all of these as well as the old ']', using some kind of configuration bit field.)
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 21/04/2023 15:13:27
Quote from: Snarky on Fri 21/04/2023 15:05:23One question: how will manual linebreaks made using "/n" appear in the reversed "RTL" string? As "/n", as "n/", or already converted to a newline code? (Actually, it might be best to optionally support all of these as well as the old ']', using some kind of configuration bit field.)

No, the line breaking chars do not need to be reversed, that makes no sense whatsoever. The `\n` (escaped 'n') is not treated as two characters, it's processed as a  single special character called LF (ascii code 10).
Converting '[' to ']' will only make sense if that's a displayed character, but won't if it's a special break character, in which case it must retain its code.
Both '\n' and '[' are treated by AGS during wrapped string drawing (I think it converts one to another for consistency, but I forgot the details); if they will be reversed, then nothing will work.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Fri 21/04/2023 17:33:28
Quote from: Crimson Wizard on Fri 21/04/2023 15:13:27No, the line breaking chars do not need to be reversed, that makes no sense whatsoever. The `\n` (escaped 'n') is not treated as two characters, it's processed as a  single special character called LF (ascii code 10).
Converting '[' to ']' will only make sense if that's a displayed character, but won't if it's a special break character, in which case it must retain its code.
Both '\n' and '[' are treated by AGS during wrapped string drawing (I think it converts one to another for consistency, but I forgot the details); if they will be reversed, then nothing will work.

If you think I meant to turn '\' into '/' and '[' into ']', then I certainly agree that makes no sense whatsoever. That was merely me not remembering which symbols are actually used.

As for the rest, it depends on which character code is actually stored in the string at string manipulation time. If it is always LF, then things are simple and it is clear what to do. However, you say that the conversion happens "during wrapped string drawing," and my recollection is also that you do sometimes have access to strings with '[' (and "\n"?) not yet parsed. In that case we have to ensure that we are handling them correctly. It might be that the '[' is a linebreak and should be treated as such in our logic, or it might be that it really is a '[' character, and should simply be kept and wrapped as any other character.

And because we are dealing with reversed strings, with "\n" it becomes a question of whether it was inserted before or after the reversing. If it was before, it might conceivably appear as "n\" (though if correctly converted from a proper bidirectional text, I believe it shouldn't). Since AGS at no point accepts that as a linebreak, we might even need to do a replacement.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Crimson Wizard on Fri 21/04/2023 17:39:12
Quote from: Snarky on Fri 21/04/2023 17:33:28As for the rest, it depends on which character code is actually stored in the string at string manipulation time. If it is always LF, then things are simple and it is clear what to do.

AFAIK all the true escape sequences ('\\', '\n', '\r', '\t' and so forth) are dealt with at compilation time, and all the '\n' in the string you typed in the Editor will be LF in the compiled data.

What engine does at runtime is converting '[' into '\n'. This is where it tests for any backslashes before the '['.
This is done just before the string wrapping algorithm, so that the latter could work strictly with '\n's.

So, at the time when engine is drawing the line of text, it has to be either "\n" or "\\[". I suppose it's best to just replace everything with '\n' in script.
How to treat the "\\[" and "[\\" in otherwise reverse text case in script, - that is indeed open for interpretations...
One solution is to ignore these completely, and suggest users to use '\n' in their texts where they want a manual linebreak, because '\n' always becomes a single character. Then they will be dealt with by a compiler. Although, I don't know how it will be displayed if you type real Unicode text in RTL mode.
Title: Re: How is the "proper" right-to-left text written, and wrapped?
Post by: Snarky on Thu 22/06/2023 07:30:50
I suppose I should share the module I made here (https://drive.google.com/file/d/1z1oTItnDJbezREEIVGoitep3wudmdxSV/view?usp=sharing).

In implementing it, I realized a few things:

While the logic to calculate the line width of LucasArts speech is relatively simple, the calculations for Sierra speech are far more complicated, depending both on Text Window settings and on the speaking character's Speech View. There are also a lot of different configuration settings and other properties that affect how text is displayed, so replicating the full logic would indeed be pretty complicated. I've taken a simpler approach of automatically calculating it for LucasArts speech, but just allowing users to set it as a static value for Sierra speech.

There are a few subtleties with "[" and "\n". The first is that if the player enters text via a TextBox, "\n" is not converted to a LF character, but remains a backslash-n sequence in the String (like when you escape it in Strings in the editor), and is displayed as such. (This is probably correct behavior for most normal purposes.) However, AGS still interprets a "[" entered in a TextBox as a linebreak for most String display purposes, so there is arguably some inconsistency there.

The second is that not all AGS controls allow linebreaks in Strings, and those that don't (Buttons, TextBoxes and ListBoxes, IIRC) treat "\n" and "[" differently. These controls render "[" normally as an open-square-bracket, while a "\n" (i.e., LF character) is ignored or turned into a space (I forget which).

Finally, the linewrapping code should check for the escape sequence "\[" that AGS uses to actually display an open-square-bracket, so that it doesn't incorrectly interpret this as a linebreak. The current version of the module does not correctly deal with this in RTL mode.