Strings Again

Started by Gurok, Sun 31/01/2016 11:26:24

Previous topic - Next topic

Gurok

I was going to reply to the 3.4.0.6 thread, but I thought it best to break this off into a new discussion.

Quote from: Crimson Wizard on Fri 11/12/2015 17:47:11
Quote from: Monsieur OUXX on Fri 11/12/2015 17:21:55
I don't know if this was the case before, but in this version you cannot do :
Code: ags

String s=null;
if (s=="")
{
   //do something
}


If you do that, the game stops with message "null pointer referenced" when trying to evaluate t=="".
I understand the underlying mechanism (there's probably an implicit something like s.GetValue() going on at some stage) but it doesn't feel right.


This was always like that because "==" operator between 2 strings was translated into "compare two strings".
It does not seem like there is any other reasonable effect of comparing a string reference and string literal. The operator cannot compare a pointer value and "", because "" is a valid string object (and null reference is not a valid object).
This is also why there is String.IsNullOrEmpty() function to be used instead of 's == ""'.

This explains why it happens, but not what the rationale was for making it this way.

I found another thread referencing this design decision:

http://www.adventuregamestudio.co.uk/forums/index.php?topic=28977.msg370265#msg370265

From what I can tell, this isn't like C#. C# does a reference comparison between the two objects unless they're both non-null. There are no null reference exceptions thrown.
It's somewhat like Java's approach, but Java doesn't balk if the second value is null, e.g. "ABC".CompareTo(null) causes an exception in AGS but wouldn't in Java.
C++ doesn't allow null std:strings and the C equivalent (strcmp) leaves the behaviour when passing NULLs as undefined.

So where DID this style of handling nulls come from? It seems ultra conservative. Is there any desire to bring this more in line with C#? Where:

Code: ags
"a".CompareTo(null); // false
String a = null; a.CompareTo("b"); // null reference exception
"a" == null // false
String a = null; a == "b" // false


I understand the principles of error hiding, but it feels to me like these are sensible defaults and not masking programming errors. Nobody is assuming a value is non-null if myString != "abc" succeeds. It also currently means developers must have some idea of how strings work under the hood (and the limitations of the CompareTo function). It's generally not a good design when the end user has to be aware of the implementation.
[img]http://7d4iqnx.gif;rWRLUuw.gi

Snarky

String comparison is tricky, because you need to be able to compare both objects (which can be null) and literals. If you let the '==' operator just do a simple pointer comparison (as it does for most other objects), you'll get counter-intuitive results where two strings with exactly the same content (possibly even two string literals, depending on the underlying implementation) may evaluate as unequal, which makes it more or less useless. (Your C# examples don't test this case, but if it avoids it, it is only be considerable complexity under the hood.)

If you just make '==' a shortcut for the string-comparison method a.equals(b) (as AGS does), it will choke if the first parameter is null. If you make this string-compare accept nulls in the argument, you'll now be in a situation where 'a == b' and 'b == a' will evaluate differently if b is null, which I would argue is very bad.

For these reasons, most C-like languages discourage use of '==' for string comparisons (though some simply treat null as an empty string); the decision to support it in AGS may not have been the best, but we're stuck with it now.

monkey0506

Quote from: Snarky on Sun 31/01/2016 15:14:50For these reasons, most C-like languages discourage use of '==' for string comparisons (though some simply treat null as an empty string)

I'm not sure what "most C-like languages" you're referring to. C doesn't have any support for operator overloading (and doesn't have traditional function overloading for that matter). C++ allows overloading the global operator== for char* and char const*, and provides a built-in overload (as part of the STL) for std::string. C# has a built-in operator== for System.String.

As gurok already pointed out:

* C's strcmp is undefined behavior if any of the parameters are not a null-terminated byte string.
* C++ has explicit reference types (references and pointers) that are not comparable to objects, so there is no such thing as a "null" std::string.
* C# checks references first. If both are null, then the result is true. Otherwise, if either is null, then the result is false. Otherwise, it compares the text of the two strings.

Of the three common, major "C-like languages", two of them support operator==. Of those two, only one has a by-reference string type, and it produces exactly the expected output when using operator==.

As expected, C# throws a NullReferenceException if the "this" reference in String.CompareTo is null, but produces expected output for a valid String object with a null parameter. (test)

Quote from: Gurok on Sun 31/01/2016 11:26:24Is there any desire to bring this more in line with C#? Where:

Code: ags
"a".CompareTo(null); // false
String a = null; a.CompareTo("b"); // null reference exception
"a" == null // false
String a = null; a == "b" // false

This is completely consistent with the way C# handles null String references, so I would say absolutely, yes. The first line, calling a function on a string-literal, isn't currently supported either, but I see that as a tangential point. Regarding the handling of String comparison, I definitely feel that it should mirror C# when working with null references.

Snarky

C, C++ and C# are not the only major C-like languages, monkey. For starters, you have Java, PHP, Perl, Objective C, JavaScript...

In C, attempting to compare strings with == is a common mistake. Java uses reference-comparison, but also has a string pool, which means that '==' will work as a string comparison under some circumstances (which can make bugs even harder to find). PHP lets you use the equals operators to do string comparisons; in this case (null == "") is true, but (null === "") is false (a lot of other == comparisons you wouldn't expect also come out true, and it's generally considered wrong to use it in most cases). I think JavaScript works more or less the same. Perl tries to convert the two values to ints, with potentially unpredictable results. Objective C uses reference comparison.

So in most of these languages, '==' is not the right way to do a string comparison.

Gurok

#4
But Snarky, you're making an argument against overloading the == operator (and to some extent the confusion that arises from having an === operator). I don't see a clear argument for why we shouldn't try to make the current operator more consistent.

At the moment, we have a situation where the literal null has a different meaning to a variable with a null value.

Code: ags

String s = "";
if(s == null) { ... } // fine

String t = null;
if(s == t) { ... } // runtime error


Both languages you mentioned with an overloaded == operator (Javascript and PHP) will not give you a runtime error for comparing null strings using that operator.

I think we should be doing reference comparison when either side has a null value foremost because it is more consistent and predictable. A null value should not be different from the null keyword. As I mentioned before, it also doesn't put a burden of knowledge about the implementation of strings on the user.

Another thing I didn't mention earlier is that this would have no impact on existing code.

monkey0506, I honestly didn't think about the method call on a literal not being supported. I could probably add support for that too. I think we're on the same wavelength with the standardisation thing. I am really just using C# as a reference as that seems to be the closest analogue.
[img]http://7d4iqnx.gif;rWRLUuw.gi

SMF spam blocked by CleanTalk