Interest in regex?

Started by monkey0506, Wed 06/01/2010 21:55:46

Previous topic - Next topic

monkey0506

I've asked this before (years ago) in the AGS IRC but I didn't get a very large response. I've been working with regex lately and although I don't have much experience, I definitely understand how powerful it is and how beneficial it can be if you're doing any type of text parsing.

For those of you who don't know what regex is, it means "Regular Expressions" and is a special language used for searching and parsing through text. Wikipedia and the PHP documentation both have information on regex if you're interested in learning more. The reason I link the PHP documentation is because it's the best information on regex I've found. ;)

Regex works basically like String.IndexOf, String.StartsWith, and String.EndsWith but allows you total control over defining what you're looking for.

Say for example you wanted to have the player enter a full name such as "John Q. Smith". You could use IndexOf to check for two spaces and the full-stop/period. But what if the player entered numbers? This might be a rather obtuse example (::)) but regex can make the process simpler. With regex you might use a pattern such as:

Code: ags
[A-Za-z]+ [A-Za-z]\. [A-Za-z]+


Which tells it you're looking for any number of alphabetic characters (but at least one) followed by a space character followed by exactly one alphabetic character followed by a full-stop, another space, and then any number of alphabetic characters (but at least one).

If there are any numbers or other characters then a regex match would fail. As I said, this example might be a bit obtuse, but it gets the point across as to how powerful regex matching is.

My question then is whether there would be a reasonable amount of interest in having regex functionality in AGS. I imagine that particularly for those who want are wanting to or are implementing text parsers in their game it could open up worlds of possibility.

When I asked before scotch told me that if I thought regex was needed it would probably be easier for CJ to implement than for me to script. But what self-respecting programmer would ask another programmer to do something they could program themselves? ;D

Actually I don't think there would be a wide-spread call for regex in an adventure game, but again, I can see possibilities where it would be useful. So I opened up my old script and realized, I got a lot further than I remembered. So I updated the code a bit and now I have the following example code for testing:

Code: ags
function game_start() {
  String p = "...thisw --is--\\text --is-- \"some\" \\text.\\..text!";
  // NOTE: all matches are considered caseless (right now)
  Display("match \"\\c\": %d", Regex.Match(p, "\\c")); // displays 0
  Display("match \"--is--\[^\\-a-zA-Z]\": %d", Regex.Match(p, "--is--[^\\-a-zA-Z]")); // displays 9
  Display("match \"text\\.\": %d", Regex.Match(p, "text\\.")); // displays 36
  Display("match \"text\[^.]\": %d", Regex.Match(p, "text[^.]")); // displays 16
  Display("match \"\[^\\\\]text\[^.]\": %d", Regex.Match(p, "[^\\\\]text[^.]")); // displays 43
  Display("match \"\[W-g]\": %d", Regex.Match(p, "[W-g]")); // displays 7
  Display("match \"\[!-\\]]\": %d", Regex.Match(p, "[!-\\]]")); // displays 0
  Display("match \"\\w\[^\\Wthisw]\": %d", Regex.Match(p, "\\w[^\\Wthisw]")); // displays 16
  Display("match \"\[^\\\\]text\": %d", Regex.Match(p, "[^\\\\]text")); // displays 43
  Display("match \"\[a-z]\[a-z]\": %d", Regex.Match(p, "[a-z][a-z]")); // displays 3
  // speed test
  Display("starting speed test!");
  DateTime *now = DateTime.Now;
  int rt = now.RawTime;
  int i = 0;
  i = Regex.Match(p, "\\c");
  i = Regex.Match(p, "--is--[^\\-a-zA-Z]");
  i = Regex.Match(p, "text\\.");
  i = Regex.Match(p, "text[^.]");
  i = Regex.Match(p, "[^\\\\]text[^.]");
  i = Regex.Match(p, "[W-g]");
  i = Regex.Match(p, "[!-\\]]");
  i = Regex.Match(p, "\\w[^\\Wthisw]");
  i = Regex.Match(p, "[^\\\\]text");
  i = Regex.Match(p, "[a-z][a-z]");
  now = DateTime.Now;
  Display("Raw time difference: %d", now.RawTime - rt); // displays 0 - execution of 10 regex matches in under a second
}


As you can see from the commentation after finding all the pattern matches where expected (if you're interested in what the patterns do or why they match where they do feel free to ask) I performed a speed test and matched all 10 patterns (again) in under a single second. So my current implementation is reasonably efficient (it seems).

I still have to implement quantifiers and the vertical bar for alternative pattern matching...but is anybody interested in using something like this? :=

suicidal pencil

The question I must ask is: NFA or DFA regex engine?

My only concern is that Regexes can be VERY hard to learn, and get used to. There is a reason why there are books written about the damn things  :-\

monkey0506

Well based on this I guess what I've been writing would be considered NFA.

And I don't think regex is any harder to learn than any other computer language. It has a set of specific rules. You learn the rules. Then you find out there are exceptions to the rules and it blows your mind. := Then you learn the exceptions and everything's okay again.

A simple regex pattern can actually be quite easy to implement actually. Besides it's not as though there's not tons of online documentation on various sites regarding building patterns for specific matches... :)

SMF spam blocked by CleanTalk