[SOLVED] String Split method

Started by eri0o, Sat 12/10/2019 21:03:59

Previous topic - Next topic

eri0o

Edit: Figured out a way to split a string on needles or tokens...

Code: ags

int CountToken(this String*, String token){
  String sub = this.Copy();
  int count = 0;
  
  while(sub.Length > 0){ 
    if(sub.IndexOf(token)==-1){
      return count;  
    }
  
    sub = sub.Substring(sub.IndexOf(token)+token.Length, sub.Length);
    count++;
  }  
  return count;  
}

String[] Split(this String*, String token){
  int count = this.CountToken(token);
  
  if(count<=0){
    String r[] = new String[1];
    r[0] = null;
    return r;  
  }
  
  String r[] = new String[count+2];
  String sub = this.Copy();
  
  int i = 0;
  int cur = 0;
  
  while(i < count){     
    cur = sub.IndexOf(token);
    if(cur==-1) cur=sub.Length;

    r[i] = sub.Substring(0, cur);
    
    sub = sub.Substring(sub.IndexOf(token)+token.Length, sub.Length);
     
    i++;
  }
  r[i] = sub.Substring(0, sub.Length);
  i++;
  r[i] = null;
  return  r;
}


So what I would like is for "22, 33, 44, 55", a split on ", ", would return four strings.


  • "22"
  • "33"
  • "44"
  • "55"

So these strings wouldn't have the token strings in it. Lastly it returns a null string, so you can check you got the latest string by testing for null.

Edit:
Figured, fixed! It's working!

Here's a simple test of it in a room:

Code: ags
// room script file
function room_AfterFadeIn() {
  String mstr = "1, 232, 9, 55, 744";
  
  String b[] = mstr.Split(", ");
  int i=0;
  while(b[i]!=null){
    Display("%s[[%s", mstr, b[i]);
    i++;  
  }
  Display("the end");
}

eri0o

#1
MorganW sent me an alternative version that would be case sensitive. My problem is restricted to symbols and numbers, but I will leave this here in case it's useful to anyone.
Code: ags

String[] Split(this String*, String token) {
  String ret[];
  int reti = 0;
  
  if (token.Length == 0 || token.Length > this.Length)
  {
    ret = new String[2];
    ret[reti] = this;
    reti ++;
  }
  else
  {
    ret = new String[this.Length + 1];
    int offset = 0;

    for (int i = 0; i < this.Length; i ++)
    {
      if (this.Substring(i, token.Length) == token)
      {
        if (i - offset != 0) // don't use 0 length slices
        {
          ret[reti] = this.Substring(offset, i - offset);
          reti ++;
        }
        
        offset = i + token.Length;
        i = offset - 1;
      }
    }

    if (this.Length - offset != 0)  // don't use 0 length remainder
    {
      ret[reti] = this.Substring(offset, this.Length - offset);
      reti ++;
    }
  }

  ret[reti] = null;
  return ret;
}

Gurok

#2
Quote from: eri0o on Sun 13/10/2019 12:50:05
MorganW sent me an alternative version that would be case sensitive. My problem is restricted to symbols and numbers, but I will leave this here in case it's useful to anyone.

I would make a few small amendments to MorganW's version. The limit of the for loop can be this.Length - token.Length instead of this.Length. You don't need the last few iterations as any substrings won't be the length of the token and so won't match. You could also similarly optimise the constructor for the array. This might be personal taste, but I would keep zero-length matches and allow splitting by a zero-length delimiter (e.g. "abc".Split() => ["a", "b", "c", null]).
[img]http://7d4iqnx.gif;rWRLUuw.gi

morganw

Quote from: Gurok on Mon 14/10/2019 02:25:37You could also similarly optimise the constructor for the array.
This was the part I was most interested in trying to optimise, but maths isn't my strong suit. I would guess that trying to work out the maximum number of delimiter instances that yield a value would allow the array size to not be so large, but the substring checks required would just present a similar problem to the string splitting, and performance wise it is just better to over-commit the array size?

Gurok

Quote from: morganw on Mon 14/10/2019 18:02:36
Quote from: Gurok on Mon 14/10/2019 02:25:37You could also similarly optimise the constructor for the array.
This was the part I was most interested in trying to optimise, but maths isn't my strong suit. I would guess that trying to work out the maximum number of delimiter instances that yield a value would allow the array size to not be so large, but the substring checks required would just present a similar problem to the string splitting, and performance wise it is just better to over-commit the array size?

I just meant you could use this.Length - token.Length + 1 for the constructor. I suppose you could do (this.Length / token.Length) + 1 for the maximum you're talking about. Not sure about performance, but I'd say there'd be an additional cost for very small arrays and a significant saving for large ones.
[img]http://7d4iqnx.gif;rWRLUuw.gi

SMF spam blocked by CleanTalk