Adventure Game Studio

AGS Support => Advanced Technical Forum => Topic started by: mode7 on Sat 18/06/2011 18:43:50

Title: Loading comma seperated values (CSV) from a String into a struct
Post by: mode7 on Sat 18/06/2011 18:43:50
Hi Guys,

I'm currently writing a tile engine that imports TMX files. Parsing the file format and loading the maps already works quite well, but now I've run into a performance problem when loading the maps.

When parsing a TMX file I end up with a string like this for each layer

0,0,226,227,228,0,0,0,226,227,228.....
.
Im using this to load the values into a struct when loading the map:


// This subfunction returns the the single value of the csv
String DecodeCSV (String source)
{
  //Display ("Parsing %s", source);
  if (source.IndexOf(",") != -1) End = source.IndexOf(",");
  else {
    return "ERROR";
  }
  String Result = source.Substring(0, End);
  //Display ("Decode CSV: %s", Result);
  return Result;
}

// Write Layers
  lStatus.Text = "Reading Data for Layers";
  Wait(1);
  String RAWLayerData[10];

  int CL;
  while (CL < Map[0].LayerCount) {
    RAWLayerData[CL] = ExtractTag (RAWLayer[CL], "<data encoding=\"csv\">","</data>");
    RAWLayerData[CL] = RAWLayerData[CL].Append(",");
    int t;
    String temp;
    while (DecodeCSV(RAWLayerData[CL]) != "ERROR") { //this is what takes most time
      temp = DecodeCSV(RAWLayerData[CL]);
      Map[CL+1].Tile[t] = temp.AsInt;
      RAWLayerData[CL] = RAWLayerData[CL].Substring(End+1, RAWLayerData[CL].Length);
      Map[CL+1].TileCount++;
      t++;
    }
    //Display ("%d Tiles Written for Layer %d", Map[CL+1].TileCount,  CL+1);
    CL++;
  }


I works pretty well for small maps but big maps like 150x150 tiles take almost 5 minutes to load.
I guess the way I'm decoding the csv values with the DecodeCSV string is overly complicated, but this is pretty new stuff for me so my question is: Is there any way do this more easily and faster.

Thanks in advance
-mode7
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: monkey0506 on Sat 18/06/2011 22:22:50
Well something that stands out to me is that worst-case scenario you're checking the index of every substring four times; and best-case scenario you're only doing it once, but that's if there's no data. You're also running that while loop one more time than necessary (best-case scenario you're running it only once, but with an empty string).

String operations aren't necessarily slow, per se, but they're definitely not as fast as, say, integer operations. A bit of optimization in your code certainly wouldn't hurt things, but I'm not sure how much speed you will gain for very large chunks of data.

You can avoid the additional indexing by storing the value into a temporary variable initially, and then using the length of the resultant substring as the remainder substring's offset afterwards. You can avoid the additional loop and its checks by actually using the EOS (end-of-string) instead of forcing an additional and unnecessary comma to be appended and tossing an error string at the new EOS.

So, something like this might help save some CPU cycles:

String DecodeCSV(String source)
{
  int i = source.IndexOf(","); // store this so we only have to call IndexOf once instead of twice
  if (i == -1) return source; // reached EOS, this is the last value
  return source.Substring(0, i); // trim the value from the list and return it directly, we don't need an additional variable for this unless we're doing something else with it
}

  // Write Layers
  lStatus.Text = "Reading Data for Layers";
  Wait(1);
  String RAWLayerData[] = new String[Map[0].LayerCount]; // using a dynamic array makes this more flexible if LayerCount changes later ;)
  int CL;
  while (CL < Map[0].LayerCount)
  {
    RAWLayerData[CL] = ExtractTag(RAWLayer[CL], "<data encoding=\"csv\">", "</data>"); // why are you passing a null String as the first parameter here?
    int t;
    String temp = ""; // initialize this so we don't get a null pointer error
    while (temp.Length != RAWLayerData[CL].Length) // end the loop after we've grabbed the last value
    {
      temp = DecodeCSV(RAWLayerData[CL]); // grab a value
      Map[CL + 1].Tile[t] = temp.AsInt;
      if (temp.Length != RAWLayerData[CL].Length) RAWLayerData[CL] = RAWLayerData[CL].Substring(temp.Length + 1, RAWLayerData[CL].Length - (temp.Length + 1));
      // else..set both of them to an empty string? (optional, depending if you need it for some reason)
      Map[CL + 1].TileCount++;
      t++;
    }
    CL++;
  }


I've modified some of the logic with the loops so that it doesn't have to run any extra loops unnecessarily. So long as there's any data at all then it will grab the first value from the CSV list, and if that value was in fact the last value then it will break out of the loop after it is done using it. You'll notice that I also changed the length parameter you're passing into the Substring function. AGS Strings are null-terminated, so I would imagine that any operations such as substring, append, etc. would be optimized to break out of the operation at the first null-character (ASCII 0, '\0') encountered, but I don't actually know that for sure, so it certainly wouldn't hurt to just trim the fat yourself. ;)

As I said, I don't know how much difference these optimizations will make for your larger maps, but go ahead and give it a try and see what comes of it. If it's still not fast enough, I'd like to see what you have in the ExtractTag function (for example, why are you passing a null String as the first parameter? AGS doesn't let you pass Strings by reference, which is why you're having to return the value you need...).

I've never done any speed tests as a comparison, but one other thing you could consider trying out if speed is still a problem is perhaps just using File.ReadRawChar and parsing the file one character at a time. I would imagine that would be significantly slower than the String operations, but as I said I've never done a speed test to know one way or the other.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: mode7 on Sun 19/06/2011 13:15:21
Thanks a lot for your effort monkey. This really did help. Loading times droped from several minutes to 35 seconds (on my netbook) which is still kind of long, yet the map IS really huge.

Here's the extract tag function you wanted to have a look at. It is slow but it only runs once for each layer so its kind of insgnificant:


String ExtractTag (String source, String start, String end)
{
 
  if (source.IndexOf(start) == -1) {
    //Display ("Start tag %s not found", start);
    return "ERROR";
  }
  else Start = source.IndexOf(start);
  String Sub = source.Substring(Start, source.Length);
  if (Sub.IndexOf(end) == -1) {
    //Display ("End tag %s not found", end);
    return "ERROR";
  }
  else End = Sub.IndexOf(end);
  String result = source.Substring(Start+start.Length, End-start.Length);
  //Display ("Result: %s", result);
 
  return result;
}


Also I was wondering. Dynamic Arrays in AGS? How does that work. Would it work for my struct arrays too? AGS seems to reserve their space in the exe so it grows really large for higher values.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: Yeppoh on Sun 19/06/2011 16:15:18
You could create an Explode function that converts your String into an Array of char values.

Something like this :


char[] Explode(this String*)
{
 short index = 0, in2 = 0, count = 0;
 char buffer[], list[];
 String sBuffer = this;
 list = new char[1];
 list[0] = count; // index 0 is the length of the array
   
 while (sBuffer.Length > 0) {
       
     buffer = new char[count+1];
     count++;
     while (in2 < list[0]) { // stores the values in a temporary array
       buffer[in2] = list[in2];
       in2++;
     }
     in2 = 0;
     list = new char[count+1];
     while (in2 < count-1) { // creates the new array with the values
       list[in2] = buffer[in2];
       in2++;
     }
     in2 = 0;
     
     list[0] = count;  // new length of the array in index 0
     
     index = sBuffer.IndexOf(",");
     
     if (index != -1) { //stores the value in new index
       list[count] = sBuffer.Substring(0, index).AsInt;
       sBuffer = sBuffer.Substring(index+1, sBuffer.Length-index);
     } else {
       list[count] = sBuffer.AsInt;
       sBuffer = "";
     }
     
 }
 
 return list;  
 
}


You can also store this in a global array, and then get the value you need from it in your script.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: RickJ on Sun 19/06/2011 21:04:48
Why not just use binary files in the release version.   If the game is in debug mode then prompt to read ascii or binary file(s).  If ascii is selected the current parser to populate your struct from the ascii file and save to binary file.  If binary is selected then rad from the binary file.  
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: monkey0506 on Sun 19/06/2011 21:25:43
@Nefasto:

Your function is probably a bit bloated because it's creating and releasing dynamic arrays every single iteration of your while loop. What would actually be more efficient is just to create an overly large buffer array, then copy it into an appropriately sized array after you know the actual size.

Something else to consider is that although the largest value he gave here is 228, is there a chance the value might be greater than 255? I don't know anything about TMX files, but if any of the values in the extracted String could be greater than 255, he'd need to use a short at least, but ints are more efficient time-wise.

Oh, and you can't directly access objects returned by functions in AGS, they have to be stored in temporary variables.

int[] ExplodeCSV(this String*)
{
 if (!this.Length) return new int[1]; // index 0 will hold the number of values
 int buffer[] = new int[this.Length];
 int count = 0;
 int i = this.IndexOf(",");
 String sbuffer = this;
 while (i != -1)
 {
   String sbuffer2 = sbuffer.Substring(0, i);
   buffer[count] = sbuffer2.AsInt;
   count++;
   if ((i + 1) < sbuffer.Length)
   {
     sbuffer = sbuffer.Substring(i + 1, sbuffer.Length - (i + 1));
     i = sbuffer.IndexOf(",");
   }
   else
   { // string ends with a comma
     sbuffer = null;
     i = -1;
   }
 }
 if (sbuffer != null)
 { // string doesn't end with a comma, store last value
   buffer[count] = sbuffer.AsInt;
   count++;
 }
 int buffer2[] = new int[count + 1];
 buffer2[0] = count;
 i = 0;
 while (i < count)
 {
   buffer2[i + 1] = buffer[i];
   i++;
 }
 return buffer2;
}


@mode7:

This function has the same problems as I described about your first code snippet. Also, unless you're setting a value after the array is created (static or dynamic), the source parameter would always be null when you call it from the above snippet, and would throw a null-pointer reference error.

String ExtractTag(String source, String start, String end)
{
 if ((source == null) || (start == null) || (end == null)) return "ERROR";
 int startIndex = source.IndexOf(start);
 if (startIndex == -1) return "ERROR";
 String sub = source.Substring(startIndex + start.Length, source.Length - (startIndex + start.Length));
 int endIndex = sub.IndexOf(end);
 if (endIndex == -1) return "ERROR";
 return sub.Substring(0, endIndex);
}


This makes sure none of the Strings are null, and only checks for each index once (instead of twice if the start/end tag is valid). Also, I've changed the return result to be based on the substring you're already extracting because that's probably more efficient than reusing the entire String to get the result.

Oh, and dynamic arrays only work with the basic and built-in data types, not custom structs unfortunately. You also can't include a dynamic array as part of a custom struct, but you can have a custom struct function return a dynamic array or use a dynamic array as a parameter (I think to return one you have to use an extender method though).

@RickJ:

You posted while I was typing, but I don't understand what you mean at all. Unless there's some plugin being used to utilize a binary file then I don't understand how you could use a binary file like that in AGS.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: mode7 on Sun 19/06/2011 23:07:40
Thanks a lot for your effort everyone

@Nefasto that looks like an interesting idea, unfortunately I can' getting it to run. Even after fixing the function I still get a infinite loop. I'll have to investigate into this further. Also monkey is probably right about the 255 char issue. This would limit a tilset to be made up of 16x16 tiles max and this is actually a bit to little

@RickJ
I was thinking about that before. If I knew how to do it that would be great.

@Monkey
Thanks for the info. A shame I only found out that AGS supports dynamic sized arrays after I had planed out my structs. Still a nice thing to know.

Well as the loading is a little faster now I figured that it might still be a big issue especially as the loading times multiply with several tile layers (im currently using only two)
I just wonder there might be a much easier and faster way I failed to see yet. I just wonder because the tiled map editor loads them in a slit second.
Another option would be to use base64 encoded data instead of the CSV. I have no idea how to pull that off but it seems like option to try..well I dont even now if its possible in AGS. I'll have a look into it though.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: Yeppoh on Mon 20/06/2011 09:58:20
Good point monkey. I wrote it off the top of my head so I didn't double check for optimization. Though I wonder if that wouldn't use more memory.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: monkey0506 on Mon 20/06/2011 10:28:45
No, it absolutely wouldn't use more memory. Well, I mean I used int instead of char, but I'm allocating significantly fewer variables. If you examine your code with the example string given of "0,0,226,227,228,0,0,0,226,227,228", then we could lay out some simplified pseudo-code that would look something like this:

// start of function
list = [1]
// extract first item
0,
buffer = [1]
list = [2]
// extract second item
0,
buffer = [2]
list = [3]
// so-on...
226,
buffer = [3]
list = [4]
227,
buffer = [4]
list = [5]
228,
buffer = [5]
list = [6]
0,
buffer = [6]
list = [7]
0,
buffer = [7]
list = [8]
0,
buffer = [8]
list = [9]
226,
buffer = [9]
list = [10]
227,
buffer = [10]
list = [11]
228
buffer = [11]
list = [12]


So if you add all that up, you're allocating 144 chars to ultimately hold 11 (+1 for the array size) values.

With my code it would look more like this:

buffer = [33]
0,
0,
226,
227,
228,
0,
0,
0,
226,
227,
228
buffer2 = [12]


Add that up and I'm allocating 45 integers. That's 3.75 times the ultimate number of values stored being allocated versus the 12 times memory allocation with your function.

The most memory-conscious method would be to iterate and parse the entire string twice, but that's actually much less time-efficient. With no way of knowing the exact number of values at the start, allocating excessive memory is the price we pay for parsing strings like this, but at 4-bytes per int, the extra 132 bytes I'm allocating aren't really that big of a deal. Arguably since you were using single-byte chars instead of ints, you were also only allocating an extra 132 bytes of memory, but as mode7 said, the char type isn't really going to work for his purposes here.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: mode7 on Mon 20/06/2011 11:00:25
Ok I got Monkey's version of Nefastos explode function running and made some speed tests (With RawTime - is there a more precise way of doing that?)

The old DecodeCSV function which monkey optimized took about 6seconds on my DualCore
The explode function took about the same time. As the data still needs some reformating and putting it into my struct arrays this will probably be slower.

But thanks a lot for the effort, anyway.

I managed to optimize my tileset loading algorithm so the map loading times is currently the only bottleneck when loading a map. On slow computers that will probably get annoying when changing level, kinda like on the old Playstation games.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: Yeppoh on Mon 20/06/2011 13:40:46
Hoho. I see, monkey. So it doesn't overwrite the array and adding a little allocation for the new index at every iteration as I instinctively thought. It just adds a new allocation for every new array, if I undertood it correctly. Shown that way my method is indeed a memory hog.

Anyway, glad we could help you, mode7.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: monkey0506 on Mon 20/06/2011 19:23:48
Regarding the bottlenecking, if the map is too large then you could consider breaking the loading into smaller chunks, which would allow you to show a "Loading" screen of some type at least. You could actually use a dynamic array here to split the extracted string:

// top of script
String datas[];
int dataCount = 0;
int dataID = 0;

void SplitData(String source)
{
  int length = 100; // the estimated length of each substring
  dataCount = FloatToInt(IntToFloat(source.Length) / IntToFloat(length), eRoundUp); // integer division always rounds down, we need to round up
  int i = 0;
  while ((i < dataCount) && (source != null))
  {
    if (source.Length >= length)
    {
      int j = length;
      while ((j < source.Length) && (source.Chars[j] != ',')) j++;
      datas[i] = source.Substring(0, j - 1);
      if ((j + 1) < source.Length) source = source.Substring(j + 1, source.Length - j);
      else source = null;
    }
    else
    {
      datas[i] = source;
      if (datas[i].EndsWith(",")) datas[i] = datas[i].Truncate(datas[i].Length - 1);
      source = null;
    }
    i++;
  }
  if (i < dataCount) // we've actually used fewer items than we allocated
  {
    String dd[] = new String[i];
    dataCount = i;
    i = 0;
    while (i < dataCount)
    {
      dd = datas[i];
      i++;
    }
    datas = dd;
  }
}

// when you're actually pulling the data from the file
String data = ExtractTag(source, "<data encoding=\"csv\">", "</data>");
SplitData(data);

// whatever script..
function repeatedly_execute()
{
  if (dataCount)
  {
    // extract the data from datas[dataID] via DecodeCSV..
    // update a loading GUI..
    // for example, if you have a button that is 100px wide to represent the loaded percentage (make sure ClipImage is true
    btnLoading.Width = (dataID * 100) / dataCount; // set the button width in pixels to match the loaded percentage..
    dataID++;
    if (dataID == dataCount) // done reading the data
    {
      datas = null;
      dataCount = 0;
      dataID = 0;
    }
  }
}
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: mode7 on Tue 21/06/2011 20:06:40
Thanks a lot for the suggestion monkey.
I  rewrote the whole algoritm. saved me another second or so.
More important it's a really small piece of code now (no more service functions)
I even implented a loading bar, this takes away some time again, but 0.5 seconds doesn't make much of a diference.


 bLoad.Width = 0; //Loading Bar
 String RAWLayerData[] = new String[Map[0].LayerCount]; // using a dynamic array makes this more flexible if LayerCount changes later ;)
 int CL;
 while (CL < Map[0].LayerCount)
 {
   lStatus.Text = String.Format("Reading %d tiles for layer %d/%d",Map[0].TileCount, CL+1, Map[0].LayerCount);
   Wait(1);
   RAWLayerData[CL] = ExtractTag(RAWLayer[CL], "<data encoding=\"csv\">", "</data>");
   RAWLayerDataC[CL] = RAWLayerData[CL]; //This double string is for tilset loading operations
   // Loading Algorithm
   int index;
   int tilecount;
   int lc=1;//Loading bar multiplier
   while (index != -1) {
     Map[CL+1].Tile[tilecount] = RAWLayerData[CL].AsInt;
     index =RAWLayerData[CL].IndexOf(",");
     RAWLayerData[CL] = RAWLayerData[CL].Substring (index+1, RAWLayerData[CL].Length-index);
     tilecount++;
     if (tilecount == (Map[0].TileCount/100)*(10*lc)) {
       bLoad.Width = lc*50; //Make loading bar wider
       lc++;
       Wait(1);
     }
   }
   CL++;
///End of loading
 }

It's still too slow and my long term goal is to load it so quick I wont need a loading bar.

Well the loading bar showed me that the second layer is loading much faster than the first. The second layer is for alpha channel sprites only, so there are a lot of '0' in there. This makes the parsed string much smaller. One more thing I realized is that at the end of the loading bar it moves faster than at the beginning (Taking most of the time for the first 20 or so %)
So String length is the issue here. But it's not the IndexOf function. I'm using it with the raw strings for the tileset and its very fast.
This brings me to the conclusion that the SubString function it the guilty one here. Constantly rewriting the string which is probably about 50kb or more of data probably is slow (correct me if I'm wrong)

So I wondered if there is a way to avoid the SubString function or at least the issue of constantly rewriting almost the whole string?
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: monkey0506 on Wed 22/06/2011 00:59:26
There's not really a way to get around creating new Strings, but it might be faster to build up a single short String instead of cutting out the short String portion at the start and then trimming off the much longer String at the end each time.

Also, is that code working for you? Coz unless I'm misreading something, you're using the AsInt property against the entire CSV string, not just the relevant portion. I'm not entirely sure how AGS handles mixed strings like that with regard to AsInt, but I'm guessing that if it is indeed working for you then it's just truncating the int at the first non-numeric character. In any case, the below doesn't use the Substring function at all, so it might actually be slightly faster (again because it's building small strings instead of truncating large ones).

 bLoad.Width = 0; //Loading Bar
 String RAWLayerData[] = new String[Map[0].LayerCount]; // using a dynamic array makes this more flexible if LayerCount changes later ;)
 int CL;
 while (CL < Map[0].LayerCount)
 {
   lStatus.Text = String.Format("Reading %d tiles for layer %d/%d",Map[0].TileCount, CL+1, Map[0].LayerCount);
   Wait(1);
   RAWLayerData[CL] = ExtractTag(RAWLayer[CL], "<data encoding=\"csv\">", "</data>");
   RAWLayerDataC[CL] = RAWLayerData[CL]; //This double string is for tilset loading operations
   // Loading Algorithm
   int index;
   String buffer = "";
   int tilecount;
   int lc=1;//Loading bar multiplier
   while (index != -1) {
     //Map[CL+1].Tile[tilecount] = RAWLayerData[CL].AsInt;
     //index =RAWLayerData[CL].IndexOf(",");
     bool break = false;
     while ((index < RAWLayerData[CL].Length) && (!break))
     {
       if (RAWLayerData[CL].Chars[index] != ',') buffer = buffer.AppendChar(RAWLayerData[CL].Chars[index]);
       else break = true;
       index++;
     }
     //RAWLayerData[CL] = RAWLayerData[CL].Substring (index+1, RAWLayerData[CL].Length-index);
     Map[CL + 1].Tile[tilecount] = buffer.AsInt; // use the short string we built instead of getting a substring
     buffer = ""; // clear the buffer for the next value
     if (index == RAWLayerData[CL].Length) index = -1; // if we reached the end-of-string, break the loop
     // otherwise, don't change index at all because we'll need its value for the next loop
     tilecount++;
     if (tilecount == (Map[0].TileCount/100)*(10*lc)) {
       bLoad.Width = lc*50; //Make loading bar wider
       lc++;
       Wait(1);
     }
   }
   CL++;
///End of loading
 }


I haven't tested to see if that's actually faster, but I imagine it might be if the data string is quite long.
Title: Re: Loading comma seperated values (CSV) from a String into a struct
Post by: mode7 on Wed 22/06/2011 07:45:01
Most interesting monkey. I'm checking it out right away.
And yes, using AsInt on a string returns the first numeric value.

EDIT:
Okay tested your algorithm. Unfortunately it didn't do the trick. It took about 12seconds while mine takes 5. I was thinking about a similar algorithm before but couldnt pull it of so I'll take another look at this.
Big thanks again for all the effort you put into this thread.