Algorithms: Optimize tile iteration

Started by abstauber, Mon 23/05/2011 11:04:28

Previous topic - Next topic

abstauber

Hi there,

I'm currently trying to optimize my tile engine, but I'm out of thoughts right now.
Here's a sample tilemap:



I'm currently looping through my tile array like this:
Code: ags

    i = 0;
    k = 1; // please ignore ;)

    while ( i < num_tiles_x )
    {
      j = 0;
      
      while ( j < num_tiles_y )
      {
        int index = j * num_tiles_x + i; 
        if (tile[index].tileno[k] != 0) {
          // and so on...



Now this works fine and it  loops from index 0 to index 1380 all the time. But it would be only necessary to loop through the indices inside the red square, which is the viewpoint of the player.
(the tiles aren't drawn anyway, because I check the tile's coordinates inside the loop and compare it to the player's coordinates.)


My question is now: Is it possible to modify these two loops to consider only the tiles inside the red square? I do have the index of the player's tile, so it could be a starting point to calculate the index. But I'm terribly clueless at the moment.


Any help is appreciated :)





TomatoesInTheHead

#1
something like

Code: ags
    
    //predefined size of view rectangle
    int num_view_tiles_x = 20;
    int num_view_tiles_y = 10;
    //player position in view
    int player_view_offset_x = 3;
    int player_view_offset_y = 9;

    //player index to (x,y) position
    int player_y = player_index / num_tiles_x;
    int player_x = player_index % num_tiles_x;

    //compute the x and y coordinate of the upper right corner of the view rect
    int view_offset_x = player_x - player_view_offset_x;
    int view_offset_y = player_y - player_view_offset_y;
    if (view_offset_x < 0) view_offset_x = 0;
    //and/or similar range checks...

    i = view_offset_x;
    k = 1; // I ignore it ;)

    while ( i < view_offset_x+num_view_tiles_x && i < num_tiles_x ) //last bit is also a range check
    {
      j = view_offset_y;
      
      while ( j < view_offset_y+num_view_tiles_y && j < num_tiles_y )
      {
        int index = j * num_tiles_x + i; 
        if (tile[index].tileno[k] != 0) {
          // and so on...

should do the trick.

Also, I don't know if it really makes a difference in performance, but I usually prefer to go in the y direction first and then in the x direction, to traverse the fields in the same order as they are in the array. It may have a better performance if the region around the field you access are cached (like, when you access array[0], array[0] through array[32] are read into the cache; now you would access fields 0, 70, 140, ..., 1, 71... in that order and not benefit from such caching).

Calin Leafshade

I suggest that you *dont* check if the tile should draw or not.

The engine already makes this check for you and it does so in optimised c++ so the cpu cost for just running the DrawImage method anyway is *less* that making all the checks required.

abstauber

#3
@TomatosInTheHead
Thanks a million! I'll try that ASAP :D

@Calin
Oh, I didn't know that. In that particular function I also check if the tile is animating, emitting particles and so on. Let's see what's faster. If Tomatos' snippet works that check will be needless anyway.


Thanks you two!


edit: Framerate stabilized a lot, but there's still room for improvement. The framerate at the bottom right corner is still dropping very low (20 fps on a 2Ghz Intel C2D). So the next thing I'll try is caching.

tzachs

Quote from: abstauber on Mon 23/05/2011 13:08:53
edit: Framerate stabilized a lot, but there's still room for improvement. The framerate at the bottom right corner is still dropping very low (20 fps on a 2Ghz Intel C2D). So the next thing I'll try is caching.

If there's one thing I learned the hard way, is that it's often better to first do some measurements and identify the source of the problem, even if you're convinced that you know where the problem is coming from.
I have once worked on some utility that has to parse files, and it took a lot of time, and I was sure it was due to having to parse large files, and I've worked a lot on reducing the amount of parsing.
While that helped a little, it wasn't nearly enough, and then I started to measure times of different parts of the code.
As it turns out, that was not the large bottleneck. The problem was in a completely different area, where there was a copy of an array that took a whole lot of time, and the solution was actually really simple...
Anyways, might not be relevant here, but just in case it is...

abstauber

Hmm... I'm pretty sure that accessing array values at the end of the array is just painfully slow.

Even if the player is spawned at the bottom right corner and doesn't move at all (no collision checks interfering), the framerate is down.

But could it be that accessing tiles[7000].id takes longer than accessing tiles[70].id

Calin Leafshade

Are you calculating the position of the tiles each time?

I did this for my iso engine and it was *painfully* slow. The calculations were more intensive than the actual drawing.

so in the end i had a struct like this:

Code: ags

struct Tile{
  int Sprite;
  Point2D *PixelPos;
  Point2D *TilePos;
  int Height;
};


And calculated the 'PixelPos' when i generated the map.

and then i have a custom type like this:

Code: ags

Rectangle *viewport;


and then when it comes round to drawing them I dont have to worry about X or Y at all I can just iterate through the entire array without any checks since comparisons in AGS seem to be relatively slow. The only calculation required is adding the viewport position to the tiles position.


abstauber

#7
Yeah, the initial coordinate is calculated when I load the map. But due to scrolling I've calculate an offset and apply that to the position.

The scrolling algorithm is also pretty unoptimized, but that's a different story ;)

I've now switched the order of the two loops, now the outer loop is y and the inner loop is x. That helped a lot: a 200x80 tilemap (16px tiles)  now runs without significant fps drops.

Now I just need a way to simulate an atom processor so I can add optimizations and actually see the result.

edit: Argh, switching loops just swapped the problem from bottom right to top left.

Khris

I gotta say, I was highly skeptical of both the order and position of array elements being accessed having any effect on the frame rate.

But being skeptical isn't enough so I did a simple test. Random strings of twenty characters are put into a 10,000 element string array.
Then I timed four loops; each time the game prints 500,000 strings to the screen.
The first loop would take a random string from the first 2000,
the second loop would print the first 2000 in order over and over again.
The third and fourth would print the last 2000, randomized and in order.

Everytime, no matter what I chose, the result was the same: 21 seconds.

And regarding caching: afaik, caching means to load stuff into the RAM in order to access it faster (e.g. files on a DVD or HD or network location).
Thus, caching stuff that's already in the RAM is completely pointless.

abstauber: since the framerate drops depending on the character's position, there's obviously some little typo or something in your code. Switching the loops moved the frame rate drop, so it's got to be something like you aren't calculating the visible rectangle right.
For instance if you by accident iterate from visible edge to map edge, the character's position changes the amount of tiles being processed.

Snarky

The computer has multiple levels of caches, starting with the CPU registers (which are in effect instantaneous) and typically including at least an L1 cache and an L2 cache, which are both much quicker to access than the main memory (but smaller, of course). In principle, designing your algorithms so all the data they need can be cached in one of the faster caches can speed up execution significantly, although I'm not sure how much control you have over that when writing in something high-level like AGS script.

Khris

That was sort of my point, I imagine AGS stores all variables in the same "virtual space", if you will, so moving it about will at best slow things down. Of course with coding in assembler, it's a completely different story.

Wyz

I think I'd use a different method to speed things up. A lot of the tiles (mainly the background) remains static during the game, they only ever scroll along the screen. You can make a huge dynamic sprite to which you draw all static tiles when loading the map. Then you can simply show the right portion of this background cache. You can even use AGS' built-in background scrolling system for this with the SetViewport functions.

As for the dynamic sprites like traps, you can store them in a kd-tree or a range tree or quad tree or whatever floats your boat. What that basically does is decrease the time to find out if something resides in the viewport or not. But it guess they are not easily implemented given that AGS does not support dynamic structs :(
Life is like an adventure without the pixel hunts.

Ryan Timothy B

Quote from: Wyz on Tue 24/05/2011 15:24:44
You can even use AGS' built-in background scrolling system for this with the SetViewport functions.
Doing this would require that you either import a large background that'll accommodate for any size level, therefor making things much slower than it needs to be for even small levels, or to have several rooms for each sized background.

Making an arcade game, the backgrounds could be at any size. So limiting yourself to a 'fixed' background size in AGS isn't an option in my eyes.


What I think you could do if you really wanted to decrease the 'drawing' time of non animating/moving sprites, you could do them in chunks. Store them in a DynamicSprite in smaller portions like the size of the screen. So the most you see at once is 4 chunks. You wouldn't have to refresh those 'chunks' because you'd simply store them as static background sprites. Simply drawing these chunks to your 'background' with the animations or overlays on top or behind of it.

The reason for the chunks would be simply if the level was very large, you wouldn't have to store a gigantic background image in the RAM, you'd just delete the DynamicSprite as you go. And recreating it whenever you get within a certain distance from that chunk. Pretty much like Minecraft how it loads the chunk into the video ram, and deletes it if you've passed the viewing distance. (Then it stores it to the harddrive, but that's only because the game is constantly sucking up more than a gig of ram).

I personally haven't done this, but I can see it being much faster than drawing the same tiles each loop for no reason. Of course I don't actually know what method you've chosen, nor have I looked at your code.

abstauber

First of all, I have to apologize. Khris was right, there was a simple i=0 hidden inside the spaghetti loop. And instead of being rational I wanted to believe in coding voodoo   ::)

So thanks to you all, the framerate is once again stable at 40 fps and I can finally start looking for real optimizations.
Merging tiles into supertiles is an definitely an option, but then I suppose I'd have to ditch tile animation and destrucion support - or at least I can't imagine how this would work with prerendered chunks.

But it's giant relief that this has nothing to do with array sizes!

Ryan Timothy B

#14
I was mainly just throwing it out there as a possibility to limit the drawing calls. But it definitely adds to the difficulty of it all and possibly even more CPU time if not done right - which, in my head, I'm not quite sure is the best method yet.

abstauber

Quick and dirty would be to increase the tile size, that way the active tilecount drops a lot :)

Ryan Timothy B

I had written this in my post above, but then deleted it to re familiarize myself with it again.

What I wrote was. I'm surprised no one here has suggested just using a 2 dimensional struct array (20 x 69 = 1380) instead of a large array consisting of 1380 indices. I know you could simply have a function return a value to find out what index should be the tile beside you etc, but already grouping it in a standard 2 dimensional array seems the most logical to me. That is of course if the tiles remain static.

Something like this:

Code: ags

managed struct sValues {
  int sprite;
  bool whatever;
};

struct sArrayX {
  sValues *Y[20];
};

sArrayX ArrayX[69];


That way you can simply use:
Code: ags

ArrayX[1].Y[2].sprite = 121;


I've always done it this way. Is it wrong?

Wyz

That would actually be much slower then one long array.
Life is like an adventure without the pixel hunts.

Ryan Timothy B

Ok, I'm curious now.
What's slower about it? Accessing a struct within struct? The Y while statement running on each X index?

I honestly don't know. Never tested for speed differences between the two, and don't really know much about reading/writing variables from/to memory. I know that AGS runs a bunch of different functions internally just to read/write the contents of a variable.

I figured that would be better than calling a function each time to see what index is around him on index 121. On each game loop, you'd constantly have to divide the index you're on by tile X amount to get the Y position, then the tile index you're on get the remainder to find the X position. Etc. Then for each tile you're checking around him, you'd have to do the opposite for those to find out what the neighboring tile is.

Seems like a lot more calculations to me than needed. But as I said, I don't know AGS's speed for reading/writing variables and sructs within structs. Not a clue. Maybe I'll do a speed test one of these days.

abstauber

Quote from: Ryan Timothy on Tue 24/05/2011 19:29:47
Maybe I'll do a speed test one of these days.
+1 :D


@Calin:
Since the rectangle now works, I've removed the addition if-clauses and draw checks inside the tile drawing functions. On the one hand this is good, because the function isn't so cluttered anymore. But on the speed side it's not noticable.

The engine currently runs with about 60 fps, no matter if the clauses are in or not.

SMF spam blocked by CleanTalk