Sorting a "large" array issue

Started by papste, Tue 23/06/2009 09:04:25

Previous topic - Next topic

papste

Long explanation of the problem at hand included so please bare with me until the end:)

Due to the text parser limitation of allowing only 1500 entries i have created my own module including a text parser that reads vocabulary entries from a text file and fills an array. The reading is done at game start at a special room that i call "loading screen". The whole process used to take not more than 1 second to read around 2500 entries from the file and fill my array which worked fine for me. I would like to keep this module even if the parser limitation was raised, because its much easier for me to edit the text file and insert/remove words from it, rather than have to enter them in the parser one by one.

However during the game when the user inputs some text, my module splits the sentence into words and compares each word individually with the 2500 entries in the array deciding whether the word exists in the vocabulary whether it is a noun, adjective, verb, etc, and whether it is an ignored word or not. This process slows down the game as the user has to wait 2-5 seconds (depending on the machine's speed) after the text is entered for the calculations to take place. Also the background music which is playing tends to "tatter" during the calculation time.

So i came up with an idea to sort the array alphabetically in order to make it easier to find words in the array and not have to go all over the 2500 vocabulary entries multiple times for every word that the user enters. I have thus created a function that uses the bubble sort method and i have assigned the function to run at the loading screen, just after the array is read from the text file.

My problem is that the sorting procedure takes around 2-3 minutes and the user is forced to wait at the loading screen, which i find too much for a text adventure and is very annoying. The computer that i use is a fast one and i suspect the problem will be far too obvious on slower machines.

Following is the code that i used for the sorting. I had to use the noloopcheck command in order to achieve this. If anyone can help achieve faster sorting times i would be grateful for the help. The custom parser module that i created has solved a lot of problems for me. I would hate to have to throw it away just because the sorting takes too much time. My other option was to pre-sort the vocabulary file in Excel, but i would like to avoid this.

Code: ags

textParser.ash
------------------

#define MAX_WORDS_IN_VOCAB 5000

struct vocab {
  int groupId;
  String word;
  String attributes;
};

import vocab vocabEntry[MAX_WORDS_IN_VOCAB];

textParser.asc
-------------------

vocab vocabEntry[MAX_WORDS_IN_VOCAB];
export vocabEntry;

static function noloopcheck text_parser::sort_vocab() {
  bool swapped = true;
  vocab tempEntry;
  int loopCount = 0;
  while (swapped) {
    int i;
    swapped = false;
    while ((i <= MAX_WORDS_IN_VOCAB - 2) && (!String.IsNullOrEmpty(vocabEntry[i + 1].word))) {
      if (vocabEntry[i].word.CompareTo(vocabEntry[i + 1].word) > 0) {
        tempEntry.groupId = vocabEntry[i].groupId;
        tempEntry.word = vocabEntry[i].word;
        tempEntry.attributes = vocabEntry[i].attributes;
        vocabEntry[i].groupId = vocabEntry[i + 1].groupId;
        vocabEntry[i].word = vocabEntry[i + 1].word;
        vocabEntry[i].attributes = vocabEntry[i + 1].attributes;
        vocabEntry[i + 1].groupId = tempEntry.groupId;
        vocabEntry[i + 1].word = tempEntry.word;
        vocabEntry[i + 1].attributes = tempEntry.attributes;
        swapped = true;
      }
      i++;
    }
  }
// End of function.
}


I also figured i should include a part describing the text file structure in order to make it easier to understand how the array is set. So this is a small sample of the file containing only a few lines. The line is read and split into 3 parts (groupId, word, and attributes) and is passed into the vocab struct array.

vocab.txt
------------

1 the article
2 a article
3 an article
4 el article
5 la article
6 los article
50 and conjuction
51 or conjuction
52 nor conjuction
54 if conjuction
55 but conjuction
100 it pronoun
101 this pronoun
102 these pronoun
103 that pronoun
104 those pronoun
105 them pronoun
...

Gilbert

#1
I'm lazy, so I didn't really read your codes, but I think the slowness was possibly caused by assigning the String elements in the array entries.
An idea, would be, instead of sorting that large array with String elements, make another reference index array of only integers, something like:

int vocabref[MAX_WORDS_IN_VOCAB];

which holds only the indices of the original array after sorting, i.e. instead of swapping the String elements in the original array you keep it intact, while the sorted indices are held in vocabref[], so the entries of:
vocabEntry[vocabref[0]]
vocabEntry[vocabref[1]]
vocabEntry[vocabref[2]]
...
are sorted in order.

Of course, it would be unfriendly and hard to read to type all your codes like this. Maybe after creating vocabref[] you may try to rebuild the original array based on these sorted indices. As in this way each entry needs to be reassigned once only, it might be a bit faster.


However, my question is:
Why wouldn't you store the words in alphabetical order in the data file in the first place?

You can even use this function you made to sort the entries and reoutput a new data file, so that the end-user wouldn't need to make the wait.

abstauber

As far as I remember, BubbleSort is the slowest of all sorting algorithms (and the easiest to understand/implement)

Maybe you'll find something more performant here.
http://en.wikipedia.org/wiki/Sorting_algorithm


Though I second the question of sort the text file directly  ;)

papste

The reason why i chose the bubble sort wasn't the simplicity of the code but mainly because its requires the least memory of all the sorting algorithms and also it preserves the relative order of records with equal keys.
But mainly it was a memory vs time issue. I will try more memory intensive algorithms tho, just to see how the game handles...

SMF spam blocked by CleanTalk