Working with a LOT of text files

Started by JpSoft, Sat 30/07/2011 22:41:39

Previous topic - Next topic

JpSoft

Ok guys, here are a lot of genius that could probably see the solution to my problem  ???

I need work in my game with a lot of text files. All this files have the same extension ".txt", and its names have the same leght; for example:

tregor.txt
regore.txt
benedy.txt
.
.
.

Also, all the files have a identical format in 5 lines:

integer A
integer B
integer C
integer D
integer E

I know how get and process the information (opening, reading and closing the files), but i cant imagine how work with a LOT (hundreds, maybe thousand) of files whichs names are NOT previously knows. I minds, i need and script that can read the data from any file ".txt" contained on  "\directory"  to create something like a database.

Could this be done in AGS? I really need this, and even when i can do some very nice things scripting, i cant imagine how i could handle with this.

Khris

The File.Open command uses a string for the filename.
All you have to do is generate the filename and append ".txt".

Code: ags
  fn = Filename(...);
  File.Open(fn.Append(".txt"), eFileRead);

monkey0506

The ListBox.FillDirList function might also be useful to you as it supports DOS/Windows-style wildcards. ListBoxes do have a maximumof 200 items, so that would certainly be relevant, but you could, for example, check a wildcard such as:

Code: ags
lstFilter.FillDirList("*regor*.txt");


That would match the files "tregor.txt" and "regore.txt" from your example. Then you could use the results stored in ListBox.Items to 1) check for files that exist and 2) help find the file you need.

JpSoft

Ok, but i dont know how i could get the filenames from the directory, since it are not previously known.

ListBox.FillDirList was my first idea, but the 200 items limitation its a problem; actually, there are around 3K txt files (randomly generated) to process  :-\ and its names changes continuosly

Anyway to increase this limit? (200 items per listbox) Any work around?



Calin Leafshade

since wildcards are supported you could do it like this:

get all files
if listbox.length == 200 then get all the files starting with the same letter as the last entry.
if listbox.length == 200 then get all the files starting with the same 2 letters as the last entry.
etc etc
if listbox.length < 200 then get all the files starting with the *next* letter or 2 letters or n letters.

Sorry that was badly explained.. but something like that.

monkey0506

#5
I get the first part of what Calin is saying...

Code: ags
#define TEXT_FILE_COUNT 3000
String TextFileNames[];

void ReadTextFileNames()
{
  if (TextFileNames == null) TextFileNames = new String[TEXT_FILE_COUNT];
  lstFilter.Clear();
  String filter = "*.txt";
  lstFilter.FillDirList(filter);
  int i = 1;
  while (lstFilter.ItemCount == 200)
  {
    String s = lstFilter.Items[199].Substring(0, i);
    i++;
    filter = s.Append("*.txt");
    lstFilter.FillDirList(filter);
  }
  int j = 0;
  i = 0;
  while (i < lstFilter.ItemCount)
  {
    TextFileNames[j] = lstFilter.Items[i];
    i++;
    j++;
  }
  // ???
}


That seems pretty good, but how do you get the rest of the list after you've truncated it to less than 200 members? If we could use regex then we could specify the first N characters are not the same as the first N characters of the last item in the list...but I don't see how we could do that with a regular filepath filter...short of iterating the rest of the possible values. Looping through 309 million possibilities (266, 19 billion if it's case-sensitive, 526) to find a maximum of 3000 entries actually seems rather daunting. And by daunting I mean basically impossible. I think Calin might have overlooked the fact that we are using the FillDirList function to get the full file list.




Now then, what I'd propose you to do Jp is this. You said that the files (or at least the file names) are randomly generated..is there a technical reason you couldn't split the files into subdirectories? With a maximum of 200 files per subdirectory and 200 subdirectories in total, that would grant you up to 40,000 files. What I'm getting at here is that FillDirList works with directories, so you could break the total file list into smaller chunks which FillDirList could then be used to read. So then, you could have something like this:

Code: ags
#define TEXT_FILE_COUNT 3000
String TextFileNames[];

void ReadTextFileNames()
{
  if (TextFileNames == null) TextFileNames = new String[TEXT_FILE_COUNT];
  lstFilter.Clear();
  String filter = "$SAVEGAMEDIR$/data/";
  lstFilter.FillDirList(filter.Append("*"));
  int dir = 0;
  int dirCount = lstFilter.ItemCount - 2;
  String subdirs[] = new String[dirCount];
  while (dir < (dirCount + 2))
  {
    // we need to save the subdir list (temporarily) so we don't need two ListBoxes
    if ((lstFilter.Items[dir] != ".") && (lstFilter.Items[dir] != "..")) subdirs[dir - 2] = lstFilter.Items[dir];
    dir++;
  }
  dir = 0;
  int file = 0;
  while (dir < dirCount)
  {
    lstFilter.Clear();
    lstFilter.FillDirList(filter.Append(subdirs[dir].Append("/*.txt")));
    int f = 0;
    while (f < lstFilter.ItemCount)
    {
      TextFileNames[file] = filter.Append(subdirs[dir].AppendChar('/'));
      TextFileNames[file] = TextFileNames[file].Append(lstFilter.Items[f]);
      f++;
      file++;
    }
    dir++;
  }
}


Then from there you would have the full file list of all of your data files in the array, ready for use with the File functions. Really I don't see a better way of doing this, but I imagine simply adding subdirectories to the equation won't be too vastly difficult.

I've tested this, and come across a few interesting points (such as the fact that FillDirList will in fact list the . and .. "directories", which I didn't originally account for). A couple things that you should take note of if you decide to use this code is 1) it won't work if you use the "Test Game" button..apparently regardless of the actual value, testing the game always expands $SAVEGAMEDIR$ to the project directory?? That seems like a bug to me, but it caused me quite a bit of trouble figuring out why the tag wasn't working properly, and I certainly didn't have "agsfnt0.wfn" in the Compiled directory :P, and 2) You must explicitly set the save directory or $SAVEGAMEDIR$ still doesn't expand properly. I'm not sure what it expands to in that case, but my tests invariably came up with no results looking for the subdirectories unless I had explicitly set the save directory first (even just setting it to "." worked fine, and is, unless I'm mistaken, the same as the default behaviour).

So, in any case, I hope this helps you. :)

P.S. If you do want to use this, the two things you'd need to update are 1) the value of TEXT_FILE_COUNT, and 2) the path being pointed to by the "filter" variable at the top of the function. The names of the subdirectories are actually irrelevant since we use the FillDirList function to determine those anyway. ;)

JpSoft

Thanks guys for y our ideas  ;D

Split the files in 20 subdirectories looks a great sugerence; there are no problem with that (actually, 20 subdirectories are enough for 4000 files).

But it gives me a even better idea; just adding a number from 00 to 99 at the end of any file name when it is created (with a counter to avoid that more than 200 files have the same number); then, i could use it to search partially the directory 100 times and get all the files  :) Since this is only necessary when the game loads, there are no problem if it takes a while to complete.

Was the idea to split the work what gives me the workround. I will try it right now and post here the resulting code in it works (if not, to looks for more help  ::) )

;D ;D ;D ;D

JpSoft

#7
Ok, first problem  :-[

Can i use a variable with listbox..FillDirList() ?

I created the entire bucle, but now i found this issue. Any idea?

SOLVED!!


RickJ

JP,

I don't understand your last question; perhaps you could clarify.   You can use the listbox keyword to declare a pointer to a listbox if that's what you want.

How/why are you creating 4000 files?   Did you write or have the ability to modify the program that is doing this?   If so perhaps you should consider putting it all in one file?


JpSoft

RickJ ;

Actually, i can change the way how the program assign names to the files, but the program creates a new file .txt every time it is used and it cant be changed  :( It is hard to detail, but managing with many files is the only way.

My last post was relative to the variable inside listbox.FillDirList(variable), since i forgot that i can create this:

String Variable = String.Format("%s%s%s", parameter1, parameter2, parameter3);

And then just include it inside the brackets: listbox.FillDirList(variable); just adjusting the parameters for variable i can fill the listbox with the desired files (obvious, buy i had a lapsus and get panicked)

I already solved the problem just adding a number at the end of the files, using a counter to avoid more than 200 files with the same number. Since now i can manage with 20,000 files and the game will just need around 3,000 i can continue  :=

I also want say that the game works incredibly fast in the tests (faster that i expected) It open, reads and close the 3,000 files in less than 1 second  :o even when my pc is not really faster.

SMF spam blocked by CleanTalk