Author Topic: String array sort with fat files  (Read 9757 times)

jj2007

  • Member
  • *****
  • Posts: 10537
  • Assembler is fun ;-)
    • MasmBasic
Re: String array sort with fat files
« Reply #15 on: March 27, 2015, 06:44:50 AM »
:t Did you try to use LARGEADDRESSAWARE switch?

Thanks, but this would just hide the problem. Of course, if I had a concrete application that needs more memory, I could use that, but it's not a solution. I must work on the design of the algo. Again, for files under 200 MB there is no problem, it works fine.

Antariy

  • Member
  • ****
  • Posts: 551
Re: String array sort with fat files
« Reply #16 on: March 27, 2015, 06:48:15 AM »
:t Did you try to use LARGEADDRESSAWARE switch?

Thanks, but this would just hide the problem. Of course, if I had a concrete application that needs more memory, I could use that, but it's not a solution. I must work on the design of the algo. Again, for files under 200 MB there is no problem, it works fine.

Well, I did not say it's solution, just mentioned that if it may be useful for you. You currently read entire file into memory before you starting the scan of it and separating the lines to pointers? Do you do inplace separation, inserting zero in the CR/LF place?

jj2007

  • Member
  • *****
  • Posts: 10537
  • Assembler is fun ;-)
    • MasmBasic
Re: String array sort with fat files
« Reply #17 on: March 27, 2015, 07:36:52 AM »
Do you do inplace separation, inserting zero in the CR/LF place?

Yes, exactly. Masm32 ltok and Hutch' new tokeniser use the same technique afaik. It's extremely efficient for read-only arrays (frequent case). It is a bit less efficient when you start heap-allocating new strings, but even in the most extreme case, only the memory of the original file size hangs around. No harm done if it doesn't reach the gigabyte range.

The alternative would be to copy each string to a heap-allocated new string, and delete the original array afterwards. That is awfully slow, and given that each heapstring neads a header and has trailing unused bytes, it is hardly more memory-efficient.

jj2007

  • Member
  • *****
  • Posts: 10537
  • Assembler is fun ;-)
    • MasmBasic
Re: String array sort with fat files
« Reply #18 on: March 28, 2015, 11:56:41 AM »
New version with strongly improved memory management. Now six Million 'bible' lines can be sorted without problems. There is a version with 10 Mio lines, but it choked sometimes - see bx$ in the source.

Code: [Select]
Getting text files
Reading Bible200.txt
Reading 6076600 lines took 0.658 seconds
Reading 6076600 lines took 0.640 seconds
***
CRT QSort for Bible200.txt:
6076600 lines sorted in 7.45 seconds
6076600 lines sorted in 7.44 seconds
6076600 lines sorted in 7.44 seconds
Writing 6076600 lines took 5.52 seconds

t0=[(According as it is written, God hath
t6076599=[Ziph, and Telem, and Bealoth, .

MasmBasic QSort for Bible200.txt:
6076600 lines sorted in 6.21 seconds
6076600 lines sorted in 6.37 seconds
6076600 lines sorted in 6.34 seconds
Writing 6076600 lines took 5.34 seconds

t0=[(According as it is written, God hath
t6076399=[Ziph, and Telem, and Bealoth, .
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: String array sort with fat files
« Reply #19 on: March 28, 2015, 12:47:27 PM »
 :redface:

rrr314159

  • Member
  • *****
  • Posts: 1382
Re: String array sort with fat files
« Reply #20 on: March 28, 2015, 01:57:50 PM »
I move that old computers be banned ... anyone second the motion?

Perhaps we could pass the hat to retire these aging warhorses, save a lot of trouble chasing down anachronistic bugs
Get dedndave a nice new Windows 8.1 machine, maybe even Windows 10 :biggrin:

[edit] just kidding of course - actually, as we all know, "Older is Better"
I am NaN ;)

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: String array sort with fat files
« Reply #21 on: March 28, 2015, 02:10:10 PM »
i like my P4 prescott   :biggrin:
XP media center edition
4 GB ram (3 usable, i guess)
had it 10 years - it's been a good machine
although, i may pick up a used win7-64 laptop for business use
win8, win10 - no thank you

sinsi

  • Guest
Re: String array sort with fat files
« Reply #22 on: March 28, 2015, 02:40:46 PM »
Crash again, with the i7-4790 and Win8.1Pro
Code: [Select]
Windows 7 Ultimate 64-bit
Getting text files
Downloading Canterbury Corpus
4047392 21.03.1997  15:31:58    bible.txt
Unzipping to C:\Users\tester\Desktop\Sort6MioLines\Bible.txt
4638690 26.02.1997  11:12:00    E.coli
2473400 18.03.1997  15:13:34    world192.txt
Building 50*Bible.txt as Bible50.txt
Building 200*Bible.txt as Bible200.txt
Reading Bible200.txt
Reading 6076600 lines took 0.963 seconds
Reading 6076600 lines took 1.28 seconds
Reading 6076600 lines took 1.11 seconds
***
CRT QSort for Bible200.txt:
6076600 lines sorted in 9.80 seconds
6076600 lines sorted in 9.81 seconds
6076600 lines sorted in 9.72 seconds
Writing 6076600 lines took 2.79 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t6076599=[Ziph, and Telem, and Bealoth, ...]

MasmBasic QSort for Bible200.txt:
6076600 lines sorted in 9.07 seconds
6076600 lines sorted in 9.00 seconds
6076600 lines sorted in 8.93 seconds
Writing 6076600 lines took 1.85 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t6076399=[Ziph, and Telem, and Bealoth, ...]
Last error: The operation completed successfully.
AMD A10-7850K APU with Radeon(TM) R7 Graphics   (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
--- hit any key ---

Windows 10 Preview 64-bit
Getting text files
Downloading Canterbury Corpus
4047392 21.03.1997  15:31:58    bible.txt
Unzipping to C:\Users\user\Desktop\Sort6MioLines\Bible.txt
4638690 26.02.1997  11:12:00    E.coli
2473400 18.03.1997  15:13:34    world192.txt
Building 50*Bible.txt as Bible50.txt
Building 200*Bible.txt as Bible200.txt
Reading Bible200.txt
Reading 6076600 lines took 1.37 seconds
Reading 6076600 lines took 1.30 seconds
Reading 6076600 lines took 1.30 seconds
***
CRT QSort for Bible200.txt:
6076600 lines sorted in 10.2 seconds
6076600 lines sorted in 10.2 seconds
6076600 lines sorted in 9.87 seconds
Writing 6076600 lines took 23.3 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t6076599=[Ziph, and Telem, and Bealoth, ...]

MasmBasic QSort for Bible200.txt:
6076600 lines sorted in 9.12 seconds
6076600 lines sorted in 9.03 seconds
6076600 lines sorted in 8.77 seconds
Writing 6076600 lines took 9.23 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t6076399=[Ziph, and Telem, and Bealoth, ...]
Last error: The operation completed successfully.
Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1)
--- hit any key ---

Windows 8.1 Professional 64-bit
Getting text files
Downloading Canterbury Corpus
4047392 21.03.1997  15:31:58    bible.txt
Unzipping to C:\Users\sinsi\Desktop\Sort6MioLines\Bible.txt
4638690 26.02.1997  11:12:00    E.coli
2473400 18.03.1997  15:13:34    world192.txt
Building 50*Bible.txt as Bible50.txt
Building 200*Bible.txt as Bible200.txt
Reading Bible200.txt
Reading 6076600 lines took 0.572 seconds
Reading 6076600 lines took 0.545 seconds
Reading 6076600 lines took 0.653 seconds
*
Fatal error: HeapAlloc

jj2007

  • Member
  • *****
  • Posts: 10537
  • Assembler is fun ;-)
    • MasmBasic
Re: String array sort with fat files
« Reply #23 on: March 28, 2015, 07:21:41 PM »
Thanxalot, Sinsi & Dave - especially for the Windows 10 preview test :bgrin:

Re HeapAlloc error: it's 1.6GB in the peak, quite a lot. But it works much better than the previous version, which chokes earlier.

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: String array sort with fat files
« Reply #24 on: March 28, 2015, 11:12:06 PM »

Getting text files
Downloading Canterbury Corpus
4047392 21.03.1997  15:31:58    bible.txt
Unzipping to K:\test\SortFatFile\Bible.txt
4638690 26.02.1997  11:12:00    E.coli
2473400 18.03.1997  15:13:34    world192.txt
Building 50*Bible.txt as Bible50.txt
Building 200*Bible.txt as Bible200.txt
Reading Bible200.txt
Reading 6076600 lines took 1.06 seconds
Reading 6076600 lines took 1.19 seconds
Reading 6076600 lines took 1.18 seconds
Fatal Error: HeapAlloc


jj2007

  • Member
  • *****
  • Posts: 10537
  • Assembler is fun ;-)
    • MasmBasic
Re: String array sort with fat files
« Reply #25 on: March 29, 2015, 12:22:37 AM »
Thanks. Here is WinXP, Celeron M (failing with HeapAlloc):
Getting text files
Downloading Canterbury Corpus
4047392 21.03.1997  15:31:58    bible.txt
Unzipping to C:\DOCUME~1\USER\DOCUME~1\DOWNLO~1\Bible.txt
4638690 26.02.1997  11:12:00    E.coli
2473400 18.03.1997  15:13:34    world192.txt
Building 50*Bible.txt as Bible50.txt
Building 200*Bible.txt as Bible200.txt
Reading Bible200.txt
Reading 6076600 lines took 37.9 seconds
Reading 6076600 lines took 32.8 seconds
Reading 6076600 lines took 32.5 seconds