News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

String array sort with fat files

Started by jj2007, March 26, 2015, 07:48:47 PM

Previous topic - Next topic

jj2007

Inspired by Alex' and Nidud's friendly exchange of views, I wrote a testbed that downloads bible.txt from the Canterbury Large Corpus, multiplies it 50x and sorts the resulting file:

            ; if the file is not in the current folder, get it online
            void FileRead$("http://www.data-compression.info/Corpora/LargeCanterburyCorpus.zip")

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Masm32 assort:
1519100 lines sorted in 6.4 seconds
1519100 lines sorted in 6.4 seconds
1519100 lines sorted in 6.4 seconds
1519100 lines sorted in 6.4 seconds
1519100 lines sorted in 6.3 seconds
t0=[(According as it is written, God hath given them the spirit ...]
tn=[Ziph, and Telem, and Bealoth, ...]
MasmBasic QSort:
1519150 lines sorted in 1.3 seconds
1519150 lines sorted in 1.3 seconds
1519150 lines sorted in 1.3 seconds
1519150 lines sorted in 1.3 seconds
1519150 lines sorted in 1.3 seconds
t0=[(According as it is written, God hath given them the spirit ...]
tn=[Ziph, and Telem, and Bealoth, ...]
CRT QSort:
1519150 lines sorted in 1.7 seconds
1519150 lines sorted in 1.7 seconds
1519150 lines sorted in 1.7 seconds
1519150 lines sorted in 1.7 seconds
1519150 lines sorted in 1.7 seconds
t0=[(According as it is written, God hath given them the spirit ...]
tn=[Ziph, and Telem, and Bealoth, ...]


Source & exe attached. Timings welcome, even more welcome theories why the algos are equally fast for smaller files (crt1.exe) but diverge so much for the large ones. Note that MasmBasic QSort is a quick sort but not a quicksort - it's actually a (stable) merge sort.

One more observation: For a perfectly random array, timings for MB and CRT are closer.
      Recall esi, t$()      ; load a text array from file
      Scramble t$()
      NanoTimer()

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
MasmBasic QSort:
1519150 lines sorted in 1.64 seconds
1519150 lines sorted in 1.64 seconds
1519150 lines sorted in 1.64 seconds

CRT QSort:
1519150 lines sorted in 1.76 seconds
1519150 lines sorted in 1.75 seconds
1519150 lines sorted in 1.77 seconds


EDIT: attachment removed, see reply #6

sinsi


Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
BibleSortCrt1
Masm32 assort:   1519100 lines sorted in 4.3 seconds
MasmBasic QSort: 1519150 lines sorted in 0.85 seconds
CRT QSort:       1519150 lines sorted in 1.1 seconds

BibleSortCrt50
Masm32 assort:    11473858 lines sorted in 53. seconds
MasmBasic QSort:  Fatal error: HeapAlloc


jj2007

Thanks, sinsi. 11473858 lines looks very odd, 7.55 times the expected value. What are your sizes of bible.txt and Bible50.txt?
Btw it seems that HeapAlloc likes failing for allocations above about one GB - stuff for another thread.

sinsi

   202,369,600 bible.txt
10,118,480,000 Bible50.txt

sinsi

Second try...

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)

BibleSortCrt1
Masm32 assort:      30382 lines sorted in 0.0067 seconds
MasmBasic QSort:    30383 lines sorted in 0.0068 seconds
CRT QSort:          30383 lines sorted in 0.011 seconds

BibleSortCrt50
Masm32 assort:    1519100 lines sorted in 4.3 seconds
MasmBasic QSort:  1519150 lines sorted in 0.84 seconds
CRT QSort:        1519150 lines sorted in 1.0 seconds

    4,047,392 bible.txt
202,369,600 Bible50.txt


nidud

#5
deleted

jj2007

#6
Thanks, sinsi & nidud. There seems to be a problem with my logic :bgrin:
What kind of crash did you get, nidud? HeapAlloc? I've searched around a bit, and it seems more people have problems to allocate big chunks, see http://stackoverflow.com/questions/10365653/why-does-dynamic-memory-allocation-fail-after-600mb for an example.

EDIT: Version 2 attached - source & single exe, the logic is now OK.

dedndave

i told ya god was gonna get ya   :biggrin:


...and his wrath was great
and he did smote them
and their bodies were strewn along the side of Heap Mountain

jj2007

Quote from: dedndave on March 27, 2015, 01:38:29 AM
i told ya god was gonna get ya   :biggrin:

You are confused, Dave - it's just the heap manager :P

OK, here is one for the poor owners of a P4 :biggrin:

If I start with the crt version, MB chokes here, too. Perhaps it's simply heap fragmentation, but the phenomenon starts intriguing me. Having 4GB memory (of which 2 or 3 should be available in Win32) and getting problems at 200 MB is strange.

Interesting read: Overcoming Windows Memory Allocation Limitations

Antariy

Jochen, specify /LARGEADDRESSAWARE switch to the linker's command line when linking, this for some extent may "solve" the memory allocation problem, especially on 64 bit Windows versions (you'll be probably able to allocate in one chunk ~1 GB on 32 bit OS and ~2 GB on 64 bit OS with this switch specified). I did not tested the prog yet.

Antariy

Jochen, when you do a Recall every time - does it with each call free the previously allocated buffer? That may be the reason if the code does not free the buffer previously allocated, so after some runs the memory gets full (leaks of losted previous arrays of data).

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (MMX, SSE, SSE2, SSE3)

Masm32 assort for Bible.txt:
30382 lines sorted in 0.0251 seconds
30382 lines sorted in 0.0257 seconds
30382 lines sorted in 0.0228 seconds
30382 lines sorted in 0.0233 seconds
30382 lines sorted in 0.0234 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t30381=[Ziph, and Telem, and Bealoth, ...]

MasmBasic QSort for Bible.txt:
30383 lines sorted in 0.0273 seconds
30383 lines sorted in 0.0271 seconds
30383 lines sorted in 0.0273 seconds
30383 lines sorted in 0.0273 seconds
30383 lines sorted in 0.0281 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t30381=[Ziph, and Telem, and Bealoth, ...]

CRT QSort for Bible.txt:
30383 lines sorted in 0.0338 seconds
30383 lines sorted in 0.0339 seconds
30383 lines sorted in 0.0408 seconds
30383 lines sorted in 0.0353 seconds
30383 lines sorted in 0.0348 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t30382=[Ziph, and Telem, and Bealoth, ...]

Masm32 assort for Bible20.txt:
607640 lines sorted in 4.30 seconds
607640 lines sorted in 4.30 seconds
607640 lines sorted in 4.28 seconds
607640 lines sorted in 4.30 seconds
607640 lines sorted in 4.30 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t607639=[Ziph, and Telem, and Bealoth, ...]

MasmBasic QSort for Bible20.txt:
607660 lines sorted in 1.22 seconds
607660 lines sorted in 1.21 seconds
607660 lines sorted in 1.22 seconds
607660 lines sorted in 1.21 seconds
607660 lines sorted in 1.21 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t607639=[Ziph, and Telem, and Bealoth, ...]

CRT QSort for Bible20.txt:
607660 lines sorted in 1.33 seconds
607660 lines sorted in 1.33 seconds
607660 lines sorted in 1.33 seconds
607660 lines sorted in 1.34 seconds
607660 lines sorted in 1.35 seconds
t0=[(According as it is written, God hath given them the spirit ...]
t607659=[Ziph, and Telem, and Bealoth, ...]

dedndave

QuoteIf I start with the crt version, MB chokes here, too.

if you want to see something weird
put the processor identification at the end instead of the beginning
compare times between the 2 programs   :redface:

jj2007

Quote from: Antariy on March 27, 2015, 05:16:34 AM
Jochen, when you do a Recall every time - does it with each call free the previously allocated buffer? That may be the reason if the code does not free the buffer previously allocated, so after some runs the memory gets full (leaks of losted previous arrays of data).

It does free the buffer. There is no leak.

Quote from: dedndave on March 27, 2015, 05:26:05 AMif you want to see something weird
put the processor identification at the end instead of the beginning
compare times between the 2 programs   :redface:

Can't see any difference here with 50x version. Timings and strings identical. This is weird indeed.

What I found in the meantime is that I have a flaw in the design of Recall: It allocates too generously memory for the string pointers, assuming that the worst case is a file with CrLfs only. This buffer gets shrunk (reallocated) at the end, but on entry it can be a problem. In fact, VMMap says that up to 1.05 GB are committed, which is far too much for a 200MB file. I will have to change that.

Anyway, the last version posted works fine on my Win7-64 system, and the timings are interesting, but obviously I have to do some homework :icon_cool:

Thanxalot to both of you :icon14:

Antariy

Quote from: jj2007 on March 27, 2015, 06:27:50 AM
Quote from: Antariy on March 27, 2015, 05:16:34 AM
Jochen, when you do a Recall every time - does it with each call free the previously allocated buffer? That may be the reason if the code does not free the buffer previously allocated, so after some runs the memory gets full (leaks of losted previous arrays of data).

It does free the buffer. There is no leak.

What I found in the meantime is that I have a flaw in the design of Recall: It allocates too generously memory for the string pointers, assuming that the worst case is a file with CrLfs only. This buffer gets shrunk (reallocated) at the end, but on entry it can be a problem. In fact, VMMap says that up to 1.05 GB are committed, which is far too much for a 200MB file. I will have to change that.

Anyway, the last version posted works fine on my Win7-64 system, and the timings are interesting, but obviously I have to do some homework :icon_cool:

Thanxalot to both of you :icon14:

:t Did you try to use LARGEADDRESSAWARE switch? Remember than on x64 system 32 bit program has much more memory than on 32 bit system, so what is working on x64 may be out of memory on 32 bit OS.

As for the explanation why it performs better than MASM32 lib algo - well, for that we should know how your algo is implemented :biggrin: But actually I have not experiencies with sorting algos at all, I even didn't look to the MASM32's algo :biggrin: So without info about implementation from your side it's hard to think why it's so.

Large files test was useful and in this extent - improve design of the algo which was not related to the speed but to data handling :t