News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Fast string sort app for testing.

Started by hutch--, July 24, 2017, 05:53:56 AM

Previous topic - Next topic

hutch--

The test app is a string sort specifically for sorting lines of text. It will handle either ascending or descending sorts and is the fastest string sort algo I have seen. There may be faster  but I have not seen it. The test is only a binary at the moment as it contains a couple of algos that are not yet in the published library and another complication is that the sort is written in C (not ++) and I don't have a good enough 64 bit disassembler to convert it properly. As an aside, I did a lot of work on the same algo in 32 bit and no amount of optimisation would make it go any faster so it is nearly a perfect algo. The 64 bit version is no faster but will handle much larger sources due to it being built as /LARGEADDRESSAWARE although it may be hard to find a source that big to test on.

The sort app is "ssort.exe" and it is supplied with an old and slow random word generator that I long ago lost the source code for and a batch file that will run both, the word list creator followed by the sort app. The batch file tells the random word generator to create 1,000,000 random words which the sort app sorts and writes to disk. In terms of memory usage this version allocates the same amount of memory as the source as it writes the sorted result back to disk far faster than a line by line disk write. I have the line by line version running but the disk write is a lot slower due to the number of calls it must make.

Fixed version below.  :greensml:

:P
Licencing conditions are extremely generous, you are welcome to use it, place it in a commercial app which you sell and not have to give credit for it  but if you make more than 1 million USD$ (AFTER TAX) you must send me a carton (12 bottles) of 12 year old Glenfiddich.  :biggrin:

adeyblue

It's fast, but there's nulls being output after each word. So if you sort a file, then sort the sorted file the other way, it doesn't work.


Top is output of this tool, bottom is the built-in sort.exe

jj2007

#2
Quote from: adeyblue on July 24, 2017, 07:10:00 AM
It's fast, but there's nulls being output after each word.

Indeed. QSort doesn't have that problem, but it is admittedly 30% slower - 650 ms instead of 450 on my Core i5:

include \masm32\MasmBasic\MasmBasic.inc         ; download
  Init
  Recall CL$(), L$()  ; CL$() returns the commandline
  QSort L$()
  Store "Out.txt", L$()
EndOfCode


P.S.: For timing such executables, use the attached ticks.exe; usage (for example): ticks testme args

EDIT: ticks.exe removed - take the version attached below, which displays also the CPU

hutch--

Thanks guys, I have tracked it down as the disk output version does not have the problem. The stuffup is in the memory write before it is written to disk. The tokeniser and sort seem to fine as the two versions I have only differ in the write to disk.

hutch--

OK, I am back, stuff up was in the memory writes after the sort was finished. This is the batch file I tested it with.

@echo off

@echo Sort ascending 100,000,000 words
ssort big.txt result.txt /a

@echo -------------------------------
@echo Sort descending 100,000,000 words
ssort result.txt next.txt /d

@echo -------------------------------
@echo Sort ascending 100,000,000 words
ssort next.txt result.txt /a

@echo -------------------------------
@echo Sort descending 100,000,000 words
ssort result.txt big.txt /d

@echo -------------------------------
@echo Sort ascending 100,000,000 words
ssort big.txt big.txt /a

: ----------
: hex editor
: ----------
hxd big.txt

pause

Run 4 times back and forth and there are no embedded zeros. Thanks for finding this, I was a slob and did not exhaustively test the faster output version as it was producing the same file length. The file IO is still the slowest part of the app, the tokeniser and sort algo are hard to time without a massive source file, I have tested it on 10 million words and it will handle bigger with no problems.

LATER : Here is the output with 100 million words.

Sort ascending 100,000,000 words
big.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing result.txt to disk
That's all folks !
-------------------------------
Sort descending 100,000,000 words
result.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing next.txt to disk
That's all folks !
-------------------------------
Sort ascending 100,000,000 words
next.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing result.txt to disk
That's all folks !
-------------------------------
Sort descending 100,000,000 words
result.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing big.txt to disk
That's all folks !
-------------------------------
Sort descending 100,000,000 words
big.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing big.txt to disk
That's all folks !
Press any key to continue . . .

jj2007

#5
You can time it with the attached Ticks.exe:
Ticks ssort words.txt result.txt /a
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
words.txt loaded at 10003562 bytes
1000000 lines tokenised
1000000 lines sorted
Writing result.txt to disk
That's all folks !
283 ms


Note this is NanoTimer() precision, i.e. QPC, not the imprecise GetTickCount.

It's time to post the source, Hutch :icon_mrgreen:


hutch--

Tim,

>    Multikey quicksort, a radix sort algorithm for arrays of character strings by Bentley and Sedgewick.

It is the Sedgewick / Bentley hybrid sourced long ago from Dr Dobbs Journal. In 32 bit I have it as assembler but I don't have a good enough disassembler for the 64 bit C file so I just used the object module from the CL compiled version. It is a Radix, Quick, Insertion sort hybrid. I spent some time trying to optimise the 32 bit version but could never get it faster than the C compiled version so it is a near perfect algorithm by its design rather than good coding.

I just did a quick hack on a copy to get the timings but with such a short duration, the timings are highly unreliable, even on 1000000 lines. The timings are for the tokenising and sorting.

rand.txt loaded at 10003562 bytes
31 ms tokenised
1000000 lines tokenised
31 ms sorted
That's all folks !
Press any key to continue . . .


This is the result with a 10 times line count

random.txt loaded at 99997972 bytes
282 ms tokenised
10000000 lines tokenised
282 ms sorted
That's all folks !
Press any key to continue . . .