Microsoft 64 bit MASM > MASM64 SDK

Fast string sort app for testing.

(1/2) > >>

hutch--:
The test app is a string sort specifically for sorting lines of text. It will handle either ascending or descending sorts and is the fastest string sort algo I have seen. There may be faster  but I have not seen it. The test is only a binary at the moment as it contains a couple of algos that are not yet in the published library and another complication is that the sort is written in C (not ++) and I don't have a good enough 64 bit disassembler to convert it properly. As an aside, I did a lot of work on the same algo in 32 bit and no amount of optimisation would make it go any faster so it is nearly a perfect algo. The 64 bit version is no faster but will handle much larger sources due to it being built as /LARGEADDRESSAWARE although it may be hard to find a source that big to test on.

The sort app is "ssort.exe" and it is supplied with an old and slow random word generator that I long ago lost the source code for and a batch file that will run both, the word list creator followed by the sort app. The batch file tells the random word generator to create 1,000,000 random words which the sort app sorts and writes to disk. In terms of memory usage this version allocates the same amount of memory as the source as it writes the sorted result back to disk far faster than a line by line disk write. I have the line by line version running but the disk write is a lot slower due to the number of calls it must make.

Fixed version below.  :greensml:

 :P
Licencing conditions are extremely generous, you are welcome to use it, place it in a commercial app which you sell and not have to give credit for it  but if you make more than 1 million USD$ (AFTER TAX) you must send me a carton (12 bottles) of 12 year old Glenfiddich.  :biggrin:

adeyblue:
It's fast, but there's nulls being output after each word. So if you sort a file, then sort the sorted file the other way, it doesn't work.


Top is output of this tool, bottom is the built-in sort.exe

jj2007:

--- Quote from: adeyblue on July 24, 2017, 07:10:00 AM ---It's fast, but there's nulls being output after each word.
--- End quote ---

Indeed. QSort doesn't have that problem, but it is admittedly 30% slower - 650 ms instead of 450 on my Core i5:

include \masm32\MasmBasic\MasmBasic.inc         ; download
  Init
  Recall CL$(), L$()  ; CL$() returns the commandline
  QSort L$()
  Store "Out.txt", L$()
EndOfCode

P.S.: For timing such executables, use the attached ticks.exe; usage (for example): ticks testme args

EDIT: ticks.exe removed - take the version attached below, which displays also the CPU

hutch--:
Thanks guys, I have tracked it down as the disk output version does not have the problem. The stuffup is in the memory write before it is written to disk. The tokeniser and sort seem to fine as the two versions I have only differ in the write to disk.

hutch--:
OK, I am back, stuff up was in the memory writes after the sort was finished. This is the batch file I tested it with.

@echo off

@echo Sort ascending 100,000,000 words
ssort big.txt result.txt /a

@echo -------------------------------
@echo Sort descending 100,000,000 words
ssort result.txt next.txt /d

@echo -------------------------------
@echo Sort ascending 100,000,000 words
ssort next.txt result.txt /a

@echo -------------------------------
@echo Sort descending 100,000,000 words
ssort result.txt big.txt /d

@echo -------------------------------
@echo Sort ascending 100,000,000 words
ssort big.txt big.txt /a

: ----------
: hex editor
: ----------
hxd big.txt

pause

Run 4 times back and forth and there are no embedded zeros. Thanks for finding this, I was a slob and did not exhaustively test the faster output version as it was producing the same file length. The file IO is still the slowest part of the app, the tokeniser and sort algo are hard to time without a massive source file, I have tested it on 10 million words and it will handle bigger with no problems.

LATER : Here is the output with 100 million words.

Sort ascending 100,000,000 words
big.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing result.txt to disk
That's all folks !
-------------------------------
Sort descending 100,000,000 words
result.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing next.txt to disk
That's all folks !
-------------------------------
Sort ascending 100,000,000 words
next.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing result.txt to disk
That's all folks !
-------------------------------
Sort descending 100,000,000 words
result.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing big.txt to disk
That's all folks !
-------------------------------
Sort descending 100,000,000 words
big.txt loaded at 1000041101 bytes
100000000 lines tokenised
100000000 lines sorted
Writing big.txt to disk
That's all folks !
Press any key to continue . . .

Navigation

[0] Message Index

[#] Next page

Go to full version