The MASM Forum

General => The Laboratory => Topic started by: jj2007 on September 29, 2014, 07:48:21 AM

Title: Create a file with RANDOM dwords
Post by: jj2007 on September 29, 2014, 07:48:21 AM
For use with all kinds of sorting algos - executable and source attached.

include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
  Init
  mov ecx, Val(Input$("Rand(): #elements to create: ", "1000000"))
  .if signed ecx>0
      shl ecx, 2      ; get DWORDs
      NanoTimer()
      Let edi=New$(ecx)
      push edi      ; we need it later
      push ecx
      .Repeat
            void Rand(-1)
            stosd
            sub ecx, DWORD
      .Until Sign?
      pop ecx
      pop edi
      Let esi=Str$("Creating %i random DWORDs took ", ecx/DWORD)+Str$("%i ms\n", NanoTimer(ms))
      Open "O", #1, "RandomDwords.dat"
      PrintBuffer #1, edi, ecx
      Close #1
      Inkey esi, CrLf$, 'To read back the data into a buffer,', CrLf$, 'MasmBasic fans may use', CrLf$, 'Let esi=FileRead$("RandomDwords.dat")', CrLf$, 'otherwise check Masm32 InputFile'
  .endif
  Exit
end start
Title: Re: Create a file with RANDOM dwords
Post by: guga on September 29, 2014, 08:32:57 AM
Thank you. Here is the result
Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on September 29, 2014, 09:04:15 AM
Looks OK:
0       3080
1       9158
2       13464
3       15479
4       17700
5       22867
6       26355
7       29191
8       32229
9       42286
999991  4294921291
999992  4294921704
999993  4294929912
999994  4294934235
999995  4294945952
999996  4294946053
999997  4294951189
999998  4294958802
999999  4294960543



include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
  Init
  Dim MyDw() As DWORD
  ArrayRead (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1179) MyDw(), "SortedDwords.dat"
  push eax      ; #elements
  xor ecx, ecx
  .Repeat
      mov eax, stack
      sub eax, 10
      .if ecx<10 || ecx>eax
            PrintLine Str$(ecx), Str$("\t%u", MyDw(ecx))
      .endif
      inc ecx
  .Until ecx>=stack
  pop eax
  Inkey "ok?"
  Exit
end start


And the timings?
Title: Re: Create a file with RANDOM dwords
Post by: guga on September 29, 2014, 10:00:57 AM
Tks, i´m uploading now the one from nrQsortA to compare. As i explained the resultatn data does not seems to be accurated. In fact, it is unsorted when feeding it with 1 million of random values. This explains why it took only a couple of secs to sort.

It seems like nrQsortA needs a review, because the sort it is being breaked when feeded with 1 million random values. Also, if i try to feed it with the already sorted data, nrQsortA ruins the result and produce unsorted values. There is a problem in the middle of the nrQsortA function causing it to stop and producing a wrong result.

Masm Result when feeded with RandomDwords.dat (Also, the same result is retrieved when you feede it with SortedDwords.dat):
Quote0   2147484709
1   2147490471
2   2147491291
3   2147491787
4   2147501199
5   2147513906
6   2147518668
7   2147519738
8   2147520028
9   2147521313
999991   2147439241
999992   2147439798
999993   2147440421
999994   2147450671
999995   2147450920
999996   2147454024
999997   2147454425
999998   2147476822
999999   2147481168
ok?

Guga result when feeded with RandomDwords.dat (This is the same correct result when feeded with SortedDwords.dat)
Quote0   3080
1   9158
2   13464
3   15479
4   17700
5   22867
6   26355
7   29191
8   32229
9   42286
999991   4294921291
999992   4294921704
999993   4294929912
999994   4294934235
999995   4294945952
999996   4294946053
999997   4294951189
999998   4294958802
999999   4294960543
ok?

I also gave a test on nrQsortA feeding it with an already sorted data (SortedDwords.dat), and even if it took only 16 milisecs to the function finishes, it completelly ruined the result, and changed. I didn´t analyzed nrQsortA to see why it is being breaked (perhaps, i´m porting it incorrecty to RosAsm)
On the other hand, on my algo, even if i pass a already sorted file it takes almost nothing to sort, n fact, if i feed it with SortedDwords.dat it took also only 16 milisecs to the algo finishes and the result is the same as in the loaded file.


Btw....did you tried to feed your algo (ArraySort) on this file (RandomDwords.dat) to test how much time it takes to parse ? Do you have the timmings of it ?


P.S: About the timmings, i´ve got this (i7 CPU 870, 2.93 Ghz)

QuoteGuga Algo 1 = 723718 miliseconds = 12.061966667 minutes
Guga Algo 1 = 719438 miliseconds = 11.990633333 minutes

Guga Algo 2 = 707640 miliseconds = 11.794 minutes


Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on September 29, 2014, 06:23:33 PM
Hi Gustavo,

I don't understand what you mean with "wrong results". They do look different, but it's because your algo sorts unsigned values, while mine sorts signed DWORDs:

#       signed  unsigned hex (sorted by Guga)
     0  3080    3080    00000C08
     1  9158    9158    000023C6
     2  13464   13464   00003498
     3  15479   15479   00003C77
     4  17700   17700   00004524
     5  22867   22867   00005953
     6  26355   26355   000066F3
..
999994  -33061  4294934235      FFFF7EDB
999995  -21344  4294945952      FFFFACA0
999996  -21243  4294946053      FFFFAD05
999997  -16107  4294951189      FFFFC115
999998  -8494   4294958802      FFFFDED2
999999  -6753   4294960543      FFFFE59F

#       signed          unsigned        hex (sorted by ArraySort)
     0  -2147482587     2147484709      80000425
     1  -2147476825     2147490471      80001AA7
     2  -2147476005     2147491291      80001DDB
     3  -2147475509     2147491787      80001FCB
     4  -2147466097     2147501199      8000448F
     5  -2147453390     2147513906      80007632
     6  -2147448628     2147518668      800088CC
..
999994  2147450671      2147450671      7FFF7F2F
999995  2147450920      2147450920      7FFF8028
999996  2147454024      2147454024      7FFF8C48
999997  2147454425      2147454425      7FFF8DD9
999998  2147476822      2147476822      7FFFE556
999999  2147481168      2147481168      7FFFF650


Re timings: ArraySort() takes 110 ms on my trusty old Celeron. If you provide your algo in Masm format, I'll try to put it into a testbed.
Title: Re: Create a file with RANDOM dwords
Post by: guga on September 30, 2014, 01:41:20 AM
Hi JJ

I didn´t tested your algo yet, i used as a test the one from Masm lib (nrQsortA from qsorta.asm)

But, i´m getting confused, because on small chuncks  nrQsortA takes more time to parse, according to my benchmarks tests.

Can you test my algo ? I´m not understanding what´s happenning, because on a real file (SortedDwords.dat), my algo takes absurd 12 minutes, while nrQsortA takes 94 ms ????


My algo in masm, should be like this, i believe:


Guga_Insort proc Array:DWORD, ArrayLen:DWORD


push edi
push ebx
mov ebx, 1
mov edi, Array
mov eax, ebx
sub eax, ArrayLen
jge OutFunction

MainLoop:
mov ecx, [edi+ebx*4]

InternalLoop:
mov eax, [edi+ebx*4-4]
cmp eax, ecx
jbe OutInternalLoop
mov [edi+ebx*4], eax
dec ebx
jnz InternalLoop

OutInternalLoop:
mov [edi+ebx*4], ecx
inc ebx
cmp ebx, ArrayLen
jb MainLoop

OutFunction:
pop ebx
pop edi
ret ; <--- Is Endp in masm already using a retn ??? If it does, then, this ret must be removed
Guga_Insort endp


I´m not sure about the ret instruction in masm. In RosAsm "endp" is a macro that already contains a retn. So, if ret is ok in my attemtp to port, keep it, otherwise, just remove the ret.

I disassembled my own algo, if it makes easy to port to Masm.

Guga_insort proc near
Array = dword ptr  8
ArrayLen = dword ptr  0Ch

push ebp
mov ebp, esp
push edi
push ebx
mov ebx, 1
mov edi, [ebp+Array]
mov eax, ebx
sub eax, [ebp+ArrayLen]
jge short loc_4069FA

loc_4069DC:
mov ecx, [edi+ebx*4]

loc_4069DF:
mov eax, [edi+ebx*4-4]
cmp eax, ecx
jbe short loc_4069ED
mov [edi+ebx*4], eax
dec ebx
jnz short loc_4069DF

loc_4069ED:
mov [edi+ebx*4], ecx
inc ebx
cmp ebx, [ebp+ArrayLen]
jb loc_4069DC

loc_4069FA:
pop ebx
pop edi
mov esp, ebp
pop ebp
retn 8
insort endp

Title: Re: Create a file with RANDOM dwords
Post by: guga on September 30, 2014, 02:23:17 AM
Btw, here are some benchmarks i´m testing.

It seems that my algo (Insort optimized) is only faster for small chuncks of elements. The difference starts when there are more then 20 elements of the array. Up to that amount, nrQsortA  is faster

For a accurated result. I inserted before each one of the algos, a function called "RandomGen", that copies to a buffer the array elements on the original order before the algos starts. This is to avoid that after, using the algo, the array is sorted, and the next algo will use this already sorted array and not a unsorted one.



Note: The file BenchMark_100Elements_NoGen.zip shows what happens when RandomGen is not being used. My algo parses 10 times faster when he array is already sorted, while nrQsortA tries to sort all over again, even if they are already sorted
Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on September 30, 2014, 02:54:50 AM
Hi Gustavo,
I've added your algo to the testbed, but it seems there is quadratic behaviour present. Even when taking only 2% of the elements used by the other algos, it is slower, sorry :(

Guga_Insort proc uses edi ebx Array:DWORD, ArrayLen:DWORD
mov ebx, 1
mov edi, Array
mov eax, ebx
sub eax, ArrayLen
jge OutFunction

MainLoop:
mov ecx, [edi+ebx*4]

InternalLoop:
mov eax, [edi+ebx*4-4]
cmp eax, ecx
jbe OutInternalLoop
mov [edi+ebx*4], eax
dec ebx
jnz InternalLoop

OutInternalLoop:
mov [edi+ebx*4], ecx
inc ebx
cmp ebx, ArrayLen
jb MainLoop

OutFunction:
ret
Guga_Insort endp
Title: Re: Create a file with RANDOM dwords
Post by: guga on September 30, 2014, 05:03:38 AM
Damn...it was too good to be true  :icon_mrgreen:

I´ll give a test on dualPivotQuicksort to see the results.

Btw.does your algo (or another from masmlib) sorts unsigned integers ?
Title: Re: Create a file with RANDOM dwords
Post by: guga on September 30, 2014, 02:27:42 PM
(...)Edited. JJ, If you read this post before i edited, forget it. I suceeded to make masmbasic work properly :)
I forgot to also rename Jwasm to mlv615.exe and not only renaming it with ml.exe  :t
Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on September 30, 2014, 05:45:43 PM
Hi Gustavo,

Just checked, it seems the minimum content of \Masm32\bin\ is this:

ml.exe       ; can be 6.15 ... 10.0, or a renamed JWasm.exe (better)
link.exe     ; the old 5.12 linker that comes with Masm32
mspdb50.dll  ; the old linker needs this one

Title: Re: Create a file with RANDOM dwords
Post by: hutch-- on September 30, 2014, 07:16:31 PM
Guga,

At an algo level the only difference between a signed and unsigned compare is the jump you use after the compare.
Title: Re: Create a file with RANDOM dwords
Post by: guga on September 30, 2014, 11:32:49 PM
Tks you for the feedback guys

I fixed the algo and mixed with other one i´building

The results are:

Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2)
Sorting 2000000 items:
DwRange=-1
224 ms for ArraySort MasmBasic
202 ms for nrQsortA Masm32 lib
388 ms for crt_qsort with callback
176 ms for Gugasedgesort
DwRange=-1
232 ms for ArraySort MasmBasic
208 ms for nrQsortA Masm32 lib
390 ms for crt_qsort with callback
174 ms for Gugasedgesort
DwRange=-1
221 ms for ArraySort MasmBasic
202 ms for nrQsortA Masm32 lib
390 ms for crt_qsort with callback
171 ms for Gugasedgesort
ok


JJ, can you see if the masm syntax of the algorithm is correct ?

If my porting to masm are correct, it is 15% faster then  nrQsortA. Probably i can optimize it a bit more if i succeed to remove the recursion and merge the functions i used. I´ll also finish the dualpivot (tri partition quicksort) to compare the results

I´m analyzing mine algos and see if i can reach something around 80-100 ms (Probably this timing can be reached if i suceed to make dualpivot algorithm work as expected and mix it with the other algos i built: insort and partial_quickersort)
Title: Re: Create a file with RANDOM dwords
Post by: guga on October 01, 2014, 01:15:38 AM
Btw, I forgot

Steve, which/Where change the JCC instruction to make nrQsortA to sort unsigned ?

I would like to compare the timmings for the algos works on the same way (I mean, unsorted and sorted) to check for the differences
Title: Re: Create a file with RANDOM dwords
Post by: guga on October 01, 2014, 02:37:14 AM
I´m almost there at 80 ms  :icon_mrgreen:

Updated gugasedgesort
Can someone test the results for accuracy ?

Btw...JJ, if the results are accurated, then the 120 ms can be even more faster, once i remove the recursion. I estimate a gain of 30% (at least) of speed without recursion
Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on October 01, 2014, 03:56:50 AM
Definitely fast...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (MMX, SSE, SSE2, SSE3)
Sorting 2000000 items:
DwRange=-1
495 ms for ArraySort    MasmBasic
457 ms for nrQsortA     Masm32 lib
878 ms for crt_qsort    with callback
248 ms for Gugasedgesort
DwRange=-1
499 ms for ArraySort    MasmBasic
387 ms for nrQsortA     Masm32 lib
820 ms for crt_qsort    with callback
249 ms for Gugasedgesort
DwRange=-1
488 ms for ArraySort    MasmBasic
388 ms for nrQsortA     Masm32 lib
837 ms for crt_qsort    with callback
264 ms for Gugasedgesort
Title: Re: Create a file with RANDOM dwords
Post by: guga on October 01, 2014, 11:00:12 AM
Wahoo  :icon_mrgreen: :icon_mrgreen:

Now, i´ll try to make it even faster.. I´m converting the dualpivot as i explained, and will later, remove the recursion on the gugasedgesort algo. I´m positive that gugasedgesort can be improved when removing the recursion and perhaps have a gain of 30% (or more) of it´s speed.

Also, once i made the tests on dualpivot, i´ll compare the results, and see if i can mix between gugasedgesort in order to force sedgesort to work with a tri partition technique. I hope i can make it work, because i´[m trying to achieve a speed of 40-80 ms on my I7 (Which means something around 120- 160 on your Celeron)

Btw, i´ll need to name the algo. "gugasedgesort" or GugaSomethign" is not exactly an algo´s name :icon_mrgreen:. I´m thinking in naming it as UltraSort or LightSpeed  :icon_mrgreen:
Title: Re: Create a file with RANDOM dwords
Post by: MichaelW on October 01, 2014, 11:15:49 AM
For what it's worth, if I take the Microsoft qsort source that was distributed with my Server 2003 PSDK and hack it to sort 32-bit integers, the result is essentially twice as fast as the original (1021/511 on my P3, but only 125/78 on my Core-i3). The function is non-recursive, but does byte operations for everything (the explanation for this being to avoid alignment problems), and as you know calls an external comparison function from the inner loop (so it can be rigged to handle any data type). My modifications replaced the byte operations with integer operations, replaced the comparison function with a macro that is expanded inline, and did the same for the internal swap procedure.

I doubt that Microsoft would want me distributing their source file, hacked or otherwise, so I included only the object module in the attachment, along with my C test source and the EXE, compiled with the Visual C++ Toolkit 2003 compiler and /O2 /G6 optimizations.
Title: Re: Create a file with RANDOM dwords
Post by: guga on October 01, 2014, 11:52:41 AM
 MichaelW, M$ qsort code is distributed for free for research. Take a look at the Windows Research Kernel source code (If i remember well, it is in one of the M$ pages. You need to register in order to dl). Essentially what is inside msvcrt (client and server) can be way more improved as you did, but, still, are slower then the ones i´m testing/implementing.

Also, if i remember well, the source code for qsort uses cdecl calling convention, while a regular stdcall would be enough. If your modifications removed the recursion, there seems be no need for using cdecl. If you try to change to stdcall will it be a bit faster ? And you could, as well, remove the 5 alignments inside the code.

If i can be able to make this algos even faster, i can then, try to add a 3rd  argument to be used as the "compare" argument of qsort, but, i´ll need to make all of this faster before give a try to add another argument. Using recursion, proved to be a bad way on those algos, since it slow down the speed, but, in some cases, we have no other way to go (well...at least, i didn´t touched the algo to remove the recursion yet)

Btw...can u test the benchmark i posted ? I would like to check the timmings on your P3
Title: Re: Create a file with RANDOM dwords
Post by: Gunther on October 01, 2014, 06:57:21 PM
Gustavo,

the results from your last test program:

Intel(R) Xeon(R) CPU           X3330  @ 2.66GHz (MMX, SSE, SSE2, SSE3, SSSE3, SS
E4.1)
Sorting 2000000 items:
DwRange=-1
392 ms for ArraySort    MasmBasic
296 ms for nrQsortA     Masm32 lib
517 ms for crt_qsort    with callback
172 ms for Gugasedgesort
DwRange=-1
387 ms for ArraySort    MasmBasic
312 ms for nrQsortA     Masm32 lib
498 ms for crt_qsort    with callback
154 ms for Gugasedgesort
DwRange=-1
386 ms for ArraySort    MasmBasic
299 ms for nrQsortA     Masm32 lib
553 ms for crt_qsort    with callback
194 ms for Gugasedgesort
ok


Gunther
Title: Re: Create a file with RANDOM dwords
Post by: MichaelW on October 01, 2014, 10:15:36 PM
Quote from: guga on October 01, 2014, 11:52:41 AM
Btw...can u test the benchmark i posted ? I would like to check the timmings on your P3

These benchmarks?

10
Results 8 pass average
timing nrQsortA (Masm Lib) 26150 ms
timing sedgesort Guga6995 ms
timing insort Guga6316 ms
20
Results 8 pass average
timing nrQsortA (Masm Lib) 41865 ms
timing sedgesort Guga24775 ms
timing insort Guga24013 ms
30
Results 8 pass average
timing nrQsortA (Masm Lib) 84504 ms
timing sedgesort Guga55358 ms
timing insort Guga54557 ms
52
Results 8 pass average
timing nrQsortA (Masm Lib) 134079 ms
timing sedgesort Guga149397 ms
timing insort Guga146562 ms
100
Results 8 pass average
timing nrQsortA (Masm Lib) 289919 ms
timing sedgesort Guga472609 ms
timing insort Guga471819 ms
100 NoGen
Results 8 pass average
timing nrQsortA (Masm Lib) 143974 ms
timing sedgesort Guga9573 ms
timing insort Guga8752 ms

The above took hours on a 500MHz P3.
Title: Re: Create a file with RANDOM dwords
Post by: guga on October 01, 2014, 11:02:53 PM
Tks guys :)


Btw...MichaelW, Did you tested the last one (ArraySortTimingsDword6a.zip ) ? For your results, i guess you tested the version that had problems with quadratic behaviour. Do´t use the ones from "NoGen". Those are the ones that had errors.

I fixed this on the last version (ArraySortTimingsDword6a.zip)

Can u test on a P3 this version below that i modified for your P3 ? I included only my algo to make the benchmark run faster. I would like to compare the results on a old P3.
Title: Re: Create a file with RANDOM dwords
Post by: MichaelW on October 01, 2014, 11:17:33 PM
ArraySortTimingsDword6a.zip and ArraySortTimingsDword6c.exe, both apparently using MasmBasic, require SSE2 support so they won't run on a P3.
Title: Re: Create a file with RANDOM dwords
Post by: nidud on October 03, 2014, 01:04:14 AM
deleted
Title: Re: Create a file with RANDOM dwords
Post by: Gunther on October 03, 2014, 02:04:44 AM
Hi nidud,

the results:

41      18467   6334    26500   19169   15724   11478   29358   26962   24464
5705    28145   23281   16827   9961    491     2995    11942   4827    5436


qsort:
41      491     2995    4827    5436    5705    6334    9961    11478   11942
15724   16827   18467   19169   23281   24464   26500   26962   28145   29358

qsortmod:
41      491     2995    4827    5436    5705    6334    9961    11478   11942
15724   16827   18467   19169   23281   24464   26500   26962   28145   29358

sortdd:
41      491     2995    4827    5436    5705    6334    9961    11478   11942
15724   16827   18467   19169   23281   24464   26500   26962   28145   29358

qsort:          3100
qsortmod:       540
sortdd:         480


Gunther
Title: Re: Create a file with RANDOM dwords
Post by: hutch-- on October 03, 2014, 11:01:12 AM
There is an alternative to exchange based sorts if you know the range from lowest to highest value and it can be fitted into memory. They were long ago called "letter box" sorts and I have heard them called "pidgeon hole" sorts but the technique is much the same. If just for example you were working on a numeric range of 1 million, you allocate an array of 4 million bytes zero filled then scan through the data incrementing each array member that matches the number scanned as it is found.

Once the first scan is completed you then scan the target array and for non zero members and write them back to the original source sorted. If the range you are working on is limited, this technique will kill an exchange sort dead, they are so much faster.
Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on October 03, 2014, 02:30:03 PM
Quote from: hutch-- on October 03, 2014, 11:01:12 AM
There is an alternative to exchange based sorts if you know the range from lowest to highest value and it can be fitted into memory. They were long ago called "letter box" sorts and I have heard them called "pidgeon hole" sorts ...  If the range you are working on is limited, this technique will kill an exchange sort dead, they are so much faster.

Kind of bucket sorts, yes. My ArraySort() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1176) uses a variant called radix sort, which has no memory problems and beats nrQsortA up to a range of 400,000 or so (and crt qsort has no chance even for the full dword range).

Re the timings above: They are so sensational that you should get a patent for "sorting small arrays repeatedly". But check the timings for "sorting a fat array once" ;)
Title: Re: Create a file with RANDOM dwords
Post by: Gunther on October 03, 2014, 07:06:07 PM
Jochen,

Quote from: jj2007 on October 03, 2014, 02:30:03 PM
They are so sensational that you should get a patent for "sorting small arrays repeatedly". But check the timings for "sorting a fat array once" ;)

these are very different things.

Gunther
Title: Re: Create a file with RANDOM dwords
Post by: jj2007 on October 03, 2014, 10:02:35 PM
Yes indeed. How is the weather in Berlin?
Title: Re: Create a file with RANDOM dwords
Post by: Gunther on October 04, 2014, 12:16:43 AM
Jochen,

Quote from: jj2007 on October 03, 2014, 10:02:35 PM
Yes indeed. How is the weather in Berlin?

the weather is fine here: sunny and a warm wind. Good weather for a public holiday.

But back to the thread. Qsort needs in any case a good center for the data. That's necessary for such Divide et impera (divide and conquer) algorithm. Sedgwick has an own chapter for that point.

Gunther