Print Page - Comparing 128-bit numbers aka OWORDs

Title: Comparing 128-bit numbers aka OWORDs
Post by: jj2007 on August 12, 2013, 08:25:24 PM

Following the "What is the fastest way (performance wise) to compare two 128 bit integers" thread (http://masm32.com/board/index.php?topic=2213.0) in the Campus, here a first attempt to time comparisons of 128-bit unsigned integers.

Fifteen cycles is quite a lot, so if you have a better algo, please post it... you can test it before in line 90 of the attached source.

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
15580 cycles for 1000 * cmp128 (2 globals)
88587 cycles for 1000 * cmp128b (loop)
27107 cycles for 1000 * cmp128p (calls proc)
28611 cycles for 1000 * two pointers

P.S.: Googling yields almost nothing, apparently there are not many applications for this :(

Title: Re: Comparing 128-bit numbers aka OWORDs
Post by: sinsi on August 12, 2013, 08:45:03 PM

Que?

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
loop overhead is approx. 1824/1000 cycles

3385 cycles for 1000 * cmp128
?? cycles for 1000 * cmp128 xx

3489 cycles for 1000 * cmp128
?? cycles for 1000 * cmp128 xx

3335 cycles for 1000 * cmp128
?? cycles for 1000 * cmp128 xx

Title: Re: Comparing 128-bit numbers aka OWORDs
Post by: jj2007 on August 12, 2013, 10:23:54 PM

John, your CPU doesn't respect the speed limits, as usual :eusa_naughty:

OK, version B attached on top. It features a loop based macro:

cmp128b MACRO ow0, ow1   ; both operands must be memory variables
push esi
push edi
mov esi, offset ow0
mov edi, offset ow1
mov ecx, 16
.Repeat
   dec ecx
   .if Sign?
      inc ecx
      .Break
   .endif
   movzx eax, byte ptr [esi+ecx]
   movzx edx, byte ptr [edi+ecx]
   cmp eax, edx
.Until !Zero?
pop edi
pop esi
ENDM

Title: Re: Comparing 128-bit numbers aka OWORDs
Post by: sinsi on August 12, 2013, 10:36:36 PM

>John, your CPU doesn't respect the speed limits, as usual :eusa_naughty:
Hah! It's you being disrespectful to my CPU.

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
loop overhead is approx. 1694/1000 cycles

3158 cycles for 1000 * cmp128
58915 cycles for 1000 * cmp128b

3162 cycles for 1000 * cmp128
58929 cycles for 1000 * cmp128b

3124 cycles for 1000 * cmp128
59191 cycles for 1000 * cmp128b

Title: Re: Comparing 128-bit numbers aka OWORDs
Post by: jj2007 on August 12, 2013, 11:38:25 PM

One more, adding a generic one which expects two pointers:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 2998/1000 cycles

15567 cycles for 1000 * cmp128 (2 globals)
88584 cycles for 1000 * cmp128b (loop)
27141 cycles for 1000 * cmp128p (calls proc)
29387 cycles for 1000 * two pointers

15562 cycles for 1000 * cmp128 (2 globals)
88587 cycles for 1000 * cmp128b (loop)
27100 cycles for 1000 * cmp128p (calls proc)
29431 cycles for 1000 * two pointers

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 1765/1000 cycles

9116 cycles for 1000 * cmp128 (2 globals)
73639 cycles for 1000 * cmp128b (loop)
17461 cycles for 1000 * cmp128p (calls proc)
24831 cycles for 1000 * two pointers

Title: Re: Comparing 128-bit numbers aka OWORDs
Post by: Gunther on August 13, 2013, 03:16:45 AM

Jochen,

your timings:

The MASM Forum

General => The Laboratory => Topic started by: jj2007 on August 12, 2013, 08:25:24 PM