Author Topic: Comparing 128-bit numbers aka OWORDs  (Read 122008 times)

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #75 on: August 20, 2013, 12:15:31 AM »
your idea is better anyways, nidud
it just occured to me that OR AL,1 does not necessarily clear the ZF if the pre-existing value has bit 7 set
Code: [Select]
    sub     eax,eax
    inc     eax
i like it   :P

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #76 on: August 20, 2013, 12:18:46 AM »
something in there doesn't like my P4   :P

Cmp128TimingsNidudB
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
+19 of 20 tests valid, loop overhead is approx. 2078/1000 cycles

23549   cycles for 1000 * Ocmp (JJ)
21190   cycles for 1000 * Ocmp2 (JJ)
31788   cycles for 1000 * cmp128n (nidud)
36806   cycles for 1000 * cmp128 qWord
17161   cycles for 1000 * JJAxCMP128bit

23934   cycles for 1000 * Ocmp (JJ)
21858   cycles for 1000 * Ocmp2 (JJ)
31505   cycles for 1000 * cmp128n (nidud)
37015   cycles for 1000 * cmp128 qWord
17209   cycles for 1000 * JJAxCMP128bit

24145   cycles for 1000 * Ocmp (JJ)
21179   cycles for 1000 * Ocmp2 (JJ)
31363   cycles for 1000 * cmp128n (nidud)
36887   cycles for 1000 * cmp128 qWord
17515   cycles for 1000 * JJAxCMP128bit

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #77 on: August 20, 2013, 01:06:42 AM »
i still think you can stop comparing when you find a high-order mismatch
and - no need to ripple the CF all the way from low-order to high-order if they are all equal

something like this - i need to do some testing
Code: [Select]
Cmp128Dave MACRO OwA:REQ,OwB:REQ

;OwA and OwB are pointers to memory operands

    mov     eax,dword ptr OwA[12]
    mov     edx,dword ptr OwA[8]
    sub     eax,dword ptr OwB[12]
    .if ZERO?
        cmp     edx,dword ptr OwB[8]
        mov     ecx,dword ptr OwA[4]
        .if ZERO?
            cmp     ecx,dword ptr OwB[4]
            mov     edx,dword ptr OwA[0]
            .if ZERO?
                cmp     edx,dword ptr OwB[0]
                .if !ZERO?
                    sbb     eax,0
                .endif
            .endif
        .endif
    .endif

ENDM

nidud

  • Member
  • *****
  • Posts: 2112
    • https://github.com/nidud/asmc
Re: Comparing 128-bit numbers aka OWORDs
« Reply #78 on: August 20, 2013, 01:16:02 AM »
By the way: Why sbb all over the place?

   jnz @NE1
   mov eax,dword ptr ow0[4]
   .if Carry?
      INT 3
   .endif
   sbb eax,dword ptr ow1[4]


   jnz @NE1
   mov eax,dword ptr ow0[4]
   sub eax,dword ptr ow1[4]

   .if Carry?
      dec eax
      INT 3
   .endif

Quote
AMD Athlon(tm) II X2 245 Processor (SSE3)
loop overhead is approx. 2014/1000 cycles

9249   cycles for 1000 * Ocmp (JJ)
9036   cycles for 1000 * Ocmp2 (JJ)
2368   cycles for 1000 * cmp128n (nidud)
7067   cycles for 1000 * cmp128 qWord
9577   cycles for 1000 * JJAxCMP128bit

jj2007

  • Member
  • *****
  • Posts: 11306
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #79 on: August 20, 2013, 01:23:16 AM »
   jnz @NE1
   ; so here we are inside a ZERO branch, and there is definitely NO CARRY
   mov eax,dword ptr ow0[4]
   sbb eax,dword ptr ow0[4]  ; therefore sbb behaves exactly like sub

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #80 on: August 20, 2013, 01:53:18 AM »
New attempt:

Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 2198/1000 cycles

24957   cycles for 1000 * Ocmp (JJ)
22338   cycles for 1000 * Ocmp2 (JJ)
32907   cycles for 1000 * cmp128n (nidud)
39508   cycles for 1000 * cmp128 qWord
6635    cycles for 1000 * AxCMP128bit

24604   cycles for 1000 * Ocmp (JJ)
22414   cycles for 1000 * Ocmp2 (JJ)
32845   cycles for 1000 * cmp128n (nidud)
38513   cycles for 1000 * cmp128 qWord
6367    cycles for 1000 * AxCMP128bit

24608   cycles for 1000 * Ocmp (JJ)
22294   cycles for 1000 * Ocmp2 (JJ)
34059   cycles for 1000 * cmp128n (nidud)
38513   cycles for 1000 * cmp128 qWord
6388    cycles for 1000 * AxCMP128bit


--- ok ---

Did not look over your latest archive yet, Jochen, got it just now.

Can you please add my code to newest testbed?

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #81 on: August 20, 2013, 02:01:20 AM »
Ooops, entangled the order :greensml:

Here is my code:

Code: [Select]
AxCMP128bit MACRO ow0:REQ, ow1:REQ
LOCAL @l1, @l2, @l3, @l1_1, @l2_1, @l3_1, @l0


mov eax,dword ptr [ow0+12]
cmp eax,dword ptr [ow1+12]
jnz @l0

mov eax,dword ptr [ow0+8]
cmp eax,dword ptr [ow1+8]
jnz @l3

mov eax,dword ptr [ow0+4]
cmp eax,dword ptr [ow1+4]
jnz @l2

mov eax,dword ptr [ow0]
cmp eax,dword ptr [ow1]
jz @l0


@l1:
mov edx,dword ptr [ow0+12]
mov ecx,dword ptr [ow1+12]
mov dx,word ptr [ow0+2]
mov cx,word ptr [ow1+2]
cmp dx,word ptr [ow1+2]
jnz @l1_1
mov cx,word ptr [ow1]
mov dx,ax
@l1_1:
cmp edx,ecx
jmp @l0

@l2:
mov edx,dword ptr [ow0+12]
mov ecx,dword ptr [ow1+12]
mov dx,word ptr [ow0+2+4]
mov cx,word ptr [ow1+2+4]
cmp dx,word ptr [ow1+2+4]
jnz @l2_1
mov cx,word ptr [ow1+4]
mov dx,ax
@l2_1:
cmp edx,ecx
jmp @l0

@l3:
mov edx,dword ptr [ow0+12]
mov ecx,dword ptr [ow1+12]
mov dx,word ptr [ow0+2+8]
mov cx,word ptr [ow1+2+8]
cmp dx,word ptr [ow1+2+8]
jnz @l3_1
mov cx,word ptr [ow1+8]
mov dx,ax
@l3_1:
cmp edx,ecx


@l0:

ENDM

Timings:
Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 2280/1000 cycles

25091   cycles for 1000 * Ocmp (JJ)
23674   cycles for 1000 * Ocmp2 (JJ)
33530   cycles for 1000 * cmp128n (nidud)
39507   cycles for 1000 * cmp128 qWord
12115   cycles for 1000 * AxCMP128bit

25075   cycles for 1000 * Ocmp (JJ)
22759   cycles for 1000 * Ocmp2 (JJ)
34451   cycles for 1000 * cmp128n (nidud)
39244   cycles for 1000 * cmp128 qWord
10880   cycles for 1000 * AxCMP128bit

25092   cycles for 1000 * Ocmp (JJ)
22990   cycles for 1000 * Ocmp2 (JJ)
32880   cycles for 1000 * cmp128n (nidud)
38631   cycles for 1000 * cmp128 qWord
10777   cycles for 1000 * AxCMP128bit


--- ok ---

nidud

  • Member
  • *****
  • Posts: 2112
    • https://github.com/nidud/asmc
Re: Comparing 128-bit numbers aka OWORDs
« Reply #82 on: August 20, 2013, 02:03:08 AM »
   jnz @NE1
   ; so here we are inside a ZERO branch, and there is definitely NO CARRY
   mov eax,dword ptr ow0[4]
   sbb eax,dword ptr ow0[4]  ; therefore sbb behaves exactly like sub
hmm :icon_redface:
yes, you may use sub instead

EDIT: JJ  :t
« Last Edit: August 20, 2013, 04:25:17 AM by nidud »

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #83 on: August 20, 2013, 02:19:44 AM »
Archive with the right code.

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #84 on: August 20, 2013, 02:20:59 AM »
nidud - SUB would be faster because it does not have to wait for the carry condition to be set

Alex - odds are that the high order compare would find a mismatch
of all the possible 128-bit integers, only 1 in every 4294967296 have a specific high dword value

of all combinations of two 128-bit values,
only 1 combination in every 42949672962 (roughly) has matching high-order dwords

as you said, this is very application dependant, but i like playing those odds   :P

the timing tests probably aren't set up to test a wide range of values

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #85 on: August 20, 2013, 02:25:33 AM »
Alex - odds are that the high order compare would find a mismatch
...
as you said, this is very application dependant, but i like playing those odds   :P

No-no, Dave, I agree with you, just not get it from first time (in the first pages of the thread me too "voted" for checking the highest order elements), it's late here, sorry :greensml:

nidud

  • Member
  • *****
  • Posts: 2112
    • https://github.com/nidud/asmc
Re: Comparing 128-bit numbers aka OWORDs
« Reply #86 on: August 20, 2013, 02:31:00 AM »
Quote
1365   cycles for 1000 * cmp128 nidud
1363   cycles for 1000 * cmp128 Dave

 :biggrin:

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #87 on: August 20, 2013, 02:36:58 AM »
lol nidud
you'd have to select specific values to compare to validate that theory

the values used in the timing test don't cover a very comprehensive range of comparisons

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #88 on: August 20, 2013, 02:40:26 AM »
results from Alex's latest code

prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 2134/1000 cycles

23563   cycles for 1000 * Ocmp (JJ)
21133   cycles for 1000 * Ocmp2 (JJ)
31419   cycles for 1000 * cmp128n (nidud)
37262   cycles for 1000 * cmp128 qWord
10163   cycles for 1000 * AxCMP128bit

23335   cycles for 1000 * Ocmp (JJ)
21130   cycles for 1000 * Ocmp2 (JJ)
32047   cycles for 1000 * cmp128n (nidud)
37288   cycles for 1000 * cmp128 qWord
10208   cycles for 1000 * AxCMP128bit

23770   cycles for 1000 * Ocmp (JJ)
21134   cycles for 1000 * Ocmp2 (JJ)
31739   cycles for 1000 * cmp128n (nidud)
37136   cycles for 1000 * cmp128 qWord
10172   cycles for 1000 * AxCMP128bit

Gunther

  • Member
  • *****
  • Posts: 3720
  • Forgive your enemies, but never forget their names
Re: Comparing 128-bit numbers aka OWORDs
« Reply #89 on: August 20, 2013, 02:43:45 AM »
My results from Alex's latest test:
Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++++++++++3 of 20 tests valid, loop overhead is approx. 1678/1000 cycles

2364    cycles for 1000 * Ocmp (JJ)
2351    cycles for 1000 * Ocmp2 (JJ)
2177    cycles for 1000 * cmp128n (nidud)
10741   cycles for 1000 * cmp128 qWord
3756    cycles for 1000 * AxCMP128bit

2340    cycles for 1000 * Ocmp (JJ)
2346    cycles for 1000 * Ocmp2 (JJ)
2188    cycles for 1000 * cmp128n (nidud)
4684    cycles for 1000 * cmp128 qWord
3749    cycles for 1000 * AxCMP128bit

2420    cycles for 1000 * Ocmp (JJ)
2431    cycles for 1000 * Ocmp2 (JJ)
7600    cycles for 1000 * cmp128n (nidud)
4545    cycles for 1000 * cmp128 qWord
9999    cycles for 1000 * AxCMP128bit

--- ok ---

Gunther
Get your facts first, and then you can distort them.