News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

nidud

#120
deleted

Antariy

With the help of Dave's test numbers patterns, here is the checking testbed :t

Currently only my and Jochen's algos pass the check.


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 2111/1000 cycles

#######################################################
Testing algo: Cmp128Dave [esi],[edi]
1970169159 - Test failed: 00000000_00000000_00000000_00000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000000_00000000_00000000  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_00000000_00000000_00000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000000_00000000_00000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_00000001_00000001_00000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000001_00000001_00000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000001_00000001_00000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_00000100_00000000_00000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000100_00000000_00000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000100_00000000_00000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_00000001_00000000_00000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000001_00000000_00000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000001_00000000_00000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_00000000_00000000_01000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000000_00000000_01000000  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_00000000_00000000_01000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000000_00000000_01000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_00000000_40000000_00000001  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000000_40000000_00000001  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_00000000_40000000_00000001  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000000_40000000_00000001  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000001_00000000_00000000_40010000  00000001_00000000
_C0000100_C0000000
1970169159 - Test failed: 00000000_00000000_41000000_40000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000000_41000000_40000000  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_00000000_41000000_40000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000000_41000000_40000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000000
_00000000_00000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000001
_00000001_00000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000100
_00000000_00000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000001
_00000000_00000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000000
_00000000_01000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000000
_40000000_00000001
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000000
_41000000_40000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000001
_00000000_80000100
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_80000000_40000001_00000000  00000000_00000000
_00000001_C0000001
1970169159 - Test failed: 00000000_00000001_00000000_80000100  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000001_00000000_80000100  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000001_00000000_80000100  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_00000000
_00000000_00000000
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_00000000
_00000000_01000000
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_00000000
_40000000_00000001
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_00000000
_41000000_40000000
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_00000000
_00000001_C0000001
1970169159 - Test failed: 00000000_00000000_80010000_80000000  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000000
_00000000_00000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000001
_00000001_00000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000100
_00000000_00000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000001
_00000000_00000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000000
_00000000_01000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000000
_40000000_00000001
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000000
_41000000_40000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000001
_00000000_80000100
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_81000000_80000000_00000001  00000000_00000000
_00000001_C0000001
1970169159 - Test failed: C0000000_80000001_00000000_00000000  C0000000_00000000
_00000000_00000000
1970169159 - Test failed: C0000000_00000000_00000000_00000000  C0000000_80000001
_00000000_00000000
1970169159 - Test failed: 00000000_00000000_00000001_C0000001  00000000_80000000
_40000001_00000000
1970169159 - Test failed: 00000000_00000000_00000001_C0000001  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_00000000_00000001_C0000001  00000000_81000000
_80000000_00000001
1970169159 - Test failed: 00000000_00000000_00000001_C0000001  00000000_C0010000
_C0000000_00000000
1970169159 - Test failed: 00000001_00000000_C0000100_C0000000  00000001_00000000
_00000000_40010000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000000
_00000000_00000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000001
_00000001_00000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000100
_00000000_00000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000001
_00000000_00000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000000
_00000000_01000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000000
_40000000_00000001
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000000
_41000000_40000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000001
_00000000_80000100
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000000
_80010000_80000000
1970169159 - Test failed: 00000000_C0010000_C0000000_00000000  00000000_00000000
_00000001_C0000001
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_3FFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_3FFFFFFF  FFFFFFFF_FFFFFFFF
_7FFFFFFF_3FFFFFFE
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_3FFFFFFF  FFFFFFFF_FFFFFFFF
_7EFFFFFF_7FFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFE_3FFFFFFE_3FFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFE_3FFFFFFE_3FFFFFFF  FFFFFFFF_FFFFFFFE
_FFFFFFFF_BFFFFEFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFF_3FFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFE
_3FFFFFFE_3FFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFF_3EFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_7FFFFFFF_3FFFFFFE
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_7EFFFFFF_7FFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_BFFFFFFF
_7FFFFFFE_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFE
_FFFFFFFF_BFFFFEFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_BFFEFFFF_BFFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_BEFFFFFF
_BFFFFFFF_FFFFFFFE
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_BFFFFFFE
_FFFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFE_FFFFFFFE
1970169159 - Test failed: FFFFFFFF_3FFFFEFF_3FFFFFFF_FFFFFFFF  FFFFFFFF_FFFEFFFF
_FFFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_3EFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_3EFFFFFF  FFFFFFFF_FFFFFFFF
_7FFFFFFF_3FFFFFFE
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_3EFFFFFF  FFFFFFFF_FFFFFFFF
_7EFFFFFF_7FFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7FFFFFFF_3FFFFFFE  FFFFFFFF_FFFFFFFF
_FFFFFFFF_3FFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7FFFFFFF_3FFFFFFE  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7FFFFFFF_3FFFFFFE  FFFFFFFF_FFFFFFFF
_FFFFFFFF_3EFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7FFFFFFF_3FFFFFFE  FFFFFFFF_FFFFFFFF
_BFFEFFFF_BFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7FFFFFFF_3FFFFFFE  FFFFFFFF_FFFFFFFF
_FFFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7FFFFFFF_3FFFFFFE  FFFFFFFF_FFFFFFFF
_FFFFFFFE_FFFFFFFE
1970169159 - Test failed: FFFFFFFE_7FFFFFFE_7FFFFFFF_FFFFFFFF  FFFFFFFE_FFFFFFFF
_FFFFFFFF_7FFEFFFF
1970169159 - Test failed: FFFFFFFE_7FFFFFFE_7FFFFFFF_FFFFFFFF  FFFFFFFE_FFFFFFFF
_FFFFFEFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFE_7FFFFFFE_7FFFFFFF_FFFFFFFF  FFFFFFFE_FFFFFFFF
_FFFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFE_FFFFFFFF_FFFFFFFF_7FFEFFFF  FFFFFFFE_7FFFFFFE
_7FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7EFFFFFF_7FFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFF_3FFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7EFFFFFF_7FFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7EFFFFFF_7FFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFF_3EFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7EFFFFFF_7FFFFFFF  FFFFFFFF_FFFFFFFF
_BFFEFFFF_BFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7EFFFFFF_7FFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_7EFFFFFF_7FFFFFFF  FFFFFFFF_FFFFFFFF
_FFFFFFFE_FFFFFFFE
1970169159 - Test failed: FFFFFFFF_BFFFFFFF_7FFFFFFE_FFFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFE_FFFFFFFF_BFFFFEFF  FFFFFFFF_FFFFFFFE
_3FFFFFFE_3FFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFE_FFFFFFFF_BFFFFEFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_BFFEFFFF_BFFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_BFFEFFFF_BFFFFFFF  FFFFFFFF_FFFFFFFF
_7FFFFFFF_3FFFFFFE
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_BFFEFFFF_BFFFFFFF  FFFFFFFF_FFFFFFFF
_7EFFFFFF_7FFFFFFF
1970169159 - Test failed: FFFFFFFF_BEFFFFFF_BFFFFFFF_FFFFFFFE  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_BFFFFFFE_FFFFFFFF_FFFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_7FFFFFFF_3FFFFFFE
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF  FFFFFFFF_FFFFFFFF
_7EFFFFFF_7FFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFE_FFFFFFFE  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFE_FFFFFFFE  FFFFFFFF_FFFFFFFF
_7FFFFFFF_3FFFFFFE
1970169159 - Test failed: FFFFFFFF_FFFFFFFF_FFFFFFFE_FFFFFFFE  FFFFFFFF_FFFFFFFF
_7EFFFFFF_7FFFFFFF
1970169159 - Test failed: FFFFFFFE_FFFFFFFF_FFFFFEFF_FFFFFFFF  FFFFFFFE_7FFFFFFE
_7FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFF_FFFEFFFF_FFFFFFFF_FFFFFFFF  FFFFFFFF_3FFFFEFF
_3FFFFFFF_FFFFFFFF
1970169159 - Test failed: FFFFFFFE_FFFFFFFF_FFFFFFFF_FFFFFFFF  FFFFFFFE_7FFFFFFE
_7FFFFFFF_FFFFFFFF
Test done


#######################################################
Testing algo: cmp128n [esi],[edi]
1970169159 - Test failed: 80000000_00000000_00000000_00000001  7FFFFFFF_FFFFFFFF
_FFFFFFFE_FFFFFFFF
Test done


#######################################################
Testing algo: cmp128q [esi],[edi]
1970169159 - Test failed: 80000000_00000000_00000000_00000001  7FFFFFFF_FFFFFFFF
_FFFFFFFE_FFFFFFFF
Test done


#######################################################
Testing algo: Ocmp2 [esi],[edi]
Test done


#######################################################
Testing algo: AxCMP128bit [esi],[edi]
Test done


Dave, thank you very much for the test data :t

Antariy

Quote from: nidud on August 20, 2013, 10:32:55 PM
Alex,
I just used the test I posted here, adding qWord's number

I then added Small=1 for the cmp(1,-1) which fail's

Then I added this to Dave's test

C128nidud PROC USES ESI EDI lpOp1:LPVOID,lpOp2:LPVOID
mov esi,lpOp1
mov edi,lpOp2
mov eax,[esi]
sub eax,[edi]
jnz @NE1
mov eax,[esi+4]
sub eax,[edi+4]
jnz @NE2
mov eax,[esi+8]
sub eax,[edi+8]
jnz @NE3
mov eax,[esi+12]
sbb eax,[edi+12]
jmp @end
@NE1: mov eax,[esi+4]
sbb eax,[edi+4]
@NE2: mov eax,[esi+8]
sbb eax,[edi+8]
@NE3:   mov eax,[esi+12]
sbb eax,[edi+12]
jo @OV
jnz @end
inc eax
@end: ret
@OV: jc @end
mov eax,80000000h
sub eax,7FFFFFFFh
jmp @end
C128nidud endp



Hmm... I checked it with qWord's number, too - it worked. Maybe I did not get something in your method?

nidud

#123
deleted

dedndave

nidud's latest code passes my test
are you saying it doesn't pass yours Alex ?

on my algo, this fail....
cmp 00000000_00000000_00000000_00000000 , 80000000_00000000_00000000_00000001
was: OV NG NZ CY should be: NV PL NZ CY

tells me that my theory about early exit on high-order compare is not a good theory - lol

it appears that you DO have to ripple from low to high
so, we are looking at something like nidud's code

i added a few blank lines for readability....
;***********************************************************************************************

C128nidud PROC USES ESI EDI lpOp1:LPVOID,lpOp2:LPVOID

mov esi,lpOp1
mov edi,lpOp2
mov eax,[esi]
sub eax,[edi]
jnz @NE1

mov eax,[esi+4]
sub eax,[edi+4]
jnz @NE2

mov eax,[esi+8]
sub eax,[edi+8]
jnz @NE3

mov eax,[esi+12]
sbb eax,[edi+12]
jmp @end

@NE1: mov eax,[esi+4]
sbb eax,[edi+4]

@NE2: mov eax,[esi+8]
sbb eax,[edi+8]

@NE3:   mov eax,[esi+12]
sbb eax,[edi+12]
jo @OV

jnz @end

inc eax

@end: ret

@OV: jc @end

mov eax,80000000h
sub eax,7FFFFFFFh
jmp @end

C128nidud ENDP

;***********************************************************************************************


i like the algo, except it seems a little messy at the end   :P

nidud

#125
deleted

Antariy

Here is the new one. Since my testing method goes other way than Dave's one and do not needs additional DWORD to check results (it relies on an "Etalone" proc which now is full), I used that extra DWORD in tests, too - by varying the offset to table and increment size - this DWORD becomes a part of a OWORD in a second pass.

You may play with this, too - comments in the lines:

   add edi,16;+offst
   dec ebx
   jnz @l2
   add esi,16;+offst

Antariy

Quote from: dedndave on August 20, 2013, 11:42:54 PM
nidud's latest code passes my test
are you saying it doesn't pass yours Alex ?

on my algo, this fail....
cmp 00000000_00000000_00000000_00000000 , 80000000_00000000_00000000_00000001
was: OV NG NZ CY should be: NV PL NZ CY

tells me that my theory about early exit on high-order compare is not a good theory - lol

I used his old code, probably, well, ATM I'm working on a testing method and not noticed that it's updated :biggrin: The target was to make working testing method - it's OK now, I'll add new code now.

BTW, Dave, is this your latest code?

Cmp128Dave MACRO OwA:REQ,OwB:REQ

;OwA and OwB are pointers to memory operands

    mov     eax,dword ptr OwA[12]
    mov     edx,dword ptr OwA[8]
    sub     eax,dword ptr OwB[12]
    .if ZERO?
        cmp     edx,dword ptr OwB[8]
        mov     ecx,dword ptr OwA[4]
        .if ZERO?
            cmp     ecx,dword ptr OwB[4]
            mov     edx,dword ptr OwA[0]
            .if ZERO?
                cmp     edx,dword ptr OwB[0]
                .if !ZERO?
                    sbb     eax,eax
                .endif
            .endif
        .endif
    .endif

ENDM

dedndave

well - that is the latest that doesn't work   :lol:

Antariy

C128nidud doesn't fail.

One more update - just for completeness make more passes with numbers flow. Some numbers are repeating.

I think no one looked to my testbed so no one knows how neat and flexible it is :P Testing results + timings in one testbed.

Note again: this testing method does not require manual data construction - you may even test it over any random data, because it uses the "milestone" to compare the results with. But Dave's data much-much better than random data, because it's crafted thing.

Timings in the bottom of listing:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
++++16 of 20 tests valid, loop overhead is approx. 2433/1000 cycles

22910   cycles for 1000 * Ocmp (JJ)
21054   cycles for 1000 * Ocmp2 (JJ)
38189   cycles for 1000 * cmp128n (nidud)
36171   cycles for 1000 * cmp128 qWord
9852    cycles for 1000 * AxCMP128bit

22886   cycles for 1000 * Ocmp (JJ)
20740   cycles for 1000 * Ocmp2 (JJ)
38777   cycles for 1000 * cmp128n (nidud)
36293   cycles for 1000 * cmp128 qWord
9835    cycles for 1000 * AxCMP128bit

23170   cycles for 1000 * Ocmp (JJ)
20988   cycles for 1000 * Ocmp2 (JJ)
37485   cycles for 1000 * cmp128n (nidud)
36515   cycles for 1000 * cmp128 qWord
9851    cycles for 1000 * AxCMP128bit


--- ok ---



The message exceeded 20000 chars, so I removed all the data except timings. But when you run it it first displays correctness testing results for every algo.

Can I ask for timings, please?

dedndave

this code works but as i recall, SAHF is a slow instruction
C128Dave PROC USES ESI EDI lpOp1:LPVOID,lpOp2:LPVOID

    mov     esi,lpOp1
    mov     edi,lpOp2

    mov     eax,[esi]
    cmp     eax,[edi]
    jnz     c1

    mov     eax,[esi+4]
    cmp     eax,[edi+4]
    jnz     c2

    mov     eax,[esi+8]
    cmp     eax,[edi+8]
    jnz     c3

    mov     eax,[esi+12]
    cmp     eax,[edi+12]
    jmp short cz

c1: mov     eax,[esi+4]
    sbb     eax,[edi+4]

c2: mov     eax,[esi+8]
    sbb     eax,[edi+8]

c3: mov     eax,[esi+12]
    sbb     eax,[edi+12]
    jnz     cz

    lahf
    lea     eax,[eax-4000h]
    sahf

cz: ret

C128Dave ENDP

Antariy

Quote from: dedndave on August 21, 2013, 01:16:47 AM
this code works, but i think SAHF is slow

Well, just add it in the testbed I posted above :P :biggrin: I got bored with this stuff :P :biggrin:

dedndave

this is the kind of stuff i enjoy
i should be working on something else - lol

Antariy

To check your code you may just insert in into testbed and add this in the start:

   CheckIt <invoke C128Dave,esi,edi>


And, yes, it passes the check :t Though adding to a timings part is to your side :P

dedndave

here is a nice algo
i think this one is a winner   :t

it can be modified to make a macro that direct addresses
and preloading registers may also provide some improvement
but this gives you the basic concept.....
C128Dave PROC USES ESI EDI lpOp1:LPVOID,lpOp2:LPVOID

    mov     esi,lpOp1
    mov     edi,lpOp2
    xor     edx,edx
    mov     eax,[esi]
    cmp     eax,[edi]
    .if !ZERO?
        inc     edx
    .endif
    mov     eax,[esi+4]
    sbb     eax,[edi+4]
    .if !ZERO?
        inc     edx
    .endif
    mov     eax,[esi+8]
    sbb     eax,[edi+8]
    .if !ZERO?
        inc     edx
    .endif
    mov     eax,[esi+12]
    mov     ecx,[edi+12]
    sbb     al,cl
    .if !ZERO?
        inc     edx
    .endif
    .if CARRY?
        mov     cl,dl
        mov     al,dh
    .else
        mov     al,dl
        mov     cl,dh
    .endif
    cmp     eax,ecx
    ret

C128Dave ENDP


the idea is to gather the info for the 3 low-order DWORD compares
then, when we get to the last one, compare the low bytes (with borrow)
then, replace the low bytes with the cumulated results for a single DWORD CMP instruction

notice that INC does not affect the CF