News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

nidud

#90
deleted

dedndave

 :biggrin:  now, you're just picking on me - lol

nidud

#92
deleted

jj2007

Quote from: Antariy on August 20, 2013, 02:01:20 AM
Ooops, entangled the order :greensml:

Attached :t

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 1766/1000 cycles

12963   cycles for 1000 * Ocmp2 (JJ)
6266    cycles for 1000 * Cmp128Dave
6270    cycles for 1000 * cmp128n (nidud)
9508    cycles for 1000 * cmp128 qWord
10295   cycles for 1000 * AxCMP128bit


Alex, there's something wrong, we have the slowest algos!! :dazzled:

nidud

#94
deleted

nidud

#95
deleted

Antariy

Quote from: jj2007 on August 20, 2013, 04:56:39 AM
Quote from: Antariy on August 20, 2013, 02:01:20 AM
Ooops, entangled the order :greensml:

Attached :t


Alex, there's something wrong, we have the slowest algos!! :dazzled:

Thank you, Jochen :t

Strange and interesting thing: this not-too-complicated algos seem to be very CPU-dependent.

Timings for your latest archive:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 2217/1000 cycles

22772   cycles for 1000 * Ocmp2 (JJ)
7344    cycles for 1000 * Cmp128Dave
33425   cycles for 1000 * cmp128n (nidud)
39887   cycles for 1000 * cmp128 qWord
10968   cycles for 1000 * AxCMP128bit

22775   cycles for 1000 * Ocmp2 (JJ)
7318    cycles for 1000 * Cmp128Dave
34449   cycles for 1000 * cmp128n (nidud)
39424   cycles for 1000 * cmp128 qWord
11218   cycles for 1000 * AxCMP128bit

22814   cycles for 1000 * Ocmp2 (JJ)
7333    cycles for 1000 * Cmp128Dave
33557   cycles for 1000 * cmp128n (nidud)
40793   cycles for 1000 * cmp128 qWord
10702   cycles for 1000 * AxCMP128bit


--- ok ---


jj2007

Quote from: nidud on August 20, 2013, 08:17:15 AM

cmp128 1, -1
jle error
jnb error

fail: cmp128 qWord
fail: cmp128n (nidud)
fail: cmp128 Dave

Alex version works

Are you sure?
oPlusOne GREATER oMinusOne (jj)
oPlusOne greater oMinusOne (nidud)

Antariy

Quote from: nidud on August 20, 2013, 05:44:59 AM
So, sorry Dave, but I'm picking on you again  :lol:

Your code fails (I think), and is equal to this (I think):

Yes, nidud is right here, Dave's code is more or less equal to the failing one I suggested couple pages ago (it consisted just from SUB/SBB/SBB/SBB) - if the number 1 is bigger than number 2 then it returned ZF flag set, too because of latest SUB (in Dave's version it's latest SBB).
(I.e. FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF is GREATER than FFFFFFFFFFFFFFFFFFFFFFFF00000001, but also is EQUAL :biggrin: - JGE/JAE/JE/JZ will have false jump.)

dedndave

Cmp128Dave MACRO OwA:REQ,OwB:REQ

;OwA and OwB are pointers to memory operands

still - my code may fail - and so may some others
we need a good validation routine before we worry about timing   :P

Antariy

Quote from: dedndave on August 20, 2013, 08:46:44 AM
still - my code may fail - and so may some others

I think you might just "trace" the code "in mind" with different conditions to get general idea if it works (like you did when I suggested brute SUB/SBB/SBB/SBB). There are not much different numbers are needed, actually, the ones chosen by Jochen are already enough - you may see that, for an instance, for you code, when numbers oBigF and oSmallF are compared (it returns GREATER and ZERO set).

But you're right here:
Quote from: dedndave on August 20, 2013, 08:46:44 AM
we need a good validation routine before we worry about timing   :P

We have such a validation here :P
Though, it's boring to parse results looking on a bunch of reported passed jump conditions :biggrin:

Antariy

My code is a bit bloated, but I think I checked it pretty thoroughly. And you can see that in general the idea is the same as in Jochen's SSE-powered algo: it just constructs final comparing number from highest order element (WORD which becomes high-order WORD of a constructed DWORD) and highest-order-different element (WORD which becomes low-order WORD of constructed DWORD). So, even if it's a bit bloated, it should work properly with no doubts, if it's implemented properly (no "mistypos" etc).

dedndave

test i will use to fix my code
by the way, 3200 tests are made (40 are repeats) - my current code only fails 2440 of them   :lol:

for the moment - back to work on my graph code
i will play with this some more later this week

;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        .List

;###############################################################################################

C128Dave PROTO :LPVOID,:LPVOID
TestCmp  PROTO :LPVOID,:LPVOID

;###############################################################################################

Cmp128Dave MACRO OwA:REQ,OwB:REQ

;OwA and OwB are pointers to memory operands

    mov     eax,dword ptr OwA[12]
    mov     edx,dword ptr OwA[8]
    sub     eax,dword ptr OwB[12]
    .if ZERO?
        cmp     edx,dword ptr OwB[8]
        mov     ecx,dword ptr OwA[4]
        .if ZERO?
            cmp     ecx,dword ptr OwB[4]
            mov     edx,dword ptr OwA[0]
            .if ZERO?
                cmp     edx,dword ptr OwB[0]
                .if !ZERO?
                    sbb     eax,eax
                .endif
            .endif
        .endif
    .endif

ENDM

;###############################################################################################

        .DATA

;each line is one set of comparison values: a DWORD and an OWORD
;the flag result after comparing the DWORD's from any 2 lines should
;be the same as after comparing the OWORD's on the same 2 lines
;each pair of lines is compared both ways (CMP a,b and CMP b,a) for a total of 3200 tests

TestVal dd 0,0,0,0,0
        dd 1,1,0,0,0
        dd 100h,0,1,0,0
        dd 10000h,0,0,1,0
        dd 1000000h,0,0,0,1

        dd 40000000h,0,0,0,40000000h
        dd 40000001h,1,0,0,40000000h
        dd 40000100h,0,1,0,40000000h
        dd 40010000h,0,0,1,40000000h
        dd 41000000h,0,0,0,40000001h

        dd 80000000h,0,0,0,80000000h
        dd 80000001h,1,0,0,80000000h
        dd 80000100h,0,1,0,80000000h
        dd 80010000h,0,0,1,80000000h
        dd 81000000h,0,0,0,80000001h

        dd 0C0000000h,0,0,0,0C0000000h
        dd 0C0000001h,1,0,0,0C0000000h
        dd 0C0000100h,0,1,0,0C0000000h
        dd 0C0010000h,0,0,1,0C0000000h
        dd 0C1000000h,0,0,0,0C0000001h

        dd 3FFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,3FFFFFFFh
        dd 3FFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,3FFFFFFFh
        dd 3FFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,3FFFFFFFh
        dd 3FFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,3FFFFFFFh
        dd 3EFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,3FFFFFFEh

        dd 7FFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,7FFFFFFFh
        dd 7FFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,7FFFFFFFh
        dd 7FFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,7FFFFFFFh
        dd 7FFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,7FFFFFFFh
        dd 7EFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,7FFFFFFEh

        dd 0BFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0BFFFFFFFh
        dd 0BFFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,0BFFFFFFFh
        dd 0BFFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,0BFFFFFFFh
        dd 0BFFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,0BFFFFFFFh
        dd 0BEFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0BFFFFFFEh

        dd 0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh
        dd 0FEFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh

;***********************************************************************************************

;        .DATA?

;###############################################################################################

        .CODE

;***********************************************************************************************

_main   PROC

        mov     esi,offset TestVal
        mov     ebx,40
        mov     edi,esi

loop00: push    ebx
        push    edi
        mov     ebx,40

loop01: INVOKE  TestCmp,esi,edi
        INVOKE  TestCmp,edi,esi
        dec     ebx
        lea     edi,[edi+20]
        jnz     loop01

        pop     edi
        pop     ebx
        add     esi,20
        dec     ebx
        jnz     loop00

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

_main   ENDP

;***********************************************************************************************

C128Dave PROC USES ESI EDI lpOp1:LPVOID,lpOp2:LPVOID

    mov     esi,lpOp1
    mov     edi,lpOp2
    mov     eax,[esi+12]
    mov     edx,[esi+8]
    sub     eax,[edi+12]
    .if ZERO?
        cmp     edx,[edi+8]
        mov     ecx,[esi+4]
        .if ZERO?
            cmp     ecx,[edi+4]
            mov     edx,[esi]
            .if ZERO?
                cmp     edx,[edi]
                .if !ZERO?
                    sbb     eax,eax
                .endif
            .endif
        .endif
    .endif
    ret

C128Dave ENDP

;***********************************************************************************************

TestCmp PROC USES EBX ESI EDI lpOp1:LPVOID,lpOp2:LPVOID

;OF = bit 11
;SF = bit 7
;ZF = bit 6
;CF = bit 0

    mov     esi,lpOp1
    mov     edi,lpOp2
    mov     eax,[esi]
    cmp     eax,[edi]
    push    ebp
    pushfd
    add     esi,4
    add     edi,4
;   INVOKE  C128Dave,esi,edi
    pushfd
    pop     ebx               ;EBX = OWORD compare result flags
    pop     ebp               ;EBP = DWORD compare result flags
    and     ebx,8C1h
    and     ebp,8C1h          ;OF SF ZF CF only
    .if ebx!=ebp
        print   chr$('cmp ')
        mov     eax,[esi+12]
        print   uhex$(eax),'_'
        mov     eax,[esi+8]
        print   uhex$(eax),'_'
        mov     eax,[esi+4]
        print   uhex$(eax),'_'
        mov     eax,[esi]
        print   uhex$(eax),' , '
        mov     eax,[edi+12]
        print   uhex$(eax),'_'
        mov     eax,[edi+8]
        print   uhex$(eax),'_'
        mov     eax,[edi+4]
        print   uhex$(eax),'_'
        mov     eax,[edi]
        print   uhex$(eax),13,10,'was: '
        .if ebx&800h
            print   chr$('OV ')
        .else
            print   chr$('NV ')
        .endif
        .if ebx&80h
            print   chr$('NG ')
        .else
            print   chr$('PL ')
        .endif
        .if ebx&40h
            print   chr$('ZR ')
        .else
            print   chr$('NZ ')
        .endif
        .if ebx&1
            print   chr$('CY')
        .else
            print   chr$('NC')
        .endif
        print   chr$(' should be: ')
        .if ebp&800h
            print   chr$('OV ')
        .else
            print   chr$('NV ')
        .endif
        .if ebp&80h
            print   chr$('NG ')
        .else
            print   chr$('PL ')
        .endif
        .if ebp&40h
            print   chr$('ZR ')
        .else
            print   chr$('NZ ')
        .endif
        .if ebp&1
            print   chr$('CY')
        .else
            print   chr$('NC')
        .endif
        print   chr$(13,10)
    .endif
    pop     ebp
    ret

TestCmp ENDP

;###############################################################################################

        END     _main

dedndave

the test program only displays on failure
here is an example of one fail:
cmp 00000000_00000000_00000000_00000000 , 40000000_00000001_00000000_00000000
was: NV PL NZ NC should be: NV NG NZ CY

Antariy

Quote from: dedndave on August 20, 2013, 10:03:15 AM
for the moment - back to work on my graph code
i will play with this some more later this week

:t