News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

jj2007

test_correct.zip:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

18963   cycles for 1000 * Ocmp (JJ)
18407   cycles for 1000 * Ocmp2 (JJ)
15392   cycles for 1000 * cmp128n (nidud)
8043   cycles for 1000 * cmp128 qWord
5544   cycles for 1000 * AxCMP128bit

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
14256   cycles for 1000 * Ocmp (JJ)
13015   cycles for 1000 * Ocmp2 (JJ)
15534   cycles for 1000 * cmp128n (nidud)
9508   cycles for 1000 * cmp128 qWord
10279   cycles for 1000 * AxCMP128bit


Ocmp2 passed all tests.

nidud

#136
deleted

dedndave

here it is in macro form - should be a bit faster   :P
Cmp128Dave MACRO OwA:REQ,OwB:REQ

;OwA and OwB are pointers to memory operands
;Example: Cmp128Dave offset Oword1,offset Oword2

    mov     eax,dword ptr OwA[0]
    xor     ecx,ecx
    cmp     eax,dword ptr OwB[0]
    mov     edx,dword ptr OwA[4]
    .if !ZERO?
        inc     ecx
    .endif
    sbb     edx,dword ptr OwB[4]
    mov     eax,dword ptr OwA[8]
    .if !ZERO?
        inc     ecx
    .endif
    sbb     eax,dword ptr OwB[8]
    mov     edx,dword ptr OwB[12]
    mov     eax,dword ptr OwA[12]
    .if !ZERO?
        inc     ecx
    .endif
    sbb     al,dl
    .if !ZERO?
        inc     ecx
    .endif
    .if CARRY?
        mov     dl,cl
        mov     al,ch
    .else
        mov     al,cl
        mov     dl,ch
    .endif
    cmp     eax,edx

ENDM

hutch--

i don't know if this is even vaguely useful as I have not been following this topic in any real detail but this may be useful to someone, a .486 compatible unsigned QWORD comparison algo.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    cmpqword PROTO :DWORD,:DWORD

    .data
      value1 QWORD 0000000000000000h
      value2 QWORD 0000000000000001h
      value3 QWORD 0000000000000002h

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    invoke cmpqword,ADDR value1,ADDR value2     ; 1 < 2
    print str$(eax),13,10

    invoke cmpqword,ADDR value3,ADDR value3     ; 3 = 3
    print str$(eax),13,10

    invoke cmpqword,ADDR value2,ADDR value1     ; 2 > 1
    print str$(eax),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

cmpqword proc pquad1:DWORD,pquad2:DWORD

  ; ----------------------
  ; unsigned QWORD compare
  ; ----------------------
    mov eax, [esp+4]
    mov edx, [esp+8]

    mov ecx, [eax+4]
    cmp ecx, [edx+4]    ; high DWORD 1st
    ja greater
    jb lessthan

    mov ecx, [eax]
    cmp ecx, [edx]      ; low DWORD next
    ja greater
    jb lessthan

    xor eax, eax        ; return 0 on equal
    ret 8

  lessthan:
    mov eax, -1         ; return -1 on less than
    ret 8

  greater:
    mov eax, 1          ; return 1 on greater
    ret 8

cmpqword endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

nidud

#139
deleted

dedndave

here is that one in macro form
i measure 11 cycles on my P4, which is pretty good

the values that are tested may not be an honest reflection of SAHF usage
i won't throw that other macro away, just yet   :P

Cmp128Dave MACRO OwA:REQ,OwB:REQ

;OwA and OwB are pointers to memory operands
;Example: Cmp128Dave offset Oword1,offset Oword2

    LOCAL   c1,c2,c3,c4

    mov     eax,dword ptr OwA[0]
    cmp     eax,dword ptr OwB[0]
    jnz     c1

    mov     eax,dword ptr OwA[4]
    cmp     eax,dword ptr OwB[4]
    jnz     c2

    mov     eax,dword ptr OwA[8]
    cmp     eax,dword ptr OwB[8]
    jnz     c3

    mov     eax,dword ptr OwA[12]
    cmp     eax,dword ptr OwB[12]
    jmp short c4

c1: mov     eax,dword ptr OwA[4]
    sbb     eax,dword ptr OwB[4]

c2: mov     eax,dword ptr OwA[8]
    sbb     eax,dword ptr OwB[8]

c3: mov     eax,dword ptr OwA[12]
    sbb     eax,dword ptr OwB[12]
    jnz     c4

    lahf
    lea     eax,[eax-4000h]
    sahf

c4:

ENDM

dedndave

not sure what the scaling factor is for cycles, but here's your last test   :P
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
---------------------------------------------------------
5663779 cycles for Cmp128Dave
8939593 cycles for Cmp128Dave2
8382453 cycles for Cmp128Nidud
---------------------------------------------------------
6032710 cycles for Cmp128Dave
9247293 cycles for Cmp128Dave2
7744632 cycles for Cmp128Nidud
---------------------------------------------------------

dedndave

wow - looking at the values again, it would seem that they all take the SAHF path
i am surprised that code does so well   :redface:

nidud

#143
deleted

nidud

#144
deleted

dedndave

ahhh - good point on the validation test data alignment
we can pad that with "empty" dwords to make it align
Alex's code doesn't use a control value, so that's another way to go

as for that timing chart.....

yes - it was a fast instruction in days of old
however, many instructions that explicitly manipulate flags seem to run slow under NT
CMC, STC, CLC are exceptions to that rule - they are ok

but CLD, STD, POPFD seem to be really slow under NT
i figured SAHF would be also
i think it's related to the protected OS thing
it has to verify that the flag change is allowed with current privileges before continuing

jj2007

I've tinkered with another one, it's fast but it doesn't pass all tests :(
@Alex: Could you modify CheckIt so that it produces less output? Such as: #tests failed?

ocjj=0
oqDeb=0
OcmpJJ MACRO ow0, ow1
LOCAL z0, z1
  ocjj=ocjj+1
  z0 CATSTR <ocJJ0>, %ocjj
  z1 CATSTR <ocJJ1>, %ocjj
   mov eax, dword ptr ow0[12]
   cmp eax, dword ptr ow1[12]
   jne z0   ; no test byte ptr
   mov eax, dword ptr ow0[8]
   mov edx, dword ptr ow1[8]
   cmp eax, edx
   jne z1
   mov eax, dword ptr ow0[4]
   mov edx, dword ptr ow1[4]
   cmp eax, edx
   jne z1
   mov eax, dword ptr ow0[0]
   mov edx, dword ptr ow1[0]
z1:
   test byte ptr ow1[15], 80h
   usedeb=01
   .if Sign?
      cmp eax, edx
      .if ! (Carry? && Sign?)
         xchg eax, edx      ; qSmallN
      .endif
   .endif
   cmp eax, edx
z0:
  if oqDeb
  .if Zero?
   print "&ow0", " equals  &ow1", 13, 10
  .elseif !Sign?
   print "&ow0", " greater &ow1", 13, 10
  .else
   print "&ow0", " lesser &ow1", 13, 10
  .endif
   print chr$(13, 10)
  endif
ENDM


Good night from Europe ;-)

nidud

#147
deleted

Antariy

Quote from: jj2007 on August 21, 2013, 02:32:04 AM
test_correct.zip:

Ocmp2 passed all tests.

AxCMP128bit too! :biggrin:

Quote from: jj2007 on August 21, 2013, 07:48:44 AM
@Alex: Could you modify CheckIt so that it produces less output? Such as: #tests failed?

Here it is. Now it prints the offsets of the numbers, not numbers themselves, so having a binary you have an info where to check and this makes output a lot smaller. Also in the same place an int 3 is executed if the prog is running under the debugger, and after that the jump to the repeat of a failed test is made - you may trace things or may jump over this jump (pun).
Is this suitable?

Also simplified Etalone a bit - more straightforward now.

Antariy

Dave, in your checking method you're checking for full corresponding of a flags that returned from compare of a control DWORD and flags returned from a comparing of a OWORD. But this is not actually right way, because the layout of SF and OF flags may be different but still have proper: by documentation signed jumps check only for (non-)equality of OF and SF flag, there are no any notes on that which layout of flags will be exactly after any compare (and I think this may be hardware-depended). I.e., if one compare returns OF=1 and SF=0, other compare returns OF=0 and SF=1 - these results are both equal to each other, because JB/JBE (and derivatives like JNA/JNAE) will jump.

My checking code is aware of this, but not yours, that's whay my comparing code doesn't pass your check. But it works, and works properly, because exact SF/OF flags layout is not fixed in standards.

In your case you too may make this like, after this:

    and     ebx,8C1h
    and     ebp,8C1h          ;OF SF ZF CF only

Make check this way:

   xor ebp,ebx
   .if ebp!=0 && ebp!=100010000000Y ; if OF and SF were "swapped" then XOR will make both bits set



An update. Checking is more strict + added new Jochen's experimental algo from upper post.

Jochen, though I was working on a testbed - the idea of that algo looks interesting.