News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

dedndave

eliminated most of the repeat tests (doh)
now, 1600 tests are performed

nidud

#166
deleted

Antariy

Quote from: dedndave on August 22, 2013, 05:03:58 AM
Alex is missing an important point
it's not enough that SF=OF or SF<>OF
because YOU DON'T KNOW WHICH Jxx INSTRUCTION WILL BE USED
you have to set the flags so that any of the ones listed above will work

No, this is you missing the point.

My comparsion code will work for every kind of signed/unsigned numbers-comparsion jumps. My code sets CF and ZF properly, but it may set SF and OF differently than YOUR CHECKING METHOD WAITS it will do. But with CPU IT WILL WORK PROPERLY, because CPU IS NOT YOUR CHECKING CODE.

Your checking code is incomplete and thus it fails itself. You may argue with this more, but this is senseless disput, since you totally don't get what I'm trying to say.

You may try to find any numbers with which my code + following JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE will fail, but you would not find that numbers.

My code is working and working properly. You may even bet for contrary for $100000000000, but you will never win.
My checking code is the only code in the thread that properly checks the flags returned for numbers comparsion and following JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE.

Specially for those, who still don't get my point from previous posts, where I omitted CF and ZF flags (which's state was assummed to be equal in both flags sets).
These flags sets:

CF=1, ZF=0, OF=1, SF=1 ARE THE SAME AS CF=1, ZF=0, OF=0, SF=0

CF=1, ZF=0, OF=0, SF=1 ARE THE SAME AS CF=1, ZF=0, OF=1, SF=0

And these:

CF=0, ZF=1, OF=1, SF=1 ARE THE SAME AS CF=0, ZF=1, OF=0, SF=0

CF=1, ZF=0, OF=0, SF=1 ARE THE SAME AS CF=1, ZF=0, OF=1, SF=0

And so on. Take your special attention on the MUTUAL STATE (I said these words much in the thread :greensml:) OF and SF flags in these tables, then read the docs, then take attention on the table again, and then read the docs.

This is CPUs behaviour. But your code isn't aware of this. Though this is only question of two instructions to make code aware - and I showed that instructions.



And then, if it's still not enough, well, try to write the code which will use my code to compare any numbers you will chose, and then execute each of these instructions: JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE, and then will fail. This is challenge, yeah. One may decline this challenge with words like "it's waste of time, I'm not in duty to do that", of course. But this still will not prove that one's point.


I theoretically and practically proved my point in this thread - if the descriptions "are not enough" for opponents - that's their own problems. When opponents do not want to understand or to check what was said to them - this is not adequate disput. It's not so much to grab any "failed" numbers, then compare them with my code, and then try every of only 9 conditional jumps we are all make working code for.

No one will be able theoretically prove contrary if the one will read documentation. Anyone may practically try to find numbers with which my code + following JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE will fail, but one will not find such numbers.

dedndave

my friend Alex   :biggrin:

rather than sorting all that out, let's look at some cases that fail one test, but pass the other

here is one that passes your test, but fails mine

cmp 00000000_00000000_00000000_00000000 , 80000000_00000000_00000000_00000001
was: OV NG NZ CY should be: NV PL NZ CY

if we subtracted 00000000_00000000_00000000_00000000 - 80000000_00000000_00000000_00000001
the result would be 7FFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF (sign = 0)

you can make a simple test:
mov eax,0
cmp eax,80000001h
js is_neg


if i execute a JG, JGE, JL, or JLE instruction after comparing these, your code will work as it should
however, if i execute a JS, JNS, JO, or JNO instruction, it will not

i ran your code through my test and got 48 failures
attached is the text file.....

sinsi

If you guys are working on 4 dwords, why not work on 4 bytes?
The logic is the same and we can easily test for errors. Forget speed, get the basics going.

Once you figure it out it should be easy to do n-bit comparisons.

Antariy

Quote from: dedndave on August 22, 2013, 07:33:19 PM
if i execute a JG, JGE, JL, or JLE instruction after comparing these, your code will work as it should
however, if i execute a JS, JNS, JO, or JNO instruction, it will not

Dave, but the point was to make a comparsion code, i.e. just compare numbers: which one is greater, lesser, above, below, or they equal. There was no points about J(N)S and J(N)O jumps - only compare jumps. Thus, all my posts based on this - only "value compare" point.

The target was make code which will properly work for JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE, but not J(N)O, and this probably was stated in first pages of the thread, as well as in my thread about "128bit CMP" in the lab I said this point - and said that JS will not work properly, too, but one may use test byte ptr [oword+15],80h and then JS (just about ~15 bytes length code and 2 instructions) - instead of using such a bloated algos like we write. It's question of usability - I get it that.

So, thus all my posts was based on the point that we all working on algos which will satisfy these jumps: JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE, as well as my checking algo based on this, and my comparsion algo, too.

Since it's very unlikely anyone may want to compare two 128 bit numbers, and then execute J(N)O jump.

Here is one more variation - partially based on Jochen's experimental algo posted on page 10.

It passes my check, and doesn't passes Dave's check, but anyone may see that the logic is so simple that it just may not fail for JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE. So, we probably worked on checking algos having different specifications of the target :lol:


option prologue:none
option epilogue:none
AxCMP128bitProc2 proc n1,n2
mov eax,[esp+4]
mov edx,[esp+8]
xor ecx,ecx
mov [esp+4],esi
mov [esp+8],edi


mov esi,[eax+12]
test dword ptr [eax+12],80000000h
mov edi,[eax+8]

sets cl

cmp esi,[edx+12]
jnz @l0

cmp edi,[edx+8]
mov esi,[eax+4]
jnz @l1

cmp esi,[edx+4]
mov edi,[eax]
jnz @l1

cmp edi,[edx]
jnz @l1

mov esi,[esp+4]
mov edi,[esp+8]
ret 8

@l1:

ja @F
mov esi,[predata1+ecx*8]
cmp esi,[predata1+4+ecx*8]
mov esi,[esp+4]
mov edi,[esp+8]
ret 8

@@:
mov esi,[predata2+ecx*8]
cmp esi,[predata2+4+ecx*8]

@l0:
mov esi,[esp+4]
mov edi,[esp+8]
ret 8

align 4
predata1 label dword
dd 1,2,-2,-1

predata2 label dword
dd 2,1,-1,-2

AxCMP128bitProc2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


Though this code is slower on my machine than my macro code based on first Jochen's idea.

Antariy

And, yes, I'm glad that you understand me - and agree that it will work for JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE - and we sorted out why were unable to understand the points of each other.

Antariy

I think probably qWord's code was based on idea of only-comparsion, too.


So, shortly speaking, for your checking code, if you see that the state of OF and SF is just reversed contrary to the "should be", after testing of my code, then it's OK, because it's "by design" :biggrin:

I.e., here in your test if you see something like

was: OV NG NZ CY should be: NV PL NZ CY


OF=1, SF=1, should be OF=0, SF=0, ZF and CF are the same - so, that's OK for JA/JB/JAE/JBE/JG/JL/JGE/JLE/JE. But not for JS or JO, but I did not write the comparsion code for support of JS and JO.

nidud

#173
deleted

nidud

#174
deleted

Antariy

Quote783467   cycles for Cmp128Axel (xor)

*Suspeciously looking to the proc name and then to the source*

LOL

The algo not only may reverse the state of a flags but reverse the order of letters in their name (if it was named by the author's name) :biggrin:
Very funny :t

results:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
---------------------------------------------------------
5881815 cycles for Cmp128Dave
9914640 cycles for Cmp128Dave2
5894460 cycles for Cmp128Nidud
2936083 cycles for Cmp128NidudSEE (xor)
2444879 cycles for Cmp128NidudSEEU (unsigned)
994634  cycles for Cmp128Axel (xor)
941599  cycles for Cmp128DaveU (unsigned)
962220  cycles for Cmp128NidudU (unsigned)
---------------------------------------------------------
5905129 cycles for Cmp128Dave
9810117 cycles for Cmp128Dave2
5910750 cycles for Cmp128Nidud
2766487 cycles for Cmp128NidudSEE (xor)
2443439 cycles for Cmp128NidudSEEU (unsigned)
969855  cycles for Cmp128Axel (xor)
933817  cycles for Cmp128DaveU (unsigned)
858812  cycles for Cmp128NidudU (unsigned)
---------------------------------------------------------

--- ok ---





Results for attached archive - added Jochen's code and my earlier tweak of his code.
Also renamed some labels in the source, Axel was not contrary :biggrin:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
---------------------------------------------------------
5843868 cycles for Cmp128Dave
9540057 cycles for Cmp128Dave2
5861747 cycles for Cmp128Nidud
2684675 cycles for Cmp128NidudSEE (xor)
2440403 cycles for Cmp128NidudSEEU (unsigned)
946568  cycles for Cmp128Alex (xor)
950729  cycles for Cmp128DaveU (unsigned)
988245  cycles for Cmp128NidudU (unsigned)
3537702 cycles for JJAxCMP128bit (SSE)
5887886 cycles for Ocmp2 - JJ's (SSE)
---------------------------------------------------------
5847877 cycles for Cmp128Dave
9582836 cycles for Cmp128Dave2
5915661 cycles for Cmp128Nidud
2731934 cycles for Cmp128NidudSEE (xor)
2439047 cycles for Cmp128NidudSEEU (unsigned)
1040627 cycles for Cmp128Alex (xor)
956229  cycles for Cmp128DaveU (unsigned)
950693  cycles for Cmp128NidudU (unsigned)
3533400 cycles for JJAxCMP128bit (SSE)
5874002 cycles for Ocmp2 - JJ's (SSE)
---------------------------------------------------------

--- ok ---




nidud, I inserted your Cmp128NidudSEEU algo into my testbed, it doesn't passes the check - many numbers, but here is first one:

00000000 00000000 00000000 00000000 and 00000000 80000000 40000001 00000000

It returns that first number is above and greater than second (CF=0, SF=0, OF=0, ZF=0) - JA/JG/JAE/JGE will jump.

SSE comparsion instructions are only-signed, that's the reason probably (i.e., they decide the OWORD just as set of signed DWORDs).


Here is the checking code of my testbed - it gets too entangled and contains too much old/not working code, so, probably it's better to post it as "external".



; EAX = BITS: ... CF ZF SF OF
FlagsToEAX MACRO
pushfd
xor eax,eax
pop edx
bt edx,0
rcl eax,1
bt edx,6
rcl eax,1
bt edx,7
rcl eax,1
bt edx,11
rcl eax,1
ENDM

Etalone MACRO ow0, ow1
LOCAL @l1, @l2, @l3, @l0, @l4

push ebx

mov eax,dword ptr [ow0+12]
mov edx,dword ptr [ow1+12]
cmp eax,edx
jnz @l1 ; just save flags

mov ecx,dword ptr [ow0+8]
mov ebx,dword ptr [ow1+8]
cmp ecx,ebx
jnz @l2

mov ecx,dword ptr [ow0+4]
mov ebx,dword ptr [ow1+4]
cmp ecx,ebx
jnz @l2

mov ecx,dword ptr [ow0]
mov ebx,dword ptr [ow1]
cmp ecx,ebx
jz @l1



@l2:
mov eax,0
ja @l0 ; if it's above - the number is bigger because this isn't MSD

;mov byte ptr [esp+3],1 ; CF set, below than (unsigned)

mov eax,1001Y ; CF and OF set, below than and less than

; setting of a sign flag has no meaning in this case, it' superfluous

jmp @l0


@l1:
FlagsToEAX

@l0:

if 0
push eax
mov ebx,eax
test ebx,1
jz @F
print "OF "
@@:
test ebx,2
jz @F
print "SF "
@@:
test ebx,4
jz @F
print "ZF "
@@:
test ebx,8
jz @F
print "CF "
@@:

print chr$(13,10)
pop eax
endif

pop ebx

ENDM


TestVal dd 0,0,0,0,0
        dd 1,1,0,0,0
        dd 100h,0,1,0,0
        dd 10000h,0,0,1,0
        dd 1000000h,0,0,0,1

        dd 40000000h,0,0,0,40000000h
        dd 40000001h,1,0,0,40000000h
        dd 40000100h,0,1,0,40000000h
        dd 40010000h,0,0,1,40000000h
        dd 41000000h,0,0,0,40000001h

        dd 80000000h,0,0,0,80000000h
        dd 80000001h,1,0,0,80000000h
        dd 80000100h,0,1,0,80000000h
        dd 80010000h,0,0,1,80000000h
        dd 81000000h,0,0,0,80000001h

        dd 0C0000000h,0,0,0,0C0000000h
        dd 0C0000001h,1,0,0,0C0000000h
        dd 0C0000100h,0,1,0,0C0000000h
        dd 0C0010000h,0,0,1,0C0000000h
        dd 0C1000000h,0,0,0,0C0000001h

        dd 3FFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,3FFFFFFFh
        dd 3FFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,3FFFFFFFh
        dd 3FFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,3FFFFFFFh
        dd 3FFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,3FFFFFFFh
        dd 3EFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,3FFFFFFEh

        dd 7FFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,7FFFFFFFh
        dd 7FFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,7FFFFFFFh
        dd 7FFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,7FFFFFFFh
        dd 7FFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,7FFFFFFFh
        dd 7EFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,7FFFFFFEh

        dd 0BFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0BFFFFFFFh
        dd 0BFFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,0BFFFFFFFh
        dd 0BFFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,0BFFFFFFFh
        dd 0BFFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,0BFFFFFFFh
        dd 0BEFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0BFFFFFFEh

        dd 0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFFFFFEh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFFFEFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFEFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh,0FFFFFFFFh
        dd 0FEFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFEh
       
        dd 0FFFF0001H,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
        dd 0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
        ArraySize EQU $-TestVal
        ;dd 123456h
       
       
CheckIt MACRO howToInvoke:REQ
CheckIt2 <howToInvoke>, 0, 0
CheckIt2 <howToInvoke>, 4, 0
CheckIt2 <howToInvoke>, 0, 4
CheckIt2 <howToInvoke>, 4, 4
ENDM

CheckIt2 MACRO howToInvoke, offst1:=<0>, offst2:=<0>
LOCAL @l3, @l2, @l1
;########################## CHECK

push esi
push edi
push ebx

print "#######################################################",13,10
print "Testing algo: ",@CatStr(<!">,<howToInvoke>,<!">)," offst1: ",@CatStr(<!">,<offst1>,<!">)," offst2: ",@CatStr(<!">,<offst2>,<!">),13,10
mov esi,offset TestVal+offst1

@l3:
mov ebx,ArraySize/16
mov edi,offset TestVal+offst1

@l2:
howToInvoke
FlagsToEAX
push eax
Etalone esi, edi
pop ecx
xor eax,ecx
jz @l1 ; test OK
cmp eax,3 ; layout of SF and OF flag may differ, but if they are both not equal
jz @l1 ; in first comparsion and in second comparsion, then it's proper result
@@: ; since signed less than is OF != SF with no difference which flags are (un)set
;int 3
if 0
print str$(ebx)," - Test failed: "
print uhex$(dword ptr [esi+12])
print "_"
print uhex$(dword ptr [esi+8])
print "_"
print uhex$(dword ptr [esi+4])
print "_"
print uhex$(dword ptr [esi])
print "  "
print uhex$(dword ptr [edi+12])
print "_"
print uhex$(dword ptr [edi+8])
print "_"
print uhex$(dword ptr [edi+4])
print "_"
print uhex$(dword ptr [edi]),13,10
else
print uhex$(esi)," "
print uhex$(edi),"   "
invoke IsDebuggerPresent
test eax,eax
jz @F
int 3
jmp @l2 ; repeat test to see it under the debugger, or skip this instruction
@@:


endif

@l1:

add edi,16+offst2
dec ebx
jnz @l2
add esi,16+offst2
cmp esi,offset TestVal+ArraySize
jb @l3

pop ebx
pop edi
pop esi

print "Test done",13,10,13,10,13,10

;##########################
ENDM



(this is the code from prog posted in the bottom of 10 page)

To test your algos just insert them into source, insert the code above, and then use this:

   CheckIt <Cmp128NidudSEEU [esi],[edi]>
   
   CheckIt <Cmp128NidudSEE [esi],[edi]>


To check algos that are procs and not a macroses, use this construct:


CheckIt <invoke AlgoName,esi,edi>


(esi and edi are obligatory to be specified as params)

It will print offsets of the failed numbers, and, if you are running the prog under the debugger, will break and then re-run failed comparsion, so you may trace things.

Did it pass Dave's check? :eek:


nidud

#176
deleted

nidud

#177
deleted

jj2007

#178
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
-----------------------------------------------
1996511 cycles for Cmp128Dave
3028413 cycles for Cmp128Dave2
1932345 cycles for Cmp128Nidud
1350691 cycles for Cmp128NidudSEE (xor)
1028161 cycles for Cmp128Alex (xor)
848824  cycles for Cmp128JJ (xor)
838612  cycles for Cmp128DaveU (unsigned)
838743  cycles for Cmp128NidudU (unsigned)

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
------------------------------------------------------
1897321 cycles for Cmp128Dave
6395118 cycles for Cmp128Dave2
1833315 cycles for Cmp128Nidud
1987302 cycles for Cmp128NidudSEE (xor)
921104  cycles for Cmp128Alex (xor)
737550  cycles for Cmp128JJ (xor)
873656  cycles for Cmp128DaveU (unsigned)
714518  cycles for Cmp128NidudU (unsigned)

nidud

#179
deleted