Author Topic: Comparing 128-bit numbers aka OWORDs  (Read 118709 times)

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #30 on: August 14, 2013, 12:57:03 PM »
Jochen, if you limit code only to ZF and CF flags to check, how would you check if negative number is greater than zero, for an instance? That's why I preferred the standard flags setting in my version, and it seems more logical to use comparsion macro-(or call)-"instruction" in a way as usually do with CMP.

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #31 on: August 14, 2013, 02:34:34 PM »
Here is the test showing that all the value-comparsion-condition conditional jumps are working with my code. Request for a test :biggrin:

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #32 on: August 14, 2013, 06:50:53 PM »
yes - signed branches use the OF flag, as well
sign, carry, overflow - the one they give you that noone cares about is parity   :P
aux carry is rarely used, too - mainly for BCD math, i think

i was thinking of implementing a function that could compare integers of any size
Code: [Select]
INVOKE  ArbCmp,nNumberOfBytes,lpFirst,lpSecond
it could compare the high-order bytes as bytes until alignment on the second operand is found
after that (if still equal), you could use an aligned method until inequality is found
don't want to use JECXZ because it's slow

jj2007

  • Member
  • *****
  • Posts: 11135
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #33 on: August 14, 2013, 10:22:11 PM »
Here is the test showing that all the value-comparsion-condition conditional jumps are working with my code. Request for a test :biggrin:

Hi Alex,
Here it is ;-)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
18742   cycles for 1000 * Ocmp (JJ)
88733   cycles for 1000 * cmp128b (loop)
62612   cycles for 1000 * AxCMP128bit

18750   cycles for 1000 * Ocmp (JJ)
88605   cycles for 1000 * cmp128b (loop)
63719   cycles for 1000 * AxCMP128bit


And Ocmp passes all your tests now, i.e. it behaves exactly like a cmp eax, edx.

P.S.: The extra speed comes from the bswap instruction.

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #34 on: August 15, 2013, 04:05:40 AM »
Hi Jochen, here are results.

Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 2255/1000 cycles

25281   cycles for 1000 * Ocmp (JJ)
104235  cycles for 1000 * cmp128b (loop)
34615   cycles for 1000 * AxCMP128bit

25063   cycles for 1000 * Ocmp (JJ)
103227  cycles for 1000 * cmp128b (loop)
34196   cycles for 1000 * AxCMP128bit

28798   cycles for 1000 * Ocmp (JJ)
101244  cycles for 1000 * cmp128b (loop)
34112   cycles for 1000 * AxCMP128bit


--- ok ---


:t

jj2007

  • Member
  • *****
  • Posts: 11135
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #35 on: August 15, 2013, 04:45:37 AM »
Thanks to everybody, especially Alex :t

I have posted a "CMP defeats intuition" thread in the Campus because I was really surprised that a negative number can be "greater" than a positive one (and yes I know this is a stupid noob error :biggrin:).

Note the FPU behaves differently.

Gunther

  • Member
  • *****
  • Posts: 3689
  • Forgive your enemies, but never forget their names
Re: Comparing 128-bit numbers aka OWORDs
« Reply #36 on: August 15, 2013, 05:39:44 AM »
Jochen,

the new timings for you:
Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 2141/1000 cycles

1959    cycles for 1000 * Ocmp (JJ)
60752   cycles for 1000 * cmp128b (loop)
6356    cycles for 1000 * AxCMP128bit

2096    cycles for 1000 * Ocmp (JJ)
60573   cycles for 1000 * cmp128b (loop)
6196    cycles for 1000 * AxCMP128bit

1918    cycles for 1000 * Ocmp (JJ)
59383   cycles for 1000 * cmp128b (loop)
6058    cycles for 1000 * AxCMP128bit

--- ok ---

Gunther
Get your facts first, and then you can distort them.

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #37 on: August 15, 2013, 06:10:19 PM »
Thanks to everybody, especially Alex :t

:icon_redface: You're welcome, Jochen :t

I have posted a "CMP defeats intuition" thread in the Campus because I was really surprised that a negative number can be "greater" than a positive one (and yes I know this is a stupid noob error :biggrin:).

Note the FPU behaves differently.

The most annoying thing in that is that there is not a full set of instructions to work with the flags selectively, like CLC/STC/CMC/CLD/STD/CLI/STI. For example - how to simply (un)set OF flag? Is there any other, simpler way than I used (i.e., via RCR - it is the only instruction I know of that may change OF flag and preserve ZF and allows to restore CF to its previous state (before RCR))?

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #38 on: August 18, 2013, 01:53:45 AM »
if you try to manipulate the flags directly, you are likely to be disappointed by the speed
STC/CLC/CMC aren't too bad
POPF and SAHF are slower than you think they ought to be
SAHF doesn't allow you to manipulate the OF - stupid mistake by intel

but, you could come up with a set of operands/operations to generate the flag conditions, as desired

Code: [Select]
mov al,7Fh
mov ah,88h
sub al,ah

so, at the end of your code, you could have
Code: [Select]
SetFlags:
sub al,88h

and branch to that location with different values in AL
something like that   :P

jj2007

  • Member
  • *****
  • Posts: 11135
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #39 on: August 18, 2013, 02:12:44 AM »
The solution for setting the flags in the Ocmp & Qcmp macros was actually inspired by Alex' insistence that the macro should behave exactly like a cmp reg32, reg32. So the trick is to take two OWORDs, for example:

  oSmall   OWORD 88000000000000000000000000000100h   ; 88=NEGATIVE
  oBig   OWORD 88000000000000000000000000000300h

.. to scan them with pcmpeqb & bsr for the first different byte, and then "construct" two reg32:
  eax = 88000001
  edx = 88000003

Then, a simple cmp eax, edx sets the flags. Simple and fast...

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #40 on: August 18, 2013, 02:15:39 AM »
that is a great solution   :t

jj2007

  • Member
  • *****
  • Posts: 11135
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #41 on: August 18, 2013, 02:20:39 AM »
that is a great solution   :t
Thanks ;-)
By the way, if you put the pcmpeqb part into a loop, it should be easy to implement the arbitrary length algo you proposed. If the length is below OWORD, you will have to clear the last bits after bsr.

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #42 on: August 18, 2013, 02:29:59 AM »
i like the flag-setting solution
i am not as fond of using BSR   :P

jj2007

  • Member
  • *****
  • Posts: 11135
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #43 on: August 18, 2013, 03:01:41 AM »
Then cmps is your candidate, Dave ;-)

qWord

  • Member
  • *****
  • Posts: 1476
  • The base type of a type is the type itself
    • SmplMath macros
Re: Comparing 128-bit numbers aka OWORDs
« Reply #44 on: August 18, 2013, 04:14:31 AM »
just as addition, one might try this x86 solution. If I'm not wrong, it also behaves like the CMP instruction:
Code: [Select]
cmp128 macro ow0,ow1
LOCAL @NE1,@NE2,@NE3,@end
lea esi,ow0
lea edi,ow1
mov eax,[esi+0*DWORD]
mov ecx,[esi+1*DWORD]
mov edx,[esi+2*DWORD]
mov ebx,[esi+3*DWORD]
sub eax,[edi+0*DWORD]
jnz @NE1
sbb ecx,[edi+1*DWORD]
jnz @NE2
sbb edx,[edi+2*DWORD]
jnz @NE3
sbb ebx,[edi+3*DWORD]
;/* equal */
jmp @end

@NE1: sbb ecx,[edi+1*DWORD]
@NE2: sbb edx,[edi+2*DWORD]
@NE3: sbb ebx,[edi+3*DWORD]
;/* LT or GT */
jnz @end
or ebx,2    ; MOV may be better, because it breaks the dependency chain...
cmp ebx,1 ; new flags: A GT B
@end:
endm
MREAL macros - when you need floating point arithmetic while assembling!