News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

nidud

#240
deleted

FORTRANS

Quote from: dedndave on August 23, 2013, 10:20:37 PM
i did create a new set of values
but, i haven't had time to validate the standard flags proc

Hi,

   In Reply #187 Dave had an array of test values.  I just tested
my algorithm against those, and passed.  I created the check
values as he had in his earlier validation program as that was what
I based mine on.  Would that still be of use to anyone else?  I know
you want fast algorithms, and mine is most probably slow.  (And it
is 16-bit to run it on an 80186.)  Anyone interested in it?

Regards,

Steve N.

Antariy

Quote from: nidud on August 27, 2013, 01:12:12 AM
Quote from: Antariy on August 26, 2013, 12:38:23 PM
Quote from: nidud on August 26, 2013, 11:35:51 AM

mov eax,1
bsf eax,eax


Is this works? :icon_eek:

The trick is to clear the zero flag without changing any of the other flags. In Dave's code this is done like this:
    jnz     c4
    lahf
    lea     eax,[eax-4000h]
    sahf


BSF are one of the few upcodes that only modifies ZF, but it is a bit slow.
Clocks
BSF 6-42
BSR 6-103


The problems is that BSF sets only ZF flag, but other flags after instruction are "undefined". For the flags this means that its state is absolutely unpredictable, and, for my CPU (and maybe (!) for every Intel), they are all (except ZF) zeroed. In short - BSF cannot be used for this purpose with any robustness (if on some CPU the flags aren't touched, on other CPU they may be messed). Check it on your CPU - is BSF trashed other flags? Did it passed Dave's check? If so, then your CPU doesn't change other flags with BSF, otherwice it should not pass the check.



Can I ask everyone for more timings for this archive? http://masm32.com/board/index.php?topic=2222.msg23743#msg23743
It's interesting how worth the rework of old code is.

Antariy


dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2044430 cycles [x][x][x] - Cmp128Nidud
2232933 cycles [x][x][x] - Cmp128NidudSSE
2631658 cycles [x][x][x] - Cmp128Dave
3862003 cycles [x][x][x] - Cmp128Dave2
1601513 cycles [x][x][x] - Cmp128JJAlexSSE_1
1559401 cycles [x][x][x] - Cmp128JJAlexSSE_2
1791892 cycles [x][x][x] - Cmp128JJAlexSSE_3
935826  cycles [x][x][ ] - Cmp128Alex
1729147 cycles [x][x][x] - Cmp128Alex_2
1779960 cycles [x][x][x] - Cmp128Alex_3
1913773 cycles [x][x][ ] - Cmp128JJSSE
1302324 cycles [x][x][ ] - AxCMP128bitProc3
1253729 cycles [x][x][ ] - AxCMP128bitProc3c (cmov)
701808  cycles [x][ ][ ] - Cmp128DaveU
752020  cycles [x][ ][ ] - Cmp128NidudU

CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2033683 cycles [x][x][x] - Cmp128Nidud
2247710 cycles [x][x][x] - Cmp128NidudSSE
2628652 cycles [x][x][x] - Cmp128Dave
3813015 cycles [x][x][x] - Cmp128Dave2
1629220 cycles [x][x][x] - Cmp128JJAlexSSE_1
1591177 cycles [x][x][x] - Cmp128JJAlexSSE_2
1794286 cycles [x][x][x] - Cmp128JJAlexSSE_3
936215  cycles [x][x][ ] - Cmp128Alex
1725124 cycles [x][x][x] - Cmp128Alex_2
1782223 cycles [x][x][x] - Cmp128Alex_3
1900926 cycles [x][x][ ] - Cmp128JJSSE
1331104 cycles [x][x][ ] - AxCMP128bitProc3
1260544 cycles [x][x][ ] - AxCMP128bitProc3c (cmov)
696199  cycles [x][ ][ ] - Cmp128DaveU
734917  cycles [x][ ][ ] - Cmp128NidudU

Siekmanski


Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
700120  cycles [x][x][x] - Cmp128Nidud
730067  cycles [x][x][x] - Cmp128NidudSSE
972859  cycles [x][x][x] - Cmp128Dave
3028178 cycles [x][x][x] - Cmp128Dave2
784587  cycles [x][x][x] - Cmp128JJAlexSSE_1
890498  cycles [x][x][x] - Cmp128JJAlexSSE_2
928216  cycles [x][x][x] - Cmp128JJAlexSSE_3
680786  cycles [x][x][ ] - Cmp128Alex
1108150 cycles [x][x][x] - Cmp128Alex_2
1114646 cycles [x][x][x] - Cmp128Alex_3
1069239 cycles [x][x][ ] - Cmp128JJSSE
871461  cycles [x][x][ ] - AxCMP128bitProc3
889968  cycles [x][x][ ] - AxCMP128bitProc3c (cmov)
592212  cycles [x][ ][ ] - Cmp128DaveU
570113  cycles [x][ ][ ] - Cmp128NidudU

--- ok ---

Creative coders use backward thinking techniques as a strategy.

Antariy

Thank you very much, Dave and Marinus! :biggrin:

sinsi

Here you go Alex

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
---------------------------------------------------
332168  cycles [x][x][x] - Cmp128Nidud
339908  cycles [x][x][x] - Cmp128NidudSSE
671045  cycles [x][x][x] - Cmp128Dave
1346792 cycles [x][x][x] - Cmp128Dave2
299201  cycles [x][x][x] - Cmp128JJAlexSSE_1
386241  cycles [x][x][x] - Cmp128JJAlexSSE_2
398699  cycles [x][x][x] - Cmp128JJAlexSSE_3
508040  cycles [x][x][ ] - Cmp128Alex
610825  cycles [x][x][x] - Cmp128Alex_2
608946  cycles [x][x][x] - Cmp128Alex_3
378717  cycles [x][x][ ] - Cmp128JJSSE
467459  cycles [x][x][ ] - AxCMP128bitProc3
417007  cycles [x][x][ ] - AxCMP128bitProc3c (cmov)
360011  cycles [x][ ][ ] - Cmp128DaveU
383098  cycles [x][ ][ ] - Cmp128NidudU
🍺🍺🍺

nidud

#248
deleted

Antariy

Quote from: sinsi on August 27, 2013, 05:07:12 PM
Here you go Alex

Thank you very much, John! :biggrin:

Quote from: nidud on August 27, 2013, 05:22:34 PM
Quote from: Antariy on August 27, 2013, 12:32:27 PM
The problems is that BSF sets only ZF flag, but other flags after instruction are "undefined". For the flags this means that its state is absolutely unpredictable, and, for my CPU (and maybe (!) for every Intel), they are all (except ZF) zeroed. In short - BSF cannot be used for this purpose with any robustness (if on some CPU the flags aren't touched, on other CPU they may be messed). Check it on your CPU - is BSF trashed other flags? Did it passed Dave's check? If so, then your CPU doesn't change other flags with BSF, otherwice it should not pass the check.

Unless this is not specifically stated in the Intel manual, that can't be the case. It will mean the same as to say that "on some CPU's the upcode INC sometimes cleared the CF flag", which is not the case.

I may be wrong in this claim, and if this is the case, the attached test will fail on some (your's?) CPU's.


It's stated in Intel's manual. Maybe you're using some textual portable (shortened) version of it, but in full version other flags are stated "undefined".

Interesting enough, it seem that your AMD doesn't trash other flags.

(results truncated since too long)

cmp 00000000_00000000_00000000_00000000 , 00000000_00000000_00000000_00000001
AX:DX 0000EB94 was: NO NS NZ NC should be: NO SF NZ CY
cmp 00000000_00000000_00000000_00000000 , 00000000_00000000_00000001_FFFFFFFF
AX:DX 0000EB94 was: NO NS NZ NC should be: NO SF NZ CY

AX:DX 0000EB94 was: NO NS NZ NC should be: NO SF NZ NC
cmp C0000001_00000000_00000000_00000000 , 40000001_00000000_00000000_00000000
AX:DX 0000EB94 was: NO NS NZ NC should be: NO SF NZ NC

1365 Failures

Press any key to continue ...


Can anyone here with AMD CPU and Intel CPU run the test attached in the post above?
(Maybe we found the fastest "CPUID" functionality for the IsIntelOrAMD routine :biggrin:)

dedndave

with nidud's Cmp128Eval program, i get 1365 failures

here is a little test program....
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

    All Flags Set: OV NG ZR AC PE CY
            BSF 1: NV PL NZ NA PE NC
            BSR 1: NV PL NZ NA PE NC

All Flags Cleared: NV PL NZ NA PO NC
            BSF 1: NV PL NZ NA PE NC
            BSR 1: NV PL NZ NA PE NC


judging from the parity flag, it looks like it explicitly sets some of the flags, other than the ZF

Antariy

Yes, I have the same results


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

    All Flags Set: OV NG ZR AC PE CY
            BSF 1: NV PL NZ NA PE NC
            BSR 1: NV PL NZ NA PE NC

All Flags Cleared: NV PL NZ NA PO NC
            BSF 1: NV PL NZ NA PE NC
            BSR 1: NV PL NZ NA PE NC

Press any key to continue ...


Quote from: dedndave on August 27, 2013, 09:00:40 PM
judging from the parity flag, it looks like it explicitly sets some of the flags, other than the ZF
Quote from: Antariy on August 27, 2013, 12:32:27 PM
for my CPU (and maybe (!) for every Intel), they are all (except ZF) zeroed

They all zeroed and parity seems to be set properly for the result.

FORTRANS

Hi,

Quote from: Antariy on August 27, 2013, 01:20:42 PM
Quote from: FORTRANS on August 27, 2013, 08:48:04 AMAnyone interested in it?

Yes, Steve, of course! I'm interested :t

   Okay, here it is.  16-bit, but would be easy to convert to 32-bits.


; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Compare two large numbers.  Bigger than can fit into a register to be
; compared with the CMP instruction.  This Algorithm is based (loosely)
; on a discussion between deadndave and jj2007 of the MASM Forum.  With
; commentary from others.  See Comparing 128-bit numbers aka OWORDs, in
; The Laboratory.  Note, that the source and destination are subtracted
; differently between CMPS and CMP.  And that does not matter here as I
; only test for equality, where order doesn't matter.  The final result
; is from a CMP.
; SRN, 22/25 August 2013.
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
;  INPUT:  (E)SI pointing to an OWORD number.
;          (E)DI pointing to an OWORD number.
; OUTPUT:  Flags set from comparison.
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CMPSVal PROC
        PUSH    SI      ; Dave is using these as counters, so preserve.
        PUSH    DI

        ADD     SI,15   ; Point to last (high) byte of OWORD.
        ADD     DI,15

        MOV     AH,[DI] ; Put OWRD high bytes into AH and DH.
        MOV     DH,[SI]

        MOV     CX,16
        STD             ; Go from high to low order bytes.
   REPE CMPSB           ; Do the comparison.

        CMP     CX,15   ; Fixed it.  Almost.
        JNE     CV_1

   REPE CMPSB
CV_1:
        CLD

        MOV     AL,[DI+1] ; Put lower order byt into AL and DL.
        MOV     DL,[SI+1]
        CMP     AX,DX     ; Return flags.

        POP     DI
        POP     SI

        RET

CMPSVal ENDP


Enjoy,

Steve N.

Siekmanski

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)

    All Flags Set: OV NG ZR AC PE CY
            BSF 1: OV NG NZ AC PE CY
            BSR 1: OV NG NZ AC PE CY

All Flags Cleared: NV PL NZ NA PO NC
            BSF 1: NV PL NZ NA PO NC
            BSR 1: NV PL NZ NA PO NC

Press any key to continue ...
Creative coders use backward thinking techniques as a strategy.

Antariy

No, it will not work as IsIntelOrAMD check :biggrin:
Thank you, Marinus :t