The MASM Forum

General => The Campus => Topic started by: xanatose on August 09, 2013, 03:57:13 PM

Title: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: xanatose on August 09, 2013, 03:57:13 PM
Assumed that they are aligned to 16 bytes boundaries on a 64 bit machine windows.
What would be faster way (performance wise) to compare two 128 bits integers?
Not only for equality but also for greater and less.


Title: Re: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: jj2007 on August 09, 2013, 04:06:34 PM
PCMPEQB/PCMPEQW/PCMPEQD--Packed Compare for Equal
and
PCMPGTB/PCMPGTW/PCMPGTD--Packed Compare for Greater Than

Depends on what you really want to achieve, though. One 128 bit number, or four dwords? If it's one number, use PCMPGTB, then check with bsf for the first bit set by e.g. PMOVMSKB eax, xmm0
Title: Re: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: qWord on August 09, 2013, 09:29:36 PM
For signed integers it is maybe more effective to use SUB+SBB for compare, because of the needed carry-logic.
Title: Re: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: FORTRANS on August 09, 2013, 09:49:03 PM
Hi,

   Using the compare strings instructions is easy.

REPNE CMPSD

Steve
Title: Re: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: dedndave on August 09, 2013, 11:45:29 PM
the problem with using string instructions is that you want to compare the high dword's first
so, you'd have to use STD and CLD, which adds about 200 cycles

here is a brute-force compare using 2 dword registers   :P
OwordA OWORD ?
OwordB OWORD ?

        mov     eax,dword ptr OwordA[12]
        mov     edx,dword ptr OwordA[8]
        cmp     eax,dword ptr OwordB[12]
        jnz     FlagsSet

        cmp     edx,dword ptr OwordB[8]
        mov     eax,dword ptr OwordA[4]
        jnz     FlagsSet

        cmp     eax,dword ptr OwordB[4]
        mov     edx,dword ptr OwordA[0]
        jnz     FlagsSet

        cmp     edx,dword ptr OwordB[0]

FlagsSet:

;flags are set as though you had executed CMP OwordA,OwordB
Title: Re: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: hool on August 12, 2013, 05:45:09 AM
A stock 6 instruction method with 8byte general purpose regs and 2 jumps will be the fastest (first you compare high digits and then low).
PCMPGT(B/W/D) are completely useless since they treat their sub elements as signed numbers
For 4byte x86 regs SSE might have some advantage, however using general purpose regs offers premature ending of code execution with nowadays fast short conditional jumps

Title: Re: What is the fastest way (performance wise) to compare two 128 bit integers
Post by: jj2007 on August 12, 2013, 11:43:53 AM
EDIT: See timings in the Lab (http://masm32.com/board/index.php?topic=2222.0)

Quote from: hool on August 12, 2013, 05:45:09 AM
A stock 6 instruction method with 8byte general purpose regs and 2 jumps will be the fastest (first you compare high digits and then low).

Depends on the nature of the data - unfortunately the OP has not been very specific. With PCMPEQB, you could quickly eliminate the equal cases, and if not equal, decide where to enter the loop.

include \masm32\MasmBasic\MasmBasic.inc        ; download (http://masm32.com/board/index.php?topic=94.0)
.data
O0        OWORD 0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
O1        OWORD 0FFFFFFFFCCFFFFFFFFFFCCFFFFFFFFFFh

        Init
        movups xmm0, O0
        movups xmm1, O1
        movaps xmm2, xmm0        ; copy for pcmpeqb
        PCMPEQB xmm2, xmm1
        Print CrLf$, "16 bits", Tb$, Tb$, Tb$, "5432109876543210", CrLf$
        PMOVMSKB eax, xmm2        ; show in ax where xmm0 differs to xmm1
        xor ecx, ecx
        not ax        ; we are interested in bits set
        deb 4, "MSKB", b:ax
        bsr ecx, eax
        deb 4, "msb set", ecx
        bsf ecx, eax
        deb 4, "lsb set", ecx
        Inkey
        Exit
end start


Output:
16 bits                 5432109876543210
MSKB    b:ax            0000100000100000
msb set ecx             11
lsb set ecx             5