News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

What is the fastest way (performance wise) to compare two 128 bit integers

Started by xanatose, August 09, 2013, 03:57:13 PM

Previous topic - Next topic

xanatose

Assumed that they are aligned to 16 bytes boundaries on a 64 bit machine windows.
What would be faster way (performance wise) to compare two 128 bits integers?
Not only for equality but also for greater and less.



jj2007

PCMPEQB/PCMPEQW/PCMPEQD--Packed Compare for Equal
and
PCMPGTB/PCMPGTW/PCMPGTD--Packed Compare for Greater Than

Depends on what you really want to achieve, though. One 128 bit number, or four dwords? If it's one number, use PCMPGTB, then check with bsf for the first bit set by e.g. PMOVMSKB eax, xmm0

qWord

For signed integers it is maybe more effective to use SUB+SBB for compare, because of the needed carry-logic.
MREAL macros - when you need floating point arithmetic while assembling!

FORTRANS

Hi,

   Using the compare strings instructions is easy.

REPNE CMPSD

Steve

dedndave

the problem with using string instructions is that you want to compare the high dword's first
so, you'd have to use STD and CLD, which adds about 200 cycles

here is a brute-force compare using 2 dword registers   :P
OwordA OWORD ?
OwordB OWORD ?

        mov     eax,dword ptr OwordA[12]
        mov     edx,dword ptr OwordA[8]
        cmp     eax,dword ptr OwordB[12]
        jnz     FlagsSet

        cmp     edx,dword ptr OwordB[8]
        mov     eax,dword ptr OwordA[4]
        jnz     FlagsSet

        cmp     eax,dword ptr OwordB[4]
        mov     edx,dword ptr OwordA[0]
        jnz     FlagsSet

        cmp     edx,dword ptr OwordB[0]

FlagsSet:

;flags are set as though you had executed CMP OwordA,OwordB

hool

A stock 6 instruction method with 8byte general purpose regs and 2 jumps will be the fastest (first you compare high digits and then low).
PCMPGT(B/W/D) are completely useless since they treat their sub elements as signed numbers
For 4byte x86 regs SSE might have some advantage, however using general purpose regs offers premature ending of code execution with nowadays fast short conditional jumps


jj2007

EDIT: See timings in the Lab

Quote from: hool on August 12, 2013, 05:45:09 AM
A stock 6 instruction method with 8byte general purpose regs and 2 jumps will be the fastest (first you compare high digits and then low).

Depends on the nature of the data - unfortunately the OP has not been very specific. With PCMPEQB, you could quickly eliminate the equal cases, and if not equal, decide where to enter the loop.

include \masm32\MasmBasic\MasmBasic.inc        ; download
.data
O0        OWORD 0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
O1        OWORD 0FFFFFFFFCCFFFFFFFFFFCCFFFFFFFFFFh

        Init
        movups xmm0, O0
        movups xmm1, O1
        movaps xmm2, xmm0        ; copy for pcmpeqb
        PCMPEQB xmm2, xmm1
        Print CrLf$, "16 bits", Tb$, Tb$, Tb$, "5432109876543210", CrLf$
        PMOVMSKB eax, xmm2        ; show in ax where xmm0 differs to xmm1
        xor ecx, ecx
        not ax        ; we are interested in bits set
        deb 4, "MSKB", b:ax
        bsr ecx, eax
        deb 4, "msb set", ecx
        bsf ecx, eax
        deb 4, "lsb set", ecx
        Inkey
        Exit
end start


Output:
16 bits                 5432109876543210
MSKB    b:ax            0000100000100000
msb set ecx             11
lsb set ecx             5