What is the fastest way (performance wise) to compare two 128 bit integers

xanatose · August 09, 2013, 03:57:13 PM

Assumed that they are aligned to 16 bytes boundaries on a 64 bit machine windows.
What would be faster way (performance wise) to compare two 128 bits integers?
Not only for equality but also for greater and less.

jj2007 · August 09, 2013, 04:06:34 PM

PCMPEQB/PCMPEQW/PCMPEQD--Packed Compare for Equal
and
PCMPGTB/PCMPGTW/PCMPGTD--Packed Compare for Greater Than

Depends on what you really want to achieve, though. One 128 bit number, or four dwords? If it's one number, use PCMPGTB, then check with bsf for the first bit set by e.g. PMOVMSKB eax, xmm0

qWord · August 09, 2013, 09:29:36 PM

For signed integers it is maybe more effective to use SUB+SBB for compare, because of the needed carry-logic.

FORTRANS · August 09, 2013, 09:49:03 PM

Hi,

Using the compare strings instructions is easy.

REPNE CMPSD

Steve

dedndave · August 09, 2013, 11:45:29 PM

the problem with using string instructions is that you want to compare the high dword's first
so, you'd have to use STD and CLD, which adds about 200 cycles

here is a brute-force compare using 2 dword registers :P

Code Select

OwordA OWORD ?
OwordB OWORD ?

        mov     eax,dword ptr OwordA[12]
        mov     edx,dword ptr OwordA[8]
        cmp     eax,dword ptr OwordB[12]
        jnz     FlagsSet

        cmp     edx,dword ptr OwordB[8]
        mov     eax,dword ptr OwordA[4]
        jnz     FlagsSet

        cmp     eax,dword ptr OwordB[4]
        mov     edx,dword ptr OwordA[0]
        jnz     FlagsSet

        cmp     edx,dword ptr OwordB[0]

FlagsSet:

;flags are set as though you had executed CMP OwordA,OwordB

hool · August 12, 2013, 05:45:09 AM

A stock 6 instruction method with 8byte general purpose regs and 2 jumps will be the fastest (first you compare high digits and then low).
PCMPGT(B/W/D) are completely useless since they treat their sub elements as signed numbers
For 4byte x86 regs SSE might have some advantage, however using general purpose regs offers premature ending of code execution with nowadays fast short conditional jumps

jj2007 · August 12, 2013, 11:43:53 AM

EDIT: See timings in the Lab

Quote from: hool on August 12, 2013, 05:45:09 AM
A stock 6 instruction method with 8byte general purpose regs and 2 jumps will be the fastest (first you compare high digits and then low).

Depends on the nature of the data - unfortunately the OP has not been very specific. With PCMPEQB, you could quickly eliminate the equal cases, and if not equal, decide where to enter the loop.

include \masm32\MasmBasic\MasmBasic.inc ; download
.data
O0 OWORD 0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
O1 OWORD 0FFFFFFFFCCFFFFFFFFFFCCFFFFFFFFFFh

Init
movups xmm0, O0
movups xmm1, O1
movaps xmm2, xmm0 ; copy for pcmpeqb
PCMPEQB xmm2, xmm1
Print CrLf$, "16 bits", Tb$, Tb$, Tb$, "5432109876543210", CrLf$
PMOVMSKB eax, xmm2 ; show in ax where xmm0 differs to xmm1
xor ecx, ecx
not ax ; we are interested in bits set
deb 4, "MSKB", b:ax
bsr ecx, eax
deb 4, "msb set", ecx
bsf ecx, eax
deb 4, "lsb set", ecx
Inkey
Exit
end start

Output:
16 bits 5432109876543210
MSKB b:ax 0000100000100000
msb set ecx 11
lsb set ecx 5

The MASM Forum

News:

What is the fastest way (performance wise) to compare two 128 bit integers

xanatose

jj2007

qWord

FORTRANS

dedndave

hool

jj2007