i don't see how qWord's code is any faster than...
OwordA OWORD ?
OwordB OWORD ?
mov eax,dword ptr OwordA[12]
mov edx,dword ptr OwordA[8]
cmp eax,dword ptr OwordB[12]
jnz FlagsSet
cmp edx,dword ptr OwordB[8]
mov eax,dword ptr OwordA[4]
jnz FlagsSet
cmp eax,dword ptr OwordB[4]
mov edx,dword ptr OwordA[0]
jnz FlagsSet
cmp edx,dword ptr OwordB[0]
FlagsSet:
;flags are set as though you had executed CMP OwordA,OwordB
in particular, register indirect addressing is a little slower than direct addressing
and, my code only pre-loads 1 dword, not all 4
in addition to all that, i do not rely on the carry flag being forwarded for SBB