News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords

Started by frktons, November 25, 2012, 02:48:06 AM

Previous topic - Next topic

habran

nidud's code:

Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
---------------------------------------------------------
2988    cycles for XMM/pcmpeqd
3004    cycles for XMM/psubd
---------------------------------------------------------
2987    cycles for XMM/pcmpeqd
3012    cycles for XMM/psubd
---------------------------------------------------------
2978    cycles for XMM/pcmpeqd
3001    cycles for XMM/psubd
---------------------------------------------------------

--- ok ---
Cod-Father

frktons


----------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------
9242    cycles for MOV AX - Test OK
8731    cycles for LEA - Test OK
4144    cycles for MMX/PUNPCKLBW - Test OK
3158    cycles for XMM/PSHUFB - I shot - Test OK
2368    cycles for XMM/PSHUFB - II shot - Test OK
12328   cycles for STOSB - Test OK
2070    cycles for CheckDest - Test OK
547     cycles for CheckDestC - Test OK
544     cycles for CheckDestX - Test OK
----------------------------------------------------
9241    cycles for MOV AX - Test OK
8728    cycles for LEA - Test OK
4130    cycles for MMX/PUNPCKLBW - Test OK
3153    cycles for XMM/PSHUFB - I shot - Test OK
2379    cycles for XMM/PSHUFB - II shot - Test OK
12335   cycles for STOSB - Test OK
2069    cycles for CheckDest - Test OK
548     cycles for CheckDestC - Test OK
543     cycles for CheckDestX - Test OK
----------------------------------------------------


CheckDestC is nidud's code modified. For the CPU and SSE level
I used Alex's routine.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

deleted

habran

last nidud's code produce this on my laptop:


---------------------------------------------------------
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
---------------------------------------------------------
6675    cycles for STOSB - Test OK
4240    cycles for LEA - Test OK
3353    cycles for MOV DX - Test OK
3276    cycles for MOV AX - Test OK
1924    cycles for MMX/PUNPCKLBW - Test OK
1213    cycles for XMM/PSHUFB - xmm0,xmm1 - Test OK
832     cycles for XMM/PSHUFB - I shot - Test OK
1539    cycles for XMM/PSHUFB - II shot - Test OK
---------------------------------------------------------
6093    cycles for STOSB - Test OK
3806    cycles for LEA - Test OK
3403    cycles for MOV DX - Test OK
3277    cycles for MOV AX - Test OK
1945    cycles for MMX/PUNPCKLBW - Test OK
808     cycles for XMM/PSHUFB - xmm0,xmm1 - Test OK
904     cycles for XMM/PSHUFB - I shot - Test OK
1490    cycles for XMM/PSHUFB - II shot - Test OK
---------------------------------------------------------
6289    cycles for STOSB - Test OK
3805    cycles for LEA - Test OK
3668    cycles for MOV DX - Test OK
3684    cycles for MOV AX - Test OK
3044    cycles for MMX/PUNPCKLBW - Test OK
888     cycles for XMM/PSHUFB - xmm0,xmm1 - Test OK
832     cycles for XMM/PSHUFB - I shot - Test OK
901     cycles for XMM/PSHUFB - II shot - Test OK
---------------------------------------------------------
6289    cycles for STOSB - Test OK
3805    cycles for LEA - Test OK
3240    cycles for MOV DX - Test OK
3255    cycles for MOV AX - Test OK
2527    cycles for MMX/PUNPCKLBW - Test OK
833     cycles for XMM/PSHUFB - xmm0,xmm1 - Test OK
832     cycles for XMM/PSHUFB - I shot - Test OK
858     cycles for XMM/PSHUFB - II shot - Test OK
---------------------------------------------------------

--- ok ---
Cod-Father

frktons

Quote from: nidud on November 28, 2012, 05:53:33 PM
I rewrote the test file with a common loop count for all tests to even the result. I was wondering if using xmm0 register might be faster than xmm1, but the test seems to have random results, at least on this machine.

With regards to using pcmpeqd or psubd , I think the last one would be the better choice since this returns 0.

Edit: renamed test_pshufb to test_pshufb0

Since you changed the structure of some routines, the results are a little
bit different, I mean quite a lot different.
I still don't understand the logic of comparing two XMM with PSUBD.
If they are equal they return zero and after the PMOVMSKB it is possible to
test for zero the final result register.
But what happens if the source register is 1 greater than destination one?
The PMOVMSKB does or doesn't detect the difference? According to what I've
got up to now, it shouldn't.  ::)
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Quote from: frktons on November 28, 2012, 10:51:37 PM
The PMOVMSKB does or doesn't detect the difference?

It does. Launch some tests with Olly to see what happens. Anyway, PCM*** does the same job as PSUBD, and they are equally fast (e.g. one cycle on my AMD).

frktons

I compare two XMM register, with one of them greater
than the other.
According to this test, with PSUBD it doesn't detect it ::)

------------------------------------
Test on PCMPEQD - Test ERR
------------------------------------
Test on PSUBD   - Test OK
------------------------------------

Press any key to continue ...


This is the code I used. Did I make any error?


; ---------------------------------------------------------------------
; TEST_PSUBD.ASM--
; http://www.masm32.com/board/index.php?topic=770.0
;-------------------------------------------------------------------------------
; Test the difference between PCMPEQD and PSUBD when comparing two XMM
; registers.
; 28/Nov/2012 - MASM FORUM - frktons
;-------------------------------------------------------------------------------



.nolist
include \masm32\include\masm32rt.inc
.686
.xmm


.data

align 8
Check db  8  dup(20h),0,0,0,0
PtrCheck dd  Check

align 8
TestOK db  "Test OK ",0,0,0,0
align 8
TestERR db  "Test ERR",0,0,0,0


.code

start:


print "---------------------------------------------------------", 13, 10
      print "Test on PCMPEQD - "
      call  PCMP_TEST
      print PtrCheck, 13, 10
print "---------------------------------------------------------", 13, 10

      print "Test on PSUBD   - "
      call  PSUB_TEST
      print PtrCheck, 13, 10
print "---------------------------------------------------------", 13, 10, 13, 10
      inkey

      exit
     
; -----------------------------------------------------------------------------------------------
PSUB_TEST proc


    mov ebx, 32323232h
    mov edx, 00000001h

    movd xmm2, edx
    pshufd xmm2, xmm2, 0

    movd xmm0, ebx
    pshufd xmm0, xmm0, 0

    movdqa xmm1, xmm0

    paddd  xmm1, xmm2

    psubd xmm1,xmm0

    pmovmskb edx, xmm1   ; set byte mask in dx
    cmp   dx, 0

    jne CheckErr

CheckOK:

    lea eax, Check
    movq mm0, qword ptr TestOK
    movq qword ptr [eax], mm0
    jmp  EndCheck

CheckErr:

    lea eax, Check
    movq mm0, qword ptr TestERR
    movq qword ptr [eax], mm0

EndCheck:

    ret

PSUB_TEST endp

; -----------------------------------------------------------------------------------------------
PCMP_TEST proc


    mov ebx, 32323232h

    mov edx, 00000001h

    movd xmm2, edx
    pshufd xmm2, xmm2, 0

    movd xmm0, ebx
    pshufd xmm0, xmm0, 0

    movdqa xmm1, xmm0

    paddd  xmm1, xmm2

    pcmpeqd xmm1,xmm0

    pmovmskb edx, xmm1   ; set byte mask in dx
    cmp   dx, 0FFFFh

    jne CheckErr

CheckOK:

    lea eax, Check
    movq mm0, qword ptr TestOK
    movq qword ptr [eax], mm0
    jmp  EndCheck

CheckErr:

    lea eax, Check
    movq mm0, qword ptr TestERR
    movq qword ptr [eax], mm0

EndCheck:

    ret

PCMP_TEST endp


end start

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

The logic is inverted:
pcmpeqb for xmm1=xmm0: xmm1 becomes ffffffffffffffffh
psubd  for xmm1=xmm0: xmm1 becomes 0h

frktons

Quote from: jj2007 on November 29, 2012, 01:08:17 AM
The logic is inverted:
pcmpeqb for xmm1=xmm0: xmm1 becomes ffffffffffffffffh
psubd  for xmm1=xmm0: xmm1 becomes 0h


So what is my error? I was aware that the logic is inverted
and I tested:

    cmp    dx, 0

    jne CheckErr

for PSUBD, and


    cmp   dx, 0FFFFh

    jne CheckErr


for PCMPEQD. ::)
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

It seems pcmpeqb returns always zero, unless the xmm bytes are FFh...
---------------------------------------------------------
Test on PCMPEQD -
pcmpeqd in
xmm1            3617008641903833650
xmm0            3617008641903833650
pcmpeqd out     xmm1            -1

pmovmskb
xmm1            -1
edx             65535
Test OK
---------------------------------------------------------
Test on PSUBD   -
PSubD in
xmm1            3617008641903833650
xmm0            3617008641903833650
PSubD out       xmm1            0

pmovmskb
xmm1            0
dx              0
Test OK
---------------------------------------------------------

---------------------------------------------------------
Test on PCMPEQD -
pcmpeqd in
xmm1            3617008646198800947
xmm0            3617008641903833650
pcmpeqd out     xmm1            0

pmovmskb
xmm1            0
edx             0
Test ERR
---------------------------------------------------------
Test on PSUBD   -
PSubD in
xmm1            3617008646198800947
xmm0            3617008641903833650
PSubD out       xmm1            4294967297

pmovmskb
xmm1            4294967297
dx              0
Test OK
---------------------------------------------------------

nidud

deleted

nidud

deleted

frktons

When I read the Intel Manuals, about PMOVMSKB
I found something didn't match with the possibility to
compare two XMM register for equality:

Creates a mask made up of the most significant bit of each byte of the source
operand (second operand) and stores the result in the low byte or word of the destination
operand (first operand).

If only the MSBits are stored into the destination operand, and the difference is in other
bits, it will not be detected.
So My idea is that after PSUBD we have to use a different opcode to
detect is there are differences other then in the MSBits of the xmm we are testing.

On the other side, using PCMPEQD we can test both the equality and the difference
between the xmm registers, using PMOVMSKB.
This is what I've undestood so far.
Using PSUBD is a smart solution but it need to be followed by something
different than PMOVMSKB, in my opinion.

So far nidud's solution is the one I understand. Waiting for some other solution.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Quote from: nidud on November 29, 2012, 04:37:01 AM
Maybe you could use CMPNEQPS
The result should then be zero if equal


Yes, probably this opcode will work as well.

Quote
What does pxor xmm0, xmm0 do ?
The same thing that xor rax, rax ?

yes again. So far I think the PCMPEQD variant is the complete one
for testing equality. Something is missing, in my opinion for PSUBD.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

qWord

MREAL macros - when you need floating point arithmetic while assembling!