Print Page - Whats wrong with these SSE instructions

Title: Whats wrong with these SSE instructions
Post by: RuiLoureiro on July 27, 2018, 07:51:51 AM

:biggrin:
Hi all

We have 4 cases:
1) X4x4 (INTEGER) * Y4x4 (INTEGER) -> all 0
2) INTEGER REAL4 - all integers
3) REAL4 INTEGER - all "
4) REAL4 REAL4 > REAL4 -> correct

Why we get 0 for case 1) and we get integers in the cases 2) and 3) ?
Use multiplyX4x4_Y4x4_v1 (.asm/.exe)

Title: Re: Whats wrong with these SSE instructions
Post by: Siekmanski on July 27, 2018, 08:22:16 AM

Look for SIMD integer addition and integer multiplication instructions.
paddd, pmuludq etc. ( I can't check it I'm not at home. )

Title: Re: Whats wrong with these SSE instructions
Post by: RuiLoureiro on July 27, 2018, 08:36:17 AM

Hi Siekmanski,
OK. It seems that, for example, mulps xmm4,xmm0 converts to integer if xmm4 or xmm0 are integers and the other is real4, but converts to zero if both are integers. Why this ? Is it a rule ?

Title: Re: Whats wrong with these SSE instructions
Post by: Siekmanski on July 27, 2018, 05:09:14 PM

Hi Rui,

You can not multiply int32 values with real4 values.
But you can convert them:

cvtdq2ps xmm0,xmm0 ; convert 4 packed int32 to 4 packed real4
cvtps2dq xmm0,xmm0 ; convert 4 packed real4 to 4 packed int32

2 examples to multiply 2 4x4 matrices.
1 routine for real4 values and 1 routine for int32 values

Code Select

.data
align 16
M1real4         real4  1.0,  2.0,  3.0,  4.0
                real4  5.0,  6.0,  7.0,  8.0
                real4  9.0, 10.0, 11.0, 12.0
                real4 13.0, 14.0, 15.0, 16.0

M2real4         real4  1.0,  2.0,  3.0,  4.0
                real4  5.0,  6.0,  7.0,  8.0
                real4  9.0, 10.0, 11.0, 12.0
                real4 13.0, 14.0, 15.0, 16.0

MresultReal4    real4 16 dup (0.0)    

M1              dd  1,  2,  3,  4
                dd  5,  6,  7,  8
                dd  9, 10, 11, 12
                dd 13, 14, 15, 16

M2              dd  1,  2,  3,  4
                dd  5,  6,  7,  8
                dd  9, 10, 11, 12
                dd 13, 14, 15, 16

Mresult         dd 16 dup (0)    
    
; result after M1 * M2
; 90   100  110  120
; 202  228  254  280
; 314  356  398  440
; 426  484  542  600

      
.code

; 4x4 Matrix Multiply real4
; result = M1real4 * M2real4

    mov     eax,offset M2real4

    movaps  xmm4,[eax]
    movaps  xmm5,[eax+16]
    movaps  xmm6,[eax+32]
    movaps  xmm7,[eax+48]

    mov     eax,offset M1real4
    mov     edi,offset MresultReal4

    i = 0
    REPT 4
    movaps  xmm0,[eax+i]
    movaps  xmm1,xmm0
    movaps  xmm2,xmm0
    movaps  xmm3,xmm0
    shufps  xmm0,xmm0,00000000b
    shufps  xmm1,xmm1,01010101b
    shufps  xmm2,xmm2,10101010b
    shufps  xmm3,xmm3,11111111b
    mulps   xmm0,xmm4
    mulps   xmm1,xmm5
    mulps   xmm2,xmm6
    mulps   xmm3,xmm7
    addps   xmm2,xmm0
    addps   xmm3,xmm1
    addps   xmm3,xmm2
    movaps  [edi+i],xmm3
    i = i + 16
    ENDM    


; 4x4 Matrix Multiply int32
; result = M1 * M2

    mov     eax,offset M2

    movaps  xmm4,[eax]
    movaps  xmm5,[eax+16]
    movaps  xmm6,[eax+32]
    movaps  xmm7,[eax+48]

    mov     eax,offset M1
    mov     edi,offset Mresult

    i = 0
    REPT 4
    movaps  xmm0,[eax+i]
    movaps  xmm1,xmm0
    movaps  xmm2,xmm0
    movaps  xmm3,xmm0
    shufps  xmm0,xmm0,00000000b
    shufps  xmm1,xmm1,01010101b
    shufps  xmm2,xmm2,10101010b
    shufps  xmm3,xmm3,11111111b
    pmulld  xmm0,xmm4           ; pmulld is sse_4.1
    pmulld  xmm1,xmm5
    pmulld  xmm2,xmm6
    pmulld  xmm3,xmm7
    paddd   xmm2,xmm0
    paddd   xmm3,xmm1
    paddd   xmm3,xmm2
    movaps  [edi+i],xmm3
    i = i + 16
    ENDM

Title: Re: Whats wrong with these SSE instructions
Post by: RuiLoureiro on July 28, 2018, 01:06:58 AM

Hi Siekmanski,
I dont want to use a particular case for integers. The example i gave seems to show that we cannot mix integers with real4 (or we need to use cvtdq2ps). So when we are working with real4 all elements must be written in the real4 format.

The MASM Forum

General => The Campus => Topic started by: RuiLoureiro on July 27, 2018, 07:51:51 AM