News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Whats wrong with these SSE instructions

Started by RuiLoureiro, July 27, 2018, 07:51:51 AM

Previous topic - Next topic

RuiLoureiro

 :biggrin: 
Hi all

        We have 4 cases:
                             1)    X4x4 (INTEGER) * Y4x4 (INTEGER)       -> all 0
                             2)             INTEGER               REAL4           -    all integers
                             3)              REAL4                  INTEGER        -    all      "
                             4)              REAL4                  REAL4            >  REAL4      ->  correct

Why we get 0 for case 1) and we get integers in the cases 2) and 3) ?
Use multiplyX4x4_Y4x4_v1 (.asm/.exe)

Siekmanski

Look for SIMD integer addition and integer multiplication instructions.
paddd, pmuludq etc. ( I can't check it I'm not at home. )
Creative coders use backward thinking techniques as a strategy.

RuiLoureiro

Hi Siekmanski,
                      OK. It seems that, for example, mulps  xmm4,xmm0 converts to integer if xmm4 or xmm0 are integers and the other is real4, but converts to zero if both are integers. Why this ? Is it a rule ?

Siekmanski

Hi Rui,

You can not multiply int32 values with real4 values.
But you can convert them:

    cvtdq2ps    xmm0,xmm0   ; convert 4 packed int32 to 4 packed real4
    cvtps2dq    xmm0,xmm0   ; convert 4 packed real4 to 4 packed int32

2 examples to multiply 2 4x4 matrices.
1 routine for real4 values and 1 routine for int32 values

.data
align 16
M1real4         real4  1.0,  2.0,  3.0,  4.0
                real4  5.0,  6.0,  7.0,  8.0
                real4  9.0, 10.0, 11.0, 12.0
                real4 13.0, 14.0, 15.0, 16.0

M2real4         real4  1.0,  2.0,  3.0,  4.0
                real4  5.0,  6.0,  7.0,  8.0
                real4  9.0, 10.0, 11.0, 12.0
                real4 13.0, 14.0, 15.0, 16.0

MresultReal4    real4 16 dup (0.0)   

M1              dd  1,  2,  3,  4
                dd  5,  6,  7,  8
                dd  9, 10, 11, 12
                dd 13, 14, 15, 16

M2              dd  1,  2,  3,  4
                dd  5,  6,  7,  8
                dd  9, 10, 11, 12
                dd 13, 14, 15, 16

Mresult         dd 16 dup (0)   
   
; result after M1 * M2
; 90   100  110  120
; 202  228  254  280
; 314  356  398  440
; 426  484  542  600

     
.code

; 4x4 Matrix Multiply real4
; result = M1real4 * M2real4

    mov     eax,offset M2real4

    movaps  xmm4,[eax]
    movaps  xmm5,[eax+16]
    movaps  xmm6,[eax+32]
    movaps  xmm7,[eax+48]

    mov     eax,offset M1real4
    mov     edi,offset MresultReal4

    i = 0
    REPT 4
    movaps  xmm0,[eax+i]
    movaps  xmm1,xmm0
    movaps  xmm2,xmm0
    movaps  xmm3,xmm0
    shufps  xmm0,xmm0,00000000b
    shufps  xmm1,xmm1,01010101b
    shufps  xmm2,xmm2,10101010b
    shufps  xmm3,xmm3,11111111b
    mulps   xmm0,xmm4
    mulps   xmm1,xmm5
    mulps   xmm2,xmm6
    mulps   xmm3,xmm7
    addps   xmm2,xmm0
    addps   xmm3,xmm1
    addps   xmm3,xmm2
    movaps  [edi+i],xmm3
    i = i + 16
    ENDM   


; 4x4 Matrix Multiply int32
; result = M1 * M2

    mov     eax,offset M2

    movaps  xmm4,[eax]
    movaps  xmm5,[eax+16]
    movaps  xmm6,[eax+32]
    movaps  xmm7,[eax+48]

    mov     eax,offset M1
    mov     edi,offset Mresult

    i = 0
    REPT 4
    movaps  xmm0,[eax+i]
    movaps  xmm1,xmm0
    movaps  xmm2,xmm0
    movaps  xmm3,xmm0
    shufps  xmm0,xmm0,00000000b
    shufps  xmm1,xmm1,01010101b
    shufps  xmm2,xmm2,10101010b
    shufps  xmm3,xmm3,11111111b
    pmulld  xmm0,xmm4           ; pmulld is sse_4.1
    pmulld  xmm1,xmm5
    pmulld  xmm2,xmm6
    pmulld  xmm3,xmm7
    paddd   xmm2,xmm0
    paddd   xmm3,xmm1
    paddd   xmm3,xmm2
    movaps  [edi+i],xmm3
    i = i + 16
    ENDM   
Creative coders use backward thinking techniques as a strategy.

RuiLoureiro

Hi Siekmanski,
                     I dont want to use a particular case for integers. The example i gave seems to show that we cannot mix integers with real4 (or we need to use cvtdq2ps). So when we are working with real4 all elements must be written in the real4 format.