:biggrin:
Hi all
We have 4 cases:
1) X4x4 (INTEGER) * Y4x4 (INTEGER) -> all 0
2) INTEGER REAL4 - all integers
3) REAL4 INTEGER - all "
4) REAL4 REAL4 > REAL4 -> correct
Why we get 0 for case 1) and we get integers in the cases 2) and 3) ?
Use multiplyX4x4_Y4x4_v1 (.asm/.exe)
Look for SIMD integer addition and integer multiplication instructions.
paddd, pmuludq etc. ( I can't check it I'm not at home. )
Hi Siekmanski,
OK. It seems that, for example, mulps xmm4,xmm0 converts to integer if xmm4 or xmm0 are integers and the other is real4, but converts to zero if both are integers. Why this ? Is it a rule ?
Hi Rui,
You can not multiply int32 values with real4 values.
But you can convert them:
cvtdq2ps xmm0,xmm0 ; convert 4 packed int32 to 4 packed real4
cvtps2dq xmm0,xmm0 ; convert 4 packed real4 to 4 packed int32
2 examples to multiply 2 4x4 matrices.
1 routine for real4 values and 1 routine for int32 values
.data
align 16
M1real4 real4 1.0, 2.0, 3.0, 4.0
real4 5.0, 6.0, 7.0, 8.0
real4 9.0, 10.0, 11.0, 12.0
real4 13.0, 14.0, 15.0, 16.0
M2real4 real4 1.0, 2.0, 3.0, 4.0
real4 5.0, 6.0, 7.0, 8.0
real4 9.0, 10.0, 11.0, 12.0
real4 13.0, 14.0, 15.0, 16.0
MresultReal4 real4 16 dup (0.0)
M1 dd 1, 2, 3, 4
dd 5, 6, 7, 8
dd 9, 10, 11, 12
dd 13, 14, 15, 16
M2 dd 1, 2, 3, 4
dd 5, 6, 7, 8
dd 9, 10, 11, 12
dd 13, 14, 15, 16
Mresult dd 16 dup (0)
; result after M1 * M2
; 90 100 110 120
; 202 228 254 280
; 314 356 398 440
; 426 484 542 600
.code
; 4x4 Matrix Multiply real4
; result = M1real4 * M2real4
mov eax,offset M2real4
movaps xmm4,[eax]
movaps xmm5,[eax+16]
movaps xmm6,[eax+32]
movaps xmm7,[eax+48]
mov eax,offset M1real4
mov edi,offset MresultReal4
i = 0
REPT 4
movaps xmm0,[eax+i]
movaps xmm1,xmm0
movaps xmm2,xmm0
movaps xmm3,xmm0
shufps xmm0,xmm0,00000000b
shufps xmm1,xmm1,01010101b
shufps xmm2,xmm2,10101010b
shufps xmm3,xmm3,11111111b
mulps xmm0,xmm4
mulps xmm1,xmm5
mulps xmm2,xmm6
mulps xmm3,xmm7
addps xmm2,xmm0
addps xmm3,xmm1
addps xmm3,xmm2
movaps [edi+i],xmm3
i = i + 16
ENDM
; 4x4 Matrix Multiply int32
; result = M1 * M2
mov eax,offset M2
movaps xmm4,[eax]
movaps xmm5,[eax+16]
movaps xmm6,[eax+32]
movaps xmm7,[eax+48]
mov eax,offset M1
mov edi,offset Mresult
i = 0
REPT 4
movaps xmm0,[eax+i]
movaps xmm1,xmm0
movaps xmm2,xmm0
movaps xmm3,xmm0
shufps xmm0,xmm0,00000000b
shufps xmm1,xmm1,01010101b
shufps xmm2,xmm2,10101010b
shufps xmm3,xmm3,11111111b
pmulld xmm0,xmm4 ; pmulld is sse_4.1
pmulld xmm1,xmm5
pmulld xmm2,xmm6
pmulld xmm3,xmm7
paddd xmm2,xmm0
paddd xmm3,xmm1
paddd xmm3,xmm2
movaps [edi+i],xmm3
i = i + 16
ENDM
Hi Siekmanski,
I dont want to use a particular case for integers. The example i gave seems to show that we cannot mix integers with real4 (or we need to use cvtdq2ps). So when we are working with real4 all elements must be written in the real4 format.