News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

SIMD module

Started by Camper, June 14, 2017, 03:57:22 AM

Previous topic - Next topic

Camper

Hi,

I just can't seem to find the right info on how to create a SIMD module. Or elementary Hello function program.

I get things like..

movaps 0xffffffe8(%ebp),%xmm0
  8f: 0f 58 45 d8          addps  0xffffffd8(%ebp),%xmm0
  93: 0f 29 45 c8          movaps %xmm0,0xffffffc8(%ebp


But I may as well point to a named variable here?

movaps xmm1,var
addps xmm1, xmm7
movaps var, xmm1


Also, what is the right way to create a module? Is there need for a stack frame? Or is it a matter of preloading the registers?
Can I do?

proc_name:
addps xmm1,xmm2
ret

movaps xmm2,var
call    proc_name



Regards,

Siekmanski

If using movaps, be sure the data is 16 bit aligned, else use movups.
Not sure what you mean by SIMD module?

If you meant SIMD proc functions, you can do it like this.

.data
align 16
var  real4 1.0,2.0,3.0,4.0
var2 real4 10.0,2.0,7.0,6.0

.code

Addfunction proc
    movaps xmm1,var
    movaps xmm7,var2
    addps  xmm1,xmm7
    movaps var,xmm1
    ret
Addfunction endp   

    call Addfunction

;or like this:

Addfunction2 proc
    movaps xmm1,var
    addps  xmm1,var2
    movaps var,xmm1
    ret
Addfunction2 endp

    call Addfunction2

; or this way

Addfunction3 proc uses esi edi Item1:DWORD,Item2:DWORD
    mov     esi,Item1
    mov     edi,Item2
    movaps  xmm0,oword ptr[esi]
    addps   xmm0,oword ptr[edi]
    movaps  oword ptr[esi],xmm0
    ret
Addfunction3 endp   

    invoke  Addfunction3,addr var,addr var2



Creative coders use backward thinking techniques as a strategy.

Camper

#2
So this creates the stack frame in the background for "Addfunction3"? And using pointers to non SIMD registers?
Addfunction3 proc uses esi edi Item1:DWORD,Item2:DWORD
    mov     esi,Item1 
    mov     edi,Item2
    movaps  xmm0,oword ptr[esi]
    addps   xmm0,oword ptr[edi]
    movaps  oword ptr[esi],xmm0
    ret
Addfunction3 endp   


or can I move Item to esi at any time?


    mov     esi,Item1 
    mov     edi,Item2
Addfunction3 proc
    movaps  xmm0,oword ptr[esi]
    addps   xmm0,oword ptr[edi]
    movaps  oword ptr[esi],xmm0
ret
Addfunction3 endp 





Camper

#3
Great  8)

Camper

Back at it,

So it took me a while to notice it was the integers getting divided into 4 bytes, in conjunction with the align16 instruction.
"Dividing the xmm registers into 16 bytes"

Which made me wonder what  the outcome of an integer movaps instruction on an aligned4 register will be?
Will it be:      int32, int32, int32, int32,
Or will it be:  int32, -       , -       , -      ,


Regards,

Siekmanski

movaps must be 16 byte aligned.
you can use movups if unaligned, or movss for a single int32/real4
Creative coders use backward thinking techniques as a strategy.

Camper

#6
For me it would be movss then..

I was browsing to forum to rediscover the topic that hosted this link, had to dig down in my bookmarks to find it. I think it's brilliant to study SIMD.
http://softpixel.com/~cwright/programming/simd/sse.php


Camper

#7
Actually my problem is more complicated,  I have to copy my value into the adjacent variables.

So, does having "Var real8 1, 2 "  mean the values of Var end up in register "xmmx  = 64,64"
by using the movapd instruction alone?

Or do you have to use logic to do it like so,

movsd xmm2,int64  ;move int64 into lower quadrant
shufpd xmm1,xmm2,0 ;flip quadrants 0 to 1, move result into 1
movsd xmm1,xmm2 ;move int64 into lower quadrant

Siekmanski

something like this?

paddb   xmm0,xmm1 ; A+B 16*8bit
paddsb   xmm0,xmm1 ; A+B 16*8bit with saturation  {-128...127}
paddusb   xmm0,xmm1 ; A+B 16*8bit with saturation  {0...255}

I'm not exactly sure what you're calculating here.
Maybe if you explain in plain arithmetic what you want to achieve, we can try to solve it with SIMD.
Creative coders use backward thinking techniques as a strategy.

Camper

I changed my question a bit,  hope its more legible like this.
I have a little VB.net console app compiler going but not really in the position to test this yet.

Siekmanski

.data
align 16
Real8A real8 1.0,2.0
Real8B real8 0.0,0.0

.code
movapd xmm0,Real8A ; 1.0,2.0
shufpd xmm0,xmm0,01b ; exchange low 64bit with high 64bit
movapd Real8B,xmm0 ; 2.0,1.0

Creative coders use backward thinking techniques as a strategy.

Camper