News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Swap Doubles in SSE2

Started by guga, February 16, 2022, 03:36:33 AM

Previous topic - Next topic

guga

Hi Guys

How do i swap the position of DoubleDword from 2 different SSE registers using SSE2 ?

Say i have this:

[DataShuffle1: R$ 1, 0]
[DataShuffle2: R$ 3, 2]

movupd xmm0 X$DataShuffle1
movupd xmm1 X$DataShuffle2

So, the positionjs of the Real8 in the xmm registers will become.

   
      
      
xmm001
xmm123


How do i switch the values of 1 and 2, so the new positions will turn onto:

   
      
      
xmm002
xmm113

I tried to use pshufd, pxor etc, but couldn´t suceed to make it switch positions from the Lower quadword on xmm0 to the Higher quadword in xmm1 and vice-versa. Btw, swap using only those 2 registers xmm0 / xmm1
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

PSHUFD is the right instruction. A swap with two xmm regs is possible only if you use memory.


guga

Hi Guys

Tks, JJ and Nidud. I guess it worked :thumbsup:

But, as JJ said, i needed to add an extra register (or memory pointer). What i did was:


    ; Y   ------ (a^2-b^2)*Y*Atan2_FactorA9 <--- xmm0    -9.05387052965142591e-4     0.000321690515963660872
    ; Y^4 ------            c^2             <--- xmm1    6.719502e-13                          2.803170505524641028443

    movhlps xmm2 xmm1
    movlhps xmm2 xmm0
    movsd xmm0 xmm2
    pshufd xmm2 xmm2 SSE_SWAP_QWORDS ; SSE_SWAP_QWORDS = 78
    movlhps xmm1 xmm2


I thought it could be done with less instructions, but it took 5 to go. But, at least it is working. I´ll try to finish the routine for this usage (The one from atan2), before do further tests and see if it is working
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

nidud

#4
deleted

jj2007

.686p
.model flat, stdcall
.xmm

.code
Null1 OWORD 22222222222222223333333333333333h
Two3 OWORD 00000000000000001111111111111111h
start:
  movups xmm1, Null1
  movups xmm0, Two3
  movaps xmm2, xmm0
  int 3
  movhlps xmm0, xmm1   ; 0 2
  movlhps xmm1, xmm2   ; 1 3
  ret
end start

guga

Great !!!  :thumbsup: :thumbsup: :thumbsup: :thumbsup:

Tks, JJ and Nidud. This version works fine and also works on both ways. Hi to Low and Low to Hi swap :azn: :azn:

I´ll check it on the routine for atan2 and review the math before go further and test for speed again. I hope that with the changes, it could be possible to speed up a bit more so it can result on a timing of less then 5000 clocks. :smiley:

I´m using a variation of JJ´s binary search and it only perform around 6-8 iterations before finding the correct pointer to the table, which seems faster then the previous version, but didn´t tested yet. I´m amazed how the changes i made on ucrtbase could result on a gain of speed of more then 12 times. The organization of atan2 was a true mess. i simply don't understand how M$ didn´t saw that when releasing a version of atan2 that was presumed to be fast. For what i saw, it seems to be a very old routine that was adapted to work on windows.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com