Author Topic: Swap Doubles in SSE2  (Read 1193 times)

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Swap Doubles in SSE2
« on: February 16, 2022, 03:36:33 AM »
Hi Guys

How do i swap the position of DoubleDword from 2 different SSE registers using SSE2 ?

Say i have this:

[DataShuffle1: R$ 1, 0]
[DataShuffle2: R$ 3, 2]

movupd xmm0 X$DataShuffle1
movupd xmm1 X$DataShuffle2

So, the positionjs of the Real8 in the xmm registers will become.
xmm001
xmm123


How do i switch the values of 1 and 2, so the new positions will turn onto:
xmm002
xmm113

I tried to use pshufd, pxor etc, but couldn´t suceed to make it switch positions from the Lower quadword on xmm0 to the Higher quadword in xmm1 and vice-versa. Btw, swap using only those 2 registers xmm0 / xmm1
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Swap Doubles in SSE2
« Reply #1 on: February 16, 2022, 04:27:55 AM »
PSHUFD is the right instruction. A swap with two xmm regs is possible only if you use memory.

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Swap Doubles in SSE2
« Reply #2 on: February 16, 2022, 04:50:12 AM »

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Re: Swap Doubles in SSE2
« Reply #3 on: February 16, 2022, 07:12:37 AM »
Hi Guys

Tks, JJ and Nidud. I guess it worked :thumbsup:

But, as JJ said, i needed to add an extra register (or memory pointer). What i did was:

Code: [Select]
    ; Y   ------ (a^2-b^2)*Y*Atan2_FactorA9 <--- xmm0    -9.05387052965142591e-4     0.000321690515963660872
    ; Y^4 ------            c^2             <--- xmm1    6.719502e-13                          2.803170505524641028443

    movhlps xmm2 xmm1
    movlhps xmm2 xmm0
    movsd xmm0 xmm2
    pshufd xmm2 xmm2 SSE_SWAP_QWORDS ; SSE_SWAP_QWORDS = 78
    movlhps xmm1 xmm2

I thought it could be done with less instructions, but it took 5 to go. But, at least it is working. I´ll try to finish the routine for this usage (The one from atan2), before do further tests and see if it is working
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Swap Doubles in SSE2
« Reply #4 on: February 16, 2022, 08:23:20 AM »
deleted
« Last Edit: February 26, 2022, 05:39:17 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Swap Doubles in SSE2
« Reply #5 on: February 16, 2022, 09:23:20 AM »
Code: [Select]
.686p
.model flat, stdcall
.xmm

.code
Null1 OWORD 22222222222222223333333333333333h
Two3 OWORD 00000000000000001111111111111111h
start:
  movups xmm1, Null1
  movups xmm0, Two3
  movaps xmm2, xmm0
  int 3
  movhlps xmm0, xmm1   ; 0 2
  movlhps xmm1, xmm2   ; 1 3
  ret
end start

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Re: Swap Doubles in SSE2
« Reply #6 on: February 16, 2022, 10:53:41 AM »
Great !!!  :thumbsup: :thumbsup: :thumbsup: :thumbsup:

Tks, JJ and Nidud. This version works fine and also works on both ways. Hi to Low and Low to Hi swap :azn: :azn:

I´ll check it on the routine for atan2 and review the math before go further and test for speed again. I hope that with the changes, it could be possible to speed up a bit more so it can result on a timing of less then 5000 clocks. :smiley:

I´m using a variation of JJ´s binary search and it only perform around 6-8 iterations before finding the correct pointer to the table, which seems faster then the previous version, but didn´t tested yet. I´m amazed how the changes i made on ucrtbase could result on a gain of speed of more then 12 times. The organization of atan2 was a true mess. i simply don't understand how M$ didn´t saw that when releasing a version of atan2 that was presumed to be fast. For what i saw, it seems to be a very old routine that was adapted to work on windows.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com