### Author Topic: is there ways to shuffle this data?  (Read 671 times)

#### daydreamer

• Member
• Posts: 640
• its time for a SIMD sphere
##### is there ways to shuffle this data?
« on: July 08, 2018, 05:18:25 PM »
Hi
is there ways to shuffle this data,or maybe simd mask should be used?
first I gonna shuffle x to all fp,symbolized by 3.0,after that I need to mix 1.0's and x's

Code: [Select]
`            ;pattern for mulps,to achieve x3,x5,x7,x9,x=3.0x3x5x7x9    real4 3.0,3.0,3.0,3.0 ;x*x*x =mulps,mulps=x3            real4 1.0,3.0,3.0,3.0            real4 1.0,1.0,3.0,3.0            real4 1.0,1.0,1.0,3.0`
Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### Siekmanski

• Member
• Posts: 1749
##### Re: is there ways to shuffle this data?
« Reply #1 on: July 09, 2018, 07:05:35 AM »
Hi Magnus,

It's not clear what needs to be shuffled in what order.
Do you have an example from start positions to end positions?
And which need to be multiplied by 3 5 7 or 9?
Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 640
• its time for a SIMD sphere
##### Re: is there ways to shuffle this data?
« Reply #2 on: July 09, 2018, 08:57:17 PM »
Hi Magnus,

It's not clear what needs to be shuffled in what order.
Do you have an example from start positions to end positions?
And which need to be multiplied by 3 5 7 or 9?
Sorry i shorted down comment also,it shall be x^3,x^5,x^7,x^9 ,first step is shuffle one x to 4 x's,mulps until i have x^3,x^3,x^3,x^3 in one xmm reg,change multiplier to 1.0,x,x,x and keep multiply
What is important is get the result in data section posted above
I thought of maybe movups with .data section 1.0,1.0,1.0,1.0,x,x,x,x would be a slow alternative???
SSE2 shift ???
Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### Siekmanski

• Member
• Posts: 1749
##### Re: is there ways to shuffle this data?
« Reply #3 on: July 09, 2018, 10:20:09 PM »
Something like this?

Code: [Select]
`.constMultipliers real4 8.0, 8.0, 8.0, 8.0 ; ^3.dataYourData   ???????.code    movaps      xmm0,oword ptr Multipliers    movaps      xmm1,oword ptr YourData    pshufd      xmm2,xmm1,???? ; shuffle your data into place    mulps       xmm2,xmm0      ; result    pslld       xmm0,2  ; ^5   ( update multipliers from ^3 to ^5 )    ; repeat steps with next set of data`
Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 640
• its time for a SIMD sphere
##### Re: is there ways to shuffle this data?
« Reply #4 on: July 10, 2018, 03:25:08 AM »
many years ago, I seen someone made a integer fast sqrt,maybe the opposite should be possible?
what about SHIFT 32bits combined with OR 1.0,0,0,0 ?
I almost never use shuffles
this is what I have come so far, I am making a sine Taylor series
remember you use radians for x
Code: [Select]
`.code    start:        lea ebx,fconstant    add ebx,16    lea edx,x3x5x7x9        movaps xmm0,x    movaps xmm7,[edx]    movaps xmm6,[ebx]    mulps xmm0,xmm7;x2    mulps xmm0,xmm7;x3    add edx,16    mulps xmm0,[edx];x4 3times    mulps xmm0,[edx];x5 3 times    add edx,16    mulps xmm0,[edx];x6 2 times    mulps xmm0,[edx];x7 2times    add edx,16    mulps xmm0,[edx];x8 1 time    mulps xmm0,[edx];x9 1 time    mulps xmm0,xmm6 ;x reciprocals of 3!,5!,7!,9!,add right - or + signs to prepare for haddps    ;haddps here    ;haddps     movss sinex,xmm0`
Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### Siekmanski

• Member
• Posts: 1749
##### Re: is there ways to shuffle this data?
« Reply #5 on: July 10, 2018, 04:43:14 AM »
We did some trig testing routines a while back on the forum.
The Chebyshev Remez approximation of a 9th degree polynomial came out as the most accurate. ( depends on the number of coeffs of course )
4 optimized constants gives a maximum error of about 3.3381e-9 over -1/2 pi to +1/2 pi.

Code: [Select]
`double fastsin2(double x){    const double a3 = -1.666665709650470145824129400050267289858e-1;    const double a5 = 8.333017291562218127986291618761571373087e-3;    const double a7 = -1.980661520135080504411629636078917643846e-4;    const double a9 = 2.600054767890361277123254766503271638682e-6;    return x + x*x*x * (a3 + x*x * (a5 + x*x * (a7 + x*x * a9))));}`
A routine to calculate 4 real4 sines at once ( could be rewritten to 2 sines and 2 cosines at once )
And a routine to calculate 2 real8 at once.

http://masm32.com/board/index.php?topic=4118.msg49276#msg49276
Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 640
• its time for a SIMD sphere
##### Re: is there ways to shuffle this data?
« Reply #6 on: July 11, 2018, 02:38:06 PM »
Check asmc large integers and floats,about real16's
http://masm32.com/board/index.php?topic=6454.15
« Last Edit: July 13, 2018, 03:27:34 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### daydreamer

• Member
• Posts: 640
• its time for a SIMD sphere
##### Re: is there ways to shuffle this data?
« Reply #7 on: July 14, 2018, 03:38:19 AM »
wouldnt it be good candidate for pi calculation,to use 6*arcsin 0.5
with fixed Point it would be a simple shift instruction to get Powers of 0.5,0.25 etc?

Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p: