### Author Topic: is there ways to shuffle this data?  (Read 680 times)

#### daydreamer

• Member
• Posts: 651
• watch Chebyshev on the backside of the Moon
##### is there ways to shuffle this data?
« on: July 08, 2018, 05:18:25 PM »
Hi
is there ways to shuffle this data,or maybe simd mask should be used?
first I gonna shuffle x to all fp,symbolized by 3.0,after that I need to mix 1.0's and x's

Code: [Select]
;pattern for mulps,to achieve x3,x5,x7,x9,x=3.0
x3x5x7x9    real4 3.0,3.0,3.0,3.0 ;x*x*x =mulps,mulps=x3
real4 1.0,3.0,3.0,3.0
real4 1.0,1.0,3.0,3.0
real4 1.0,1.0,1.0,3.0

Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### Siekmanski

• Member
• Posts: 1754
##### Re: is there ways to shuffle this data?
« Reply #1 on: July 09, 2018, 07:05:35 AM »
Hi Magnus,

It's not clear what needs to be shuffled in what order.
Do you have an example from start positions to end positions?
And which need to be multiplied by 3 5 7 or 9?
Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 651
• watch Chebyshev on the backside of the Moon
##### Re: is there ways to shuffle this data?
« Reply #2 on: July 09, 2018, 08:57:17 PM »
Hi Magnus,

It's not clear what needs to be shuffled in what order.
Do you have an example from start positions to end positions?
And which need to be multiplied by 3 5 7 or 9?
Sorry i shorted down comment also,it shall be x^3,x^5,x^7,x^9 ,first step is shuffle one x to 4 x's,mulps until i have x^3,x^3,x^3,x^3 in one xmm reg,change multiplier to 1.0,x,x,x and keep multiply
What is important is get the result in data section posted above
I thought of maybe movups with .data section 1.0,1.0,1.0,1.0,x,x,x,x would be a slow alternative???
SSE2 shift ???
Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### Siekmanski

• Member
• Posts: 1754
##### Re: is there ways to shuffle this data?
« Reply #3 on: July 09, 2018, 10:20:09 PM »
Something like this?

Code: [Select]
.const
Multipliers real4 8.0, 8.0, 8.0, 8.0 ; ^3

.data
YourData   ???????

.code
movaps      xmm0,oword ptr Multipliers

movaps      xmm1,oword ptr YourData
pshufd      xmm2,xmm1,???? ; shuffle your data into place
mulps       xmm2,xmm0      ; result

pslld       xmm0,2  ; ^5   ( update multipliers from ^3 to ^5 )
; repeat steps with next set of data
Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 651
• watch Chebyshev on the backside of the Moon
##### Re: is there ways to shuffle this data?
« Reply #4 on: July 10, 2018, 03:25:08 AM »
many years ago, I seen someone made a integer fast sqrt,maybe the opposite should be possible?
what about SHIFT 32bits combined with OR 1.0,0,0,0 ?
I almost never use shuffles
this is what I have come so far, I am making a sine Taylor series
remember you use radians for x
Code: [Select]
.code
start:

lea ebx,fconstant
lea edx,x3x5x7x9

movaps xmm0,x
movaps xmm7,[edx]
movaps xmm6,[ebx]
mulps xmm0,xmm7;x2
mulps xmm0,xmm7;x3
mulps xmm0,[edx];x4 3times
mulps xmm0,[edx];x5 3 times
mulps xmm0,[edx];x6 2 times
mulps xmm0,[edx];x7 2times
mulps xmm0,[edx];x8 1 time
mulps xmm0,[edx];x9 1 time
mulps xmm0,xmm6 ;x reciprocals of 3!,5!,7!,9!,add right - or + signs to prepare for haddps
movss sinex,xmm0

Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### Siekmanski

• Member
• Posts: 1754
##### Re: is there ways to shuffle this data?
« Reply #5 on: July 10, 2018, 04:43:14 AM »
We did some trig testing routines a while back on the forum.
The Chebyshev Remez approximation of a 9th degree polynomial came out as the most accurate. ( depends on the number of coeffs of course )
4 optimized constants gives a maximum error of about 3.3381e-9 over -1/2 pi to +1/2 pi.

Code: [Select]
double fastsin2(double x)
{
const double a3 = -1.666665709650470145824129400050267289858e-1;
const double a5 = 8.333017291562218127986291618761571373087e-3;
const double a7 = -1.980661520135080504411629636078917643846e-4;
const double a9 = 2.600054767890361277123254766503271638682e-6;

return x + x*x*x * (a3 + x*x * (a5 + x*x * (a7 + x*x * a9))));
}

A routine to calculate 4 real4 sines at once ( could be rewritten to 2 sines and 2 cosines at once )
And a routine to calculate 2 real8 at once.

http://masm32.com/board/index.php?topic=4118.msg49276#msg49276
Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 651
• watch Chebyshev on the backside of the Moon
##### Re: is there ways to shuffle this data?
« Reply #6 on: July 11, 2018, 02:38:06 PM »
Check asmc large integers and floats,about real16's
http://masm32.com/board/index.php?topic=6454.15
« Last Edit: July 13, 2018, 03:27:34 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p:

#### daydreamer

• Member
• Posts: 651
• watch Chebyshev on the backside of the Moon
##### Re: is there ways to shuffle this data?
« Reply #7 on: July 14, 2018, 03:38:19 AM »
wouldnt it be good candidate for pi calculation,to use 6*arcsin 0.5
with fixed Point it would be a simple shift instruction to get Powers of 0.5,0.25 etc?

Quote from Flashdance
Nick  :  When you give up your dream, you die.m
*wears a flameproof asbestos suit*
I have no idea how to compare beauty of two real8 women with SSE
If you switch to C++, x86 means 086h, which is wrong cpu
So don't switch to C++   p: