News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

is there ways to shuffle this data?

Started by daydreamer, July 08, 2018, 05:18:25 PM

Previous topic - Next topic

daydreamer

Hi
is there ways to shuffle this data,or maybe simd mask should be used?
first I gonna shuffle x to all fp,symbolized by 3.0,after that I need to mix 1.0's and x's

            ;pattern for mulps,to achieve x3,x5,x7,x9,x=3.0
x3x5x7x9    real4 3.0,3.0,3.0,3.0 ;x*x*x =mulps,mulps=x3
            real4 1.0,3.0,3.0,3.0
            real4 1.0,1.0,3.0,3.0
            real4 1.0,1.0,1.0,3.0


my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Siekmanski

Hi Magnus,

It's not clear what needs to be shuffled in what order.
Do you have an example from start positions to end positions?
And which need to be multiplied by 3 5 7 or 9?
Creative coders use backward thinking techniques as a strategy.

daydreamer

Quote from: Siekmanski on July 09, 2018, 07:05:35 AM
Hi Magnus,

It's not clear what needs to be shuffled in what order.
Do you have an example from start positions to end positions?
And which need to be multiplied by 3 5 7 or 9?
Sorry i shorted down comment also,it shall be x^3,x^5,x^7,x^9 ,first step is shuffle one x to 4 x's,mulps until i have x^3,x^3,x^3,x^3 in one xmm reg,change multiplier to 1.0,x,x,x and keep multiply
What is important is get the result in data section posted above
I thought of maybe movups with .data section 1.0,1.0,1.0,1.0,x,x,x,x would be a slow alternative???
SSE2 shift ???
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Siekmanski

Something like this?


.const
Multipliers real4 8.0, 8.0, 8.0, 8.0 ; ^3

.data
YourData   ???????

.code
    movaps      xmm0,oword ptr Multipliers

    movaps      xmm1,oword ptr YourData
    pshufd      xmm2,xmm1,???? ; shuffle your data into place
    mulps       xmm2,xmm0      ; result

    pslld       xmm0,2  ; ^5   ( update multipliers from ^3 to ^5 )
    ; repeat steps with next set of data
Creative coders use backward thinking techniques as a strategy.

daydreamer

many years ago, I seen someone made a integer fast sqrt,maybe the opposite should be possible?
what about SHIFT 32bits combined with OR 1.0,0,0,0 ?
I almost never use shuffles
this is what I have come so far, I am making a sine Taylor series
remember you use radians for x

.code
    start:
   
    lea ebx,fconstant
    add ebx,16
    lea edx,x3x5x7x9
   
    movaps xmm0,x
    movaps xmm7,[edx]
    movaps xmm6,[ebx]
    mulps xmm0,xmm7;x2
    mulps xmm0,xmm7;x3
    add edx,16
    mulps xmm0,[edx];x4 3times
    mulps xmm0,[edx];x5 3 times
    add edx,16
    mulps xmm0,[edx];x6 2 times
    mulps xmm0,[edx];x7 2times
    add edx,16
    mulps xmm0,[edx];x8 1 time
    mulps xmm0,[edx];x9 1 time
    mulps xmm0,xmm6 ;x reciprocals of 3!,5!,7!,9!,add right - or + signs to prepare for haddps
    ;haddps here
    ;haddps
    movss sinex,xmm0


my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Siekmanski

We did some trig testing routines a while back on the forum.
The Chebyshev Remez approximation of a 9th degree polynomial came out as the most accurate. ( depends on the number of coeffs of course )
4 optimized constants gives a maximum error of about 3.3381e-9 over -1/2 pi to +1/2 pi.


double fastsin2(double x)
{
    const double a3 = -1.666665709650470145824129400050267289858e-1;
    const double a5 = 8.333017291562218127986291618761571373087e-3;
    const double a7 = -1.980661520135080504411629636078917643846e-4;
    const double a9 = 2.600054767890361277123254766503271638682e-6;

    return x + x*x*x * (a3 + x*x * (a5 + x*x * (a7 + x*x * a9))));
}


A routine to calculate 4 real4 sines at once ( could be rewritten to 2 sines and 2 cosines at once )
And a routine to calculate 2 real8 at once.

http://masm32.com/board/index.php?topic=4118.msg49276#msg49276
Creative coders use backward thinking techniques as a strategy.

daydreamer

#6
Thanks for the link,marinus
Check asmc large integers and floats,about real16's
http://masm32.com/board/index.php?topic=6454.15
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer

wouldnt it be good candidate for pi calculation,to use 6*arcsin 0.5
with fixed Point it would be a simple shift instruction to get Powers of 0.5,0.25 etc?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding