Equivalence angle conversion in SSE2

guga · July 19, 2023, 05:15:28 PM

Hi guys

I´m trying to create a function to convert any angle to it´s equivalence. Example, 870.41 degrees is the same as 150.4º

I´m using only SSE2 for this scope. For small numbers, my routine worked, but fo large numbers it is failig miserably.

Code Select

; RosAsm syntax

; variables
[Float_Two_Pi_INV: R$ (1/6.2831853071795864769252867665590057683943387987502116419)] ; R$ = Real8 number
[Float_Two_Pi: R$ 6.2831853071795864769252867665590057683943387987502116419]

    movupd xmm1 xmm0 ; number in xmm0
    movupd xmm2 X$Float_Two_Pi_INV ; multiply by 1/(2*pi) . It´s the same as dividing the angle to 360º but using radians instead, since the inputed value in xmm0 is in radian.
    mulpd xmm1 xmm2

    ; calculate floor of the number in xmm1. xmm1 is used as a leftover (The fraction part) and xmm2 is our integer. In fact, i created a macro for that, but this is the true instructions
    CVTTPD2DQ XMM2 XMM1
    CVTDQ2PD XMM2 XMM2
    SUBSD XMM1 XMM2

    movupd xmm1 X$Float_Two_Pi ; multiply the resultant fraction part by 360º (2*pi)
    mulpd xmm2 xmm1
    subpd xmm0 xmm2 ; the converted number is stored in xmm0

The math involving this is simply dividing by 360º and the result convert to it´s floor, then subtract.
Ex:

Degrees = 870.41 is equivalent to 150.41º
Step1) 870.41/360 = 2.4178055555555555555555555555555555555555555555555555555555555555
Step2) 2.4178055555555555... - 2 (it´s floor) => 0.4178055555555555555....
Step3) 0.4178055555555555555 * 360 = 150.41

Other example:
Degrees = 33e17 is equivalent to 240º
Step1) 33e17/360 = 9.16666666666666666666666666e15 = 9166666666666666.66666666666666666
Step2) 9166666666666666.66666666666 - 9166666666666666 (it´s floor) => 0.666666666666666666666
Step3) 0.666666666666666666666 * 360 = 240

So, for small number like 870.41, the routine works, but for the larger number it is failing badly.

I ported a modf routine from msvcrt that also uses SSE2, but it also fails miserably.

What i´m doing wrong ?

I tested both values in wolframalpha to make sure the results are ok, but i´m not being able to make it work for larger numbers such as 33e17, 12.56e100 etc etc etc

daydreamer · July 19, 2023, 08:15:11 PM

Quote from: guga on July 19, 2023, 05:15:28 PMI tested both values in wolframalpha to make sure the results are ok, but i´m not being able to make it work for larger numbers such as 33e17, 12.56e100 etc etc etc

gets overflow when such high numbers?too big numbers for real8's?

jj2007 · July 19, 2023, 08:25:33 PM

Quote from: daydreamer on July 19, 2023, 08:15:11 PMgets overflow when such high numbers?too big numbers for real8's?

Works fine with the FPU, btw:

Code Select

include \masm32\MasmBasic\MasmBasic.inc
  Init
  push 360
  FpuSet MbDown64
  Let esi="33.0e17"
  .While 1
	Let esi=Input$("Your value: ", esi)
	.Break .if Len(esi)==0	; quit if the string is empty
	MovVal ST(0), esi
	Print Str$("%5e = ", ST(0))
	fidiv stack		; /360
	fld st
	frndint			; x.123-x
	fsub
	fimul stack		; *360
	Print Str$("%4f\n", ST(0)v)	; print & pop ST
  .Endw
  pop edx
EndOfCode

Your value: 33.0e17
3.3000e+18 = 239.8
Your value: 870.41
8.7041e+02 = 150.4

HSE · July 19, 2023, 11:03:23 PM

Hi Guga!

Quote from: guga on July 19, 2023, 05:15:28 PMI´m using only SSE2 for this scope. For small numbers, my routine worked, but fo large numbers it is failig miserably.

SSE only work with REAL8. Like JJ say FPU is better, because internally work with REAL10.

Probably you can jump in precision using DoubleDouble Precision. I only tested that using FPU, but have to work with SSE (with less precision obviously).

I will try that later. I don't know so much about SSE but perhaps is posible to obtain a disassemble of these few operations

jj2007 · July 19, 2023, 11:24:35 PM

Attached the SSE version of my test program. It requires SSE 4.1.

Caché GB · July 20, 2023, 01:35:40 AM

JJ is on the ball.

Code Select

Hi guga
Is this what the modf routine that you ported from msvcrt, that also uses SSE2, looks like?

ModulusDouble proc

         movapd  xmm2, xmm0
          divsd  xmm2, xmm1
      cvttsd2si  ecx, xmm2
           movd  eax, xmm2
            shr  eax, 31
            sub  ecx, eax
       cvtsi2sd  xmm2, ecx
          mulsd  xmm1, xmm2
          subsd  xmm0, xmm1
            ret

ModulusDouble endp

;#############################################################################################################

JJs_on_the_ball proc
 
       ; movlps  xmm0, Res8
          mulsd  xmm0, FLT8(0.002777777777777777777777778)
         movaps  xmm1, xmm0
        roundsd  xmm1, xmm0, 17              ; if const > 17 = fail
          subsd  xmm0, xmm1
          mulsd  xmm0, FLT8(360.0)
            ret

JJs_on_the_ball endp

;#############################################################################################################

Test_The_Thing proc

      local  ModulusJJ:real8
      local  Modulus:real8

          movsd  xmm1, FLT8(360.0)

        ; movsd  xmm0, FLT8(8.7041e+9)    ; good
          movsd  xmm0, FLT8(1.2345e+11)    ; good

        ; movsd  xmm0, FLT8(1.2345e+12)    ; fail
        ; movsd  xmm0, FLT8(3.3000e+18)    ; fail
         invoke  ModulusDouble
          movsd  Modulus, xmm0            ; Modulus = 240.00000000000000 double

            nop

       ; movlps  xmm0, FLT8(1.2345e10)  ; good
         movlps  xmm0, FLT8(1.2345e11)  ; good
       ; movlps  xmm0, FLT8(1.2345e18)  ; fail
         invoke  JJs_on_the_ball
          movsd  ModulusJJ, xmm0        ; ModulusJJ = 240.00000715255737 double

            ret

Test_The_Thing endp

Quotehttps://softpixel.com/~cwright/programming/simd/sse.php

SSE — MXCSR
The MXCSR register is a 32-bit register containing flags for control and status information regarding SSE instructions.
As of SSE3, only bits 0-15 have been defined.

I can't find this MXCSR register to see what is what. Maybe daydreamer is right with the Overflow. Who knows.

OE - bit 3 - Overflow Flag

HSE · July 20, 2023, 02:42:40 AM

Hi Guga,

I can easily make the double double precision math with FPU...

Code Select

    fSlvRR dd1 = 870.41

    fSlvRR dd1 = dd1/rr360

    fSlvRR dd1 = dd1 - trunc(dd1)

    fSlvRR dd1 = dd1 * rr360

Code Select

  dd1  rr  <8.70409999999999970e+002, 3.18323145620524910e-014>

  first step  =  2.4178055555555555555555555555555

  second step =  0.41780555555555555555555555555557

  third step  =  150.41000000000000000000000000000

Press any key to continue...

but apparently never tested the parser with exponents with more than one digit

Then 33.0e17 is not loaded

Without the parser is not going to be useful now. I have to see that.

Quote from: guga on July 19, 2023, 05:15:28 PMI tested both values in wolframalpha to make sure the results are ok

If You have Win10 just use calculator (that have quadruple precision in scientific mode).

Regards, HSE.

guga · July 20, 2023, 04:33:20 AM

Ok, Guys, tks.

But there is a problem. The number is already preloaded in xmm0 register. If i use the JJ´s method (FPU), how can i convert back the proper numbers as inpputed ? I mean, in SSE2 how can i convert a xmm0 to a Real8 valur to be used as a varaiable for the fld ?

I mean, in order to make it work, i modified JJ´s code as:

Code Select


; JJ´s code in FPU routine only. I mena, input is directly in Real8 to be loaded in Fpu

[GugaStartVal: R$ 33e17]
[GugaTmpValue: T$ 0] ; To store it back onto a TenByte data
[Float_360: D$ 360]

___________________________

        finit
        fld R$GugaStartVal | fstp T$GugaTmpValue | fld T$GugaTmpValue
        fidiv D$Float_360
        fld ST0
        frndint
        fsubp ST1 ST0
        fld1
        faddp ST1 ST0
        fimul D$Float_360
___________________________

Btw. A question on your code. Why using fidiv rayther then a multiplication but the inverterd value, like fmul R$(1/360) ? I tried with fmul, but the rtesultant value was incorrect, while in fidiv it was ok.

The problem is that GugaStartVal is already loaded in xmm0 register. The code is part of a routine i´m creating for fast and precise tangent, but using more parameters as input, such as pointer to a value, a flag to identify the type of value (integer, float, Real8, Quadword)

It works like this:

Code Select

__________________________
; Parameters flag equates
[SSE_TRIG_INT 1
 SSE_TRIG_FLOAT 2
 SSE_TRIG_REAL8 4
 SSE_TRIG_QWORD 8]
__________________________
; variable to convert degree to radian
[Float_DegreetoRadian: R$ (3.1415926535897932384626433832795/180)]

__________________________
; Small macro to extract the integer and fractional part of a xmm register
[SSE_XTRACT_INTEGER | cvttpd2dq #1 #2 | cvtdq2pd #1 #1 | subsd #2 #1]
__________________________
    mov eax D@IsDegreeFlag ; If the input is represented in degrees, convert it to radian
    .Test_If eax &TRUE

        ; 1st check theformat of the input and place it onto xmm0
        mov eax D@Flag
        Test_if eax SSE_TRIG_INT
            cvtsi2sd xmm0 D@pNumber ; converts a signed integer to double @pNumber = Pointer to the number stored in a inputed variable
        Test_Else_if eax SSE_TRIG_FLOAT
            cvtss2sd xmm0 X@pNumber ; converts a single precision float to double
        Test_Else_if eax SSE_TRIG_REAL8
            mov eax D@pNumber; | movsd XMM0 X$eax
            movupd XMM0 X$eax
        Test_Else_if eax SSE_TRIG_QWORD
            mov eax D@pNumber | movq XMM0 X$eax
        Test_Else
            xor eax eax | ExitP ; return 0 Invalid parameter
        Test_End
        movsd XMM1 X$Float_DegreetoRadian ; convert degres to radians
        mulsd xmm0 xmm1
        movsd X@pConvertedNumberDis xmm0
        lea eax D@pConvertedNumberDis
        mov D@pNumber eax

    .Test_End

    ; added now This will ensure the angle is always in between 0 and 360º. It convert any huuge angle to it´s equivalent inside the limits of 360º
    mov eax D@pNumber | movupd xmm0 X$eax
    .SSE_D_If_Or xmm0 > X$Float_Two_Pi, xmm0 < X$Float_Minus_Two_Pi ; SSE macros for comparition, similar as IF macro, but using COMISD instead. Here we are checking if the value in xmm0 is outsie de limites of an angle (360º)
        ; Angle is bigegr then 360º (2*pi radian)
        movupd xmm1 xmm0 ; number in xmm0 - expressed in radians as we previously converted
        movupd xmm2 X$Float_Two_Pi_INV | mulpd xmm1 xmm2
        SSE_XTRACT_INTEGER xmm2, xmm1; calculate floor of the number in xmm1. xmm1 is used as a leftover (The fraction part) and xmm2 is our integer
        SSE_D_If xmm1 >s X$Float_Zero

                ; <---------------------------- JJ´s routine must be here.
            movupd X$GugaStartVal XMM0 ; <----------- This is not converting back properly
            finit
            fld R$GugaStartVal | fstp T$GugaTmpValue | fld T$GugaTmpValue
            fidiv D$Float_360
            fld ST0
            frndint
            fsubp ST1 ST0
            fld1
            faddp ST1 ST0
            fimul D$Float_360
            fstp R$GugaTmpValue2
            movupd XMM0 X$GugaTmpValue2 ; put it back to xmm0 register

        SSE_D_Else
            movupd xmm1 X$Float_Two_Pi | mulpd xmm2 xmm1
            subpd xmm0 xmm2
        SSE_D_End_If
        movsd X@pConvertedNumberDis xmm0
        lea eax D@pConvertedNumberDis
        mov D@pNumber eax
    .SSE_D_End_If

The problem is happening when i try to copy the content of xmm0 to Fpu variable, such as
movupd X$GugaStartVal XMM0 ; --- the resultant value doing this, does not works whatsoever.

It only works, we use fld R$GugaStartVal directly. So witout passing it onto xmm0

How do i convert it back from xmm0 to GugaStartVal ?

guga · July 20, 2023, 04:41:52 AM

Quote from: Caché GB on July 20, 2023, 01:35:40 AMJJ is on the ball.

Code Select Expand
Hi guga Is this what the modf routine that you ported from msvcrt, that also uses SSE2, looks like? ModulusDouble proc movapd xmm2, xmm0 divsd xmm2, xmm1 cvttsd2si ecx, xmm2 movd eax, xmm2 shr eax, 31 sub ecx, eax cvtsi2sd xmm2, ecx mulsd xmm1, xmm2 subsd xmm0, xmm1 ret ModulusDouble endp

Not exactly. The modf code in msvcrt also fails miserably. It returns 0 in ST0 and not the proper fracion, when using 33.0e17 as input

The modf routine from msvcrt i started porting to rosasm as (But, i quited after seing it´s also not working):

Code Select


[<16 SSE_MODF_BNS1: Q$ 0433, 0433] ;R$ 5.3112056927934002e-321, R$ 5.3112056927934002e-321]
[<16 SSE_MODF_Sign: Q$ 08000000000000000, 08000000000000000]
[<16 SSE_MODF_Mantissa: Q$ 0FFFFFFFFFFFFF, 0FFFFFFFFFFFFF];R$ 2.22507385850720082e-308, R$ 2.22507385850720082e-308]
[<16 SSE_MODF_Zero: R$ 0, R$ 0]

; https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2010/bk4c380c(v=vs.100)
[ModfReturnedPart: R$ 0]

Proc SSE2_modf:
    Arguments @pNumber
    Structure @TempStorage 16, @pConvertedNumberDis 0

    mov eax D@pNumber | mov ebx eax
    movq XMM0 X$eax;@pNumber
    movapd XMM2 X$SSE_MODF_BNS1
    movapd XMM3 XMM0
    movapd XMM1 XMM0
    movapd XMM4 XMM0
    movapd XMM6 XMM0
    psllq XMM0 01
    psrlq XMM0 035
    psrlq XMM3 034
    andpd XMM4 X$SSE_MODF_Sign
    movd eax XMM0
    psubd XMM2 XMM0
    ;mov ecx ModfReturnedPart;D$esp+0C
    psrlq XMM1 XMM2
    psllq XMM1 XMM2
    movd edx XMM3
    cmp eax 03FF | jl O6>  ; Code010054452
    cmp eax 0432 | jg P5>  ; Code01005445B
    movq X$ModfReturnedPart  XMM1
    subsd XMM6 XMM1
    orpd XMM6 XMM4
    movq X@pConvertedNumberDis XMM6
    fld R@pConvertedNumberDis
    ExitP
    ;ret


@Code010054452: O6:
    movq X$ModfReturnedPart  XMM4
    fld R@pConvertedNumberDis;$esp+04
    ExitP
    ;ret


@Code01005445B: P5:
    movq XMM0 X$ebx;@pNumber
    .If eax <> 07FF
        movq X$ModfReturnedPart  XMM0
        fldz
        If edx =>s 0800
            fchs
        End_If
    .Else
        ; ret_inf_nan
        movapd XMM1 XMM0
        addsd XMM0 XMM0
        movq X$ModfReturnedPart  XMM0
        andpd XMM0 X$SSE_MODF_Mantissa
        cmpneqpd XMM0 X$SSE_MODF_Zero
        pextrw eax XMM0 0
        andpd XMM0 XMM1
        orpd XMM0 XMM4
        mov edx 03EF
        If eax = 0
            movq X@pConvertedNumberDis XMM0
            fld R@pConvertedNumberDis
        Else
            ; calibration error
            xor eax eax
            movlpd X@pConvertedNumberDis XMM0
            fld R@pConvertedNumberDis
        End_If

    .End_If


EndP

HSE · July 20, 2023, 05:29:45 AM

Skipping parser problem...

with more precision second number have solution:

Code Select

  dd1 rr <3.30000000000000000e+017, 0.00000000000000000e+000>
  dd1      = 330000000000000000.00000000000000

  first step  =  916666666666666.66666666666666666

  second step =  0.66666666666666666435370203203092

  third step  =  240.00000000000000099999999999999

Press any key to continue...

but with third number there is no way because result hardly can have a fraccional part with this precision:

Code Select

  dd1 rr <1.25599999999999990e+101, 6.23951297965124220e+084>
  dd1      = 12560000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0

  first step  =  34888888888888888888888888888888000000000000000000000000000000000000000000000000000000000000000000.0

  second step = 0.0

  third step  = 0.0

Press any key to continue...

Thinking a little, perhaps this last only could be solved with arbitrary precision. You can define how many fraccional places you want, then precision is automatic (and I presume very very slow)

NoCforMe · July 20, 2023, 10:46:35 AM

Parsing problem? Did you say parsing problem?
Is there something I can do here to help solve that? Lay it on me, baby. I'll be happy to come up with a parsing solution. (Sorry, can't help you w/the math here.)

Siekmanski · July 20, 2023, 11:16:43 AM

Hi guga,

Don't know if this is what you are looking for?
This is the code I use for radians.
You can convert from radians to degrees and visa versa with one multiplication.

Code Select

4 single precision values at once

align 16
OneDivPi    real4 4 dup (0.31830988618379067153776752674)
Pi          real4 4 dup (3.14159265358979323846264338327)

    mulps       xmm0,oword ptr OneDivPi     ; 1/pi to get a 1 pi range
    cvtps2dq    xmm3,xmm0                   ; (4 packed spfp to 4 packed int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2ps    xmm1,xmm3                   ; (4 packed int32 to 4 packed spfp) save the  integral parts
    subps       xmm0,xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3,31                     ; put sign-bits in position, to place values in the right hemispheres
    xorps       xmm0,xmm3                   ; set sign-bits
    mulps       xmm0,oword ptr Pi           ; restore ranges between -1/2 pi to +1/2 pi

And 2 double precision values at once

align 16
OneDivPiDP  real8 2 dup (0.31830988618379067153776752674)
PiDP        real8 2 dup (3.14159265358979323846264338327)

    mulpd       xmm0,oword ptr OneDivPiDP   ; 1/pi to get a 1 pi range
    cvtpd2dq    xmm3,xmm0                   ; (2 packed dpfp to 2 int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2pd    xmm1,xmm3                   ; (2 packed int32 to 2 dpfp) save the  integral parts
    subpd       xmm0,xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3,31                     ; put sign-bits in position, to place values in the right hemispheres
    pshufd      xmm3,xmm3,Shuffle(1,3,0,2)  ; shuffle the sign-bits into place         
    xorpd       xmm0,xmm3                   ; set sign-bits
    mulpd       xmm0,oword ptr PiDP         ; restore ranges between -1/2 pi to +1/2 pi

jj2007 · July 20, 2023, 11:43:20 AM

Quote from: guga on July 20, 2023, 04:41:52 AM
Quote from: Caché GB on July 20, 2023, 01:35:40 AMJJ is on the ball.

Not exactly. The modf code in msvcrt also fails miserably. It returns 0 in ST0 and not the proper fracion, when using 33.0e17 as input

If you use the second version of my proggie (which is SIMD, not FPU), you will see that 33.0e17 as input will not work. However, 33.0e10 will indeed work. As HSE rightly noted, the issue is precision.

P.S.: In the source, change the third roundsd operator to 11. Both 9 and 11 work, but the result will be different for negative inputs. See the RoundSD thread for explanations.

HSE · July 20, 2023, 01:07:36 PM

Quote from: NoCforMe on July 20, 2023, 10:46:35 AMParsing problem? Did you say parsing problem?

No problem (I think). It's more incomplete test. But this Guga's test is going to help. If I fail... You will know. Thanks

guga · July 20, 2023, 02:25:04 PM

Quote from: Siekmanski on July 20, 2023, 11:16:43 AMHi guga,

Don't know if this is what you are looking for?
This is the code I use for radians.
You can convert from radians to degrees and visa versa with one multiplication.

Code Select Expand
4 single precision values at once align 16 OneDivPi real4 4 dup (0.31830988618379067153776752674) Pi real4 4 dup (3.14159265358979323846264338327) mulps xmm0,oword ptr OneDivPi ; 1/pi to get a 1 pi range cvtps2dq xmm3,xmm0 ; (4 packed spfp to 4 packed int32) lose the fractional parts and keep it in xmm3 to save the signs cvtdq2ps xmm1,xmm3 ; (4 packed int32 to 4 packed spfp) save the integral parts subps xmm0,xmm1 ; now it's inside the range, results are values between -0.5 to 0.4999999 pslld xmm3,31 ; put sign-bits in position, to place values in the right hemispheres xorps xmm0,xmm3 ; set sign-bits mulps xmm0,oword ptr Pi ; restore ranges between -1/2 pi to +1/2 pi And 2 double precision values at once align 16 OneDivPiDP real8 2 dup (0.31830988618379067153776752674) PiDP real8 2 dup (3.14159265358979323846264338327) mulpd xmm0,oword ptr OneDivPiDP ; 1/pi to get a 1 pi range cvtpd2dq xmm3,xmm0 ; (2 packed dpfp to 2 int32) lose the fractional parts and keep it in xmm3 to save the signs cvtdq2pd xmm1,xmm3 ; (2 packed int32 to 2 dpfp) save the integral parts subpd xmm0,xmm1 ; now it's inside the range, results are values between -0.5 to 0.4999999 pslld xmm3,31 ; put sign-bits in position, to place values in the right hemispheres pshufd xmm3,xmm3,Shuffle(1,3,0,2) ; shuffle the sign-bits into place xorpd xmm0,xmm3 ; set sign-bits mulpd xmm0,oword ptr PiDP ; restore ranges between -1/2 pi to +1/2 pi

Hi Siekmanski

It´s close to the one i did for SSE2, but i´m not sure i ported correctly, because the resultant value is wrong. I´m temporarilly using JJ´s solution for smaller numbers (Smaller then 1e15, i believe), but the problem is precision as he said.

On your´s, i have some questions. The input number in xmm0 is in radians, right ? Also, what is the resultant number in your shuffle macro ? I don´t know if i ported your version correctly.

here
pshufd xmm3,xmm3,Shuffle(1,3,0,2) what results in Shuffle(1,3,0,2) ?

on mine version, it results in PSHUFD XMM3 XMM3 072 (114 in decimal). The macro i´m using to recreate yoiur version is like: [SHUFFLE | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski working

I ported you version, but it seems not working for big numbers (as in my version as well). I`m testing your version of 2 Double convertion at once 1st.

Code Select


[SHUFFLE | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Siekmanski macro for shuffle. 

[GugaVal: R$ 33e17, 0] ; let´s assume the 2nd real is only 0 for now, just to test the algo.
[Float_DegreetoRadian: R$ (3.1415926535897932384626433832795/180)]

; siekmanski variables
[<16 OneDivPiDP: R$ 0.31830988618379067153776752674, 0.31830988618379067153776752674]
[<16 PiDP: R$ 3.14159265358979323846264338327, 3.14159265358979323846264338327]


    movupd XMM0 X$GugaVal ; In degrees
    movsd XMM1 X$Float_DegreetoRadian ; convert to radians
    mulsd xmm0 xmm1 ; to be used in Siekmanski routine

    mulpd       xmm0 X$OneDivPiDP   ; 1/pi to get a 1 pi range
    cvtpd2dq    xmm3 xmm0                   ; (2 packed dpfp to 2 int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2pd    xmm1 xmm3                   ; (2 packed int32 to 2 dpfp) save the  integral parts
    subpd       xmm0 xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3 31                     ; put sign-bits in position, to place values in the right hemispheres
    pshufd      xmm3 xmm3 {SHUFFLE 1,3,0,2};,Shuffle(1,3,0,2)  ; shuffle the sign-bits into place         
    xorpd       xmm0 xmm3                   ; set sign-bits
    mulpd       xmm0 X$PiDP         ; restore ranges between -1/2 pi to +1/2 pi

the xmm0 results a huge number 5.795....e16 when using as input a number suchh as: 33e17

and at xmm3 it results in 0 when passed by pslld xmm3 31

The difference between your´s and mine is that you preserved the sign of the angle and also yours seems to be limited to 180º. But in both cases, we produce incorrect values for huge values.

Try using a bigger number in input, such as 33e17. It won´t convert to the equivalent angle.

On yours´when i input 870.41º (or 15.192 radians), the result is incorrect. Its is resulting in 0.5164...radians (Something around 29º degrees). But in mine the result is correct. So, in mine version it results the proper equivaletnt angle (2.625 radians = 150.4 degrees) as described in wolfram alpha
https://www.wolframalpha.com/input?i=870.41+degrees

The MASM Forum

News:

Equivalence angle conversion in SSE2

guga

daydreamer

jj2007

HSE

jj2007

Caché GB

HSE

guga

guga

HSE

NoCforMe

Siekmanski

jj2007

HSE

guga