The MASM Forum

General => The Laboratory => Topic started by: guga on July 19, 2023, 05:15:28 PM

Title: Equivalence angle conversion in SSE2
Post by: guga on July 19, 2023, 05:15:28 PM
Hi guys

I´m trying to create a function to convert any angle to it´s equivalence. Example, 870.41 degrees is the same as 150.4º

I´m using only SSE2 for this scope. For small numbers, my routine worked, but fo large numbers it is failig miserably.


; RosAsm syntax

; variables
[Float_Two_Pi_INV: R$ (1/6.2831853071795864769252867665590057683943387987502116419)] ; R$ = Real8 number
[Float_Two_Pi: R$ 6.2831853071795864769252867665590057683943387987502116419]

    movupd xmm1 xmm0 ; number in xmm0
    movupd xmm2 X$Float_Two_Pi_INV ; multiply by 1/(2*pi) . It´s the same as dividing the angle to 360º but using radians instead, since the inputed value in xmm0 is in radian.
    mulpd xmm1 xmm2

    ; calculate floor of the number in xmm1. xmm1 is used as a leftover (The fraction part) and xmm2 is our integer. In fact, i created a macro for that, but this is the true instructions
    CVTTPD2DQ XMM2 XMM1
    CVTDQ2PD XMM2 XMM2
    SUBSD XMM1 XMM2

    movupd xmm1 X$Float_Two_Pi ; multiply the resultant fraction part by 360º (2*pi)
    mulpd xmm2 xmm1
    subpd xmm0 xmm2 ; the converted number is stored in xmm0

The math involving this is simply dividing by 360º and the result convert to it´s floor, then subtract.
Ex:

Degrees = 870.41 is equivalent to 150.41º
Step1) 870.41/360 = 2.4178055555555555555555555555555555555555555555555555555555555555
Step2) 2.4178055555555555... - 2 (it´s floor) => 0.4178055555555555555....
Step3) 0.4178055555555555555 * 360 = 150.41

Other example:
Degrees = 33e17 is equivalent to 240º
Step1) 33e17/360 = 9.16666666666666666666666666e15 = 9166666666666666.66666666666666666
Step2) 9166666666666666.66666666666 - 9166666666666666 (it´s floor) => 0.666666666666666666666
Step3) 0.666666666666666666666 * 360 = 240

So, for small number like 870.41, the routine works, but for the larger number it is failing badly.

I ported a modf routine from msvcrt that also uses SSE2, but it also fails miserably.

What i´m doing wrong ?

I tested both values in wolframalpha to make sure the results are ok, but i´m not being able to make it work for larger numbers such as 33e17, 12.56e100 etc etc etc
Title: Re: Equivalence angle conversion in SSE2
Post by: daydreamer on July 19, 2023, 08:15:11 PM
Quote from: guga on July 19, 2023, 05:15:28 PMI tested both values in wolframalpha to make sure the results are ok, but i´m not being able to make it work for larger numbers such as 33e17, 12.56e100 etc etc etc
gets overflow when such high numbers?too big numbers for real8's?


Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 19, 2023, 08:25:33 PM
Quote from: daydreamer on July 19, 2023, 08:15:11 PMgets overflow when such high numbers?too big numbers for real8's?

(https://i.stack.imgur.com/ogSad.png)

Works fine with the FPU, btw:

include \masm32\MasmBasic\MasmBasic.inc
  Init
  push 360
  FpuSet MbDown64
  Let esi="33.0e17"
  .While 1
Let esi=Input$("Your value: ", esi)
.Break .if Len(esi)==0 ; quit if the string is empty
MovVal ST(0), esi
Print Str$("%5e = ", ST(0))
fidiv stack ; /360
fld st
frndint ; x.123-x
fsub
fimul stack ; *360
Print Str$("%4f\n", ST(0)v) ; print & pop ST
  .Endw
  pop edx
EndOfCode

Your value: 33.0e17
3.3000e+18 = 239.8
Your value: 870.41
8.7041e+02 = 150.4
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 19, 2023, 11:03:23 PM
Hi Guga!

Quote from: guga on July 19, 2023, 05:15:28 PMI´m using only SSE2 for this scope. For small numbers, my routine worked, but fo large numbers it is failig miserably.

SSE only work with REAL8. Like JJ say FPU is better, because internally work with REAL10.

Probably you can jump in precision using DoubleDouble Precision (https://masm32.com/board/index.php?topic=10678.0). I only tested that using FPU, but have to work with SSE (with less precision obviously).

I will try that later. I don't know so much about SSE but perhaps is posible to obtain a disassemble of these few operations :biggrin:
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 19, 2023, 11:24:35 PM
Attached the SSE version of my test program. It requires SSE 4.1.
Title: Re: Equivalence angle conversion in SSE2
Post by: Caché GB on July 20, 2023, 01:35:40 AM
JJ is on the ball.

Hi guga
Is this what the modf routine that you ported from msvcrt, that also uses SSE2, looks like?

ModulusDouble proc

         movapd  xmm2, xmm0
          divsd  xmm2, xmm1
      cvttsd2si  ecx, xmm2
           movd  eax, xmm2
            shr  eax, 31
            sub  ecx, eax
       cvtsi2sd  xmm2, ecx
          mulsd  xmm1, xmm2
          subsd  xmm0, xmm1
            ret

ModulusDouble endp

;#############################################################################################################

JJs_on_the_ball proc
 
       ; movlps  xmm0, Res8
          mulsd  xmm0, FLT8(0.002777777777777777777777778)
         movaps  xmm1, xmm0
        roundsd  xmm1, xmm0, 17              ; if const > 17 = fail
          subsd  xmm0, xmm1
          mulsd  xmm0, FLT8(360.0)
            ret

JJs_on_the_ball endp

;#############################################################################################################

Test_The_Thing proc

      local  ModulusJJ:real8
      local  Modulus:real8

          movsd  xmm1, FLT8(360.0)

        ; movsd  xmm0, FLT8(8.7041e+9)    ; good
          movsd  xmm0, FLT8(1.2345e+11)    ; good

        ; movsd  xmm0, FLT8(1.2345e+12)    ; fail
        ; movsd  xmm0, FLT8(3.3000e+18)    ; fail
         invoke  ModulusDouble
          movsd  Modulus, xmm0            ; Modulus = 240.00000000000000 double

            nop

       ; movlps  xmm0, FLT8(1.2345e10)  ; good
         movlps  xmm0, FLT8(1.2345e11)  ; good
       ; movlps  xmm0, FLT8(1.2345e18)  ; fail
         invoke  JJs_on_the_ball
          movsd  ModulusJJ, xmm0        ; ModulusJJ = 240.00000715255737 double

            ret

Test_The_Thing endp
Quotehttps://softpixel.com/~cwright/programming/simd/sse.php

SSE — MXCSR
The MXCSR register is a 32-bit register containing flags for control and status information regarding SSE instructions.
As of SSE3, only bits 0-15 have been defined.

I can't find this MXCSR register to see what is what. Maybe daydreamer is right with the Overflow. Who knows.

OE - bit 3 - Overflow Flag
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 02:42:40 AM
Hi Guga,

I can easily make the double double precision math with FPU...

    fSlvRR dd1 = 870.41

    fSlvRR dd1 = dd1/rr360

    fSlvRR dd1 = dd1 - trunc(dd1)

    fSlvRR dd1 = dd1 * rr360

  dd1  rr  <8.70409999999999970e+002, 3.18323145620524910e-014>

  first step  =  2.4178055555555555555555555555555

  second step =  0.41780555555555555555555555555557

  third step  =  150.41000000000000000000000000000

Press any key to continue...

but apparently never tested the parser with exponents with more than one digit  :biggrin:

Then 33.0e17 is not loaded  :sad:

Without the parser is not going to be useful now. I have to see that.

Quote from: guga on July 19, 2023, 05:15:28 PMI tested both values in wolframalpha to make sure the results are ok

If You have Win10 just use calculator (that have quadruple precision in scientific mode).

Regards, HSE.


Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 20, 2023, 04:33:20 AM
Ok, Guys, tks.

But there is a problem. The number is already preloaded in xmm0 register. If i use the JJ´s method (FPU), how can i convert back the proper numbers as inpputed ? I mean, in SSE2 how can i convert a xmm0 to a Real8 valur to be used as a varaiable for the fld ?

I mean, in order to make it work, i modified JJ´s code as:


; JJ´s code in FPU routine only. I mena, input is directly in Real8 to be loaded in Fpu

[GugaStartVal: R$ 33e17]
[GugaTmpValue: T$ 0] ; To store it back onto a TenByte data
[Float_360: D$ 360]

___________________________

        finit
        fld R$GugaStartVal | fstp T$GugaTmpValue | fld T$GugaTmpValue
        fidiv D$Float_360
        fld ST0
        frndint
        fsubp ST1 ST0
        fld1
        faddp ST1 ST0
        fimul D$Float_360
___________________________

Btw. A question on your code. Why using fidiv rayther then a multiplication but the inverterd value,  like fmul R$(1/360) ? I tried with fmul, but the rtesultant value was incorrect, while in fidiv it was ok.

The problem is that GugaStartVal is already loaded in xmm0 register. The code is part of a routine i´m creating for fast and precise tangent, but using more parameters as input, such as pointer to a value, a flag to identify the type of value (integer, float, Real8, Quadword)

It works like this:
__________________________
; Parameters flag equates
[SSE_TRIG_INT 1
 SSE_TRIG_FLOAT 2
 SSE_TRIG_REAL8 4
 SSE_TRIG_QWORD 8]
__________________________
; variable to convert degree to radian
[Float_DegreetoRadian: R$ (3.1415926535897932384626433832795/180)]

__________________________
; Small macro to extract the integer and fractional part of a xmm register
[SSE_XTRACT_INTEGER | cvttpd2dq #1 #2 | cvtdq2pd #1 #1 | subsd #2 #1]
__________________________
    mov eax D@IsDegreeFlag ; If the input is represented in degrees, convert it to radian
    .Test_If eax &TRUE

        ; 1st check theformat of the input and place it onto xmm0
        mov eax D@Flag
        Test_if eax SSE_TRIG_INT
            cvtsi2sd xmm0 D@pNumber ; converts a signed integer to double @pNumber = Pointer to the number stored in a inputed variable
        Test_Else_if eax SSE_TRIG_FLOAT
            cvtss2sd xmm0 X@pNumber ; converts a single precision float to double
        Test_Else_if eax SSE_TRIG_REAL8
            mov eax D@pNumber; | movsd XMM0 X$eax
            movupd XMM0 X$eax
        Test_Else_if eax SSE_TRIG_QWORD
            mov eax D@pNumber | movq XMM0 X$eax
        Test_Else
            xor eax eax | ExitP ; return 0 Invalid parameter
        Test_End
        movsd XMM1 X$Float_DegreetoRadian ; convert degres to radians
        mulsd xmm0 xmm1
        movsd X@pConvertedNumberDis xmm0
        lea eax D@pConvertedNumberDis
        mov D@pNumber eax

    .Test_End

    ; added now This will ensure the angle is always in between 0 and 360º. It convert any huuge angle to it´s equivalent inside the limits of 360º
    mov eax D@pNumber | movupd xmm0 X$eax
    .SSE_D_If_Or xmm0 > X$Float_Two_Pi, xmm0 < X$Float_Minus_Two_Pi ; SSE macros for comparition, similar as IF macro, but using COMISD instead. Here we are checking if the value in xmm0 is outsie de limites of an angle (360º)
        ; Angle is bigegr then 360º (2*pi radian)
        movupd xmm1 xmm0 ; number in xmm0 - expressed in radians as we previously converted
        movupd xmm2 X$Float_Two_Pi_INV | mulpd xmm1 xmm2
        SSE_XTRACT_INTEGER xmm2, xmm1; calculate floor of the number in xmm1. xmm1 is used as a leftover (The fraction part) and xmm2 is our integer
        SSE_D_If xmm1 >s X$Float_Zero

                ; <---------------------------- JJ´s routine must be here.
            movupd X$GugaStartVal XMM0 ; <----------- This is not converting back properly
            finit
            fld R$GugaStartVal | fstp T$GugaTmpValue | fld T$GugaTmpValue
            fidiv D$Float_360
            fld ST0
            frndint
            fsubp ST1 ST0
            fld1
            faddp ST1 ST0
            fimul D$Float_360
            fstp R$GugaTmpValue2
            movupd XMM0 X$GugaTmpValue2 ; put it back to xmm0 register

        SSE_D_Else
            movupd xmm1 X$Float_Two_Pi | mulpd xmm2 xmm1
            subpd xmm0 xmm2
        SSE_D_End_If
        movsd X@pConvertedNumberDis xmm0
        lea eax D@pConvertedNumberDis
        mov D@pNumber eax
    .SSE_D_End_If



The problem is happening when i try to copy the content of xmm0 to Fpu variable, such as
movupd X$GugaStartVal XMM0 ; --- the resultant value doing this, does not works whatsoever.

It only works, we use fld R$GugaStartVal directly. So witout passing it onto xmm0

How do i convert it back from xmm0 to GugaStartVal ?
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 20, 2023, 04:41:52 AM
Quote from: Caché GB on July 20, 2023, 01:35:40 AMJJ is on the ball.

Hi guga
Is this what the modf routine that you ported from msvcrt, that also uses SSE2, looks like?

ModulusDouble proc

         movapd  xmm2, xmm0
          divsd  xmm2, xmm1
      cvttsd2si  ecx, xmm2
           movd  eax, xmm2
            shr  eax, 31
            sub  ecx, eax
       cvtsi2sd  xmm2, ecx
          mulsd  xmm1, xmm2
          subsd  xmm0, xmm1
            ret

ModulusDouble endp

Not exactly. The modf code in msvcrt also fails miserably. It returns 0 in ST0 and not the proper fracion, when using 33.0e17 as input


The modf routine from msvcrt i started porting to rosasm as (But, i quited after seing it´s also not working):


[<16 SSE_MODF_BNS1: Q$ 0433, 0433] ;R$ 5.3112056927934002e-321, R$ 5.3112056927934002e-321]
[<16 SSE_MODF_Sign: Q$ 08000000000000000, 08000000000000000]
[<16 SSE_MODF_Mantissa: Q$ 0FFFFFFFFFFFFF, 0FFFFFFFFFFFFF];R$ 2.22507385850720082e-308, R$ 2.22507385850720082e-308]
[<16 SSE_MODF_Zero: R$ 0, R$ 0]

; https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2010/bk4c380c(v=vs.100)
[ModfReturnedPart: R$ 0]

Proc SSE2_modf:
    Arguments @pNumber
    Structure @TempStorage 16, @pConvertedNumberDis 0

    mov eax D@pNumber | mov ebx eax
    movq XMM0 X$eax;@pNumber
    movapd XMM2 X$SSE_MODF_BNS1
    movapd XMM3 XMM0
    movapd XMM1 XMM0
    movapd XMM4 XMM0
    movapd XMM6 XMM0
    psllq XMM0 01
    psrlq XMM0 035
    psrlq XMM3 034
    andpd XMM4 X$SSE_MODF_Sign
    movd eax XMM0
    psubd XMM2 XMM0
    ;mov ecx ModfReturnedPart;D$esp+0C
    psrlq XMM1 XMM2
    psllq XMM1 XMM2
    movd edx XMM3
    cmp eax 03FF | jl O6>  ; Code010054452
    cmp eax 0432 | jg P5>  ; Code01005445B
    movq X$ModfReturnedPart  XMM1
    subsd XMM6 XMM1
    orpd XMM6 XMM4
    movq X@pConvertedNumberDis XMM6
    fld R@pConvertedNumberDis
    ExitP
    ;ret


@Code010054452: O6:
    movq X$ModfReturnedPart  XMM4
    fld R@pConvertedNumberDis;$esp+04
    ExitP
    ;ret


@Code01005445B: P5:
    movq XMM0 X$ebx;@pNumber
    .If eax <> 07FF
        movq X$ModfReturnedPart  XMM0
        fldz
        If edx =>s 0800
            fchs
        End_If
    .Else
        ; ret_inf_nan
        movapd XMM1 XMM0
        addsd XMM0 XMM0
        movq X$ModfReturnedPart  XMM0
        andpd XMM0 X$SSE_MODF_Mantissa
        cmpneqpd XMM0 X$SSE_MODF_Zero
        pextrw eax XMM0 0
        andpd XMM0 XMM1
        orpd XMM0 XMM4
        mov edx 03EF
        If eax = 0
            movq X@pConvertedNumberDis XMM0
            fld R@pConvertedNumberDis
        Else
            ; calibration error
            xor eax eax
            movlpd X@pConvertedNumberDis XMM0
            fld R@pConvertedNumberDis
        End_If

    .End_If


EndP
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 05:29:45 AM
Skipping parser problem...

with more precision second number have solution:
  dd1 rr <3.30000000000000000e+017, 0.00000000000000000e+000>
  dd1      = 330000000000000000.00000000000000

  first step  =  916666666666666.66666666666666666

  second step =  0.66666666666666666435370203203092

  third step  =  240.00000000000000099999999999999

Press any key to continue...

but with third number there is no way because result hardly can have a fraccional part with this precision:
  dd1 rr <1.25599999999999990e+101, 6.23951297965124220e+084>
  dd1      = 12560000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0

  first step  =  34888888888888888888888888888888000000000000000000000000000000000000000000000000000000000000000000.0

  second step = 0.0

  third step  = 0.0

Press any key to continue...



Thinking a little, perhaps this last only could be solved with arbitrary precision. You can define how many fraccional places you want, then precision is automatic (and I presume very very slow)  :thumbsup: 
Title: Re: Equivalence angle conversion in SSE2
Post by: NoCforMe on July 20, 2023, 10:46:35 AM
Parsing problem? Did you say parsing problem?
Is there something I can do here to help solve that? Lay it on me, baby. I'll be happy to come up with a parsing solution. (Sorry, can't help you w/the math here.)
Title: Re: Equivalence angle conversion in SSE2
Post by: Siekmanski on July 20, 2023, 11:16:43 AM
Hi guga,

Don't know if this is what you are looking for?
This is the code I use for radians.
You can convert from radians to degrees and visa versa with one multiplication.

4 single precision values at once

align 16
OneDivPi    real4 4 dup (0.31830988618379067153776752674)
Pi          real4 4 dup (3.14159265358979323846264338327)

    mulps       xmm0,oword ptr OneDivPi     ; 1/pi to get a 1 pi range
    cvtps2dq    xmm3,xmm0                   ; (4 packed spfp to 4 packed int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2ps    xmm1,xmm3                   ; (4 packed int32 to 4 packed spfp) save the  integral parts
    subps       xmm0,xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3,31                     ; put sign-bits in position, to place values in the right hemispheres
    xorps       xmm0,xmm3                   ; set sign-bits
    mulps       xmm0,oword ptr Pi           ; restore ranges between -1/2 pi to +1/2 pi

And 2 double precision values at once

align 16
OneDivPiDP  real8 2 dup (0.31830988618379067153776752674)
PiDP        real8 2 dup (3.14159265358979323846264338327)

    mulpd       xmm0,oword ptr OneDivPiDP   ; 1/pi to get a 1 pi range
    cvtpd2dq    xmm3,xmm0                   ; (2 packed dpfp to 2 int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2pd    xmm1,xmm3                   ; (2 packed int32 to 2 dpfp) save the  integral parts
    subpd       xmm0,xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3,31                     ; put sign-bits in position, to place values in the right hemispheres
    pshufd      xmm3,xmm3,Shuffle(1,3,0,2)  ; shuffle the sign-bits into place         
    xorpd       xmm0,xmm3                   ; set sign-bits
    mulpd       xmm0,oword ptr PiDP         ; restore ranges between -1/2 pi to +1/2 pi
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 20, 2023, 11:43:20 AM
Quote from: guga on July 20, 2023, 04:41:52 AM
Quote from: Caché GB on July 20, 2023, 01:35:40 AMJJ is on the ball.

Not exactly. The modf code in msvcrt also fails miserably. It returns 0 in ST0 and not the proper fracion, when using 33.0e17 as input

If you use the second version of my proggie (https://masm32.com/board/index.php?action=dlattach;attach=15835) (which is SIMD, not FPU), you will see that 33.0e17 as input will not work. However, 33.0e10 will indeed work. As HSE rightly noted, the issue is precision.

P.S.: In the source, change the third roundsd operator to 11. Both 9 and 11 work, but the result will be different for negative inputs. See the RoundSD thread for explanations. (https://masm32.com/board/index.php?topic=10992.0)
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 01:07:36 PM
Quote from: NoCforMe on July 20, 2023, 10:46:35 AMParsing problem? Did you say parsing problem?

:biggrin: No problem (I think). It's more incomplete test. But this Guga's test is going to help. If I fail... You will know. Thanks  :thumbsup:
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 20, 2023, 02:25:04 PM
Quote from: Siekmanski on July 20, 2023, 11:16:43 AMHi guga,

Don't know if this is what you are looking for?
This is the code I use for radians.
You can convert from radians to degrees and visa versa with one multiplication.

4 single precision values at once

align 16
OneDivPi    real4 4 dup (0.31830988618379067153776752674)
Pi          real4 4 dup (3.14159265358979323846264338327)

    mulps       xmm0,oword ptr OneDivPi     ; 1/pi to get a 1 pi range
    cvtps2dq    xmm3,xmm0                   ; (4 packed spfp to 4 packed int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2ps    xmm1,xmm3                   ; (4 packed int32 to 4 packed spfp) save the  integral parts
    subps       xmm0,xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3,31                     ; put sign-bits in position, to place values in the right hemispheres
    xorps       xmm0,xmm3                   ; set sign-bits
    mulps       xmm0,oword ptr Pi           ; restore ranges between -1/2 pi to +1/2 pi

And 2 double precision values at once

align 16
OneDivPiDP  real8 2 dup (0.31830988618379067153776752674)
PiDP        real8 2 dup (3.14159265358979323846264338327)

    mulpd       xmm0,oword ptr OneDivPiDP   ; 1/pi to get a 1 pi range
    cvtpd2dq    xmm3,xmm0                   ; (2 packed dpfp to 2 int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2pd    xmm1,xmm3                   ; (2 packed int32 to 2 dpfp) save the  integral parts
    subpd       xmm0,xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3,31                     ; put sign-bits in position, to place values in the right hemispheres
    pshufd      xmm3,xmm3,Shuffle(1,3,0,2)  ; shuffle the sign-bits into place         
    xorpd       xmm0,xmm3                   ; set sign-bits
    mulpd       xmm0,oword ptr PiDP         ; restore ranges between -1/2 pi to +1/2 pi


Hi Siekmanski

It´s close to the one i did for SSE2, but i´m not sure i ported correctly, because the resultant value is wrong. I´m temporarilly using JJ´s solution for smaller numbers (Smaller then 1e15, i believe), but the problem is precision as he said.


On your´s, i have some questions. The input number in xmm0 is in radians, right ? Also, what is the resultant number in your shuffle macro ? I don´t know if i ported your version correctly.

here
pshufd      xmm3,xmm3,Shuffle(1,3,0,2) what results in Shuffle(1,3,0,2) ?

on mine version, it results in PSHUFD XMM3 XMM3 072 (114 in decimal). The macro i´m using to recreate yoiur version is like:  [SHUFFLE | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski working

I ported you version, but it seems not working for big numbers (as in my version as well). I`m testing your version of 2 Double convertion at once 1st.



[SHUFFLE | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Siekmanski macro for shuffle.

[GugaVal: R$ 33e17, 0] ; let´s assume the 2nd real is only 0 for now, just to test the algo.
[Float_DegreetoRadian: R$ (3.1415926535897932384626433832795/180)]

; siekmanski variables
[<16 OneDivPiDP: R$ 0.31830988618379067153776752674, 0.31830988618379067153776752674]
[<16 PiDP: R$ 3.14159265358979323846264338327, 3.14159265358979323846264338327]


    movupd XMM0 X$GugaVal ; In degrees
    movsd XMM1 X$Float_DegreetoRadian ; convert to radians
    mulsd xmm0 xmm1 ; to be used in Siekmanski routine

    mulpd       xmm0 X$OneDivPiDP   ; 1/pi to get a 1 pi range
    cvtpd2dq    xmm3 xmm0                   ; (2 packed dpfp to 2 int32) lose the fractional parts and keep it in xmm3 to save the signs
    cvtdq2pd    xmm1 xmm3                   ; (2 packed int32 to 2 dpfp) save the  integral parts
    subpd       xmm0 xmm1                   ; now it's inside the range, results are values between -0.5 to 0.4999999
    pslld       xmm3 31                     ; put sign-bits in position, to place values in the right hemispheres
    pshufd      xmm3 xmm3 {SHUFFLE 1,3,0,2};,Shuffle(1,3,0,2)  ; shuffle the sign-bits into place         
    xorpd       xmm0 xmm3                   ; set sign-bits
    mulpd       xmm0 X$PiDP         ; restore ranges between -1/2 pi to +1/2 pi

the xmm0 results a huge number 5.795....e16 when using as input a number suchh as: 33e17

and at xmm3 it results in 0 when passed by pslld       xmm3 31


The difference between your´s and mine is that you preserved the sign of the angle and also yours seems to be limited to 180º. But in both cases, we produce incorrect values for huge values.

Try using a bigger number in input, such as 33e17. It won´t convert to the equivalent angle.

On yours´when i input 870.41º (or 15.192 radians), the result is incorrect. Its is resulting in 0.5164...radians (Something around 29º degrees). But in mine the result is correct. So, in mine version it results the proper equivaletnt angle (2.625 radians = 150.4 degrees) as described in wolfram alpha
https://www.wolframalpha.com/input?i=870.41+degrees
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 09:47:51 PM
Hi Guga!

Quote from: guga on July 19, 2023, 05:15:28 PMi´m not being able to make it work for larger numbers such as 33e17, 12.56e100 etc etc etc

Where you obtain this irreal angles? Is possible that previous process is not so smart.
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 20, 2023, 10:03:08 PM
Siekmanski

i guess i found the limit for out SSE routines.

Whenever the input number is bigger then 8e11,it fails. Also, values closer to 8, such as 7.999999999e11. So, it seems that we have a safe limit for 7e11 to calculate this way.

I´m trying another solution for really huge numbers, such as 1.4564457e100 or also, 1.44654e900 etc etc. I´ll try to make it with tables after calcuulating the log10 of each one of them. It won´t be pretty, and may slow down a biut the algo, but perhaps, it would work with somewhat good precision. I´ll try to adapt the log routine for SSE we did a long time ago and see what happens
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 20, 2023, 10:07:49 PM
Quote from: HSE on July 20, 2023, 09:47:51 PMHi Guga!

Quote from: guga on July 19, 2023, 05:15:28 PMi´m not being able to make it work for larger numbers such as 33e17, 12.56e100 etc etc etc

Where you obtain this irreal angles? Is possible that previous process is not so smart.


Hi HSE

those numbers are for testing only. I created a SSE version for tangent, and realized that maybe someone could input a really high value such as these ones i posted (Angles of 33e17, 12.56e100 etc etc).

I wanted to know if there was a way to calculate tangent of angles that extrapolates the limits of 360º, and it works fine accordying to wolframalpha, but it do have some limitations both on my and Siekmanski versions. (The limit seems to be around  8e11, so anything below 7e11 would be safe to calculate using SSE2). On JJ´s version, this limit is a bit higher because it uses TenByte (80 bits) for the scope, but it also may decreases the precision and will end with the same limitation as in SSE2 version.

In wolframalpha, all it is doing to calculate the tangent of huge numbers is:
1st - convert those value to the equivalent angle inside 360º
2nd, calculate the tangent from this equivalent angle

I´m trying to do the same thing, but i´m facing some limitations using SSE2. Perhaps i´ve found aa solution as i commented to Siekmanski, but i´m still working on it to see what happens.

For example, in wolframalpha if you input 33e17 degrees, it will convert it to the equivalent angle which is something around 240
https://www.wolframalpha.com/input?i=33e17+degrees
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 10:20:37 PM
Quote from: guga on July 20, 2023, 10:07:49 PMinput

The input to this calculations is wrong. Something is turning around billions and billions of times  :biggrin:

Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 20, 2023, 10:22:01 PM
Hi Guga,

Can you explain the practical relevance of translating an angle of 33e17 to its 0...360° equivalent? Do you have a specific project that requires this?

Using the FPU:
8.704100e+02 = 150.4
1.234560e+06 = 120.0
1.234560e+08 = 120.0
1.234560e+10 = 120.0
1.234560e+12 = 120.0
1.234560e+14 = 120.0

Using SIMD instructions:
8.704100e+02 = 150.4
1.234560e+06 = 120.0
1.234560e+08 = 120.1
1.234560e+10 = 129.9
1.234560e+12 = 27.65
1.234560e+14 = 244.8

I put everything in a macro now, see attached source, so that I can switch easily between SIMD and FPU mode.
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 20, 2023, 11:29:16 PM
Hi JJ

I´m creating a variation of libm_sse2_tan_precise from msvcrt which is fast and also precise. It works fine for small numbers (i.e: inside the range of 360º, no matter if the input is a negative or positive angle btw). The problem is that there are some angles (positive or negative) higher than that wich also may need to be calculated (such as the equivalence angles that extrapolates the limits as 870.41º or any other weird weird weird values such as 33e11, 5.589e17, 5.9587e150 etc etc)

I realize that is unlikelly to use numbers higher then 360º, but since wolframalpha suceeded to calculate the tangent of very weird values with a very good precision, i´m trying to do the same and also maintain speed and precision as well.

I´ll give a test on your new file, but i´m trying to do it for SSE2 only, since i didn´t implemented SSE4 yet for RosAsm and still there are people that don´t have a newer processor that handles SSE4, so i can´t create yet a variation that can handle both.

Btw. I´m doing these tests for positive numbers 1st that extrapolates the limits, and then see if it is ok to do the same for negative angles. Siekmanski test works fine but not for negative angles from where it seems to not calculate the equivalent angle properly when the input is negative.The same for my original version, btw, that also fails to handle the equivalence angle when the input is negative. That´s why i´m trying for positive 1st and later do the same for negative numbers, i mean trying also to find the equivalence angle when the input is negative. (As wolframalpha do)
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 11:42:27 PM

If you know what precision in angle result you need, you can know the maximum number valid.

Raymond must know the exact value. But for example, if you want integer angles, any REAL8 number bigger than 1.0e17 don't contain the final angle.

Quote from: guga on July 20, 2023, 11:29:16 PM5.9587e150
If this number is a REAL8 number don't have any information about the angle you need.
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 20, 2023, 11:54:45 PM
Quote from: jj2007 on July 20, 2023, 10:22:01 PM1.234560e+08 = 120.0
1.234560e+10 = 120.0
1.234560e+12 = 120.0
1.234560e+14 = 120.0

This numbers don't contain the angle. You need to fill all fraccional positions (unless you know that all missing positions are zero).

Quote from: jj2007 on July 20, 2023, 10:22:01 PMfrndint

By default FPU round to nearest, not to floor. You have to modify that. For example Biterider's macro:
; ——————————————————————————————————————————————————————————————————————————————————————————————————
; Macro:      fInt
; Purpose:    Calculate the integer part of the content of st0.
; Arguments:  None.
; Return:     Nothing.

fInt macro
  sub xsp, @WordSize                                    ;;Reserve stack place for one word
  fstcw WORD ptr [xsp]                                  ;;Store FPU control word
  push XWORD ptr [xsp]                                  ;;Duplicate value
  BitSet WORD ptr [xsp], (BIT10 or BIT11)               ;;Modify the control word, int(x) = Truncate (toward 0)
  fldcw WORD ptr [xsp]                                  ;;Restore modified FPU control word
  frndint                                               ;;Round down
  fldcw WORD ptr[xsp + @WordSize]                       ;;Restore previous FPU control word
  add xsp, 2*@WordSize                                  ;;Restore stack. Don't use pop xax. We'll not destroy it.
endm
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 21, 2023, 01:36:52 AM
Quote from: guga on July 20, 2023, 11:29:16 PMBtw. I´m doing these tests for positive numbers 1st that extrapolates the limits, and then see if it is ok to do the same for negative angles.

Quote from: jj2007 on July 20, 2023, 11:43:20 AMP.S.: In the source, change the third roundsd operator to 11. Both 9 and 11 work, but the result will be different for negative inputs. See the RoundSD thread for explanations. (https://masm32.com/board/index.php?topic=10992.0)

Quote from: HSE on July 20, 2023, 11:54:45 PM
Quote from: jj2007 on July 20, 2023, 10:22:01 PMfrndint

By default FPU round to nearest, not to floor. You have to modify that.

  Init
  Read i$()
  Cls 3
  push 360
  FpuSet MbDown64 (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1191)
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 21, 2023, 02:21:46 AM
Quote from: jj2007 on July 21, 2023, 01:36:52 AMFpuSet MbDown64

 :thumbsup:
Title: Re: Equivalence angle conversion in SSE2
Post by: raymond on July 21, 2023, 06:13:31 AM
Precision, precision, precision, ......

The precision available with the equipment using "floating point" in currently available computers has a specific limit. For REAL4, it is the equivalent to 7 significant digits. For example, even if you know the square root of 2.0 would be equal to

1.41421 35623 73095 04880 16887 24209 69807 85696 71875 37694 80731 76679 73799 07324 78462 10703 88503 87534 32764 15727

a REAL4 would only provide it at best as 1.4142136 even if you tried to feed it manually with the 100 decimal digits.

Before wasting time trying to handle astronomical numbers of angles with the SSE, one should examine the necessity of converting such numbers having more than 7 digits in the integer portion, i.e. numbers in degrees expresssed in scientific notation as >9.0e7.

Where and when would such numbers be generated???
Title: Re: Equivalence angle conversion in SSE2
Post by: NoCforMe on July 21, 2023, 06:27:07 AM
But why limit yourself to 7 digits of precision when you have REAL10s available?
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 21, 2023, 07:26:14 AM
This is a strange use case, and I am still waiting for a convincing answer from Guga why he needs that.

Generally speaking, the Windows x64 ABI does allow the use of the FPU. From what I saw in The Laboratory in the past, the FPU is a) much more precise but b) not slower than SIMD code. The only convincing reason to not use it is if you have lots of data that you can process in parallel. That is the only area where SIMD shines.
Title: Re: Equivalence angle conversion in SSE2
Post by: six_L on July 21, 2023, 06:08:51 PM
Hi,all
interesting...
If the Laser weapon hits the missile,what are the needed precision about the following image?
R=500km
α=α0+0.000000000000001°
β=β0-0.000000000000001°
Title: Re: Equivalence angle conversion in SSE2
Post by: mineiro on July 21, 2023, 09:42:46 PM
This is one example that you can need a higher precision:

bellow it's a really "big" string:
m   i   n   e   i   r   o
6dh,69h,6eh,65h,69h,72h,6fh
109,105,110,101,105,114,111

1/x + uint = 1/x +uint = 1/x + uint = ...

Example to letters "mi", first will be added +1, nexts no.
1/((1/109+1) + 105) = 0,009522985

Getting "encoded" values back, LIFO (last in, first out:
1/0,009522985 = 105,009091162 - 105 = 0,009091162
1/0,009091162 = 109,996939885 - 109 = ...

In linux I'm using gnu multiple precision arithmetic library (bignum). Limits are computer memory.
https://gmplib.org/
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 22, 2023, 10:02:01 AM
It seems harder then i thought. Using roundsd seems more easier, but it don´t have any similar opcode for SSE2. Can it be emulated ?

I gave a testa on arghhh bard, and it show me a code that could emulate roundsd, but made for visualstudio.
int round_pd_sse2(double x) {
  // Isolate the fractional part of the floating-point value.
  double f = x - (int)x;

  // Round the fractional part to the nearest integer.
  int i = (int)f + 0.5;

  // If the fractional part is exactly 0.5, round to the nearest even integer.
  if (f == 0.5) {
    i = (i % 2 == 0) ? i : i - 1;
  }

  return i;
}

I doubt it would work, but can someone test or port it ? The correspnndant disassmbled code (accordlying to godbolt) is:

__real@3fe0000000000000 DQ 03fe0000000000000r   ; 0.5

_f$ = -16                                         ; size = 8
tv75 = -8                                         ; size = 4
_i$ = -4                                                ; size = 4
_x$ = 8                                       ; size = 8
int round_pd_sse2(double) PROC                   ; round_pd_sse2
        push    ebp
        mov     ebp, esp
        sub     esp, 16                             ; 00000010H
        cvttsd2si eax, QWORD PTR _x$[ebp]
        cvtsi2sd xmm0, eax
        movsd   xmm1, QWORD PTR _x$[ebp]
        subsd   xmm1, xmm0
        movsd   QWORD PTR _f$[ebp], xmm1
        cvttsd2si ecx, QWORD PTR _f$[ebp]
        cvtsi2sd xmm0, ecx
        addsd   xmm0, QWORD PTR __real@3fe0000000000000
        cvttsd2si edx, xmm0
        mov     DWORD PTR _i$[ebp], edx
        movsd   xmm0, QWORD PTR _f$[ebp]
        ucomisd xmm0, QWORD PTR __real@3fe0000000000000
        lahf
        test    ah, 68                              ; 00000044H
        jp      SHORT $LN2@round_pd_s
        mov     eax, DWORD PTR _i$[ebp]
        and     eax, -2147483647              ; 80000001H
        jns     SHORT $LN6@round_pd_s
        dec     eax
        or      eax, -2                           ; fffffffeH
        inc     eax
$LN6@round_pd_s:
        test    eax, eax
        jne     SHORT $LN4@round_pd_s
        mov     ecx, DWORD PTR _i$[ebp]
        mov     DWORD PTR tv75[ebp], ecx
        jmp     SHORT $LN5@round_pd_s
$LN4@round_pd_s:
        mov     edx, DWORD PTR _i$[ebp]
        sub     edx, 1
        mov     DWORD PTR tv75[ebp], edx
$LN5@round_pd_s:
        mov     eax, DWORD PTR tv75[ebp]
        mov     DWORD PTR _i$[ebp], eax
$LN2@round_pd_s:
        mov     eax, DWORD PTR _i$[ebp]
        mov     esp, ebp
        pop     ebp
        ret     0
int round_pd_sse2(double) ENDP                   ; round_pd_sse2

Is this emulation correct ? Can it be possible to recreate the same functionality of  roundsd using SSE2 instructions ?
Title: Re: Equivalence angle conversion in SSE2
Post by: daydreamer on July 22, 2023, 01:48:19 PM
Guga
SSE way of doing all kinds of math instructions it lack code itself
Advanced way :You can write a roundsd macro and in masm use nokeyword roundsd to make redefine roundsd mnemonic to macro instead possible
.data
Round real8 0.5,0.5
.code
Addpd xmm0,round
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 22, 2023, 03:06:51 PM
Hi Daydreamer

Ok, but will it work for huge values when trying to create an equivalent angle for weird angles such as 4.845978e11, 4.567972e45 etc etc ?

I´m trying to see if there is a way to avoid the normnal limitations and make possible to retrieve whatever fraction part derives from a number after it is divided by 360, for example. So, no matter if the number is 1.235468e45, whenever it is divided by other value, it will always (or should, at least) results on a integer and a fractional part (that is the one, i´m trying to retrieve to calculate the equivalent angle)

I know it is really really unusual to happens someone using such values for calculating tanget, arctn, cos etc but it could be good to prevent strange situations when using such functions to handle color conversion routines.

I´m rebuiling an old routine i made from convert RGB to CIELCh that, in fact, have almost no limitations, but it uses tan and atan functions. Both i suceeded to port to SSE, but in very rare situations, we may have some calculations that ends on those weird numbers and, in order to avoid using limits inside the function, such as Jmp If above or below some limits etc, i´m trying to force the tangent function (or others trigonometric functions) to handle those situations more properly.

The old conversion routine i made for RGB to CieLCH have a error in design, because i ended creating a function to handle the luminance, Hue and Chroma as packets limited withing thesame levels of gray.

The result was not bad, because it ends to categorize pixels by a relation between their hues and luma, but killed the initiall funcionality of the convertion itself.

The problem of the converter (i suppose) is that even if i can attach a luma level (gray) to a certain hue level, the same thing seems to not work for chroma, which results on a image too blue, or green, or pink, etc if i change the variables i´m using. Perhaps, if i remove the limitation of chroma, the algorithm would work as expected insetad blueing the image whenever i decrease the luma.

Making a better tan/atan routines can not only speed up the code a lot, buut also help me fix those problems and find out what exactly is going on with the LCH relations that are causing the algo to fail.
Title: Re: Equivalence angle conversion in SSE2
Post by: TimoVJL on July 22, 2023, 10:11:50 PM
Is x64 in dead end, as most most accurate calculations moves to GPU ?
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 22, 2023, 10:32:02 PM
Quote from: jj2007 on July 21, 2023, 07:26:14 AMGenerally speaking, the Windows x64 ABI does allow the use of the FPU. From what I saw in The Laboratory in the past, the FPU is a) much more precise but b) not slower than SIMD code. The only convincing reason to not use it is if you have lots of data that you can process in parallel. That is the only area where SIMD shines.

Quote from: TimoVJL on July 22, 2023, 10:11:50 PMIs x64 in dead end, as most most accurate calculations moves to GPU ?

So you are adding a third case, the GPU. Do you have a link with an example? I've heard the GPU can do lots of useful things, but "accuracy" was not the prime concern there.
Title: Re: Equivalence angle conversion in SSE2
Post by: TimoVJL on July 22, 2023, 10:57:08 PM
No, but i read, that science world started some years ago using GPUs, as CPU can't help them enough.
Just check something about nvidia and why China is in black list.

Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 23, 2023, 11:08:20 AM
Ok, guys

Done the hardest part. Now i´m trying to find some exceptions or cases were handle weird values in other situations. So far, i suceeded to find equivalent angles of weird values such as:

840.41º = 150.41º
33e17º = 240º
2.45481458182487182147878e304º = 68.1458182487170915º
17e200º = 79.9999999999997157º
365º = 5º
1000º = 280º
900º = 180º
2.1754545e21º = 154.54499999999º

At the moment, this is for positive angles only. I´ll review the code and math to check for errors and later try to do the same for negative angles as well.

Once it´s done, i´ll post here the matematical equation i made for this weird thing and the full code and a small dll to we benchmark this.

Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 23, 2023, 07:37:46 PM
Quote from: guga on July 22, 2023, 10:02:01 AMIt seems harder then i thought. Using roundsd seems more easier, but it don´t have any similar opcode for SSE2. Can it be emulated ?

Dear Guga,

this is a useless fight. You will have difficulties finding a machine with a cpu that doesn't support SSE4.1, and thus cannot use roundsd.

Now if you find that very, very old machine, just use the fpu. I would do that anyway, because it's perfectly suited to make these calculations. If your client complains that it is too slow (relevant only if parallel processing is possible), tell him to buy a new computer. It will be ten times faster than his old machine.

Remember zedd's problems with my SSE 4.2 Instr() implementation? He had that problem because his machine supports "only" SSE4.1.

A machine that does not support roundsd is over 15 years old (->Penryn (https://en.wikipedia.org/wiki/Penryn_(microarchitecture)), Nehalem (https://en.wikipedia.org/wiki/Nehalem_(microarchitecture))).
Title: Re: Equivalence angle conversion in SSE2
Post by: InfiniteLoop on July 24, 2023, 01:11:34 AM
By "equivalence" do you mean finding the angle X % 2*Pi ?
This has already been done.

Y = X % 2Pi
==>
c = Floor(X/2Pi)
==>
Y = X - c*2Pi_Big - c*2Pi_Small

2Pi_Big = 2*Pi - 2*Pi_Small
2Pi_Small =Fractional(2Pi_Infinite_Precision << MANTISSA_BITS) * 2^-MANTISSA_BITS
Title: Re: Equivalence angle conversion in SSE2
Post by: raymond on July 24, 2023, 02:02:38 AM
Quote from: guga on July 23, 2023, 11:08:20 AMSo far, i suceeded to find equivalent angles of weird values such as:
2.45481458182487182147878e304º = 68.1458182487170915º

I know that I'm probably wasting my time, BUT

Unless you are definitely certain than the exact value of the given 2.45481458182487182147878 would continue with another 282 integer 0's, there is no way you can get the modulo 360º without using huge number procedures.

On the other hand, if any of those additional 282 integer digits could be anything else but 0's, stating that its modulo 360º is any specific value is mathematically WRONG. It could be anything between 0 and 359.

The same reasoning would apply to those other huge numbers.
Title: Re: Equivalence angle conversion in SSE2
Post by: daydreamer on July 24, 2023, 06:35:46 PM
If you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 12:10:57 AM
Quote from: raymond on July 24, 2023, 02:02:38 AM
Quote from: guga on July 23, 2023, 11:08:20 AMSo far, i suceeded to find equivalent angles of weird values such as:
2.45481458182487182147878e304º = 68.1458182487170915º

I know that I'm probably wasting my time, BUT

Unless you are definitely certain than the exact value of the given 2.45481458182487182147878 would continue with another 282 integer 0's, there is no way you can get the modulo 360º without using huge number procedures.

On the other hand, if any of those additional 282 integer digits could be anything else but 0's, stating that its modulo 360º is any specific value is mathematically WRONG. It could be anything between 0 and 359.

The same reasoning would apply to those other huge numbers.


Hi Raymond

We may can have the fraction part of any angle using magic numbers. In case of dividing a number by 360, it´s the same as using the magical number 3 oor 9, since 360 can be divided by those numbers.

I found a equation that can retrieve the value of the remainder of such divisions to be used to calculate the equivalent angle. I don´t know yet, how to make it work with SSE on the same functionality as we do for regular asm x86 as in here: https://masm32.com/board/index.php?topic=1906.15

The math equation to calculate tthe remainder of such divisions is described here:
https://codegolf.stackexchange.com/questions/243840/find-the-magic-numbers-to-divide-a-number-without-division?newreg=131c0f9bcfbb4d0ebacf0ef7f4c9c626
(https://i.postimg.cc/MnqFWbPT/Screenshot-2023-07-24-at-10-41-29-Find-the-magic-numbers-to-divide-a-number-without-division.png) (https://postimg.cc/MnqFWbPT)

Any angle can be divided by a magic number of 3 or 9, all we needed to do 1st is divide the original angle by 40 ) if we are using to get the remainder, a magic number of 9, since 360/40 = 9), or divide by 120 if we are using a maic number of 3 (360/120 = 3)

So, HugeNumber/magic divider (either resultant from division by 3 or 9) to we then find the remainder.

Even for huge numbers, no matter what the number is, it will always have a measurable remainder. Of course, we need to take onto acount the limitations of he Real8 values to be stored, because if we input values such as 1.234e19, in fact, we are inputing only a integer 1234e16 and so on. But, even if we consider that after the 14th (or 17th) digit, all the other digits are 0, we still can calculate their values, but in their truncated form. So, we can do things like 9.45445481112182e16 = 94544548111218.2e1. Where the last "2" is the actual remainder. Buut for bigger values,it don´t means we don´t have a remaidner, it means only that for limitations of the 64 bits, all other values above a certain limit will be filled with 0. So, 9.45445481112182e38 = 9454454811121820000000....... Truncating at the 37th dighit, all ohers are naturally 0

Nevertheless, the encoding is also true. When we put a hge number to be stored on a 64 bite register or variable, all fractions exceeding a limit are discarded. Ex: we can´t enconde things like 1445454545487512714581781474847847841784178. They will be truncated at a certain digit (17th or 14th etc).

But all of this, don´t means that we cannot calculaate equivalent angles of huge numbers (even with thei limitations)

For SSE equivalent algorithm, i would need a equivalent of doing things like shr eax 1 to multiply by 2, shr eax 2 to multply by 4 and so on, or also shl eax 1 etc etc, but i can´t make it work with psrldq xmm0 1, psrad xmm0 1 etc

    xorpd xmm1 xmm1
    cvtpd2dq xmm1 xmm1

    ; emulate shr eax 1
    mov eax 1
    CVTSI2SD xmm1 eax ; put it in eax
    cvtpd2dq xmm1 xmm1 ; now we have 00__0000_0000_0000_0000_0000_0000_0000_0001 in the xmm register
    PSLLD xmm1 1  ; shift it left by 1  ; now we have 00__0000_0000_0000_0000_0000_0000_0000_0010 in the xmm register


    CVTPS2PD xmm1 xmm1 ; ??? How to convert back from binary to int (or double) representation ?


For instance, the magic number of a division by 3 in regular x86 is calculated as:
Magic Number = 31 - log(x)/log(2) , where x= divisor

for 64 bits it is:
Magic Number = 63 - log(x)/log(2) , where x= divisor


The equation is described as:

Given an integer d, find a valid tuple (m, s, f) such that the following formula computes the correct value of q for all x:

y = mx >> 32
q = y + ((x - y) >> 1 & -f) >> s

The formula works by first dividing x by 2^32, which is the largest power of 2 that is less than or equal to d. The result of this division is y. Then, the formula subtracts y from x and divides the result by 2. The result of this division is a number between 0 and 2^31 - 1 (or 2^63-1 for 64 bits). The formula then takes the remainder of this division and shifts it to the left by s bits. Finally, the formula adds f to the result and shifts it to the right by s bits.

The value of f is either 0 or 1. If f is 0, then the formula will round the result down to the nearest integer. If f is 1, then the formula will round the result up to the nearest integer.

The value of s is an integer between 0 and 31 (or 0 and 63 for 64 bits). The value of s determines how many bits of precision the formula uses. A larger value of s will result in a more accurate result, but it will also make the formula slower.

The value of m is an integer between 2 and 2^32 - 1 (or 2^64-1). The value of m is used to scale the result of the formula. A larger value of m will result in a larger result, but it will also make the formula less accurate.

The equation tries to find a valid tuple (m, s, f) for a given value of d by using a brute-force search. The code starts with m = 2 and s = 0, and then it increments m and s until it finds a tuple that works.



Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 12:28:15 AM
Quote from: daydreamer on July 24, 2023, 06:35:46 PMIf you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut


Hi daydreamer, for normal x86 (integers), it can be possible, but can we do the same for SSE 2 ? I mean, using magic number division as i explained above ?
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 25, 2023, 12:40:03 AM
Quote from: guga on July 25, 2023, 12:10:57 AMBut all of this, don´t means that we cannot calculaate equivalent angles of huge numbers

How do you know that the number have zeros after represented part? Because in big numbers the angle depends of the part it's not there.


Quote from: raymond on July 24, 2023, 02:02:38 AMI know that I'm probably wasting my time, BUT

:thumbsup:
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 25, 2023, 01:27:18 AM
Quote from: guga on July 25, 2023, 12:28:15 AM
Quote from: daydreamer on July 24, 2023, 06:35:46 PMIf you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut


Hi daydreamer, for normal x86 (integers), it can be possible, but can we do the same for SSE 2 ? I mean, using magic number division as i explained above ?

Almost everything can be done with SIMD instructions, but it's better to test it ;-)

Using the FPU:
8.704100e+02 = 150.4
1.234560e+06 = 120.0
1.234560e+07 = 120.0
1.234560e+08 = 120.0
1.234560e+09 = 120.0
1.234560e+10 = 120.0

Using SIMD instructions, mode 1 (roundsd):
8.704100e+02 = 150.4
1.234560e+06 = 120.0
1.234560e+07 = 120.0
1.234560e+08 = 120.1
1.234560e+09 = 121.0
1.234560e+10 = 129.9

Using SIMD instructions, mode 2 (and 65536):
8.704100e+02 = 150.4
1.234560e+06 = 120.0
1.234560e+07 = 0
1.234560e+08 = 0
1.234560e+09 = 0
1.234560e+10 = 0
Title: Re: Equivalence angle conversion in SSE2
Post by: daydreamer on July 25, 2023, 01:33:20 AM
Quote from: guga on July 25, 2023, 12:28:15 AM
Quote from: daydreamer on July 24, 2023, 06:35:46 PMIf you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut


Hi daydreamer, for normal x86 (integers), it can be possible, but can we do the same for SSE 2 ? I mean, using magic number division as i explained above ?
Hi Guga
64bit shifts =1 bit resolution,xmm 128bit shifts 1byte resolution is the problem
best is create full precision reciprocal numbers before and mulpd 2 angles each time

packed AND also possible with 0ffffh,but if its going to be used afterwards in LUT,it also cost extra opcodes MOVD reg,xmm I doubt there you gain any speed
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 25, 2023, 02:27:11 AM
The problem is the resolution, as shown in reply #44 above (https://masm32.com/board/index.php?msg=121804). The and eax, 0FFFFh does not provide a good resolution.

You can switch to 64-bit code and rax:

include \Masm32\MasmBasic\Res\JBasic.inc    ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
Init        ; OPT_64 1    ; put 0 for 32 bit, 1 for 64 bit assembly; click here for an example with procs
  PrintLine Chr$(13, 10, 10, "This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  movlps xmm7, FP8(12345.6e7)            ; e.g. 870.41
  Print Str$("Original:  %9f\n", xmm7)
  mulsd xmm7, FP8(11930464.711111111111111111)    ; 2^32/360
  cvtsd2si rax, xmm7            ; double to reg32
  mov eax, eax
  cvtsi2sd xmm7, rax            ; reg32 to Scalar Double
  mulsd xmm7, FP8(0.00000008381903171539306640625)        ; rescale to degrees
  Inkey Str$("Converted: %f", xmm7)
EndOfCode

Resolution is a lot better, but still very far from what the FPU can offer:
This program was assembled with UAsm64 in 64-bit format.
Original:  123456000000.000000
Converted: 120.000014

This program was assembled with ml64 in 64-bit format.
Original:  123456000000.000000
Converted: 120.000014
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 05:04:17 AM
Quote from: HSE on July 25, 2023, 12:40:03 AM
Quote from: guga on July 25, 2023, 12:10:57 AMBut all of this, don´t means that we cannot calculaate equivalent angles of huge numbers

How do you know that the number have zeros after represented part? Because in big numbers the angle depends of the part it's not there.


Quote from: raymond on July 24, 2023, 02:02:38 AMI know that I'm probably wasting my time, BUT

:thumbsup:


We don´t. But perhaps, we can extend the equation i said before and store the number in a array of Integer data. So, if we have 100 Dwords on the array, we can use, for example the 1st 80 to represent the integer part and the remainder 20 to represent the fraction. At the end we will have a huge nuumber to feed on this math equation, something bigegr as 1.4456454....e100 (integer part)+14155452e-100
Or something like that.

At least there is a formula to we calculate something, even if it is limited for now.
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 05:05:18 AM
Quote from: jj2007 on July 25, 2023, 02:27:11 AMThe problem is the resolution, as shown in reply #44 above (https://masm32.com/board/index.php?msg=121804). The and eax, 0FFFFh does not provide a good resolution.

You can switch to 64-bit code and rax:

include \Masm32\MasmBasic\Res\JBasic.inc    ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
Init        ; OPT_64 1    ; put 0 for 32 bit, 1 for 64 bit assembly; click here for an example with procs
  PrintLine Chr$(13, 10, 10, "This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  movlps xmm7, FP8(12345.6e7)            ; e.g. 870.41
  Print Str$("Original:  %9f\n", xmm7)
  mulsd xmm7, FP8(11930464.711111111111111111)    ; 2^32/360
  cvtsd2si rax, xmm7            ; double to reg32
  mov eax, eax
  cvtsi2sd xmm7, rax            ; reg32 to Scalar Double
  mulsd xmm7, FP8(0.00000008381903171539306640625)        ; rescale to degrees
  Inkey Str$("Converted: %f", xmm7)
EndOfCode

Resolution is a lot better, but still very far from what the FPU can offer:
This program was assembled with UAsm64 in 64-bit format.
Original:  123456000000.000000
Converted: 120.000014

This program was assembled with ml64 in 64-bit format.
Original:  123456000000.000000
Converted: 120.000014

Great wok. JJ

I´ll take a look onto it. And will try to imlement the necessary SSE4 opcode in rosasm so i can test it better
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 05:10:57 AM
Quote from: daydreamer on July 25, 2023, 01:33:20 AM
Quote from: guga on July 25, 2023, 12:28:15 AM
Quote from: daydreamer on July 24, 2023, 06:35:46 PMIf you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut


Hi daydreamer, for normal x86 (integers), it can be possible, but can we do the same for SSE 2 ? I mean, using magic number division as i explained above ?
Hi Guga
64bit shifts =1 bit resolution,xmm 128bit shifts 1byte resolution is the problem
best is create full precision reciprocal numbers before and mulpd 2 angles each time

packed AND also possible with 0ffffh,but if its going to be used afterwards in LUT,it also cost extra opcodes MOVD reg,xmm I doubt there you gain any speed


Can it be used with the magic numbers in SSE ?
Title: Re: Equivalence angle conversion in SSE2
Post by: raymond on July 25, 2023, 05:23:41 AM
One last try to resolve this misunderstanding.

The original aim of this thread was to attempt to convert large numerical angles to the range of 0-360. In essence, it meant obtaining the remainder of dividing by 360, using whatever means available.

For example, let's assume we want to obtain the remainder of a division by 9. One age-old trick is to add the digits of the number to be divided (and repeatedly add the digits of the sum) to eventually obtain the effective remainder.

Thus the remainder of dividing by 9 a number such as 12345679013 (sum of digits 41) would be 5.
But, trying the same on a number approximated to 1.2345679013e25 would be indeterminate (i.e. TOTALLY MEANINGLESS) because it would depend on which digits would have made up the truncated approximate number.

The same conclusion can be applied to any divisor of approximate huge numbers. It can be totally MEANINGLESS depending on the level of precision required.

Let's see what happens with jj's numbers when we fill some of the truncated digits by only a 1 and do a modulo on a supposedly 'more' exact number using grade-school arithmetic.

12345600000 mod 360 120
12345600001         121
12345600010         130
12345600100         220
12345601000          40
12345610000          40

 :eusa_naughty:
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 25, 2023, 05:36:26 AM
Quote from: guga on July 25, 2023, 05:04:17 AMBut perhaps, we can extend the equation i said before and store the number in a array of Integer data.

:thumbsup: That is to use a Big Number library.
Can't be solved with 64 bits numbers, like we are saying from the beginning (but you don't want to define the problem  :biggrin: ).
Title: Re: Equivalence angle conversion in SSE2
Post by: NoCforMe on July 25, 2023, 06:59:57 AM
OK; when I read this topic all I really see is a huuuge cloud of dust that obscures whatever the original point of the thread was supposed to be. Sum-of-digits remainders, "magic numbers", blah blah blah.

Let me ask some somewhat dumbass questions, because it seems to me that these issues are important but haven't been properly addressed here. A lot of this has to do with what Guga's intentions and requirements are in this project, which are not clear at all:

1. Seems to me that it's been shown that the best precision here would come from using the FPU as opposed to any of those more newfangled methods (SSE, XMM, etc.). Is there any reason why you can't use the FPU? or why you prefer not to use it? Is speed an issue here?

2. Why do you have to worry about such ridiculously large angles that you want to reduce to the range 0-360º Is this just a theoretical possibility that bothers you and you'd like to cover, or would you actually be dealing with numbers of that magnitude?

3. What, exactly, is your application here, if you don't mind explaining it?

Inquiring minds want to know.
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 25, 2023, 07:58:44 AM
Quote from: raymond on July 25, 2023, 05:23:41 AMLet's see what happens with jj's numbers when we fill some of the truncated digits by only a 1 and do a modulo on a supposedly 'more' exact number using grade-school arithmetic.
u
12345600000 mod 360 120
12345600001         121
12345600010         130
12345600100         220
12345601000          40
12345610000          40

Sorry, Ray, I don't get the point here. These are the results that I get with the FPU, but with one exception (a sign change, 220->-140) also with the SIMD method in 64-bit code:
This program was assembled with ml64 in 64-bit format.
12345600000       120
12345600001       121
12345600010       130
12345600100      -140
12345601000        40
12345610000        40

What exactly do you want to point out?
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 25, 2023, 08:08:30 AM
:biggrin:
Quote from: jj2007 on July 25, 2023, 07:58:44 AMI don't get the point here.

One of your numbers was 1.234560e+10. Just an example of to represent imcomplete numbers, you can see.


Quote from: jj2007 on July 25, 2023, 07:58:44 AMbut with one exception (a sign change

Apparently SSE rounding mode must be changed like in FPU:
STMXCSR dword ptr [r8]
mov eax, dword ptr [r8]
and or, 6000h
mov dword ptr [r8], eax
LDMXCSR dword ptr [r8]

I have to test that!
Later: must be store in memory.
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 25, 2023, 08:23:09 AM
Quote from: HSE on July 25, 2023, 08:08:30 AM:biggrin:
Quote from: jj2007 on July 25, 2023, 07:58:44 AMI don't get the point here.

One of your numbers was 1.234560e+10. Just an example of to represent imcomplete numbers, you can see.

Sorry, 1.234560e+10=12345600000, divided by 360, the fraction is 0.33... *360=120, correct. What's the problem?
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 25, 2023, 09:05:26 AM
Quote from: HSE on July 25, 2023, 08:08:30 AMI have to test that!

JJ, I'm failing, JBasic installation is from today.
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 09:31:02 AM
Quote from: NoCforMe on July 25, 2023, 06:59:57 AMOK; when I read this topic all I really see is a huuuge cloud of dust that obscures whatever the original point of the thread was supposed to be. Sum-of-digits remainders, "magic numbers", blah blah blah.

Let me ask some somewhat dumbass questions, because it seems to me that these issues are important but haven't been properly addressed here. A lot of this has to do with what Guga's intentions and requirements are in this project, which are not clear at all:

1. Seems to me that it's been shown that the best precision here would come from using the FPU as opposed to any of those more newfangled methods (SSE, XMM, etc.). Is there any reason why you can't use the FPU? or why you prefer not to use it? Is speed an issue here?

2. Why do you have to worry about such ridiculously large angles that you want to reduce to the range 0-360º Is this just a theoretical possibility that bothers you and you'd like to cover, or would you actually be dealing with numbers of that magnitude?

3. What, exactly, is your application here, if you don't mind explaining it?

Inquiring minds want to know.

I explained earlier. JJ and Siekmanski already found the solution for the particular problem of identifying the equivalent angles within the limits of a Real 8 (As Raymond and others told).

The goal for this was to try to avoid extra checking on a RGB to CieLCH function i´m making that uses tangent functions and may end on weird values if i removed some limits. Those problems where fixed with JJ and Siekmanski solutions.

The other thing is the theorical possibility to find the equivalent angle or whatever angle is inputed, no matter how big it is. So, without the limitations of a Real8 data. To do that, we need only to identify the remainder and multiply by 360. The problem is how to calculate easier this remainder for really huge values ? (Known or even truncated)

As Raymond said, there´s a limit of what we can do in 64 bits numbers. All that exceeds the 17th or 19th digit will be truncated and thus can lead to whatever equivalent angle, because if we truncate we don´t know exacly the total amount of revolutions of that angle.

So, we have 2 possiblities only:

1) Keep limited to the truncated value, so if we calculate 1.4545648e34, whataver is beyong the 34º digit is simply 0 (Also if there´s no other number after the last "8")= 14545648000000000000000000000.000000000000000000000000 and this will end up on a huge integer to we divide by 360 and find the equivalent angle, assuming we are ok with the truncation

2) Assume we know those weird digits and fill it in somehow to properly calculate the equivalent angle, for example creating tables to hold the huge integer value to be calculated

So, there are 2 possibilities to find the equivalent angle of such huge values. And this is ended me to a theorical issue. Can we calculate the remainder of these weird weird weird situations, and thus, retrieve the equivalent angle on a faster way ?

The answer for that seems to be the equation i found earlier at

https://codegolf.stackexchange.com/questions/243840/find-the-magic-numbers-to-divide-a-number-without-division?newreg=131c0f9bcfbb4d0ebacf0ef7f4c9c626

Title: Re: Equivalence angle conversion in SSE2
Post by: NoCforMe on July 25, 2023, 09:34:51 AM
OK, thanks for answer. But you didn't address one thing:

The FPU uses 80-bit numbers; why are you restricting yourself to 64 bits?
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 25, 2023, 10:10:40 AM
Quote from: HSE on July 25, 2023, 08:08:30 AMApparently SSE rounding mode must be changed like in FPU:
LDMXCSR dword ptr [r8]
mov eax, dword ptr [r8]
and or, 6000h
mov dword ptr [r8], eax
STMXCSR dword ptr [r8]

I checked that in x64dbg, it's further down above the xmm regs. So far no success - I can set that register, but results don't change :sad:
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 10:16:25 AM
Quote from: NoCforMe on July 25, 2023, 09:34:51 AMOK, thanks for answer. But you didn't address one thing:

The FPU uses 80-bit numbers; why are you restricting yourself to 64 bits?

JJ´s solution on FPU is elegant, but, seems to lead to some lack of precision at some point.  Using the SSE4 opcode seems to fix this particular problem as well and seems to be faster, but i didn´t implemented yet to check.
Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 25, 2023, 10:20:01 AM
Quote from: jj2007 on July 25, 2023, 10:10:40 AMI can set that register, but results don't change

:biggrin: I can't set register, but happen that result is 220  :rolleyes:

JBasic error is with https://masm32.com/board/index.php?msg=121818

Later:  To set register was the other way ST-LD :biggrin:

File for Masm64 SDK, just in case
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 25, 2023, 10:41:18 AM
Quote from: guga on July 25, 2023, 10:16:25 AMJJ´s solution on FPU is elegant, but, seems to lead to some lack of precision at some point.

Where?
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 11:32:11 AM
Hi JJ

i didn´t tested yet fully with FPU, but wasn´t precision issue you talked about the answer #12 ?

QuoteIf you use the second version of my proggie (which is SIMD, not FPU), you will see that 33.0e17 as input will not work. However, 33.0e10 will indeed work. As HSE rightly noted, the issue is precision.

I didn´t tested after it to see if it was ok about the precision you were talking about
Title: Re: Equivalence angle conversion in SSE2
Post by: NoCforMe on July 25, 2023, 01:36:27 PM
Quote... (which is SIMD, not FPU) ...
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 25, 2023, 02:54:02 PM
Ops, Tks for the clarification,NoCforMe. Sorry, JJ, i missread it.  :azn:
Title: Re: Equivalence angle conversion in SSE2
Post by: daydreamer on July 25, 2023, 05:54:14 PM
 dont need to be  AND 0fffh,it can be use any 2^x-1that works with your problem
Myself used AND 1023 with UV coordinates for drawing 1024*1024 tiled water texture

Title: Re: Equivalence angle conversion in SSE2
Post by: HSE on July 25, 2023, 10:11:11 PM
Quote from: guga on July 25, 2023, 09:31:02 AMto avoid extra checking on a RGB to CieLCH function

Then that is where the error is, and where it must be corrected. Try to manage the error later, inside a simple modulus calculation, is just foolish.

(Somebody must remember me this from time to time  :biggrin: )
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 26, 2023, 06:41:00 PM
I've added a new macro, MbMod(number, divisor) to MasmBasic (http://masm32.com/board/index.php?topic=94.0):

include \masm32\MasmBasic\MasmBasic.inc
TestData    REAL8 870.41, 33.0e17, 1.11116e15, 2.2222e16, 3.3333e17, 4.4444e18,
                  5.5555e19, 6.6666e20, 7.7777e21, 8.8888e22
  Init
  mov esi, offset TestData
  PrintLine CrLf$, "Source        mod 360"
  .Repeat
    Print Str$("%9f", REAL8 ptr [esi]), Str$("\t%4f\n", MbMod(REAL8 ptr [esi], 360))
    add esi, REAL8
  .Until esi>=offset TestData+sizeof TestData
  Inkey
EndOfCode

The data marked in red are those proposed by Guga in the first post.

Output:
Source          mod 360
870.410000      150.4
3.30000000e+18  240.0
1.11116000e+15  200.0
2.22220000e+16  280.0
3.33330000e+17  240.0
4.44440000e+18  200.0
5.55550000e+19  160.0
6.66660000e+20  120.0
7.77770000e+21  1.489e+12
8.88880000e+22  4.517e+11

Note the limit for correct results is about 4.72e21 - you can use the Windows Calculator to verify that, it uses a BigNum lib (kind of...).
Title: Re: Equivalence angle conversion in SSE2
Post by: raymond on July 27, 2023, 04:16:22 AM
Sorry but this is getting TOTALLY ABSURD from a MATHEMATICAL point of view. Might as well use the following as an example:

1.234560e16 mod 360 = 120 for ALL the integers between 123456000000000000 and 12345601000000000.

Using REAL4, The maximum number of significant digits it can offer is 7. The above example source could easily be generated as a REAL4; in fact it can generate numbers up to 3.4e38, but still with only a maximum of 7 significant digits.

The only way a modulo of 360 will make any sense is if the integer portion of the source contains all significant digits. Fractional digits (when the number is displayed without exponents) would have no importance if a precision of +/-1 degree is sufficient.

Furthermore, loading a number generated as a REAL4 onto the FPU to be used for REAL8 or REAL10 computations will NOT improve the precision of results even though those are capable of better accuracy when fed with more accurate data.

Considering REAL8 data, this can be accurate to 15 significant digits AS LONG AS all input data would have been also accurate to at least 15 significant digits. Although REAL8 can handle numbers up to 1.79e308, using it to get a modulo 360 with an accuracy of +/-1 degree would still be limited to 1.0e16 (or less depending on its own accuracy) for the source.
Title: Re: Equivalence angle conversion in SSE2
Post by: jj2007 on July 27, 2023, 06:40:50 AM
Dear Ray,

I stand corrected, you are right, of course. Here is an example that respects the precision limits:

include \masm32\MasmBasic\MasmBasic.inc    ; thread
  SetGlobals fct:REAL8
  Init
  PrintLine CrLf$, "Source              mod 360"
  For_ fct=1234567890000350.0 To 1234567890000366.0 Step 1.0
    Print Str$("%Hf", fct), Str$("\t%4f\n", MbMod(fct, 360)v)
  Next
  Inkey
EndOfCode

Source                mod 360
1234567890000350.0      350.0
1234567890000351.0      351.0
1234567890000352.0      352.0
1234567890000353.0      353.0
1234567890000354.0      354.0
1234567890000355.0      355.0
1234567890000356.0      356.0
1234567890000357.0      357.0
1234567890000358.0      358.0
1234567890000359.0      359.0
1234567890000360.0      0.0
1234567890000361.0      1.0
1234567890000362.0      2.0
1234567890000363.0      3.0
1234567890000364.0      4.0
1234567890000365.0      5.0
1234567890000366.0      6.0
Title: Re: Equivalence angle conversion in SSE2
Post by: raymond on July 27, 2023, 10:11:47 AM
I would agree conditionally with all those results.

(i) All those middle 0's must be considered as significant digits and you now have 16 significant digits for an integer lower than 1.0e19 which can be processed as a REAL10.

(ii) The next question would be which components were used to generate those numbers. Did they all have the equivalent 16 significant digits of accuracy.
      (A multiplication by any integer would be considered as having an infinite number of significant digits, even if it only has a single digit. Addition/subtraction of integers is treated in the same category. A division by any number would be allowed as long as there would be no remainder.)

(iii) Would any component appear to be an integer after being truncated by some other process. (Must remember that this thread was started to process data generated as REAL4!!!!)


Otherwise, I'm glad someone finally saw the light.
Title: Re: Equivalence angle conversion in SSE2
Post by: NoCforMe on July 27, 2023, 12:49:42 PM
One question to the OP, Guga, and I think this touches on the issues that Raymond has raised here:

You never answered this: Is there any reason you don't want to use the FPU (which gives you more precision) than other methods? I'm still puzzled by this.
Title: Re: Equivalence angle conversion in SSE2
Post by: guga on July 28, 2023, 11:53:49 PM
Quote from: raymond on July 27, 2023, 10:11:47 AM
(iii) Would any component appear to be an integer after being truncated by some other process. (Must remember that this thread was started to process data generated as REAL4!!!!)


Otherwise, I'm glad someone finally saw the light.

One correction only. It started as Real8 " R$ = Real8 number", but it can work for Real4 or Tenbyte as well. Your comments are valid and helped to better work on the proper fixes or ways to better represent the equivalent angles (assuming they are valid ones of course)

Hi NoCforMe

I´ll use the FPU TenByte format as well, but i´ll try to check for speed 1st to see which is faster to use when assuuming a Tenbyte data (SSE or regular FPU). Didn´t had time yet to finish the checkings, so i´m using FPU for Real10 untill i finish it.