Equivalence angle conversion in SSE2

daydreamer · July 25, 2023, 01:33:20 AM

Quote from: guga on July 25, 2023, 12:28:15 AM
Quote from: daydreamer on July 24, 2023, 06:35:46 PMIf you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut

Hi daydreamer, for normal x86 (integers), it can be possible, but can we do the same for SSE 2 ? I mean, using magic number division as i explained above ?

Hi Guga
64bit shifts =1 bit resolution,xmm 128bit shifts 1byte resolution is the problem
best is create full precision reciprocal numbers before and mulpd 2 angles each time

packed AND also possible with 0ffffh,but if its going to be used afterwards in LUT,it also cost extra opcodes MOVD reg,xmm I doubt there you gain any speed

jj2007 · July 25, 2023, 02:27:11 AM

The problem is the resolution, as shown in reply #44 above. The and eax, 0FFFFh does not provide a good resolution.

You can switch to 64-bit code and rax:

Code Select

include \Masm32\MasmBasic\Res\JBasic.inc    ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
Init        ; OPT_64 1    ; put 0 for 32 bit, 1 for 64 bit assembly; click here for an example with procs
  PrintLine Chr$(13, 10, 10, "This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  movlps xmm7, FP8(12345.6e7)            ; e.g. 870.41
  Print Str$("Original:  %9f\n", xmm7)
  mulsd xmm7, FP8(11930464.711111111111111111)    ; 2^32/360
  cvtsd2si rax, xmm7            ; double to reg32
  mov eax, eax
  cvtsi2sd xmm7, rax            ; reg32 to Scalar Double
  mulsd xmm7, FP8(0.00000008381903171539306640625)        ; rescale to degrees
  Inkey Str$("Converted: %f", xmm7)
EndOfCode

Resolution is a lot better, but still very far from what the FPU can offer:

Code Select

This program was assembled with UAsm64 in 64-bit format.
Original:  123456000000.000000
Converted: 120.000014

This program was assembled with ml64 in 64-bit format.
Original:  123456000000.000000
Converted: 120.000014

guga · July 25, 2023, 05:04:17 AM

Quote from: HSE on July 25, 2023, 12:40:03 AM
Quote from: guga on July 25, 2023, 12:10:57 AMBut all of this, don´t means that we cannot calculaate equivalent angles of huge numbers

How do you know that the number have zeros after represented part? Because in big numbers the angle depends of the part it's not there.

Quote from: raymond on July 24, 2023, 02:02:38 AMI know that I'm probably wasting my time, BUT

We don´t. But perhaps, we can extend the equation i said before and store the number in a array of Integer data. So, if we have 100 Dwords on the array, we can use, for example the 1st 80 to represent the integer part and the remainder 20 to represent the fraction. At the end we will have a huge nuumber to feed on this math equation, something bigegr as 1.4456454....e100 (integer part)+14155452e-100
Or something like that.

At least there is a formula to we calculate something, even if it is limited for now.

guga · July 25, 2023, 05:05:18 AM

Quote from: jj2007 on July 25, 2023, 02:27:11 AMThe problem is the resolution, as shown in reply #44 above. The and eax, 0FFFFh does not provide a good resolution.

You can switch to 64-bit code and rax:

Code Select Expand
include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ## Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly; click here for an example with procs PrintLine Chr$(13, 10, 10, "This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.") movlps xmm7, FP8(12345.6e7) ; e.g. 870.41 Print Str$("Original: %9f\n", xmm7) mulsd xmm7, FP8(11930464.711111111111111111) ; 2^32/360 cvtsd2si rax, xmm7 ; double to reg32 mov eax, eax cvtsi2sd xmm7, rax ; reg32 to Scalar Double mulsd xmm7, FP8(0.00000008381903171539306640625) ; rescale to degrees Inkey Str$("Converted: %f", xmm7) EndOfCode
Resolution is a lot better, but still very far from what the FPU can offer:
Code Select Expand
This program was assembled with UAsm64 in 64-bit format. Original: 123456000000.000000 Converted: 120.000014 This program was assembled with ml64 in 64-bit format. Original: 123456000000.000000 Converted: 120.000014

Great wok. JJ

I´ll take a look onto it. And will try to imlement the necessary SSE4 opcode in rosasm so i can test it better

guga · July 25, 2023, 05:10:57 AM

Quote from: daydreamer on July 25, 2023, 01:33:20 AM
Quote from: guga on July 25, 2023, 12:28:15 AM
Quote from: daydreamer on July 24, 2023, 06:35:46 PMIf you use convert to 65536 instead of 360 ,and ebx,0fffh is faster than modulo
And can use ebx pointing to atan lut

Hi daydreamer, for normal x86 (integers), it can be possible, but can we do the same for SSE 2 ? I mean, using magic number division as i explained above ?
Hi Guga
64bit shifts =1 bit resolution,xmm 128bit shifts 1byte resolution is the problem
best is create full precision reciprocal numbers before and mulpd 2 angles each time

packed AND also possible with 0ffffh,but if its going to be used afterwards in LUT,it also cost extra opcodes MOVD reg,xmm I doubt there you gain any speed

Can it be used with the magic numbers in SSE ?

raymond · July 25, 2023, 05:23:41 AM

One last try to resolve this misunderstanding.

The original aim of this thread was to attempt to convert large numerical angles to the range of 0-360. In essence, it meant obtaining the remainder of dividing by 360, using whatever means available.

For example, let's assume we want to obtain the remainder of a division by 9. One age-old trick is to add the digits of the number to be divided (and repeatedly add the digits of the sum) to eventually obtain the effective remainder.

Thus the remainder of dividing by 9 a number such as 12345679013 (sum of digits 41) would be 5.
But, trying the same on a number approximated to 1.2345679013e25 would be indeterminate (i.e. TOTALLY MEANINGLESS) because it would depend on which digits would have made up the truncated approximate number.

The same conclusion can be applied to any divisor of approximate huge numbers. It can be totally MEANINGLESS depending on the level of precision required.

Let's see what happens with jj's numbers when we fill some of the truncated digits by only a 1 and do a modulo on a supposedly 'more' exact number using grade-school arithmetic.

12345600000 mod 360 120
12345600001 121
12345600010 130
12345600100 220
12345601000 40
12345610000 40

HSE · July 25, 2023, 05:36:26 AM

Quote from: guga on July 25, 2023, 05:04:17 AMBut perhaps, we can extend the equation i said before and store the number in a array of Integer data.

That is to use a Big Number library.
Can't be solved with 64 bits numbers, like we are saying from the beginning (but you don't want to define the problem

).

NoCforMe · July 25, 2023, 06:59:57 AM

OK; when I read this topic all I really see is a huuuge cloud of dust that obscures whatever the original point of the thread was supposed to be. Sum-of-digits remainders, "magic numbers", blah blah blah.

Let me ask some somewhat dumbass questions, because it seems to me that these issues are important but haven't been properly addressed here. A lot of this has to do with what Guga's intentions and requirements are in this project, which are not clear at all:

1. Seems to me that it's been shown that the best precision here would come from using the FPU as opposed to any of those more newfangled methods (SSE, XMM, etc.). Is there any reason why you can't use the FPU? or why you prefer not to use it? Is speed an issue here?

2. Why do you have to worry about such ridiculously large angles that you want to reduce to the range 0-360º Is this just a theoretical possibility that bothers you and you'd like to cover, or would you actually be dealing with numbers of that magnitude?

3. What, exactly, is your application here, if you don't mind explaining it?

Inquiring minds want to know.

jj2007 · July 25, 2023, 07:58:44 AM

Quote from: raymond on July 25, 2023, 05:23:41 AMLet's see what happens with jj's numbers when we fill some of the truncated digits by only a 1 and do a modulo on a supposedly 'more' exact number using grade-school arithmetic.
u
12345600000 mod 360 120
12345600001 121
12345600010 130
12345600100 220
12345601000 40
12345610000 40

Sorry, Ray, I don't get the point here. These are the results that I get with the FPU, but with one exception (a sign change, 220->-140) also with the SIMD method in 64-bit code:

Code Select

This program was assembled with ml64 in 64-bit format.
12345600000       120
12345600001       121
12345600010       130
12345600100      -140
12345601000        40
12345610000        40

What exactly do you want to point out?

HSE · July 25, 2023, 08:08:30 AM

Quote from: jj2007 on July 25, 2023, 07:58:44 AMI don't get the point here.

One of your numbers was 1.234560e+10. Just an example of to represent imcomplete numbers, you can see.

Quote from: jj2007 on July 25, 2023, 07:58:44 AMbut with one exception (a sign change

Apparently SSE rounding mode must be changed like in FPU:

Code Select

STMXCSR dword ptr [r8]
mov eax, dword ptr [r8]
and or, 6000h
mov dword ptr [r8], eax
LDMXCSR dword ptr [r8]

I have to test that!
Later: must be store in memory.

jj2007 · July 25, 2023, 08:23:09 AM

Quote from: HSE on July 25, 2023, 08:08:30 AM
Quote from: jj2007 on July 25, 2023, 07:58:44 AMI don't get the point here.

One of your numbers was 1.234560e+10. Just an example of to represent imcomplete numbers, you can see.

Sorry, 1.234560e+10=12345600000, divided by 360, the fraction is 0.33... *360=120, correct. What's the problem?

HSE · July 25, 2023, 09:05:26 AM

Quote from: HSE on July 25, 2023, 08:08:30 AMI have to test that!

JJ, I'm failing, JBasic installation is from today.

guga · July 25, 2023, 09:31:02 AM

Quote from: NoCforMe on July 25, 2023, 06:59:57 AMOK; when I read this topic all I really see is a huuuge cloud of dust that obscures whatever the original point of the thread was supposed to be. Sum-of-digits remainders, "magic numbers", blah blah blah.

Let me ask some somewhat dumbass questions, because it seems to me that these issues are important but haven't been properly addressed here. A lot of this has to do with what Guga's intentions and requirements are in this project, which are not clear at all:

1. Seems to me that it's been shown that the best precision here would come from using the FPU as opposed to any of those more newfangled methods (SSE, XMM, etc.). Is there any reason why you can't use the FPU? or why you prefer not to use it? Is speed an issue here?

2. Why do you have to worry about such ridiculously large angles that you want to reduce to the range 0-360º Is this just a theoretical possibility that bothers you and you'd like to cover, or would you actually be dealing with numbers of that magnitude?

3. What, exactly, is your application here, if you don't mind explaining it?

Inquiring minds want to know.

I explained earlier. JJ and Siekmanski already found the solution for the particular problem of identifying the equivalent angles within the limits of a Real 8 (As Raymond and others told).

The goal for this was to try to avoid extra checking on a RGB to CieLCH function i´m making that uses tangent functions and may end on weird values if i removed some limits. Those problems where fixed with JJ and Siekmanski solutions.

The other thing is the theorical possibility to find the equivalent angle or whatever angle is inputed, no matter how big it is. So, without the limitations of a Real8 data. To do that, we need only to identify the remainder and multiply by 360. The problem is how to calculate easier this remainder for really huge values ? (Known or even truncated)

As Raymond said, there´s a limit of what we can do in 64 bits numbers. All that exceeds the 17th or 19th digit will be truncated and thus can lead to whatever equivalent angle, because if we truncate we don´t know exacly the total amount of revolutions of that angle.

So, we have 2 possiblities only:

1) Keep limited to the truncated value, so if we calculate 1.4545648e34, whataver is beyong the 34º digit is simply 0 (Also if there´s no other number after the last "8")= 14545648000000000000000000000.000000000000000000000000 and this will end up on a huge integer to we divide by 360 and find the equivalent angle, assuming we are ok with the truncation

2) Assume we know those weird digits and fill it in somehow to properly calculate the equivalent angle, for example creating tables to hold the huge integer value to be calculated

So, there are 2 possibilities to find the equivalent angle of such huge values. And this is ended me to a theorical issue. Can we calculate the remainder of these weird weird weird situations, and thus, retrieve the equivalent angle on a faster way ?

The answer for that seems to be the equation i found earlier at

https://codegolf.stackexchange.com/questions/243840/find-the-magic-numbers-to-divide-a-number-without-division?newreg=131c0f9bcfbb4d0ebacf0ef7f4c9c626

NoCforMe · July 25, 2023, 09:34:51 AM

OK, thanks for answer. But you didn't address one thing:

The FPU uses 80-bit numbers; why are you restricting yourself to 64 bits?

jj2007 · July 25, 2023, 10:10:40 AM

Quote from: HSE on July 25, 2023, 08:08:30 AMApparently SSE rounding mode must be changed like in FPU:
Code Select Expand
LDMXCSR dword ptr [r8] mov eax, dword ptr [r8] and or, 6000h mov dword ptr [r8], eax STMXCSR dword ptr [r8]

I checked that in x64dbg, it's further down above the xmm regs. So far no success - I can set that register, but results don't change

The MASM Forum

News:

Equivalence angle conversion in SSE2

daydreamer

jj2007

guga

guga

guga

raymond

HSE

NoCforMe

jj2007

HSE

jj2007

HSE

guga

NoCforMe

jj2007