Author Topic: Large integers and floats  (Read 3404 times)

nidud

  • Member
  • *****
  • Posts: 1614
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #15 on: July 04, 2018, 09:12:24 PM »
so wouldnt this be possible to use for faster computation of PI with loads of decimals,compared to have some almost unlimited Array of numbers other ways?

Yes, it allows computation directly which simplify declarations. The internal calculation must be higher than max storage (REAL10 in MASM) so this is now extended from 96-bit to 128-bit to handle REAL16.

The values based on PI may thus be declared using macros as above or directly:
Code: [Select]
PI    equ 3.141592653589793238462643383279502884197169399375105820974945
...
PI_2  equ PI / 2.0
PI_4  equ PI / 4.0

Right taylor series,but i have todo research first,maybe arcsine where sine pi is
Also Real16 sine would be useful or you already made that nidud?

The assembler allows usage of Add(+), Subtract(-), Multiply(*), and Divide(/) directly which uses REAL16 internally, so it should be possible to construct static functions using macros based on this.

I'm currently implementing a DirectXMath library for testing out the vectorcall implementation which includes passing of arguments, assigning values and return codes
using vectors, so this may clarify how REAL16 values may be used.

The only compiler using REAL16 internally (as far as I know) is the Intel Fortran Compiler in addition to a few open source libraries, so this is more about how Asmc handles floats internally than any practical use, at least for now. This basically simplify the xmm versus pointers/regs using INVVOKE given all REALx values is passed in xmm registers.
Code: [Select]
VECTOR TYPEDEF REAL16

daydreamer

  • Member
  • ****
  • Posts: 557
  • reach for the stars
Re: Large integers and floats
« Reply #16 on: July 05, 2018, 02:11:04 AM »
I have looked at taylor series for trigo
I tested with arcsin with a scientific calculator and I think 2xarcsin (1.0) would work for pi calculation
Sine is a series that include x^2/3! Changing every time sign and increasing x^3/ while increasing 4!,5! Etc
The x! Increases very fast to high integer numbers ,i think 10!,is the number of combinations you can put numbers 1-10 in
I hope raymond or any math skilled would like to explain best way to calculate this function ???
I wonder how many calculations in sine taylor series is needed for real16 precision

Quote from Flashdance
Nick  :  When you give up your dream, you die.
*wears a flameproof asbestos suit*

daydreamer

  • Member
  • ****
  • Posts: 557
  • reach for the stars
Re: Large integers and floats
« Reply #17 on: July 06, 2018, 03:52:57 AM »
I think sine Taylor series calculation,should use constants 1/(3!),1/(5!),1/(7!),1/(9!)... Until it's enough for real16 precision and many divisions will be replaced by faster mul and cache previous x*x*x calculations to be used in x^5,x^7,x^9... Calculations, maybe possible to write parallel code for simultaneously multiply 4 or more of the above constants to take advantage of cpu that can handle execute many mulpd in parallel
I am actually working on coding sine function to get it right first and compare to built in sine function,so I can compare precision vs speed

Now i know what a real16 is:
Real10 is a supermodell woman,10 on 1-10 scale
Real16 is the wife of a man who says "my wife is the most beautiful woman in the World" :biggrin:
« Last Edit: July 07, 2018, 01:53:15 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die.
*wears a flameproof asbestos suit*

nidud

  • Member
  • *****
  • Posts: 1614
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #18 on: September 14, 2018, 11:37:01 PM »
Added REAL2 binary16 support.

http://masm32.com/board/index.php?topic=7394.msg81065#msg81065

This works for declarations of data as used by REAL8/10/16 but invoke as vector is not implemented yet. Arguments is loaded in 16-bit registers but this may change. The following logic applies to size:

    mov ax,1.0          ; 3C00
    mov ax,-65504.0     ; FBFF  Min
    mov ax,65504.0      ; 7BFF  Max
    mov ax,0.0          ; 0000  zero
    mov ax,-0.0         ; 8000 -zero
    mov ax,1.0/0.0      ; 7C00  Inf  1/zero
    mov ax,-1.0/0.0     ; FC00 -Inf  -1/zero
    mov ax,0.0/0.0      ; FFFF  NaN  zero/zero
    mov ax,0.3333333333 ; 3555  1/3

raymond

  • Member
  • **
  • Posts: 218
    • Raymond's page
Re: Large integers and floats
« Reply #19 on: September 15, 2018, 04:27:37 AM »
I hope raymond or any math skilled would like to explain best way to calculate this function ???
I wonder how many calculations in sine taylor series is needed for real16 precision

As for the initial question, the first thing to do is to reduce the input angle to the 1st quadrant and adjust the result later for angles in the other three quadrants. Furthermore, in order to minimize computations, the sine (or cosine) function should only be done for angles in the 0-45 deg range. For angles in the 45-90 deg range, the cosine (or sine) function should be used with the complimentary angle.

As for the latter question, the accuracy of a computation depends on the accuracy of the least accurate of the elements used for the computation. Assuming the 128-bit real16 is made up similarly to a real10 (i.e. 1 bit for the sign, 15 bits for the biased exponent and 112 bits for the mantissa), its equivalent accuracy in base 10 would be 33 digits. The last term expected to be of significance in the series would thus have a factorial with 33 digits (i.e. 31!). With factorials going up by 2 with each term of the series, 15 terms would be required for full accuracy (again assuming the source angle would also have such accuracy).
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com/

nidud

  • Member
  • *****
  • Posts: 1614
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #20 on: September 15, 2018, 09:23:22 PM »
Added INVOKE(REAL2) as float for syscall, vectorcall and fastcall.

The definition of HALF and test case:

DirectXPackedVector.inc

 - XMConvertHalfToFloat(HALF)
 - XMConvertFloatToHalf(float)
 * PackedVector.s

nidud

  • Member
  • *****
  • Posts: 1614
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #21 on: September 16, 2018, 12:38:56 AM »
Added the streamed versions of the conversion.
Code: [Select]
include DirectXPackedVector.inc

    .code

XMConvertHalfToFloatStream proc uses rsi rdi rbx pOutputStream:ptr float, OutputStride:size_t,
        pInputStream:ptr HALF, InputStride:size_t, HalfCount:size_t

    .assert(pOutputStream)
    .assert(pInputStream)
    .assert(InputStride >= sizeof(HALF))
    .assert(OutputStride >= sizeof(float))

    .for (rsi = r8, rdi = rcx, ebx = 0: rbx < HalfCount: ebx++)

        XMConvertHalfToFloat([rsi])
        mov [rdi],eax
        add rsi,InputStride
        add rdi,OutputStride
    .endf

    mov rax,pOutputStream
    ret

XMConvertHalfToFloatStream endp

    end
So, mostly about storage.

nidud

  • Member
  • *****
  • Posts: 1614
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #22 on: September 19, 2018, 10:17:03 PM »
Added support for invoke(_int128) in syscall. OWORD will span registers and REAL16 (and floats) will use xmm.

    __int128    typedef oword
    __float128  typedef real16

Three 128-bit register arguments may be used:

p1  proc syscall a1:__int128, a2:__int128, a3:__int128

    mov rax,a1 ; RDI: high64 in RSI
    mov rax,a2 ; RDX: high64 in RCX
    mov rax,a3 ; R8:  high64 in R9
    ret
p1  endp


In addition to six 128-bit xmm arguments:
p2  proc syscall a1:__int128, a2:__int128, a3:__int128, a4:__float128, a5:__float128, a6:__float128

    p1(a1, a2, a3) ; no params set
    movaps xmm0,a4 ; xmm0
    movaps xmm0,a5 ; xmm1
    movaps xmm0,a6 ; xmm2
    ret
p2  endp


Spanning of args, const and memory. Negative values extends to HIGH64 if sign are used and value not zero:

    .data
    x __int128 0
...
    p1( 0, 1, 2 )
*    mov r8d, 2
*    xor r9d, r9d
*    mov edx, 1
*    xor ecx, ecx
*    xor edi, edi
*    xor esi, esi
    p1( 0x0000000000000000FFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF0000000000000000,
        0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF )
*    mov r8, low64 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
*    mov r9, high64 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
*    xor edx, edx
*    mov rcx, high64 0xFFFFFFFFFFFFFFFF0000000000000000
*    mov rdi, low64 0x0000000000000000FFFFFFFFFFFFFFFF
*    xor esi, esi
    p1( -0, -1, x )
*    mov r8, qword ptr x
*    mov r9, qword ptr x[8]
*    mov rdx, low64 -1
*    mov rcx, -1
*    xor edi, edi
*    xor esi, esi

nidud

  • Member
  • *****
  • Posts: 1614
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #23 on: September 24, 2018, 06:48:47 AM »
Added support for double::register arguments in 64-bit.

syscall:

p1  proto syscall :oword, :oword, :oword

p2  proc syscall a1:oword, a2:oword, a3:oword
    p1(rsi::rdi, rcx::rdx, r9::r8)  ; no params set
    p1(rsi::rax, rcx::rdx, r9::r8)  ; rdi
    p1(rax::rax, rax::rax, rax::rax); all
    ret
p2  endp

   0:   55                      push   rbp
   1:   48 8b ec                mov    rbp,rsp
   4:   e8 21 00 00 00          call   0x2a
   9:   48 8b f8                mov    rdi,rax
   c:   e8 19 00 00 00          call   0x2a
  11:   4c 8b c0                mov    r8,rax
  14:   4c 8b c8                mov    r9,rax
  17:   48 8b d0                mov    rdx,rax
  1a:   48 8b c8                mov    rcx,rax
  1d:   48 8b f8                mov    rdi,rax
  20:   48 8b f0                mov    rsi,rax
  23:   e8 02 00 00 00          call   0x2a
  28:   c9                      leave 
  29:   c3                      ret   

fastcall:

p1  proto a1:dword
p2  proto a1:oword, a2:oword

p3  proc 1:oword, a2:oword
    p2(rdx::rcx, r9::r8)    ; no params set
    p2(p1(0), r9::r8)       ; rcx
    p2(rax::rcx, r9::r8)    ; rdx
    p2(rax::rax, r11::r10)  ; all
    ret
p3  endp

   0:   55                      push   rbp
   1:   48 8b ec                mov    rbp,rsp
   4:   48 83 ec 20             sub    rsp,0x20
   8:   e8 34 00 00 00          call   0x41
   d:   33 c9                   xor    ecx,ecx
   f:   e8 23 00 00 00          call   0x37
  14:   48 8b c8                mov    rcx,rax
  17:   e8 25 00 00 00          call   0x41
  1c:   48 8b d0                mov    rdx,rax
  1f:   e8 1d 00 00 00          call   0x41
  24:   48 8b c8                mov    rcx,rax
  27:   48 8b d0                mov    rdx,rax
  2a:   4d 8b c2                mov    r8,r10
  2d:   4d 8b cb                mov    r9,r11
  30:   e8 0c 00 00 00          call   0x41
  35:   c9                      leave 
  36:   c3                      ret