### Author Topic: REAL16 for statistics and Gamma function  (Read 839 times)

• Regular Member
• Posts: 17
##### REAL16 for statistics and Gamma function
« on: June 18, 2018, 02:44:37 PM »
I was wondering how to do this in MASM.  It is hard to do this in any language because you have to mask it and do a lot of manipulation.
I also think that it is even harder to convert from BINARY FLOATING POINT to decimal notation as it means converting it to a string.

I have an algorithm for calculating Gamma that is currently written in a high level language.  It does not have an accurate answer in REAL8 (double) because of rounding error.  I think that calculating it in REAL16 will reduce the round off error, because I could later convert the result to REAL8 dropping the inaccurate bits.  Not really sure but I know that this will significantly slow down the calculation.

This Gamma function keeps multiplying until the iteration stopped changing the delta or 100 iterations.
Code: [Select]
`        public static double Gamma(double z)        {            double factor = 1.0;            double delta = 1.0;            for (int n = 1; n <= 100 && delta > 0.0; n++)            {                double previous = factor;                factor *= (1.0 / (1.0 + (z / n))) * Math.Pow((1.0 + (1.0 / n)), z);                delta = Math.Abs(factor - previous);            }            return factor / z;        }`
This Gamma function starts from 0 increments by some small number and stops at a fake value of infinity called approxInfinity.
Code: [Select]
`        public static double GammaIntegral(double z, double approxInfinity, double increment)        {            double series = 0.0;            double delta = 1.0;            for (double x = 0; x <= approxInfinity; x += increment)            {                series += (Math.Pow(x, z - 1) * Math.Exp(-x))*increment;            }            return series;        }`
I've passed a few numbers for a single value of z.  Up to the first few decimal points they match, but different parameters yield different results.
I believe that the round off error because of not enough bits in the mantissa is part of the problem.

#### jj2007

• Member
• Posts: 9087
• Assembler is fun ;-)
##### Re: REAL16 for statistics and Gamma function
« Reply #1 on: June 18, 2018, 05:06:43 PM »
If REAL8 is not enough, try REAL10 - that is the native precision of the FPU. Ten bits of precision more.

We have a dozen threads about REAL16 precision, though. Forum search is here. The available REAL16 libraries are much slower than the FPU, of course.

#### nidud

• Member
• Posts: 1618
##### Re: REAL16 for statistics and Gamma function
« Reply #2 on: June 19, 2018, 02:16:18 AM »
The available REAL16 libraries are much slower than the FPU, of course.

Is there any test done to verify this?

I did a simple test using the FMUL function which suggest there may be some overhang using the FPU.

The FPU function used:
Code: [Select]
`    fld   tbyte ptr [rdx]    fld   tbyte ptr [r8]    fmul  st(0),st(1)    fstp  tbyte ptr [rcx]    ret`
The REAL16 function:
Code: [Select]
`    .x64    .model flat    .codeQ_SIGBITS   equ 113Q_EXPBITS   equ 15Q_EXPMASK equ (1 shl Q_EXPBITS) - 1Q_EXPBIAS equ Q_EXPMASK shr 1Q_EXPMAX equ Q_EXPMASK - Q_EXPBIAS    push rsi    push rdi    push rbx    push r12    mov r12,rcx    ;    ; The args are REAL10, so this is converted to REAL16 below.    ;    ; void test( result:ptr, a:ptr REAL10, b:ptr REAL10 );    ;    ;mov rbx,[rdx]    ;shl rbx,16    xor rbx,rbx    ;mov rcx,[rdx+6]    mov rcx,[rdx]    shl rcx,1    ;mov si,[rdx+14]    ; shift out bits 0..112    mov si,[rdx+8]    and esi,Q_EXPMASK   ; if not zero    neg esi             ; - set high bit    ;mov si,[rdx+14]    mov si,[rdx+8]    ;rcr rcx,1    ;rcr rbx,1    shl esi,16    ;mov rax,[r12]    ;shl rax,16    xor rax,rax    ;mov si,[r12+14]    mov si,[r12+8]    and si,Q_EXPMASK    neg si    ;mov si,[r12+14]    mov si,[r12+8]    ;mov rdx,[r12+6]    mov rdx,[r12]    shl rdx,1    ;rcr rdx,1    ;rcr rax,1    .repeat        add si,1            ; add 1 to exponent        jc  er_NaN_A        ; quit if NaN        jo  er_NaN_A        ; ...        add esi,0xFFFF      ; readjust low exponent and inc high word        jc  er_NaN_B        ; quit if NaN        jo  er_NaN_B        ; ...        sub esi,0x10000     ; readjust high exponent        mov rdi,rax         ; A is 0 ?        or  rdi,rdx        .ifz                ; exit if A is 0            add si,si       ; place sign in carry            .break .ifz     ; return 0            rcr si,1        ; restore sign of A        .endif        mov rdi,rbx         ; exit if B is 0        or  rdi,rcx        .ifz            .if !(esi & 0x7FFF0000)                jmp return_0            .endif        .endif        mov edi,esi         ; exponent and sign of A into EDI        rol edi,16          ; shift to top        sar edi,16          ; duplicate sign        sar esi,16          ; ...        and edi,0x80007FFF  ; isolate signs and exponent        and esi,0x80007FFF  ; ...        add esi,edi         ; determine exponent of result and sign        sub si,0x3FFE       ; remove extra bias        .ifnc               ; exponent negative ?            cmp si,0x7FFF   ; overflow ?            ja  overflow        .endif        .ifs si < -64       ; exponent too small ?            dec si            jmp return_si0        .endif        mov r10,rbx        mov r11,rcx        mov r8,rax        mov r9,rcx        mov rcx,rdx        .if !rdx && !r11            mul     r10            xor     r10,r10        .else            mul     r10            mov     rbx,rdx            mov     rdi,rax            mov     rax,rcx            mul     r11            mov     r11,rdx            xchg    r10,rax            mov     rdx,rcx            mul     rdx            add     rbx,rax            adc     r10,rdx            adc     r11,0            mov     rax,r8            mov     rdx,r9            mul     rdx            add     rbx,rax            adc     r10,rdx            adc     r11,0            mov     rdx,rbx            mov     rax,rdi        .endif        mov rdi,rdx        mov rax,r10        mov rdx,r11        test rdx,rdx        .ifns ; if not normalized            shl rdi,1            rcl rax,1            rcl rdx,1            dec si        .endif        shl rdi,1        .ifb            .ifz                .if edi                    stc                .else                    bt eax,0                .endif            .endif            adc rax,0            adc rdx,0            .ifb                rcr rdx,1                rcr rax,1                inc si                cmp si,0x7FFF                jz  overflow            .endif        .endif        or si,si        .ifng            .ifz                and si,0xFF00                inc si            .else                neg si            .endif            movzx ecx,si            shrd rax,rdx,cl            shr rdx,cl            xor si,si        .endif        add esi,esi        rcr si,1    .until 1done:    shl rax,1    rcl rdx,1    shr rax,16    mov [r12],rax    mov [r12+6],rdx    mov [r12+14],si    mov rax,r12    pop r12    pop rbx    pop rdi    pop rsi    retoverflow:return_si0:return_0:return_m0:er_NaN_B:   ; B is a NaN or infinityreturn_B:er_NaN_A:   ; A is a NaN or infinity    jmp done    end`
Well, there could be something wrong with the test I guess but the result is strange.

Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz (AVX2)
----------------------------------------------
-- proc(1)
2903388 cycles, rep(5000), code( 10) 0.asm: mul FPU
98390 cycles, rep(5000), code(442) 1.asm: mul REAL16
-- proc(2)
3469981 cycles, rep(5000), code( 10) 0.asm: mul FPU
83059 cycles, rep(5000), code(442) 1.asm: mul REAL16
-- proc(3)
3023280 cycles, rep(5000), code( 10) 0.asm: mul FPU
75807 cycles, rep(5000), code(442) 1.asm: mul REAL16

total [1 .. 3], 1++
257256 cycles 1.asm: mul REAL16
9396649 cycles 0.asm: mul FPU
hit any key to continue...

#### nidud

• Member
• Posts: 1618
##### Re: REAL16 for statistics and Gamma function
« Reply #3 on: June 19, 2018, 02:37:09 AM »
Well, this will overflow the FPU stack on repeated calls so adding FFREE seems to shift the result a bit
Code: [Select]
`    fld   tbyte ptr [rdx]    fld   tbyte ptr [r8]    fmul  st(0),st(1)    fstp  tbyte ptr [rcx]    ffree st(0)    ffree st(1)    ret`
Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz (AVX2)
----------------------------------------------
-- proc(1)
48684 cycles, rep(5000), code( 14) 0.asm: mul FPU
76272 cycles, rep(5000), code(442) 1.asm: mul REAL16
-- proc(2)
36313 cycles, rep(5000), code( 14) 0.asm: mul FPU
76393 cycles, rep(5000), code(442) 1.asm: mul REAL16
-- proc(3)
36403 cycles, rep(5000), code( 14) 0.asm: mul FPU
81168 cycles, rep(5000), code(442) 1.asm: mul REAL16

total [1 .. 3], 1++
121400 cycles 0.asm: mul FPU
233833 cycles 1.asm: mul REAL16
hit any key to continue...

#### jj2007

• Member
• Posts: 9087
• Assembler is fun ;-)
##### Re: REAL16 for statistics and Gamma function
« Reply #4 on: June 19, 2018, 03:18:49 AM »
Check again:
Code: [Select]
`    fld   tbyte ptr [rdx]    fld   tbyte ptr [r8]    fmul  ; NO: st(0),st(1)    fstp  tbyte ptr [rcx]`
No need for ffree.

#### nidud

• Member
• Posts: 1618
##### Re: REAL16 for statistics and Gamma function
« Reply #5 on: June 19, 2018, 04:14:13 AM »
So what happens then?
fmul clear out st(1) and fstp st(0) ?

The code will most likely use SSE for REAL8 and a similar function will then look something like this.
Code: [Select]
`    movq   xmm0,[rdx]    movq   xmm1,[r8]    mulpd  xmm0,xmm1    movq   [rcx],xmm0    ret`
The cycle count will then ruffly double for each size.
Code: [Select]
`    52738 cycles 2.asm: mul REAL8 (SSE)   134291 cycles 0.asm: mul REAL10 (FPU)   253657 cycles 1.asm: mul REAL16`

#### felipe

• Member
• Posts: 1092
• Eagles are just great!
##### Re: REAL16 for statistics and Gamma function
« Reply #6 on: June 19, 2018, 05:32:19 AM »
So what happens then?
fmul clear out st(1) and fstp st(0) ?

This is from raymond's tutorial on the fpu:
Quote
Note: FMUL without operands can also be used with the MASM assembler but such an instruction is coded as FMULP ST(1),ST(0).

Felipe.

#### felipe

• Member
• Posts: 1092
• Eagles are just great!
##### Re: REAL16 for statistics and Gamma function
« Reply #7 on: June 19, 2018, 05:41:20 AM »
I also think that it is even harder to convert from BINARY FLOATING POINT to decimal notation as it means converting it to a string.

Don't know if you are going to use the real16 or the real10 types, but for real10 this convertion isn't so difficult and there are already functions (binary fp to ascii and viceversa) in the fpulib for this,already available. You can use it as is or you can check the source and make an custom adaption.
Felipe.

#### jj2007

• Member
• Posts: 9087
• Assembler is fun ;-)
##### Re: REAL16 for statistics and Gamma function
« Reply #8 on: June 19, 2018, 08:54:15 AM »
So what happens then?
fmul clear out st(1) and fstp st(0) ?

Yes, more or less. Raymond's tutorial is a great source for that.
In short: fmul stands for "multiply ST(0) with ST(1), discard ST(1) and put the result in ST(0)".

Code: [Select]
`    ; The args are REAL10, so this is converted to REAL16 below.    ;    ; void test( result:ptr, a:ptr REAL10, b:ptr REAL10 );`
result:ptr is what? ptr to REAL16? How do you print that? No criticism, just curious...

#### nidud

• Member
• Posts: 1618
##### Re: REAL16 for statistics and Gamma function
« Reply #9 on: June 19, 2018, 09:50:13 AM »
result:ptr is what? ptr to REAL16?

The test uses the same arguments, so yes, result just points to a 16 byte buffer large enough for both of the values.

arg_1   REAL16 0.0
arg_2   REAL10 2.0
arg_3   REAL10 4.0

Quote
How do you print that?

You have to link with the LIBC library and use the stdio functions. There's a test case in test/libc/math/quadfloat directory for REAL16.

The format strings for REAL8/10/16:
Code: [Select]
`include stdio.incpi equ <3.141592653589793238462643383279502884197169399375105820974945>.datad REAL8  pil REAL10 piq REAL16 pi.codemain proc    printf("%g - %%g\n", d)    printf("%Lg - %%Lg\n", l)    printf("%llg - %%llg\n", q)    printf("%f - %%f\n", d)    printf("%Lf - %%Lf\n", l)    printf("%llf - %%llf\n", q)    printf("%e - %%e\n", d)    printf("%Le - %%Le\n", l)    printf("%lle - %%lle\n", q)    printf("%.20f - %%.20f\n", d)    printf("%.24Lf - %%.24Lf\n", l)    printf("%.34llf - %%.34llf\n", q)    xor eax,eax    retmain endp    end`

#### jj2007

• Member
• Posts: 9087
• Assembler is fun ;-)
##### Re: REAL16 for statistics and Gamma function
« Reply #10 on: June 19, 2018, 05:39:42 PM »
You have to link with the LIBC library and use the stdio functions.
I downloaded the required inc files, it builds fine but
Code: [Select]
`POLINK: error: Unresolved external symbol 'printf'.POLINK: fatal error: 1 unresolved external(s).`
Is there an installer for AsmC that takes care of such problems?

#### AW

• Member
• Posts: 1720
• Let's Make ASM Great Again!
##### Re: REAL16 for statistics and Gamma function
« Reply #11 on: June 19, 2018, 08:20:49 PM »
Of course, people can have a lot of fun arguing whether REAL10 is better than REAL8 or whether REAL16 from software is fast enough.

But for serious stuff, people should use an arbitrary precision library, in this case MPIR is the best alternative for Windows and can be easily used from ASM as I have shown here http://masm32.com/board/index.php?topic=6438.0
All right, it was done for UASM but can be easily converted to MASM.

I made a little change to that code and put it calculating Gamma(3000) and Gamma (1/2)
For people that has no idea, Gamma (3000) is the same as 2999!, a really huge number. And Gamma(1/2) is the same as square root of PI.

Let's see a picture, yes does not fit it all in the capture, we are using 10000 decimal digits precision for the calculation of Gamma (3000) . I selected only 100 decimal digits precision for Gamma (1/2).

MPIR has a function to directly calculate Gamma, that's what I have used, alternatively we could have made the calculation using one of the many existing formulas.
« Last Edit: June 20, 2018, 01:17:38 AM by AW »

#### nidud

• Member
• Posts: 1618
##### Re: REAL16 for statistics and Gamma function
« Reply #12 on: June 20, 2018, 12:50:19 AM »
You have to link with the LIBC library and use the stdio functions.
I downloaded the required inc files, it builds fine but
Code: [Select]
`POLINK: error: Unresolved external symbol 'printf'.POLINK: fatal error: 1 unresolved external(s).`

LIBC is a static library supplied by a specific tool chain so you have to use one with support for real10 and real16. Microsoft don't support this and I don't think Pelles C do either.

Quote
Is there an installer for AsmC that takes care of such problems?

No.

#### jj2007

• Member
• Posts: 9087
• Assembler is fun ;-)
##### Re: REAL16 for statistics and Gamma function
« Reply #13 on: June 20, 2018, 02:31:44 AM »
LIBC is a static library supplied by a specific tool chain

Sure. Is there a download link for libc.lib, or do I have to fight some hours to build a source with another "specific tool chain"?