Print Page - 128-bit integer arithmetic using FP

Title: 128-bit integer arithmetic using FP
Post by: jack on March 27, 2025, 12:29:22 AM

good day all :)
for a long time I had wished that my favorite Basic Compiler would have 128-bit integers with the associated arithmetic and was wondering about an efficient cross-platform implementation, either in FreeBasic or in C
you could use inline assembler but then you would have to deal with each different architecture and operating system, so a high level approach would be easier.
integer 64-bit division is quite slow and was wondering whether using floating-point would be faster, I was thinking of using triple-double floating point as described in the crlibm GitHub repository (https://github.com/taschini/crlibm) and in the pdf (https://ens-lyon.hal.science/ensl-01529804v1/document)
what are your thoughts?

one troublesome detail is the conversion to/from int128 and triple-double
the pdf file doesn't address division but the file triple-double.h has a reciprocal function which looks quite complex
<edit> forgot to mention that it needs to work in 32 and 64-bits

Title: Re: 128-bit integer arithmetic using FP
Post by: jj2007 on March 27, 2025, 01:54:53 AM

See QuadMath (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1453). Requires libquadmath.a

Title: Re: 128-bit integer arithmetic using FP
Post by: TimoVJL on March 27, 2025, 02:18:18 AM

https://www.freebasic.net/forum/viewtopic.php?t=19208 (https://www.freebasic.net/forum/viewtopic.php?t=19208)

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 27, 2025, 02:22:50 AM

jj, I thought about that but as far as I can tell QuadMath is FP only, I am interested in 128-bit integer only, even if implemented using FP
QuadMath would be able to hold an integer of about 112-bits but with a ton of unwanted luggage
here is a possible solution UInt128-Division-Modulus (https://www.codeproject.com/Tips/785014/UInt128-Division-Modulus)
even though the license is permissive, the ideal license would be public domain

Title: Re: 128-bit integer arithmetic using FP
Post by: jj2007 on March 27, 2025, 03:56:08 AM

Quote from: jack on March 27, 2025, 02:22:50 AMas far as I can tell QuadMath is FP only

GccInt.pdf (https://gcc.gnu.org/onlinedocs/gcc-10.5.0/gccint.pdf)

4.1 Routines for integer arithmetic
The integer arithmetic routines are used on platforms that don't provide hardware support
for arithmetic operations on some modes.
4.1.1 Arithmetic functions
[Runtime Function]
int __ashlsi3 (int a, int b)
[Runtime Function]
long __ashldi3 (long a, int b)
[Runtime Function]
long long __ashlti3 (long long a, int b)
These functions return the result of shifting a left by b bits.
[Runtime Function]
int __ashrsi3 (int a, int b)
[Runtime Function]
long __ashrdi3 (long a, int b)
[Runtime Function]
long long __ashrti3 (long long a, int b)
These functions return the result of arithmetically shifting a right by b bits.
[Runtime Function]
int __divsi3 (int a, int b)
[Runtime Function]
long __divdi3 (long a, long b)
[Runtime Function]
long long __divti3 (long long a, long long b)
These functions return the quotient of the signed division of a and b.
[Runtime Function]
int __lshrsi3 (int a, int b)
[Runtime Function]
long __lshrdi3 (long a, int b)
[Runtime Function]
long long __lshrti3 (long long a, int b)
These functions return the result of logically shifting a right by b bits.
10
GNU Compiler Collection (GCC) Internals
[Runtime Function]
int __modsi3 (int a, int b)
[Runtime Function]
long __moddi3 (long a, long b)
[Runtime Function]
long long __modti3 (long long a, long long b)
These functions return the remainder of the signed division of a and b.
[Runtime Function]
int __mulsi3 (int a, int b)
[Runtime Function]
long __muldi3 (long a, long b)
[Runtime Function]
long long __multi3 (long long a, long long b)
These functions return the product of a and b.
[Runtime Function]
long __negdi2 (long a)
[Runtime Function]
long long __negti2 (long long a)
These functions return the negation of a.
[Runtime Function]
unsigned int __udivsi3 (unsigned int a, unsigned int b)
[Runtime Function]
unsigned long __udivdi3 (unsigned long a, unsigned long b)
[Runtime Function]
unsigned long long __udivti3 (unsigned long long a,
unsigned long long b)
These functions return the quotient of the unsigned division of a and b.
[Runtime Function]
unsigned long __udivmoddi4 (unsigned long a, unsigned long
b, unsigned long *c)
[Runtime Function]
unsigned long long __udivmodti4 (unsigned long long a,
unsigned long long b, unsigned long long *c)
These functions calculate both the quotient and remainder of the unsigned division
of a and b. The return value is the quotient, and the remainder is placed in variable
pointed to by c.
[Runtime Function]
unsigned int __umodsi3 (unsigned int a, unsigned int b)
[Runtime Function]
unsigned long __umoddi3 (unsigned long a, unsigned long b)
[Runtime Function]
unsigned long long __umodti3 (unsigned long long a,
unsigned long long b)
These functions return the remainder of the unsigned division of a and b.

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 27, 2025, 05:22:46 AM

jj, none of those functions deal with 128-bit integers, so it's irrelevant

Title: Re: 128-bit integer arithmetic using FP
Post by: Biterider on March 27, 2025, 10:28:40 AM

Hi Jack
Check this link https://masm32.com/board/index.php?topic=11801.msg128168#msg128168 (https://masm32.com/board/index.php?topic=11801.msg128168#msg128168)
Maybe this can help you.

Regards, Biterider

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 27, 2025, 07:56:23 PM

thanks Biterider :smiley:
I will have a look as soon as I wake up, haven't had my coffee yet ☕

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 27, 2025, 11:56:15 PM

hello Biterider
as a preliminary test, I adapted your uoooDiv procedure to FreeBasic and it runs OK, but a lot more testing needs to be done
I sent you my mod in a PM

Code Select

	dim as uint128 N = (&hFFFFFFFFFFFFFFFF, &h2222222222222222)
	dim as uint128 D = (&h0000000000000FF1, &h1111111111111111)
	dim as uint128 Q
	uoooDiv(Q, N, D)
	print128("Dividend (N)", N)
	print128("Divisor (D)", D)
	print128("Quotient (Q)", Q)
'output
'Dividend (N): 0xFFFFFFFFFFFFFFFF2222222222222222
'Divisor (D): 0x0000000000000FF11111111111111111
'Quotient (Q): 0x000000000000000000100EFCEC0F85F4

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 28, 2025, 12:38:16 AM

I ran that division in a loop of 1000000 and it took 0.031 seconds, the C++ code compiled with -O3 took .08 seconds, that's about 2.58 times slower
the C++ code returns the remainder along with the quotient so that will slow it down some

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 29, 2025, 01:37:12 AM

I managed to adapt Biterider's oword assembler multiply and divide routines to FreeBasic
an unexpected result in timing the high-level multiply vs the asm version, with gcc backend and with optimization level of -O3

Code Select

multiply n times d 100000000 times in a loop
a sum of the product is kept and printed out
in order to prevent gcc from eliminating the loop
and thereby making the timing meaningless

assembler multiply
n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152
elapsed time seconds  0.296251199994003

high-level multiply
n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152
elapsed time seconds  0.0651214999961667

here's the whole enchilada if you want to try run it
the assembler routines are by Biterider (https://masm32.com/board/index.php?topic=11801.msg128168#msg128168)

Title: Re: 128-bit integer arithmetic using FP
Post by: zedd on March 29, 2025, 01:42:41 AM

Quote from: jack on March 29, 2025, 01:37:12 AMhere's the whole enchilada if you want to try run it

Doh! :undecided: My bad...

I expected that there was a Windows executable in the attachment to test. :biggrin:
I wanted to see how my i7 performed on this.

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 29, 2025, 02:39:36 AM

I try to avoid binaries but here it is

sorry, I forgot to zero the sum tor the second test, please re-download

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 29, 2025, 02:43:29 AM

but I suspect that gcc optimized the multiplications somehow, I mean beyond function optimization, am thinking that it probably evaluated the multiplication only once

Title: Re: 128-bit integer arithmetic using FP
Post by: zedd on March 29, 2025, 02:58:19 AM

Quote from: jack on March 29, 2025, 02:39:36 AMI try to avoid binaries but here it is

Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz 3.60 GHz

Code Select

multiply n times d 100000000 times in a loop
a sum of the product is kept and printed out
in order to prevent gcc from eliminating the loop
and thereby making the timing meaningless

n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152

assembler version
elapsed time seconds  0.337129999999888

n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 226854911283331164773060472524885474304

high level version
elapsed time seconds  0.072414799942635

:smiley:

Yes, the C optimisations add a great improvement in performance.
How does the unoptimsed C version fare compared to the FreeBasic version?

But wait, something is amiss.
'sum of product = 226854911283331164773060472524885474304' - for the high level version in my test?
double what it should be.

and 'sum of product = 113427455641665582386530236262442737152' - for the high level version in your test.

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 29, 2025, 03:01:54 AM

un-optimized times

Code Select

n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152
assembler version
elapsed time seconds  0.642031399998814

n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152
high level version
elapsed time seconds  2.85223180000321

Title: Re: 128-bit integer arithmetic using FP
Post by: jack on March 29, 2025, 03:06:22 AM

Quote from: zedd151 on March 29, 2025, 02:58:19 AMBut wait, something is amiss.
'sum of product = 226854911283331164773060472524885474304' - for the high level version in my test?

and 'sum of product = 113427455641665582386530236262442737152' - for the high level version in your test.

I had forgotten to zero out the sum for the second test, so it just kept adding, it's fixed now if you want to download the binary

Title: Re: 128-bit integer arithmetic using FP
Post by: zedd on March 29, 2025, 03:09:03 AM

Quote from: jack on March 29, 2025, 03:06:22 AMI had forgotten to zero out the sum for the second test, so it just kept adding, it's fixed now if you want to download the binary

:biggrin:

Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz 3.60 GHz

Code Select

multiply n times d 100000000 times in a loop
a sum of the product is kept and printed out
in order to prevent gcc from eliminating the loop
and thereby making the timing meaningless

n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152
assembler version
elapsed time seconds  0.338004100020044

n = 340282366920938463447387429234553266722
d = 18446744073709551615
product = 45370982256125128475310893311622766046
sum of product = 113427455641665582386530236262442737152
high level version
elapsed time seconds  0.0723988000536337

press return to end

perfect. :azn:
Side note: A humongous difference between unoptimised C version and optimized C version. Biterider might have to tweak his code (or you may need to tweak your conversion) to be a contender. :tongue:

These huge numbers remind me of my experiments with my 'ascii_adder' routine. It added two humongous ascii decimal strings of any length together, outputting an ascii string on return. May be good for working with the U.S. national debt? :biggrin:

The MASM Forum

Miscellaneous => The Orphanage => Topic started by: jack on March 27, 2025, 12:29:22 AM