Historical background of the high-accuracy floating point technique

Gunther · July 25, 2012, 10:43:38 AM

Yesterday I've started a topic; http://masm32.com/board/index.php?topic=487.msg3698#msg3698. Here is some technical and historical explanation about that research field.

First of all: The basic idea behind the software is an extra long dot product accumulator, which can accumulate partial results without catastrophic cancellation in scientific computations that can quickly lead to completely inaccurate results. The thread above shows one of these examples, but also how to avoid such errors. The image below shows the principle with five bit numbers.

But do we really need such techniques? The only possible answer is: yes. There is, for example, the ASCI program; ASCI stands for Advanced Strategic Computing Initiative with the following goal:

QuoteIt aims to replace physical nuclear weapons testing with computer simulations.

There is a critical report by John Gustafson, US-Department of Energy, with the title: "Computational Verifyability and Feasibility of the ASCI Program". The author explains that the confidence in usual calculations combined with error-prone software could lead to a nuclear disaster. Moreover he writes:

QuoteThis forces us to a very different style of computing, one with which very few people have much experience, where results must come with guarantees. This is a new kind of science ....

That's for sure. One single false operation can crash the entire calculation.

Since 2002 we've discussions about the revision of the floating point standard. I've the copy of a letter (from September 2004) to Bob Davis, Chairman of the IEEE Microprocessor Standards Committee; authors are, besides others, Ulrich Kulisch and William Kahan (the father of floating point arithmetic) https://en.wikipedia.org/wiki/William_Kahan. Here is one quote:

Quote
We think that the tremendous progress in computer technology and the great increase in computer speed should be accompanied by extensions of the mathematical capacity of the computer. Beyond what has already been done by IEEE754R, IFIP WG 2.5 expresses its desire that the following two requirements are included in the future arithmetic standard.

For the data format double precision, interval arithmetic should be made available at the speed of simple floating-point arithmetic. Most processors on the market are equipped with arithmetic for multimedia applications. On these processors we believe that it is likely that only 0.1% more silicon in the arithmetic circuitry would suffice to realise this capability.
High speed arithmetic for dynamic precision should be made available for real and for interval data. The basic tool to achieve high speed dynamic precision arithmetic for real and interval data is an exact multiply and accumulate operation for the data format double precision.

Both authors are nearly 80 years old, but they have seen the entire question very clear. I'm not sure, if the final result of the new floating point standard will contain the 2 requirements, but I have doubts. The next question is: Will the processor manufacturers follow the standard? The current floating point equipment that we have is far from ideal. What should we think about Intel's and AMD's design decision not to allow BCD arithmetic in the 64 bit long mode?

Anyway, the idea of a extra long accumulator is not very new. I came first in touch with it during my time as a student at the Technical University Dresden. We had an IBM/370 mainframe with the IBM high-accuracy library installed. That library was for real and complex arithmetic and a lot of other fine stuff and did use the long accumulator.

In 1983 I could grab a book by Kulisch and others with the title: "PASCAL-XSC"; that stands for: PASCAL for Extended Scientific Computing. It was impressive - they used the long accumulator, too. The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter. I hadn't an assembler or compiler (to expensive for me), but I found in one of Berlin's public library a book with the Z80 opcodes. That was enough. I was the assembler and linker in one person, wrote the instructions with pencil and paper down, and poked the bytes via data lines into the memory in a basic array. After 6 weeks and countless system crashes I had my first working long accumulator for REAL4 values (the REAL8 format was unthinkable in this years, because my interpreter didn't support that format). That where wild times!

What else? There's also an English version of Kulisch's book.

And that's for Alex: a Russian version is also available. 2006 came the 3. Russian edition and the compiler is still in use there.

So, that was a bit background information and I hope that other forum members find this questions interesting.

By the way, I've found the books above together with the Z80 code in my bookshelf in the cellar.

Gunther

Antariy · July 25, 2012, 11:47:35 AM

Quote from: Gunther on July 25, 2012, 10:43:38 AM
In 1983 I could grab a book by Kulisch and others with the title: "PASCAL-XSC"; that stands for: PASCAL for Extended Scientific Computing. It was impressive - they used the long accumulator, too. The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter. I hadn't an assembler or compiler (to expensive for me), but I found in one of Berlin's public library a book with the Z80 opcodes. That was enough. I was the assembler and linker in one person, wrote the instructions with pencil and paper down, and poked the bytes via data lines into the memory in a basic array. After 6 weeks and countless system crashes I had my first working long accumulator for REAL4 values (the REAL8 format was unthinkable in this years, because my interpreter didn't support that format). That where wild times!

Gunther, the narration you have just told us is... awful!

raymond · July 25, 2012, 12:45:21 PM

QuoteThe only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter

TRS-80. That's also how I got started. Bought second hand in a garage sale complete with 4 disk drives (single-sided), a TTX printer with its stand (upper case letters only), and a slotted table for the computer, monitor and keyboard; all that for $500 in 1986. I later found a book about "machine language" for the Z-80 ($4) in a book store. Never looked back at other programming languages.

Although I never got involved with floating point computation at that time (I wasn't even aware it existed), I did manage to write a program to extract the square root of any number with a precision of 10,000 digits on that machine and send the result to the printer. I really enjoyed that first computer.

Gunther · July 25, 2012, 08:13:55 PM

Alex,

Quote from: Antariy on July 25, 2012, 11:47:35 AM
Gunther, the narration you have just told us is... awful!

yes it is. It was pure machine language programming, but I had a lot of fun. In a similar way I wrote my first fractal generator. But in that case the row and column loop was Basic, but the inner loop for iterating was in machine language. It takes a while to write and "debug" the program, the graphics resolution was only 320X200 in 2 colours (black and white); the computer (with 1 MHz) calculated about 8 hours for one image - but it was the Mandelbrot set.

Raymond,

Quote from: raymond on July 25, 2012, 12:45:21 PM
Never looked back at other programming languages.

yes, that's the point. :icon14:

Quote from: raymond on July 25, 2012, 12:45:21 PM
Although I never got involved with floating point computation at that time (I wasn't even aware it existed), I did manage to write a program to extract the square root of any number with a precision of 10,000 digits on that machine and send the result to the printer. I really enjoyed that first computer.

Well done. Did you use Heron's algorithm?

Gunther

Rockphorr · July 26, 2012, 01:43:27 AM

Quote from: Gunther on July 25, 2012, 10:43:38 AM
Yesterday I've started a topic; http://masm32.com/board/index.php?topic=487.msg3698#msg3698. Here is some technical and historical explanation about that research field.

First of all: The basic idea behind the software is an extra long dot product accumulator, which can accumulate partial results without catastrophic cancellation in scientific computations that can quickly lead to completely inaccurate results. The thread above shows one of these examples, but also how to avoid such errors. The image below shows the principle with five bit numbers.

But do we really need such techniques? The only possible answer is: yes. There is, for example, the ASCI program; ASCI stands for Advanced Strategic Computing Initiative with the following goal:
QuoteIt aims to replace physical nuclear weapons testing with computer simulations.
There is a critical report by John Gustafson, US-Department of Energy, with the title: "Computational Verifyability and Feasibility of the ASCI Program". The author explains that the confidence in usual calculations combined with error-prone software could lead to a nuclear disaster. Moreover he writes:
QuoteThis forces us to a very different style of computing, one with which very few people have much experience, where results must come with guarantees. This is a new kind of science ....
That's for sure. One single false operation can crash the entire calculation.

Since 2002 we've discussions about the revision of the floating point standard. I've the copy of a letter (from September 2004) to Bob Davis, Chairman of the IEEE Microprocessor Standards Committee; authors are, besides others, Ulrich Kulisch and William Kahan (the father of floating point arithmetic) https://en.wikipedia.org/wiki/William_Kahan. Here is one quote:
Quote
We think that the tremendous progress in computer technology and the great increase in computer speed should be accompanied by extensions of the mathematical capacity of the computer. Beyond what has already been done by IEEE754R, IFIP WG 2.5 expresses its desire that the following two requirements are included in the future arithmetic standard.

For the data format double precision, interval arithmetic should be made available at the speed of simple floating-point arithmetic. Most processors on the market are equipped with arithmetic for multimedia applications. On these processors we believe that it is likely that only 0.1% more silicon in the arithmetic circuitry would suffice to realise this capability.
High speed arithmetic for dynamic precision should be made available for real and for interval data. The basic tool to achieve high speed dynamic precision arithmetic for real and interval data is an exact multiply and accumulate operation for the data format double precision.
Both authors are nearly 80 years old, but they have seen the entire question very clear. I'm not sure, if the final result of the new floating point standard will contain the 2 requirements, but I have doubts. The next question is: Will the processor manufacturers follow the standard? The current floating point equipment that we have is far from ideal. What should we think about Intel's and AMD's design decision not to allow BCD arithmetic in the 64 bit long mode?

Anyway, the idea of a extra long accumulator is not very new. I came first in touch with it during my time as a student at the Technical University Dresden. We had an IBM/370 mainframe with the IBM high-accuracy library installed. That library was for real and complex arithmetic and a lot of other fine stuff and did use the long accumulator.

In 1983 I could grab a book by Kulisch and others with the title: "PASCAL-XSC"; that stands for: PASCAL for Extended Scientific Computing. It was impressive - they used the long accumulator, too. The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter. I hadn't an assembler or compiler (to expensive for me), but I found in one of Berlin's public library a book with the Z80 opcodes. That was enough. I was the assembler and linker in one person, wrote the instructions with pencil and paper down, and poked the bytes via data lines into the memory in a basic array. After 6 weeks and countless system crashes I had my first working long accumulator for REAL4 values (the REAL8 format was unthinkable in this years, because my interpreter didn't support that format). That where wild times!

What else? There's also an English version of Kulisch's book.

And that's for Alex: a Russian version is also available. 2006 came the 3. Russian edition and the compiler is still in use there.

So, that was a bit background information and I hope that other forum members find this questions interesting.

By the way, I've found the books above together with the Z80 code in my bookshelf in the cellar.

Gunther

Hi, Thx for this info.
I cann't unpack your books - winrar sad that it is damaged or invalid format

Regards

Gunther · July 26, 2012, 03:35:11 AM

Hi Rockphorr,

Quote from: Rockphorr on July 26, 2012, 01:43:27 AM
Hi, Thx for this info.
I cann't unpack your books - winrar sad that it is damaged or invalid format

Regards

that's clear, because the attached zip files are only the images for the post above. It was my fault that I didn't write that in the original post. Excuse me.

Gunther

Antariy · July 26, 2012, 09:02:49 AM

Quote from: Gunther on July 25, 2012, 08:13:55 PM
It was pure machine language programming, but I had a lot of fun.

No doubts on that

raymond · July 26, 2012, 09:34:12 AM

QuoteWell done. Did you use Heron's algorithm?

I was not familiar with that algo at the time.

This came about when my son told me that one of his friends had been struggling for weeks to write a program to extract a square root with a precision of something like 16 significant digits on a PC-XT which was not equiped with a co-processor (i.e. FPU). When he asked me how long it would take me, I bluntly answered ... about 4 hours, and with a lot more than 16 digits. What he didn't know was that I had learned how to extract a square root with a mechanical calculator, according to a procedure probably developed a long time ago to be used with an abacus. I simply applied that procedure using BCDs.

With the 64kb of memory, I could have gone to some 25000 digits of precision but I was too lazy to write the code for checking if the input parameter for the number of digits required in the square root was below such a limit. It was a lot easier to check that the number of digits in the input did not exceed 4 (i.e. 9,999 max)!!!

Gunther · July 26, 2012, 10:14:56 PM

Raymond,

Quote from: raymond on July 26, 2012, 09:34:12 AM
What he didn't know was that I had learned how to extract a square root with a mechanical calculator, according to a procedure probably developed a long time ago to be used with an abacus. I simply applied that procedure using BCDs.

With the 64kb of memory, I could have gone to some 25000 digits of precision but I was too lazy to write the code for checking if the input parameter for the number of digits required in the square root was below such a limit. It was a lot easier to check that the number of digits in the input did not exceed 4 (i.e. 9,999 max)!!!

very clever. :t A lot of our current algorithms come from the analogue calculation technique.

Gunther

Daydreamer · July 27, 2012, 05:59:22 AM

the thing I did back in the 80's was make a mul that simultanasly multiplied X and Y With Size
in an experiment project in Electronics Engineer education I amazed others by rendering a full circle on an oscilloscope in realtime with a superslow 8088 experimental board computer despite it only had 0.5mhz clock with size*cosinus and sinus*size output thruoght two D/A converters to X and Y input on oscilloscope

Gunther · July 27, 2012, 07:18:17 AM

Hi daydreamer,

Quote from: daydreamer on July 27, 2012, 05:59:22 AM
the thing I did back in the 80's was make a mul that simultanasly multiplied X and Y With Size
in an experiment project in Electronics Engineer education I amazed others by rendering a full circle on an oscilloscope in realtime with a superslow 8088 experimental board computer despite it only had 0.5mhz clock with size*cosinus and sinus*size output thruoght two D/A converters to X and Y input on oscilloscope

I assume that you did use integer arithmetic, didn't you?

Gunther

dedndave · July 27, 2012, 10:57:05 AM

i can make a circle on an oscilloscope with a few transistors

http://upload.wikimedia.org/wikipedia/commons/b/b0/Lissajous_figures_on_oscilloscope_%2890_degrees_phase_shift%29.gif

Gunther · July 27, 2012, 11:27:48 AM

Dave,

Quote from: dedndave on July 27, 2012, 10:57:05 AM
i can make a circle on an oscilloscope with a few transistors

http://upload.wikimedia.org/wikipedia/commons/b/b0/Lissajous_figures_on_oscilloscope_%2890_degrees_phase_shift%29.gif

You're one of the cheating kind. :icon_rolleyes:

Gunther

Daydreamer · July 31, 2012, 05:27:44 PM

Quote from: Gunther on July 27, 2012, 07:18:17 AM
Hi daydreamer,

Quote from: daydreamer on July 27, 2012, 05:59:22 AM
the thing I did back in the 80's was make a mul that simultanasly multiplied X and Y With Size
in an experiment project in Electronics Engineer education I amazed others by rendering a full circle on an oscilloscope in realtime with a superslow 8088 experimental board computer despite it only had 0.5mhz clock with size*cosinus and sinus*size output thruoght two D/A converters to X and Y input on oscilloscope

I assume that you did use integer arithmetic, didn't you?

Gunther

kindof fixed point

Gunther · August 01, 2012, 04:12:03 AM

Hi daydreamer,

Quote from: daydreamer on July 31, 2012, 05:27:44 PM
kindof fixed point

that's in fact very similar. Good approach.

Gunther

The MASM Forum

News:

Historical background of the high-accuracy floating point technique

Gunther

Antariy

raymond

Gunther

Rockphorr

Gunther

Antariy

raymond

Gunther

Daydreamer

Gunther

dedndave

Gunther

Daydreamer

Gunther