The MASM Forum
Miscellaneous => 16 bit DOS Programming => Topic started by: Gunther on July 25, 2012, 10:43:38 AM

Yesterday I've started a topic; http://masm32.com/board/index.php?topic=487.msg3698#msg3698 (http://masm32.com/board/index.php?topic=487.msg3698#msg3698). Here is some technical and historical explanation about that research field.
First of all: The basic idea behind the software is an extra long dot product accumulator, which can accumulate partial results without catastrophic cancellation in scientific computations that can quickly lead to completely inaccurate results. The thread above shows one of these examples, but also how to avoid such errors. The image below shows the principle with five bit numbers.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=326)
But do we really need such techniques? The only possible answer is: yes. There is, for example, the ASCI program; ASCI stands for Advanced Strategic Computing Initiative with the following goal:
It aims to replace physical nuclear weapons testing with computer simulations.
There is a critical report by John Gustafson, USDepartment of Energy, with the title: "Computational Verifyability and Feasibility of the ASCI Program". The author explains that the confidence in usual calculations combined with errorprone software could lead to a nuclear disaster. Moreover he writes:
This forces us to a very different style of computing, one with which very few people have much experience, where results must come with guarantees. This is a new kind of science ....
That's for sure. One single false operation can crash the entire calculation.
Since 2002 we've discussions about the revision of the floating point standard. I've the copy of a letter (from September 2004) to Bob Davis, Chairman of the IEEE Microprocessor Standards Committee; authors are, besides others, Ulrich Kulisch and William Kahan (the father of floating point arithmetic) https://en.wikipedia.org/wiki/William_Kahan (https://en.wikipedia.org/wiki/William_Kahan). Here is one quote:
We think that the tremendous progress in computer technology and the great increase in computer speed should be accompanied by extensions of the mathematical capacity of the computer. Beyond what has already been done by IEEE754R, IFIP WG 2.5 expresses its desire that the following two requirements are included in the future arithmetic standard.
 For the data format double precision, interval arithmetic should be made available at the speed of simple floatingpoint arithmetic. Most processors on the market are equipped with arithmetic for multimedia applications. On these processors we believe that it is likely that only 0.1% more silicon in the arithmetic circuitry would suffice to realise this capability.
 High speed arithmetic for dynamic precision should be made available for real and for interval data. The basic tool to achieve high speed dynamic precision arithmetic for real and interval data is an exact multiply and accumulate operation for the data format double precision.
Both authors are nearly 80 years old, but they have seen the entire question very clear. I'm not sure, if the final result of the new floating point standard will contain the 2 requirements, but I have doubts. The next question is: Will the processor manufacturers follow the standard? The current floating point equipment that we have is far from ideal. What should we think about Intel's and AMD's design decision not to allow BCD arithmetic in the 64 bit long mode?
Anyway, the idea of a extra long accumulator is not very new. I came first in touch with it during my time as a student at the Technical University Dresden. We had an IBM/370 mainframe with the IBM highaccuracy library installed. That library was for real and complex arithmetic and a lot of other fine stuff and did use the long accumulator.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=327)
In 1983 I could grab a book by Kulisch and others with the title: "PASCALXSC"; that stands for: PASCAL for Extended Scientific Computing. It was impressive  they used the long accumulator, too. The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter. I hadn't an assembler or compiler (to expensive for me), but I found in one of Berlin's public library a book with the Z80 opcodes. That was enough. I was the assembler and linker in one person, wrote the instructions with pencil and paper down, and poked the bytes via data lines into the memory in a basic array. After 6 weeks and countless system crashes I had my first working long accumulator for REAL4 values (the REAL8 format was unthinkable in this years, because my interpreter didn't support that format). That where wild times!
What else? There's also an English version of Kulisch's book.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=328)
And that's for Alex: a Russian version is also available. 2006 came the 3. Russian edition and the compiler is still in use there.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=329)
So, that was a bit background information and I hope that other forum members find this questions interesting.
By the way, I've found the books above together with the Z80 code in my bookshelf in the cellar.
Gunther

In 1983 I could grab a book by Kulisch and others with the title: "PASCALXSC"; that stands for: PASCAL for Extended Scientific Computing. It was impressive  they used the long accumulator, too. The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter. I hadn't an assembler or compiler (to expensive for me), but I found in one of Berlin's public library a book with the Z80 opcodes. That was enough. I was the assembler and linker in one person, wrote the instructions with pencil and paper down, and poked the bytes via data lines into the memory in a basic array. After 6 weeks and countless system crashes I had my first working long accumulator for REAL4 values (the REAL8 format was unthinkable in this years, because my interpreter didn't support that format). That where wild times!
Gunther, the narration you have just told us is... awful! :eusa_clap:

The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter
TRS80. That's also how I got started. Bought second hand in a garage sale complete with 4 disk drives (singlesided), a TTX printer with its stand (upper case letters only), and a slotted table for the computer, monitor and keyboard; all that for $500 in 1986. I later found a book about "machine language" for the Z80 ($4) in a book store. Never looked back at other programming languages.
Although I never got involved with floating point computation at that time (I wasn't even aware it existed), I did manage to write a program to extract the square root of any number with a precision of 10,000 digits on that machine and send the result to the printer. I really enjoyed that first computer.

Alex,
Gunther, the narration you have just told us is... awful! :eusa_clap:
yes it is. It was pure machine language programming, but I had a lot of fun. In a similar way I wrote my first fractal generator. But in that case the row and column loop was Basic, but the inner loop for iterating was in machine language. It takes a while to write and "debug" the program, the graphics resolution was only 320X200 in 2 colours (black and white); the computer (with 1 MHz) calculated about 8 hours for one image  but it was the Mandelbrot set.
Raymond,
Never looked back at other programming languages.
yes, that's the point. :icon14:
Although I never got involved with floating point computation at that time (I wasn't even aware it existed), I did manage to write a program to extract the square root of any number with a precision of 10,000 digits on that machine and send the result to the printer. I really enjoyed that first computer.
Well done. Did you use Heron's algorithm?
Gunther

Yesterday I've started a topic; http://masm32.com/board/index.php?topic=487.msg3698#msg3698 (http://masm32.com/board/index.php?topic=487.msg3698#msg3698). Here is some technical and historical explanation about that research field.
First of all: The basic idea behind the software is an extra long dot product accumulator, which can accumulate partial results without catastrophic cancellation in scientific computations that can quickly lead to completely inaccurate results. The thread above shows one of these examples, but also how to avoid such errors. The image below shows the principle with five bit numbers.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=326)
But do we really need such techniques? The only possible answer is: yes. There is, for example, the ASCI program; ASCI stands for Advanced Strategic Computing Initiative with the following goal:
It aims to replace physical nuclear weapons testing with computer simulations.
There is a critical report by John Gustafson, USDepartment of Energy, with the title: "Computational Verifyability and Feasibility of the ASCI Program". The author explains that the confidence in usual calculations combined with errorprone software could lead to a nuclear disaster. Moreover he writes:
This forces us to a very different style of computing, one with which very few people have much experience, where results must come with guarantees. This is a new kind of science ....
That's for sure. One single false operation can crash the entire calculation.
Since 2002 we've discussions about the revision of the floating point standard. I've the copy of a letter (from September 2004) to Bob Davis, Chairman of the IEEE Microprocessor Standards Committee; authors are, besides others, Ulrich Kulisch and William Kahan (the father of floating point arithmetic) https://en.wikipedia.org/wiki/William_Kahan (https://en.wikipedia.org/wiki/William_Kahan). Here is one quote:
We think that the tremendous progress in computer technology and the great increase in computer speed should be accompanied by extensions of the mathematical capacity of the computer. Beyond what has already been done by IEEE754R, IFIP WG 2.5 expresses its desire that the following two requirements are included in the future arithmetic standard.
 For the data format double precision, interval arithmetic should be made available at the speed of simple floatingpoint arithmetic. Most processors on the market are equipped with arithmetic for multimedia applications. On these processors we believe that it is likely that only 0.1% more silicon in the arithmetic circuitry would suffice to realise this capability.
 High speed arithmetic for dynamic precision should be made available for real and for interval data. The basic tool to achieve high speed dynamic precision arithmetic for real and interval data is an exact multiply and accumulate operation for the data format double precision.
Both authors are nearly 80 years old, but they have seen the entire question very clear. I'm not sure, if the final result of the new floating point standard will contain the 2 requirements, but I have doubts. The next question is: Will the processor manufacturers follow the standard? The current floating point equipment that we have is far from ideal. What should we think about Intel's and AMD's design decision not to allow BCD arithmetic in the 64 bit long mode?
Anyway, the idea of a extra long accumulator is not very new. I came first in touch with it during my time as a student at the Technical University Dresden. We had an IBM/370 mainframe with the IBM highaccuracy library installed. That library was for real and complex arithmetic and a lot of other fine stuff and did use the long accumulator.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=327)
In 1983 I could grab a book by Kulisch and others with the title: "PASCALXSC"; that stands for: PASCAL for Extended Scientific Computing. It was impressive  they used the long accumulator, too. The only computer that I had during this time was a Z80 machine (8 bit wide!) with 64 KB RAM (4 KB for the operating system) with a basic interpreter. I hadn't an assembler or compiler (to expensive for me), but I found in one of Berlin's public library a book with the Z80 opcodes. That was enough. I was the assembler and linker in one person, wrote the instructions with pencil and paper down, and poked the bytes via data lines into the memory in a basic array. After 6 weeks and countless system crashes I had my first working long accumulator for REAL4 values (the REAL8 format was unthinkable in this years, because my interpreter didn't support that format). That where wild times!
What else? There's also an English version of Kulisch's book.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=328)
And that's for Alex: a Russian version is also available. 2006 came the 3. Russian edition and the compiler is still in use there.
(http://masm32.com/board/index.php?action=dlattach;topic=493.0;attach=329)
So, that was a bit background information and I hope that other forum members find this questions interesting.
By the way, I've found the books above together with the Z80 code in my bookshelf in the cellar.
Gunther
Hi, Thx for this info.
I cann't unpack your books  winrar sad that it is damaged or invalid format
Regards

Hi Rockphorr,
Hi, Thx for this info.
I cann't unpack your books  winrar sad that it is damaged or invalid format
Regards
that's clear, because the attached zip files are only the images for the post above. It was my fault that I didn't write that in the original post. Excuse me.
Gunther

It was pure machine language programming, but I had a lot of fun.
No doubts on that :biggrin:

Well done. Did you use Heron's algorithm?
I was not familiar with that algo at the time.
This came about when my son told me that one of his friends had been struggling for weeks to write a program to extract a square root with a precision of something like 16 significant digits on a PCXT which was not equiped with a coprocessor (i.e. FPU). When he asked me how long it would take me, I bluntly answered ... about 4 hours, and with a lot more than 16 digits. What he didn't know was that I had learned how to extract a square root with a mechanical calculator, according to a procedure probably developed a long time ago to be used with an abacus. I simply applied that procedure using BCDs.
With the 64kb of memory, I could have gone to some 25000 digits of precision but I was too lazy to write the code for checking if the input parameter for the number of digits required in the square root was below such a limit. It was a lot easier to check that the number of digits in the input did not exceed 4 (i.e. 9,999 max)!!!

Raymond,
What he didn't know was that I had learned how to extract a square root with a mechanical calculator, according to a procedure probably developed a long time ago to be used with an abacus. I simply applied that procedure using BCDs.
With the 64kb of memory, I could have gone to some 25000 digits of precision but I was too lazy to write the code for checking if the input parameter for the number of digits required in the square root was below such a limit. It was a lot easier to check that the number of digits in the input did not exceed 4 (i.e. 9,999 max)!!!
very clever. :t A lot of our current algorithms come from the analogue calculation technique.
Gunther

the thing I did back in the 80's was make a mul that simultanasly multiplied X and Y With Size
in an experiment project in Electronics Engineer education I amazed others by rendering a full circle on an oscilloscope in realtime with a superslow 8088 experimental board computer despite it only had 0.5mhz clock with size*cosinus and sinus*size output thruoght two D/A converters to X and Y input on oscilloscope

Hi daydreamer,
the thing I did back in the 80's was make a mul that simultanasly multiplied X and Y With Size
in an experiment project in Electronics Engineer education I amazed others by rendering a full circle on an oscilloscope in realtime with a superslow 8088 experimental board computer despite it only had 0.5mhz clock with size*cosinus and sinus*size output thruoght two D/A converters to X and Y input on oscilloscope
I assume that you did use integer arithmetic, didn't you?
Gunther

i can make a circle on an oscilloscope with a few transistors :biggrin:
http://upload.wikimedia.org/wikipedia/commons/b/b0/Lissajous_figures_on_oscilloscope_%2890_degrees_phase_shift%29.gif (http://upload.wikimedia.org/wikipedia/commons/b/b0/Lissajous_figures_on_oscilloscope_%2890_degrees_phase_shift%29.gif)

Dave,
i can make a circle on an oscilloscope with a few transistors :biggrin:
http://upload.wikimedia.org/wikipedia/commons/b/b0/Lissajous_figures_on_oscilloscope_%2890_degrees_phase_shift%29.gif (http://upload.wikimedia.org/wikipedia/commons/b/b0/Lissajous_figures_on_oscilloscope_%2890_degrees_phase_shift%29.gif)
You're one of the cheating kind. :icon_rolleyes:
Gunther

Hi daydreamer,
the thing I did back in the 80's was make a mul that simultanasly multiplied X and Y With Size
in an experiment project in Electronics Engineer education I amazed others by rendering a full circle on an oscilloscope in realtime with a superslow 8088 experimental board computer despite it only had 0.5mhz clock with size*cosinus and sinus*size output thruoght two D/A converters to X and Y input on oscilloscope
I assume that you did use integer arithmetic, didn't you?
Gunther
kindof fixed point

Hi daydreamer,
kindof fixed point
that's in fact very similar. Good approach.
Gunther