News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Dot Product under DOS with SSE2

Started by Gunther, May 16, 2022, 03:58:44 AM

Previous topic - Next topic

Gunther

The dot product is a common and often used procedure in the BLAS (Basic Linear Algebra Subprograms). It's a de facto application programming interface standard for
publishing libraries to perform linear algebra operations such as vector and matrix multiplications. They are heavily used in high-performance computing.

The program shows the calculation of the scalar product with two REAL4 vectors; random numbers are used as array elements. The computations are executed with
PowerBASIC for DOS, the classical FPU as well as with SSE2 code and compared with each other.

The current version of DOSBox does not allow SSE2 instructions. Therefore only the two FPU functions are called. Here's the result on my machine:

We're using 1000 iterations for the time measure loop.
That'll take a while. Please be patient...

PowerBASIC Code   =  517.19
Elapsed Time      =  75.8 Seconds

Simple FPU Code   =  517.19
Elapsed Time      =  3.46 Seconds

Advanced FPU Code =  517.19
Elapsed Time      =  2.75 Seconds

SSE2 instructions not supported in your configuration.

Please, press any key to end the application...

The code of PB 3.5 is at least by a factor of 20 slower. Where have all the cycles gone? There are several reasons for this. Thus, the application has a possibility to execute the
PB code or not. The user may decide that.

I've installed FreeDOS on a USB stick and run it from there. This works with any other DOS, too. Native plain DOS allows the use of SSE2 and here is the result, without the slow
BASIC code:

We're using 10000000 iterations for the time measure loop.
That'll take a while. Please be patient...

Simple FPU Code   =  506.45
Elapsed Time      =  16.9 Seconds

Advanced FPU Code =  506.45
Elapsed Time      =  8.95 Seconds

Simple XMM Code   =  506.45
Elapsed Time      =  4.94 Seconds

Advanced XMM Code =  506.45
Elapsed Time      =  2.97 Seconds

Please, press any key to end the application...


To my surprise, the program also runs with XP Mode under Win7. Considering that it's an emulation, the times are quite acceptable:

We're using 10000000 iterations for the time measure loop.
That'll take a while. Please be patient...

Simple FPU Code   =  513.93
Elapsed Time      =  21.7 Seconds

Advanced FPU Code =  513.93
Elapsed Time      =  11.4 Seconds

Simple XMM Code   =  513.93
Elapsed Time      =  6.21 Seconds

Advanced XMM Code =  513.93
Elapsed Time      =  3.68 Seconds

Please, press any key to end the application...


Unfortunately, the XP Mode no longer works since Win10. This is also unfortunate because both Virtual Box and VMWare Player fail to use SSE2. The FPU part runs fine and with
the first SSE2 function the virtual machine crashes. That' s not caused by the program, but by the buggy emulations.

If it runs under QEMU, it would need to be checked. But the program should be tested in as many environments as possible. I need help for that.

Therefore I thank you already in advance.
You have to know the facts before you can distort them.

_japheth


It runs under qemu 5.2 and ms-dos
attached screenshot
Dummheit, gepaart mit Dreistigkeit - eine furchtbare Macht.

Gunther

Quote from: _japheth on May 16, 2022, 04:46:06 AM
It runs under qemu 5.2 and ms-dos

Thank you Andreas.  :thumbsup: Qemu seems pretty solid after all. I am thinking about installing it.
You have to know the facts before you can distort them.