The MASM Forum

64 bit assembler => 64 bit assembler. Conceptual Issues => Topic started by: Gunther on October 15, 2012, 04:29:42 AM

Title: Win64 command line programs with AVX instructions
Post by: Gunther on October 15, 2012, 04:29:42 AM
I've added 2 archives to this message: features.zip and floatsum.zip. Please read the readme.txt file first (it's included in every archive). The applications should run under Win64, SP1 (native or VM).

The program features.exe checks the available instruction sets for the underlying machine during runtime. A lot of tests are not really necessary under Win64, but my goal was to develop a technique, which is useable under Win32, too (with some minor changes, that's clear).

The program floatsum.exe sums up an array of float (REAL4) numbers in C and assembly language (with SSE2 instructions and the new AVX instructions). The differences are tremendous. Here is the application's output on my machine: Intel Core i7-3770, 3.4 GHz with Win7 (64 bit) and SP1:

Supported by Processor and installed Operating System:
------------------------------------------------------

     Pentium 4 Instruction Set,
     + FPU (floating point unit) on chip,
     + support of FXSAVE and FXRSTOR,
     +  57 MMX Instructions,
     +  70 SSE (Katmai) Instructions,
     + 144 SSE2 (Willamette) Instructions,
     +  13 SSE3 (Prescott) Instructions,
     +  47 SSE4.1 (Penryn) Instructions,
     +   7 SSE4.2 (Nehalem) Instructions,
     + AVX (Advanced Vector Extensions).

Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 12.68 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 4.29 Seconds
Performance Boost = 296%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.08 Seconds
Performance Boost = 1178%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 0.55 Seconds
Performance Boost = 2323%


For the C sources I used gcc 4.7.2 for Windows, but with some minimal changes (especially the data alignment) should it work with VC or Pelles C, too, but that's not tested. The assembly language sources are processed with yasm 1.2.0 for Windows, but nasm will do the same job (that's tested).

In the next days I'll upload the same example, working under Linux and BSD.

The software isn't in a final stadium. Hints and proposals for improvements are welcome, as well as any feedback.

Gunther

Title: Re: Win64 command line programs with AVX instructions
Post by: Gunther on March 04, 2013, 05:23:38 AM
I've updated the application that sums up an array with REAL4 (float) numbers. The new instruction detection procedure is included and a procedure with FPU code, too.

A little bit feedback would be okay. The Linux version is coming soon.

Gunther

Title: Re: Win64 command line programs with AVX instructions
Post by: dedndave on March 04, 2013, 05:35:38 AM
hi Gunther
my "little bit of feedback".....

your instruction-set ID code...
the first instruction is PUSH RBX
yet, you attempt to ID...
;                   0 = 8086
;                   1 = 80186
;                   2 = 80286
;                   3 = 80386
;                   4 = 80486

probably not much use in identifying those, as windows 2k+ won't run on any of them
but, the PUSH RBX would crash if it did   :P
the first logical step might be to identify processor width
Title: Re: Win64 command line programs with AVX instructions
Post by: Gunther on March 04, 2013, 05:49:51 AM
Hi Dave,

Quote from: dedndave on March 04, 2013, 05:35:38 AM
hi Gunther
my "little bit of feedback".....

your instruction-set ID code...
the first instruction is PUSH RBX
yet, you attempt to ID...
;                   0 = 8086
;                   1 = 80186
;                   2 = 80286
;                   3 = 80386
;                   4 = 80486

probably not much use in identifying those, as windows 2k+ won't run on any of them
but, the PUSH RBX would crash if it did   :P
the first logical step might be to identify processor width

that's right and not right. The numbers up to 7 are only for the 32 bit version and especially for the 16 bit version. The 32 bit version is finished and you've posted in: http://masm32.com/board/index.php?topic=1418.0 (http://masm32.com/board/index.php?topic=1418.0). The 16 bit version isn't ready yet, but the numbers are already there.

Gunther
Title: Re: Win64 command line programs with AVX instructions
Post by: dedndave on March 04, 2013, 06:17:33 AM
ok   :t
i saw those in there and thought i'd mention it
Title: Re: Win64 command line programs with AVX instructions
Post by: Gunther on March 04, 2013, 07:58:03 AM
Hi Dave,

Quote from: dedndave on March 04, 2013, 06:17:33 AM
ok   :t
i saw those in there and thought i'd mention it

never mind. Enjoy.  :t

Gunther