This is the corresponding application to that thread: http://masm32.com/board/index.php?topic=795.0 (http://masm32.com/board/index.php?topic=795.0). The program has the same functionality; the timings are very similar, because it's the same machine: Intel Core i7-3770, 3.4 GHz.
Supported by Processor and installed Operating System:
------------------------------------------------------
MMX, CMOV and FCOMI, SSE, SSE2, SSE3, SSSE3, SSE4.1,
POPCNT, SSE4.2, AVX, PCLMUL and AES
Calculating the sum of a float array with different methods.
That'll take a little while. Please be patient ...
Simple C implementation:
------------------------
sum1 = 8390656.00
Elapsed Time = 15.74 Seconds
FPU code with 4 accumulators:
-----------------------------
sum2 = 8390656.00
Elapsed Time = 7.02 Seconds
Performance Boost = 224%
C implementation with 4 accumulators:
-------------------------------------
sum3 = 8390656.00
Elapsed Time = 5.34 Seconds
Performance Boost = 295%
SSE2 code with 4 accumulators:
------------------------------
sum4 = 8390656.00
Elapsed Time = 1.34 Seconds
Performance Boost = 1175%
AVX code with 4 accumulators:
-----------------------------
sum5 = 8390656.00
Elapsed Time = 0.69 Seconds
Performance Boost = 2281%
Feedback would be okay.
Gunther