News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

AVX for 32-bit Windows applications

Started by Gunther, May 27, 2014, 04:08:26 AM

Previous topic - Next topic

hutch--

I only got this far with my old Core2 quad.


Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 16.44 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 9.61 Seconds
Performance Boost = 171%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.42 Seconds
Performance Boost = 1156%

Your current CPU doesn't support the AVX instruction set.
The application terminates now.

Gunther

Thank you Hutch. It's clear that the Core2 doesn't support AVX. But the other timings are interesting.

Sinsi, you're using VMware and that allows AVX. Interesting point. Thank you, too.

Gunther
You have to know the facts before you can distort them.

Gunther

I've installed yesterday VMware Player with Windows 7-64 as host and Windows 7-32 as guest. Here are the results for fsum.exe, which is under post #43:

Quote
Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 12.96 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 6.46 Seconds
Performance Boost = 201%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.09 Seconds
Performance Boost = 1187%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 0.75 Seconds
Performance Boost = 1733%


Gunther
You have to know the facts before you can distort them.

habran

I have found error for vsqrtps, vsqrtpd in JWasm

file instravx.h

line 30
was:

avxins (SQRTPD,   vsqrtpd,       P_AVX, VX_L ) /* L, s */
avxins (SQRTPS,   vsqrtps,       P_AVX, VX_L ) /* L, s */

change to:

avxins (SQRTPD,   vsqrtpd,       P_AVX, VX_L|VX_NND ) /* L, ns */
avxins (SQRTPS,   vsqrtps,       P_AVX, VX_L|VX_NND ) /* L, ns */
Cod-Father

Gunther

You have to know the facts before you can distort them.

habran

Cod-Father