The MASM Forum

General => The Laboratory => Topic started by: Gunther on May 27, 2014, 04:08:26 AM

Title: AVX for 32-bit Windows applications
Post by: Gunther on May 27, 2014, 04:08:26 AM
The attached archive is ArrayCopy.zip. The C program is only the frame. The real work is done by the assembly language procedures.

The application checks if the AVX instruction set is available (since Sandy Bridge and Bulldozer) and if the Operating System supports it. If so, it copies an array of doubles with AVX instructions. If not, it uses SSE2 instructions.

The strange thing is: Under Windows 7, SP1 (32-bit), the program output is this:
Quote

AVX support not available:
==========================

X[0] = 1.00     Y[0] = 1.00
X[1] = 2.00     Y[1] = 2.00
X[2] = 3.00     Y[2] = 3.00
X[3] = 4.00     Y[3] = 4.00

Please, press enter to end the application ...


The same program brings under Windows 7, SP1 (64-bit) in the compatibility mode:
Quote

AVX support available:
======================

X[0] = 1.00     Y[0] = 1.00     Z[0] = 1.00
X[1] = 2.00     Y[1] = 2.00     Z[1] = 2.00
X[2] = 3.00     Y[2] = 3.00     Z[2] = 3.00
X[3] = 4.00     Y[3] = 4.00     Z[3] = 4.00

Please, press enter to end the application ...


The compatibility mode seems to be the only chance to use AVX, AVX2 etc. instruction sets. But I can test it only under a virtual machine. What is with a native Win7 (32-bit) with SP1? That must be tested. The C source should be compile after a few minor changes with VC, too. But that must also be tested. Your help and comments are very welcome.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: jj2007 on May 27, 2014, 04:18:41 AM
Hi Gunther,

One question I ask myself (and I can't test it) is what exactly does "supported by the OS" mean?

Can the OS tell the CPU "throw an exception if you hit an AVX instruction", even if the CPU supports it?

Or does it only mean that the AVX registers are not saved during a context switch?
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on May 27, 2014, 04:30:46 AM
the OS can disable many of the newer features via the control registers (ring 0)
because AVX is only supported for 64-bit OS's, i suspect they have been disabled under win7-32
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on May 27, 2014, 07:03:47 AM
Jochen,

One question I ask myself (and I can't test it) is what exactly does "supported by the OS" mean?

Can the OS tell the CPU "throw an exception if you hit an AVX instruction", even if the CPU supports it?

Or does it only mean that the AVX registers are not saved during a context switch?

The Intel® Advanced Vector Extensions Programming Reference says that (p. 2-2):
Quote
Prior to using AVX, the application must identify that the operating system supports the XGETBV instruction, the YMM register state, in addition to processor’s support for YMM state management using XSAVE/XRSTOR and AVX instructions. The following simplified sequence accomplishes both and is strongly recommended.
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1)
2) Issue XGETBV and verify that XFEATURE_ENABLED_MASK[2:1] = ‘11b’ (XMM state and YMM state are enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2)

and that (p. 2-3):
Quote
Note: It is unwise for an application to rely exclusively on CPUID.1:ECX.AVX[bit 28] or at all on CPUID.1:ECX.XSAVE[bit 26]: These indicate hardware support but not operating system support. If YMM state management is not enabled by an operating systems, AVX instructions will #UD regardless of CPUID.1:ECX.AVX[bit 28]. “CPUID.1:ECX.XSAVE[bit 26] = 1” does not guarantee the OS actually uses the XSAVE process for state management.
These steps above also apply to enhanced 128-bit SIMD floating-pointing instructions in AVX (using VEX prefix-encoding) that operate on the YMM states.

Dave,

the OS can disable many of the newer features via the control registers (ring 0)
because AVX is only supported for 64-bit OS's, i suspect they have been disabled under win7-32

but that's not very logical. A 32-bit client can use under a 64-bit OS by using the compatibility mode (no re-compiling necessary) the AVX instructions. The compatibility mode is nearly the same as under a native 32-bit OS. Why these restrictions?

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on May 28, 2014, 04:01:19 AM
Jochen,

here's another excerpt from the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture, p. 13-10.
Quote
Intel AVX instructions comprises of 256-bit and 128-bit instructions that operates on 256-bit YMM registers. The XSAVE feature set allows software to save and restore the state of these registers. See Chapter 13 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1. System software support requirements for 256-bit YMM states are described next:
For processors that support YMM states, the YMM state exists in all operating modes. However, the available instruction interfaces to access YMM states may vary in different modes. Operating systems must use the XSAVE feature set for YMM state management. The XSAVE feature set also provides flexible and efficient interface to manage XMM/MXCSR states and x87 FPU states in conjunction with newer processor extended states like YMM states. Operating systems may need to be aware of the following when supporting AVX.
• Saving/Restoring AVX state in non-compacted format without SSE state will also save/restore MXCSR even though MXCSR is not part of AVX state. This does not happen when compacted format is used.
• Few AVX instructions such as VZEROUPPER/VZEROALL may operate on future expansion of YMM registers.
An operating system must enable its YMM state management to support AVX and any 256-bit extensions that operate on YMM registers. Otherwise, an attempt to execute an instruction in AVX extensions (including an enhanced 128-bit SIMD instructions using VEX encoding) will cause a #UD exception.
AVX instructions may generate SIMD floating-point exceptions. An OS must enable SIMD floating-point exception support by setting CR4.OSXMMEXCPT[bit 10]=1.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: jj2007 on May 28, 2014, 04:13:03 AM
Danke :t
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on May 28, 2014, 06:04:45 AM
Jochen,

Danke :t

you're welcome.

On the other hand, it could be that there is a chance to use AVX and later instruction sets under plain DOS. That would be a joke.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 28, 2014, 05:49:41 AM
this code was written by someone who doesn't write ASM very often
but, we should be able to make it work

https://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled (https://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled)

Code: [Select]
.686p
.xmm
.model FLAT

; CPUID Win32
.code

; int isAvxSupported();
_isAvxSupported proc
xor eax, eax
cpuid
cmp eax, 1 ; does CPUID support eax = 1?
jb not_supported
mov eax, 1
cpuid
and ecx, 018000000h ;check 27 bit (OS uses XSAVE/XRSTOR)
cmp ecx, 018000000h ; and 28 (AVX supported by CPU)
jne not_supported
xor ecx, ecx ; XFEATURE_ENABLED_MASK/XCR0 register number = 0
xgetbv ; XFEATURE_ENABLED_MASK register is in edx:eax
and eax, 110b
cmp eax, 110b ; check the AVX registers restore at context switch
jne not_supported
mov eax, 1
ret
not_supported:
xor eax, eax
ret
_isAvxSupported endp
END
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 28, 2014, 06:00:35 AM
ok - that looks similar to Gunther's check

this page is really confusing me, though

http://msdn.microsoft.com/en-us/library/windows/desktop/ff919571%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/desktop/ff919571%28v=vs.85%29.aspx)

they start out, talking about enabling the features under win7-32 SP1
and, the farther i read, the more confused i am - lol
by the end of it, i go away with the impression that AVX is for debuggers    :dazzled:
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on June 28, 2014, 06:47:33 PM
Dave,

ok - that looks similar to Gunther's check

this page is really confusing me, though

http://msdn.microsoft.com/en-us/library/windows/desktop/ff919571%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/desktop/ff919571%28v=vs.85%29.aspx)

you can trust me. There's no chance for AVX and better under a 32-bit Operating System.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 28, 2014, 10:04:22 PM
it's not that i don't trust you, Gunther   :biggrin:

but, i keep seeing documentation that says otherwise
Title: Re: AVX for 32-bit Windows applications
Post by: anta40 on June 29, 2014, 02:15:02 AM
Hi Gunther,

I'm running 32-bit Win 7. CPU-Z says that my CPU supports AVX.
Your program seems to run correctly as expected.

(http://i244.photobucket.com/albums/gg6/segmentationfault3/avx_7_zps917ed676.jpg)
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 29, 2014, 02:41:09 AM
anta40 - have you installed service pack 1 ?
i am noticing that it is the Ultimate edition

perhaps the edition and SP level have something to do with it
Title: Re: AVX for 32-bit Windows applications
Post by: anta40 on June 29, 2014, 02:53:03 AM
Yes dave you are right.
32-bit Win 7 with SP 1.
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on June 29, 2014, 03:39:05 AM
... so it was the mysterious VM that doe not support AVX ::)


BTW: OllyDbg 2.01 does decode AVX instructions.
BTW2: For the 32- and 16 bit modes the VEX prefix is encoded as an invalid form of the LDS/ES instruction (LDS/ES reg,reg). However, only ymm0-7 can be used in these modes.
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 29, 2014, 05:44:20 AM
i was reading something about ymm0-7 etc the other day
i think it mentioned something about 64-bit OS, being able to use the others
i may have mis-read because it was not what i was after and i skimmed that part quickly

as for VM's....

according to some documentation i have seen,
some VM's return their own vendor ID for CPUID
Code: [Select]
;'KVMKVMKVMKVM' KVM
;'Microsoft Hv' Microsoft Hyper-V or Windows Virtual PC
;'VMwareVMware' VMware
;'XenVMMXenVMM' Xen HVM

that might make it difficult for software to determine the level of support for extensions   ::)
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on June 29, 2014, 11:56:08 PM
Hi Gunther,

I'm running 32-bit Win 7. CPU-Z says that my CPU supports AVX.
Your program seems to run correctly as expected.

That's a big surprise for me. But Windows 7-32, Professional, SP 1 doesn't support it, at least as VM.

qWord,
However, only ymm0-7 can be used in these modes.

That's clear. If so, it would be similar to the usage of XMM registers. Furthermore, we had that interesting discussion (http://masm32.com/board/index.php?topic=3171.msg32938#msg32938). The groundwork was that link to an article (https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions) by Chris Lomont in the Intel Developer Zone:
Quote
The new instructions are encoded using what Intel calls a VEX prefix, which is a two- or three-byte prefix designed to clean up the complexity of current and future x86/x64 instruction encoding. The two new VEX prefixes are formed from two obsolete 32-bit instructions-Load Pointer Using DS (LDS-0xC4, 3-byte form) and Load Pointer Using ES (LES-0xC5, two-byte form)-which load the DS and ES segment registers in 32-bit mode. In 64-bit mode, opcodes LDS and LES generate an invalid-opcode exception, but under Intel® AVX, these opcodes are repurposed for encoding new instruction prefixes. As a result, the VEX instructions can only be used when running in 64-bit mode. The prefixes allow encoding more registers than previous x86 instructions and are required for accessing the new 256-bit SIMD registers or using the three- and four-operand syntax. As a user, you do not need to worry about this (unless you're writing assemblers or disassemblers).

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on June 30, 2014, 12:06:20 AM
qWord,
However, only ymm0-7 can be used in these modes.

That's clear. If so, it would be similar to the usage of XMM registers. Furthermore, we had that interesting discussion (http://masm32.com/board/index.php?topic=3171.msg32938#msg32938). The groundwork was that link to an article (https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions) by Chris Lomont in the Intel Developer Zone:
Quote
The new instructions are encoded using what Intel calls a VEX prefix, which is a two- or three-byte prefix designed to clean up the complexity of current and future x86/x64 instruction encoding. The two new VEX prefixes are formed from two obsolete 32-bit instructions-Load Pointer Using DS (LDS-0xC4, 3-byte form) and Load Pointer Using ES (LES-0xC5, two-byte form)-which load the DS and ES segment registers in 32-bit mode. In 64-bit mode, opcodes LDS and LES generate an invalid-opcode exception, but under Intel® AVX, these opcodes are repurposed for encoding new instruction prefixes. As a result, the VEX instructions can only be used when running in 64-bit mode. The prefixes allow encoding more registers than previous x86 instructions and are required for accessing the new 256-bit SIMD registers or using the three- and four-operand syntax. As a user, you do not need to worry about this (unless you're writing assemblers or disassemblers).
What should this say to other readers?
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on June 30, 2014, 12:14:35 AM
qWord,

What should this say to other readers?

no offense. It wasn't my quote. It's a statement by Chris Lomont inside the Intel Developer Network. The other side of the coin is the result by anta40. So, what is with Windows 8?

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 02:32:40 AM
perhaps not all AVX instructions are VEX encoded ?
it's a lot to absorb   :(
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on June 30, 2014, 03:08:12 AM
Dave,

perhaps not all AVX instructions are VEX encoded ?

But
Code: [Select]
        vaddps
is VEX encoded.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 03:35:52 AM
your test program didn't use any VEX-encoded instructions, though - right ?
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on June 30, 2014, 03:38:17 AM
your test program didn't use any VEX-encoded instructions, though - right ?

It uses VEX encoding. That makes the entire thing a bit strange.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 03:48:11 AM
ok - it's nice to know that i'm not the only person that's confused   :biggrin:
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 04:48:47 AM
i was looking for info and found this document
it's all about AVX512

however, they also give a complete treatment to CPUID at the beginning of section 2   :biggrin:

https://software.intel.com/en-us/file/319433-019pdf (https://software.intel.com/en-us/file/319433-019pdf)
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on June 30, 2014, 04:56:25 AM
What is so confusion? That a VM does not support AVX?

AVX512 does not exist in hardware currently and the linked manuals shows that these instruction will get their own new (4 byte) prefix: EVEX.



Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 05:23:47 AM
i can't speak for Gunther, but here's what's confusing me.....

the AVX instructions are VEX-encoded
i.e., they use the opcode space of the old LDS and LES instructions

so, let's say you have a 64-bit processor that is AVX-capable (i7, for example)
and you have Windows 7 32-bit installed

it seems to support AVX, at least to some degree
but, some documentation states that VEX may only be decoded in long mode   :redface:
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 05:41:25 AM
to clarify....

here is a link to Intel info, stating that VEX instructions may only be decoded in long mode
at the end of the page, instructions are listed, nearly all seem to be VEX encoded

https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions (https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions)

here is a link to Intel info, showing how to test for AVX support - 32-bit and 64-bit code provided

https://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled (https://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled)

according to Gunther, his test program uses at least one VEX encoded instruction
and anta40 ran the test program successfully under Windows 7-32, SP1
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on June 30, 2014, 05:54:07 AM
The VEX prefix is encoded in such way that the first two byte form an invalid form of LES resp. LDS: These instructions have one register argument as destination and one memory operand as source. The VEX prefixes (in 16 or 32 bit modes) encode the illegal form with two register arguments (ModR/M: mod=11y). The limitation for ymm0-7 has to do with the 2-byte VEX prefix (=>LDS), where bit 6 of the second prefix byte ( = low bit of the mod-filed of LDS) is used encode a register number (ymmX). The reg. number is saved in 1's complement thus this bit is 1 for ymm0-7. In 64 bit mode this bit could also be 0, because LDS does not exist, but in 32 and 16 bit modes this bit must be 1 to get an illegal form of LDS (mod=11y).

You can read this up in the latest Manuals (I've used the "AllInOne" pdf).

BTW: does the Author of the PDF (https://software.intel.com/sites/default/files/m/d/4/1/d/8/Intro_to_Intel_AVX.pdf) work for Intel? I didn't think so.

EDIT: bit 7 of the second prefix byte is also used for register encoding and must be 1 in 32 bit modes.
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on June 30, 2014, 06:03:59 AM
Hi qWord,

thank you for the explanation. I'm sitting here, writing some test programs resulting in strange results. So, I'll need some time.

BTW: does the Author of the PDF (https://software.intel.com/sites/default/files/m/d/4/1/d/8/Intro_to_Intel_AVX.pdf) work for Intel? I didn't think so.

On the other hand, Chris Lomont isn't Mr. Nobody.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on June 30, 2014, 07:12:37 AM
....and, Intel does publish the article    :redface:
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on June 30, 2014, 07:18:27 AM
....and, Intel does publish the article    :redface:
Indeed strange, but the final word has the official documentation.
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 01, 2014, 02:06:50 AM
Hi qWord,

Indeed strange, but the final word has the official documentation.

no doubt about that. But wait a little bit. I'll post my new test program here. It'll come to very strange results.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on July 01, 2014, 11:24:04 PM
But wait a little bit. I'll post my new test program here. It'll come to very strange results.
No strange result here (requires MASM 10+):
Code: [Select]
include \masm32\include\masm32rt.inc
.686
.mmx
.xmm

IF @Version GE 1000

print_ps8 macro m256:req
xor esi,esi
.while esi < 8*REAL4
movss xmm0,REAL4 ptr m256[esi]
sub esp,8
cvtss2sd xmm0,xmm0
movsd REAL8 ptr [esp],xmm0
.if esi != 7*REAL4
push chr$("%3.2G, ")
.else
push chr$("%3.2G")
.endif
call crt_printf
add esp,12
add esi,REAL4
.endw
endm

.const
align 16
vpsVector REAL4 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
vpdVector REAL8 2.0, 2.0, 2.0, 2.0
.data?
vpsResult0 YMMWORD ?
vpsResult1 YMMWORD ?
.code

; return: eax => AVX?, edx => FMA?
supports_AVX_FMA proc uses ebx

.repeat
mov eax, 1
cpuid
push ecx
and ecx,18000000h
cmp ecx,18000000h
.break .if !ZERO?
xor ecx,ecx
xgetbv
and eax,6h
cmp eax,6h
.break .if !ZERO?
pop ecx
xor edx,edx
mov eax,1
test ecx,1000h
cmovnz edx,eax
ret
.until 1
pop ecx
xor eax,eax
ret

supports_AVX_FMA endp

main proc
LOCAL bAVX:BOOL

fnx bAVX = supports_AVX_FMA

.if bAVX
print "AVX supported:",13,10

vmovups ymm0,YMMWORD ptr vpsVector
vaddps ymm0,ymm0,ymm0
vmovups vpsResult0,ymm0
vsqrtps ymm0,ymm0
vmovups vpsResult1,ymm0

fnc crt_printf,   "ymm0            = { "
print_ps8 vpsVector
fnc crt_printf,"}\nymm0+ymm0       = { "
print_ps8 vpsResult0
fnc crt_printf,"}\n"
fnc crt_printf,   "sqrt(ymm0+ymm0) = { "
print_ps8 vpsResult1
fnc crt_printf,"}\n"
.else
print "AVX not supported",13,10
.endif

inkey
exit

main endp
ELSE
.err <MASM version 10+ required>
externdef main:proc
ENDIF
end main


Just an interesting side note, with AVX2 a new type of memory addressing has been introduced that allows to use ymm registers as scale register (*1\2\4\8 ) for SIB addressing.
An example using MASM v11+:
Code: [Select]
vgatherdps ymm0,[esi+ymm1*4],ymm2   ; ymm1 holds 8 DWORD indices which are used to load up to 8 REAL4 values.
These V[p]GATHERxxx instruction are really powerful, because they allow vectorized addressing whereas individual accesses can be masked due to the third operand.

BTW: jWasm's current AVX implementation seems to be buggy...
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 02, 2014, 01:01:23 AM
Hi qWord,

here is the result of your application under Windows 7-64:
Code: [Select]
AVX supported:
ymm0            = {   1,   2,   3,   4,   5,   6,   7,   8}
ymm0+ymm0       = {   2,   4,   6,   8,  10,  12,  14,  16}
sqrt(ymm0+ymm0) = { 1.4,   2, 2.4, 2.8, 3.2, 3.5, 3.7,   4}
Press any key to continue ...

Could you post the code via attachment, please?

BTW: jWasm's current AVX implementation seems to be buggy...
So, what is your recommendation instead?

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on July 02, 2014, 04:24:46 AM
Could you post the code via attachment, please?
should I really support your laziness? :biggrin:

So, what is your recommendation instead?
MASM 10+ of course.

Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on July 02, 2014, 04:31:48 AM
should I really support your laziness? :biggrin:

yes   :biggrin:
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 02, 2014, 07:58:55 PM
Hi qWord,

should I really support your laziness? :biggrin:

Special thanks for that.  :t

So, what is your recommendation instead?
MASM 10+ of course.

Okay. Is that part of the current MASM32 package?

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: qWord on July 03, 2014, 12:06:47 AM
So, what is your recommendation instead?
MASM 10+ of course.

Okay. Is that part of the current MASM32 package?
No, as usual it comes with Visual Studio (2010 or later).
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 03, 2014, 01:43:42 AM
No, as usual it comes with Visual Studio (2010 or later).

Thank you for the information. I'll download and install it as soon as possible.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 03, 2014, 04:49:59 AM
Hi qWord,

is that part of the Express Edition, too? If not, is there another legal download possible?

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: dedndave on July 03, 2014, 08:35:16 AM
sent you a PM, Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 04, 2014, 04:11:04 AM
sent you a PM, Gunther

Received.  :t

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 15, 2014, 09:35:05 AM
I've attached the archive fsum.zip to this mail. It contains a test program for AVX instructions under 32-bit Windows. It sums up an array of float values and measures the calculation time. That's the programs output under Windows 7-32, SP 1 as virtual machine under VirtualBox:
Quote

Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 13.55 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 6.89 Seconds
Performance Boost = 197%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.15 Seconds
Performance Boost = 1176%

Your current CPU doesn't support the AVX instruction set.
The application terminates now.

No AVX support is available with that configuration. But that's tricky. CPU-Z indicates AVX:
(http://ibunker.us/photos/20140715140538231172934.png)
In the compatibility mode under Windows 7-64, SP 1 the same application gives that output:
Quote

Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 13.04 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 6.52 Seconds
Performance Boost = 200%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.11 Seconds
Performance Boost = 1173%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 0.77 Seconds
Performance Boost = 1701%

The frame is written in C and compiled with gcc. With a few minor changes it should compile with MSVC, too. The dirty work is made by the assembly language procedures. Those are assembled with jWasm, but ml should work, too. I've provided the full source.

Some test results by other members under different environments would be fine.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: sinsi on July 15, 2014, 10:14:22 AM
Windows 8.1 32-bit VMware guest on Windows 8.1 64-bit host

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 13.05 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 6.53 Seconds
Performance Boost = 200%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.11 Seconds
Performance Boost = 1175%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 0.78 Seconds
Performance Boost = 1670%

Title: Re: AVX for 32-bit Windows applications
Post by: hutch-- on July 15, 2014, 11:19:52 AM
I only got this far with my old Core2 quad.

Code: [Select]
Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 16.44 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 9.61 Seconds
Performance Boost = 171%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.42 Seconds
Performance Boost = 1156%

Your current CPU doesn't support the AVX instruction set.
The application terminates now.
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on July 15, 2014, 11:29:39 AM
Thank you Hutch. It's clear that the Core2 doesn't support AVX. But the other timings are interesting.

Sinsi, you're using VMware and that allows AVX. Interesting point. Thank you, too.

Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on August 23, 2014, 08:31:35 PM
I've installed yesterday VMware Player with Windows 7-64 as host and Windows 7-32 as guest. Here are the results for fsum.exe, which is under post #43 (http://masm32.com/board/index.php?topic=3227.msg35958#msg35958):

Quote

Calculating the sum of a float array in different ways.
That'll take a little while. Please be patient ...

Simple C implementation:
------------------------
sum1              = 8390656.00
Elapsed Time      = 12.96 Seconds

C implementation with 4 accumulators:
-------------------------------------
sum2              = 8390656.00
Elapsed Time      = 6.46 Seconds
Performance Boost = 201%

Assembly Language with 4 XMM accumulators:
------------------------------------------
sum3              = 8390656.00
Elapsed Time      = 1.09 Seconds
Performance Boost = 1187%

Assembly Language with 4 YMM accumulators:
------------------------------------------
sum4              = 8390656.00
Elapsed Time      = 0.75 Seconds
Performance Boost = 1733%


Gunther
Title: Re: AVX for 32-bit Windows applications
Post by: habran on November 02, 2014, 02:41:58 PM
I have found error for vsqrtps, vsqrtpd in JWasm

file instravx.h

line 30
was:
Code: [Select]
avxins (SQRTPD,   vsqrtpd,       P_AVX, VX_L ) /* L, s */
avxins (SQRTPS,   vsqrtps,       P_AVX, VX_L ) /* L, s */
change to:
Code: [Select]
avxins (SQRTPD,   vsqrtpd,       P_AVX, VX_L|VX_NND ) /* L, ns */
avxins (SQRTPS,   vsqrtps,       P_AVX, VX_L|VX_NND ) /* L, ns */
Title: Re: AVX for 32-bit Windows applications
Post by: Gunther on November 03, 2014, 02:04:09 AM
Good catch, habran.  :t

Gunther

Title: Re: AVX for 32-bit Windows applications
Post by: habran on November 03, 2014, 05:40:05 AM
Thanks mate  :biggrin: