The MASM Forum

General => The Campus => Topic started by: markallyn on November 02, 2017, 03:00:53 AM

Title: problem with avx instruction
Post by: markallyn on November 02, 2017, 03:00:53 AM
Hello,

I've been trying to play with avx instructions and instantly run into a problem I don't understand at all.   The following code simply attempts to load 4 real8 variables into a ymm register.  For whatever reason, the vmovapd instruction assembles and links but doesn't execute.  The program aborts when it hits that line. 

Quote

include \masm32\include64\masm64rt.inc

printf   PROTO   :QWORD, :VARARG

.DATA


frmt1   BYTE "%f",13,10,0
frmt2   BYTE "%d",13,10,0
ALIGN 16
v1   REAL8   1.1, 2.2, 3.3, 4.4
v2   REAL8   5.5, 6.6, 7.7, 8.8

.CONST
sz   EQU   SIZEOF v1
tp   EQU   TYPE   v1
ln   EQU   LENGTHOF v1

.CODE
main   PROC

lea   rax, v1
lea     rdx, v2
invoke   printf, ADDR frmt2, sz
invoke   printf, ADDR frmt2, tp
invoke   printf, ADDR frmt2, ln
vmovapd   ymm0, v1
vmovapd   ymm1, v2

ret
main   ENDP
END

My machine is Win7 Pro and has avx technology support. 

Thanks,
Mark Allyn
Title: Re: problem with avx instruction
Post by: aw27 on November 02, 2017, 03:18:08 AM
I would suggest align 32 to prevent the exception.  :biggrin:


Since the default .data is PARA aligned it does not support align 32.
You need another section:

data32 segment align(32) ".data"
anydata dword ?
align 32 ; now works
v1   REAL8   1.1, 2.2, 3.3, 4.4
v2   REAL8   5.5, 6.6, 7.7, 8.8
data32 ends
Title: Re: problem with avx instruction
Post by: markallyn on November 02, 2017, 03:41:03 AM
Hi aw27,

Thanks for getting back to me.  I should have added in my previous post that running x64dbg on the exe file shows that an "exception access error" occurs at the relevant instruction.

I guess what you're telling me is that because I'm using the 256 bit registers the alignment is on 32 bit boundaries, not 16?

Thanks much.  I haven't seen any documentation on this, but it makes sense.

Mark
Title: Re: problem with avx instruction
Post by: markallyn on November 02, 2017, 03:51:45 AM
aw27,

Yup, that fixed the problem.  Can you point me to any documentation that covers this issue.  I work off a book by daniel kusswurm which is pretty comprehensive on 32 and 64 bit x86 code, but I haven't seen anything on this point, and he covers avx pretty thoroughly.  I'll look again, however.

Thanks again,
Mark
Title: Re: problem with avx instruction
Post by: aw27 on November 02, 2017, 03:58:57 AM
I am not going to search google for you, but the rule is that anything that is 128/256/512- bits wide need to be aligned to a 128/256/512 bit boundary to load store in memory unless the instruction says no need.
Is it clear?
Title: Re: problem with avx instruction
Post by: markallyn on November 02, 2017, 04:15:34 AM
aw27.

Clear.

Mark
Title: Re: problem with avx instruction
Post by: LiaoMi on November 02, 2017, 11:09:22 PM
Hi markallyn,

maybe it will be useful for you

https://ibb.co/c73Dfw (https://ibb.co/c73Dfw)
https://ibb.co/i3moDG (https://ibb.co/i3moDG)
https://ibb.co/cUeVSb (https://ibb.co/cUeVSb)
https://ibb.co/cNiqSb (https://ibb.co/cNiqSb)

:t
Title: Re: problem with avx instruction
Post by: jj2007 on November 04, 2017, 02:43:47 PM
Does anybody know why vpshufd ymm0, ymm1, 0 crashes?

include \masm32\MasmBasic\MasmBasic.inc

MyArray dd 11111111h, 22222222h, 33333333h, 44444444h
dd 55555555h, 66666666h, 77777777h, 88888888h
dd 99999999h, 0aaaaaaaah, 0bbbbbbbbh, 0cccccccch
dd 0ddddddddh, 0eeeeeeeeh, 0ffffffffh, 12345678h
  Init
  mov esi, offset MyArray
  vmovdqa ymm0, YMMWORD ptr [esi]
  vmovdqa ymm1, YMMWORD ptr [esi+32] ; OK
  deb 4, "Lower XMMWORDs", x:xmm0, x:xmm1
  vpshufd ymm0, ymm1, 0
  PrintLine "it crashed, you won't see this line"
EndOfCode


Output:
Lower XMMWORDs
x:xmm0          44444444 33333333 22222222 11111111
x:xmm1          CCCCCCCC BBBBBBBB AAAAAAAA 99999999
Title: Re: problem with avx instruction
Post by: Siekmanski on November 04, 2017, 04:05:12 PM
Does your computer handle AVX-512 instructions?

Else you could use the AVX-256 Permute Operations instructions: vperm2f128 or vperm2i128

vmovaps    ymm0,ymmword ptr[esi]
vperm2f128 ymm0,ymm0,ymm0,1                ; Swap upper and lower 128-bit lanes.
vshufps    ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
vmovaps    ymmword ptr[edi],ymm0           ; Save 8 values in reversed order.
Title: Re: problem with avx instruction
Post by: aw27 on November 04, 2017, 05:11:02 PM
It appears that AVX2 is enough.



includelib \masm32\lib64\msvcrt.lib
printf proto :ptr, :vararg
includelib \masm32\lib64\kernel32.lib
ExitProcess proto :dword

data32 segment align(32) ".data" alias(".data")
fmt db "I am here",10,0
align 32
MyArray dd 11111111h, 22222222h, 33333333h, 44444444h
dd 55555555h, 66666666h, 77777777h, 88888888h
dd 99999999h, 0aaaaaaaah, 0bbbbbbbbh, 0cccccccch
dd 0ddddddddh, 0eeeeeeeeh, 0ffffffffh, 12345678h
data32 ends

.code

main proc
sub rsp,28h
vmovdqa ymm0, YMMWORD PTR MyArray
vmovdqa ymm1, YMMWORD PTR [MyArray+32]
;vpshufd ymm0, YMMWORD PTR [MyArray+32], 0h
vpshufd ymm0, ymm1, 0
lea rcx, fmt
call printf
mov rcx,0
call ExitProcess

main endp

end
Title: Re: problem with avx instruction
Post by: jj2007 on November 04, 2017, 07:56:55 PM
Quote from: Siekmanski on November 04, 2017, 04:05:12 PM
vperm2f128 ymm0,ymm0,ymm0,1                ; Swap upper and lower 128-bit lanes.
vshufps    ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.

They both work, thanks :t

@José: You seem to use some specific commandline options and/or includes:vmovaps    ymm0,ymmword ptr[esi]
vperm2f128 ymm0,ymm0,ymm0,1                ; Swap upper and lower 128-bit lanes.
vshufps    ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
vmovaps    ymmword ptr[edi],ymm0           ; Save 8 values in reversed order.
Title: Re: problem with avx instruction
Post by: aw27 on November 04, 2017, 08:17:49 PM
JJ, that is not my code, it is from Siekmanski.  :icon_eek:
Title: Re: problem with avx instruction
Post by: jj2007 on November 04, 2017, 10:16:30 PM
Quote from: aw27 on November 04, 2017, 08:17:49 PM
JJ, that is not my code, it is from Siekmanski.  :icon_eek:

Oops, you are right. But your code threw the error messages.

And it seems that my vpshufd crashed simply because it's an illegal instruction for my Core i5 :(
Title: Re: problem with avx instruction
Post by: aw27 on November 04, 2017, 10:27:16 PM
Quote from: jj2007 on November 04, 2017, 10:16:30 PM
And it seems that my vpshufd crashed simply because it's an illegal instruction for my Core i5 :(
Yeap, my condolences. But you can try the Intel® Software Development Emulator (https://software.intel.com/en-us/articles/intel-software-development-emulator/) and tell us how it fares. I never did, just curious.

Title: Re: problem with avx instruction
Post by: jj2007 on November 04, 2017, 11:02:33 PM
Quote from: aw27 on November 04, 2017, 10:27:16 PMyou can try the Intel® Software Development Emulator (https://software.intel.com/en-us/articles/intel-software-development-emulator/) and tell us how it fares. I never did, just curious.

You should try it, it's only a tiny 20MB download. And it promises a thrilling trial-and-error experience:C:\IntelEmulator\sde-external-8.12.0-2017-10-23-win>sde -- C:\Masm32\MasmBasic\Misc\WinSock\vmovdqa.exe
A: Source\pin\vm_w\syscall_dispatcher_windows.cpp: LEVEL_VM::WIN_SYSCALL_DISPATCHER::InterruptSyscallByException: 949: assertion failed: retAddr == m_
gateFallThroughStub

NO STACK TRACE AVAILABLE
Detach Service Count: 24796
Pin: pin-3.5-97483-e4b3cd5
Copyright (c) 2003-2017, Intel Corporation. All rights reserved.
Title: Re: problem with avx instruction
Post by: aw27 on November 04, 2017, 11:19:35 PM
I just tried it, was not expecting it to be so easy, and simply worked on my SandyBridge which does not support AVX2.

sde -- myTest.exe
Title: Re: problem with avx instruction
Post by: Siekmanski on November 04, 2017, 11:57:44 PM
Hi Jochen,

vpshufd = 512 bit.
vshufpd = 256 bit.
Title: Re: problem with avx instruction
Post by: aw27 on November 05, 2017, 12:19:35 AM
Quote from: Siekmanski on November 04, 2017, 11:57:44 PM
Hi Jochen,

vpshufd = 512 bit.
vshufpd = 256 bit.

vpshufd is AVX2 instruction not AVX-512 instruction. AVX-512 added BW, VL, F extensions.
Title: Re: problem with avx instruction
Post by: jj2007 on November 05, 2017, 12:53:19 AM
I got SDE running, but it seems a bit buggy. Olly stops in the middle of nowhere, etc. However, the simple demo manages to go beyond the "illegal instruction" line when e.g. -skl is specified.

But -slm chokes with SDE-ERROR: Executed instruction not valid for specified chip (SILVERMONT): 0x401201: vmovdqa
Title: Re: problem with avx instruction
Post by: Siekmanski on November 05, 2017, 12:59:10 AM
aw27, you are right, vpshufd is avx2 and not a 512 bit instruction.  :icon_redface: I'm awake now.
Title: Re: problem with avx instruction
Post by: aw27 on November 05, 2017, 01:44:47 AM
Quote from: jj2007 on November 05, 2017, 12:53:19 AM
I got SDE running, but it seems a bit buggy. Olly stops in the middle of nowhere, etc. However, the simple demo manages to go beyond the "illegal instruction" line when e.g. -skl is specified.

But -slm chokes with SDE-ERROR: Executed instruction not valid for specified chip (SILVERMONT): 0x401201: vmovdqa

I have not explored much, but is good to know we have something when we have nothing else.
However, I believe it is not compatible with debuggers, because emulators set the processor into single step mode.

"The Silvermont supports the SSE4.2 instruction set, but not AVX and AVX2" - Agner Fog
Title: Re: problem with avx instruction
Post by: aw27 on November 05, 2017, 01:46:40 AM
Quote from: Siekmanski on November 05, 2017, 12:59:10 AM
I'm awake now.

Good morning.  :t
Title: Re: problem with avx instruction
Post by: jj2007 on November 05, 2017, 02:32:49 AM
Quote from: aw27 on November 05, 2017, 01:44:47 AMI believe it is not compatible with debuggers, because emulators set the processor into single step mode.

Once I got it running with Olly, but after a small change somewhere it stopped working. I see your logic but not sure what it means in practice.
Title: Re: problem with avx instruction
Post by: aw27 on November 05, 2017, 03:28:30 AM
It practice may mean that debugging emulated code may depend on facilities provided by the emulator.
I dont know if you have already read the Help or the Manual. :icon_cool: