Hello,
I've been trying to play with avx instructions and instantly run into a problem I don't understand at all. The following code simply attempts to load 4 real8 variables into a ymm register. For whatever reason, the vmovapd instruction assembles and links but doesn't execute. The program aborts when it hits that line.
Quote
include \masm32\include64\masm64rt.inc
printf PROTO :QWORD, :VARARG
.DATA
frmt1 BYTE "%f",13,10,0
frmt2 BYTE "%d",13,10,0
ALIGN 16
v1 REAL8 1.1, 2.2, 3.3, 4.4
v2 REAL8 5.5, 6.6, 7.7, 8.8
.CONST
sz EQU SIZEOF v1
tp EQU TYPE v1
ln EQU LENGTHOF v1
.CODE
main PROC
lea rax, v1
lea rdx, v2
invoke printf, ADDR frmt2, sz
invoke printf, ADDR frmt2, tp
invoke printf, ADDR frmt2, ln
vmovapd ymm0, v1
vmovapd ymm1, v2
ret
main ENDP
END
My machine is Win7 Pro and has avx technology support.
Thanks,
Mark Allyn
I would suggest align 32 to prevent the exception. :biggrin:
Since the default .data is PARA aligned it does not support align 32.
You need another section:
data32 segment align(32) ".data"
anydata dword ?
align 32 ; now works
v1 REAL8 1.1, 2.2, 3.3, 4.4
v2 REAL8 5.5, 6.6, 7.7, 8.8
data32 ends
Hi aw27,
Thanks for getting back to me. I should have added in my previous post that running x64dbg on the exe file shows that an "exception access error" occurs at the relevant instruction.
I guess what you're telling me is that because I'm using the 256 bit registers the alignment is on 32 bit boundaries, not 16?
Thanks much. I haven't seen any documentation on this, but it makes sense.
Mark
aw27,
Yup, that fixed the problem. Can you point me to any documentation that covers this issue. I work off a book by daniel kusswurm which is pretty comprehensive on 32 and 64 bit x86 code, but I haven't seen anything on this point, and he covers avx pretty thoroughly. I'll look again, however.
Thanks again,
Mark
I am not going to search google for you, but the rule is that anything that is 128/256/512- bits wide need to be aligned to a 128/256/512 bit boundary to load store in memory unless the instruction says no need.
Is it clear?
aw27.
Clear.
Mark
Hi markallyn,
maybe it will be useful for you
https://ibb.co/c73Dfw (https://ibb.co/c73Dfw)
https://ibb.co/i3moDG (https://ibb.co/i3moDG)
https://ibb.co/cUeVSb (https://ibb.co/cUeVSb)
https://ibb.co/cNiqSb (https://ibb.co/cNiqSb)
:t
Does anybody know why vpshufd ymm0, ymm1, 0 crashes?
include \masm32\MasmBasic\MasmBasic.inc
MyArray dd 11111111h, 22222222h, 33333333h, 44444444h
dd 55555555h, 66666666h, 77777777h, 88888888h
dd 99999999h, 0aaaaaaaah, 0bbbbbbbbh, 0cccccccch
dd 0ddddddddh, 0eeeeeeeeh, 0ffffffffh, 12345678h
Init
mov esi, offset MyArray
vmovdqa ymm0, YMMWORD ptr [esi]
vmovdqa ymm1, YMMWORD ptr [esi+32] ; OK
deb 4, "Lower XMMWORDs", x:xmm0, x:xmm1
vpshufd ymm0, ymm1, 0
PrintLine "it crashed, you won't see this line"
EndOfCode
Output:
Lower XMMWORDs
x:xmm0 44444444 33333333 22222222 11111111
x:xmm1 CCCCCCCC BBBBBBBB AAAAAAAA 99999999
Does your computer handle AVX-512 instructions?
Else you could use the AVX-256 Permute Operations instructions: vperm2f128 or vperm2i128
vmovaps ymm0,ymmword ptr[esi]
vperm2f128 ymm0,ymm0,ymm0,1 ; Swap upper and lower 128-bit lanes.
vshufps ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
vmovaps ymmword ptr[edi],ymm0 ; Save 8 values in reversed order.
It appears that AVX2 is enough.
includelib \masm32\lib64\msvcrt.lib
printf proto :ptr, :vararg
includelib \masm32\lib64\kernel32.lib
ExitProcess proto :dword
data32 segment align(32) ".data" alias(".data")
fmt db "I am here",10,0
align 32
MyArray dd 11111111h, 22222222h, 33333333h, 44444444h
dd 55555555h, 66666666h, 77777777h, 88888888h
dd 99999999h, 0aaaaaaaah, 0bbbbbbbbh, 0cccccccch
dd 0ddddddddh, 0eeeeeeeeh, 0ffffffffh, 12345678h
data32 ends
.code
main proc
sub rsp,28h
vmovdqa ymm0, YMMWORD PTR MyArray
vmovdqa ymm1, YMMWORD PTR [MyArray+32]
;vpshufd ymm0, YMMWORD PTR [MyArray+32], 0h
vpshufd ymm0, ymm1, 0
lea rcx, fmt
call printf
mov rcx,0
call ExitProcess
main endp
end
Quote from: Siekmanski on November 04, 2017, 04:05:12 PM
vperm2f128 ymm0,ymm0,ymm0,1 ; Swap upper and lower 128-bit lanes.
vshufps ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
They both work, thanks :t
@José: You seem to use some specific commandline options and/or includes:
vmovaps ymm0,ymmword ptr[esi]
vperm2f128 ymm0,ymm0,ymm0,1 ; Swap upper and lower 128-bit lanes.
vshufps ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
vmovaps ymmword ptr[edi],ymm0 ; Save 8 values in reversed order.
JJ, that is not my code, it is from Siekmanski. :icon_eek:
Quote from: aw27 on November 04, 2017, 08:17:49 PM
JJ, that is not my code, it is from Siekmanski. :icon_eek:
Oops, you are right. But your code threw the error messages.
And it seems that my vpshufd crashed simply because it's an illegal instruction for my Core i5 :(
Quote from: jj2007 on November 04, 2017, 10:16:30 PM
And it seems that my vpshufd crashed simply because it's an illegal instruction for my Core i5 :(
Yeap, my condolences. But you can try the Intel® Software Development Emulator (https://software.intel.com/en-us/articles/intel-software-development-emulator/) and tell us how it fares. I never did, just curious.
Quote from: aw27 on November 04, 2017, 10:27:16 PMyou can try the Intel® Software Development Emulator (https://software.intel.com/en-us/articles/intel-software-development-emulator/) and tell us how it fares. I never did, just curious.
You should try it, it's only a tiny 20MB download. And it promises a thrilling trial-and-error experience:
C:\IntelEmulator\sde-external-8.12.0-2017-10-23-win>sde -- C:\Masm32\MasmBasic\Misc\WinSock\vmovdqa.exe
A: Source\pin\vm_w\syscall_dispatcher_windows.cpp: LEVEL_VM::WIN_SYSCALL_DISPATCHER::InterruptSyscallByException: 949: assertion failed: retAddr == m_
gateFallThroughStub
NO STACK TRACE AVAILABLE
Detach Service Count: 24796
Pin: pin-3.5-97483-e4b3cd5
Copyright (c) 2003-2017, Intel Corporation. All rights reserved.
I just tried it, was not expecting it to be so easy, and simply worked on my SandyBridge which does not support AVX2.
sde -- myTest.exe
Hi Jochen,
vpshufd = 512 bit.
vshufpd = 256 bit.
Quote from: Siekmanski on November 04, 2017, 11:57:44 PM
Hi Jochen,
vpshufd = 512 bit.
vshufpd = 256 bit.
vpshufd is AVX2 instruction not AVX-512 instruction. AVX-512 added BW, VL, F extensions.
I got SDE running, but it seems a bit buggy. Olly stops in the middle of nowhere, etc. However, the simple demo manages to go beyond the "illegal instruction" line when e.g. -skl is specified.
But -slm chokes with SDE-ERROR: Executed instruction not valid for specified chip (SILVERMONT): 0x401201: vmovdqa
aw27, you are right, vpshufd is avx2 and not a 512 bit instruction. :icon_redface: I'm awake now.
Quote from: jj2007 on November 05, 2017, 12:53:19 AM
I got SDE running, but it seems a bit buggy. Olly stops in the middle of nowhere, etc. However, the simple demo manages to go beyond the "illegal instruction" line when e.g. -skl is specified.
But -slm chokes with SDE-ERROR: Executed instruction not valid for specified chip (SILVERMONT): 0x401201: vmovdqa
I have not explored much, but is good to know we have something when we have nothing else.
However, I believe it is not compatible with debuggers, because emulators set the processor into single step mode.
"The Silvermont supports the SSE4.2 instruction set, but not AVX and AVX2" - Agner Fog
Quote from: aw27 on November 05, 2017, 01:44:47 AMI believe it is not compatible with debuggers, because emulators set the processor into single step mode.
Once I got it running with Olly, but after a small change somewhere it stopped working. I see your logic but not sure what it means in practice.
It practice may mean that debugging emulated code may depend on facilities provided by the emulator.
I dont know if you have already read the Help or the Manual. :icon_cool: