News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

problem with avx instruction

Started by markallyn, November 02, 2017, 03:00:53 AM

Previous topic - Next topic

markallyn

Hello,

I've been trying to play with avx instructions and instantly run into a problem I don't understand at all.   The following code simply attempts to load 4 real8 variables into a ymm register.  For whatever reason, the vmovapd instruction assembles and links but doesn't execute.  The program aborts when it hits that line. 

Quote

include \masm32\include64\masm64rt.inc

printf   PROTO   :QWORD, :VARARG

.DATA


frmt1   BYTE "%f",13,10,0
frmt2   BYTE "%d",13,10,0
ALIGN 16
v1   REAL8   1.1, 2.2, 3.3, 4.4
v2   REAL8   5.5, 6.6, 7.7, 8.8

.CONST
sz   EQU   SIZEOF v1
tp   EQU   TYPE   v1
ln   EQU   LENGTHOF v1

.CODE
main   PROC

lea   rax, v1
lea     rdx, v2
invoke   printf, ADDR frmt2, sz
invoke   printf, ADDR frmt2, tp
invoke   printf, ADDR frmt2, ln
vmovapd   ymm0, v1
vmovapd   ymm1, v2

ret
main   ENDP
END

My machine is Win7 Pro and has avx technology support. 

Thanks,
Mark Allyn

aw27

I would suggest align 32 to prevent the exception.  :biggrin:


Since the default .data is PARA aligned it does not support align 32.
You need another section:

data32 segment align(32) ".data"
anydata dword ?
align 32 ; now works
v1   REAL8   1.1, 2.2, 3.3, 4.4
v2   REAL8   5.5, 6.6, 7.7, 8.8
data32 ends

markallyn

Hi aw27,

Thanks for getting back to me.  I should have added in my previous post that running x64dbg on the exe file shows that an "exception access error" occurs at the relevant instruction.

I guess what you're telling me is that because I'm using the 256 bit registers the alignment is on 32 bit boundaries, not 16?

Thanks much.  I haven't seen any documentation on this, but it makes sense.

Mark

markallyn

aw27,

Yup, that fixed the problem.  Can you point me to any documentation that covers this issue.  I work off a book by daniel kusswurm which is pretty comprehensive on 32 and 64 bit x86 code, but I haven't seen anything on this point, and he covers avx pretty thoroughly.  I'll look again, however.

Thanks again,
Mark

aw27

I am not going to search google for you, but the rule is that anything that is 128/256/512- bits wide need to be aligned to a 128/256/512 bit boundary to load store in memory unless the instruction says no need.
Is it clear?

markallyn



jj2007

Does anybody know why vpshufd ymm0, ymm1, 0 crashes?

include \masm32\MasmBasic\MasmBasic.inc

MyArray dd 11111111h, 22222222h, 33333333h, 44444444h
dd 55555555h, 66666666h, 77777777h, 88888888h
dd 99999999h, 0aaaaaaaah, 0bbbbbbbbh, 0cccccccch
dd 0ddddddddh, 0eeeeeeeeh, 0ffffffffh, 12345678h
  Init
  mov esi, offset MyArray
  vmovdqa ymm0, YMMWORD ptr [esi]
  vmovdqa ymm1, YMMWORD ptr [esi+32] ; OK
  deb 4, "Lower XMMWORDs", x:xmm0, x:xmm1
  vpshufd ymm0, ymm1, 0
  PrintLine "it crashed, you won't see this line"
EndOfCode


Output:
Lower XMMWORDs
x:xmm0          44444444 33333333 22222222 11111111
x:xmm1          CCCCCCCC BBBBBBBB AAAAAAAA 99999999

Siekmanski

#8
Does your computer handle AVX-512 instructions?

Else you could use the AVX-256 Permute Operations instructions: vperm2f128 or vperm2i128

vmovaps    ymm0,ymmword ptr[esi]
vperm2f128 ymm0,ymm0,ymm0,1                ; Swap upper and lower 128-bit lanes.
vshufps    ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
vmovaps    ymmword ptr[edi],ymm0           ; Save 8 values in reversed order.
Creative coders use backward thinking techniques as a strategy.

aw27

It appears that AVX2 is enough.



includelib \masm32\lib64\msvcrt.lib
printf proto :ptr, :vararg
includelib \masm32\lib64\kernel32.lib
ExitProcess proto :dword

data32 segment align(32) ".data" alias(".data")
fmt db "I am here",10,0
align 32
MyArray dd 11111111h, 22222222h, 33333333h, 44444444h
dd 55555555h, 66666666h, 77777777h, 88888888h
dd 99999999h, 0aaaaaaaah, 0bbbbbbbbh, 0cccccccch
dd 0ddddddddh, 0eeeeeeeeh, 0ffffffffh, 12345678h
data32 ends

.code

main proc
sub rsp,28h
vmovdqa ymm0, YMMWORD PTR MyArray
vmovdqa ymm1, YMMWORD PTR [MyArray+32]
;vpshufd ymm0, YMMWORD PTR [MyArray+32], 0h
vpshufd ymm0, ymm1, 0
lea rcx, fmt
call printf
mov rcx,0
call ExitProcess

main endp

end

jj2007

Quote from: Siekmanski on November 04, 2017, 04:05:12 PM
vperm2f128 ymm0,ymm0,ymm0,1                ; Swap upper and lower 128-bit lanes.
vshufps    ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.

They both work, thanks :t

@José: You seem to use some specific commandline options and/or includes:vmovaps    ymm0,ymmword ptr[esi]
vperm2f128 ymm0,ymm0,ymm0,1                ; Swap upper and lower 128-bit lanes.
vshufps    ymm0,ymm0,ymm0,Shuffle(0,1,2,3) ; Reverse values in both 128-bit lanes.
vmovaps    ymmword ptr[edi],ymm0           ; Save 8 values in reversed order.

aw27

JJ, that is not my code, it is from Siekmanski.  :icon_eek:

jj2007

Quote from: aw27 on November 04, 2017, 08:17:49 PM
JJ, that is not my code, it is from Siekmanski.  :icon_eek:

Oops, you are right. But your code threw the error messages.

And it seems that my vpshufd crashed simply because it's an illegal instruction for my Core i5 :(

aw27

Quote from: jj2007 on November 04, 2017, 10:16:30 PM
And it seems that my vpshufd crashed simply because it's an illegal instruction for my Core i5 :(
Yeap, my condolences. But you can try the Intel® Software Development Emulator and tell us how it fares. I never did, just curious.


jj2007

Quote from: aw27 on November 04, 2017, 10:27:16 PMyou can try the Intel® Software Development Emulator and tell us how it fares. I never did, just curious.

You should try it, it's only a tiny 20MB download. And it promises a thrilling trial-and-error experience:C:\IntelEmulator\sde-external-8.12.0-2017-10-23-win>sde -- C:\Masm32\MasmBasic\Misc\WinSock\vmovdqa.exe
A: Source\pin\vm_w\syscall_dispatcher_windows.cpp: LEVEL_VM::WIN_SYSCALL_DISPATCHER::InterruptSyscallByException: 949: assertion failed: retAddr == m_
gateFallThroughStub

NO STACK TRACE AVAILABLE
Detach Service Count: 24796
Pin: pin-3.5-97483-e4b3cd5
Copyright (c) 2003-2017, Intel Corporation. All rights reserved.