Print Page - Save MMX, SSE, AVX registers to memory

Title: Save MMX, SSE, AVX registers to memory
Post by: JK on July 26, 2020, 06:02:01 AM

According to Sysinternals "coreinfo" utility my CPU supports AVX, the debugger shows YMM0 to YMM7 (32 bit) and YMM0 to Ymm15 (64 bit) are available. Why does the following code fail at saving YMM0 to memory? The code assembles, links and runs in 32 and 64 bit, but i keep getting a GPF when saving YMM0 (exception_illegal_instruction).

Code Select



ifdef _WIN64
  cax equ <rax>
  cbx equ <rbx>
  ccx equ <rcx>
  cdx equ <rdx>
  csi equ <rsi>
  cdi equ <rdi>
  csp equ <rsp>
  cbp equ <rbp>
  
else
  cax equ <eax>
  cbx equ <ebx>
  ccx equ <ecx>
  cdx equ <edx>
  csi equ <esi>
  cdi equ <edi>
  csp equ <esp>
  cbp equ <ebp>
endif


ifdef _WIN64
  option win64:15
  OPTION STACKBASE : RBP  
else
  .686P
  .model flat, stdcall                      ; 32 bit memory model
  .xmm
  OPTION STACKBASE : EBP  
endif



;assemble console 32 
;assemble console 64


include <windows.inc>
includelib kernel32.lib


.data
  mem __m256 <>

.code


start proc
;*************************************************************************************
; mmx, sse, avx storage test
;*************************************************************************************
local ttt:__m256


  lea cax, mem
;  lea cax, ttt                                        ;doesn´t work either

  movq [cax], mm0                                     ;mmx

  movdqu [cax], xmm0                                  ;xmm

  vmovdqu [cax], ymm0                                 ;ymm -> gpf (exception_illegal_instruction)

;  vmovdqu32 [cax], ymm0                               ;ymm -> same here


  invoke ExitProcess, 0
  ret 

start endp


end start

Thanks

JK

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: hutch-- on July 26, 2020, 08:18:55 AM

I can't really help you with UASM but the obvious would be to try each technique separately to see if it works correctly then test the macro to see if the problem is there.

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: JK on July 26, 2020, 07:13:14 PM

I have never used AVX, but i want to learn about it. So lesson #1 for me: how to save and load YMM0. The posted code obviously doesn´t make much sense, but it demonstrates my problem: i can save MM0 (MMx Registers), i can save XMM0 (XMMx Registers), but i cannot save YMM0 (YMMx registers).

Running the code fails at "vmovdqu [rax], ymm0" with a GPF (exception_illegal_instruction).

So, what is wrong?

- reading the Intel docs the instruction (vmovdqu) seems to be correct
- the assembler (UASM in this case, but should be the same for MASM) accepts it
- the debugger disassemly shows, what i coded (vmovdqu [rax], ymm0)
- the exception (exception_illegal_instruction) means, my CPU doesn´t know this instruction (why?)
- on the other hand "coreinfo" (Sysinternals proved to be a reliable source for useful tools) tells me, that my CPU supports AVX
- the debugger (x86dbg) shows YMMx registers in the registers area

Questions:

- is this code "vmovdqu [rax], ymm0" for saving YMM0 to a memory location pointed to by RAX correct? (MASM or UASM shouldn´t be different)
- did i miss assembler (MASM, UASM) or linker (MS link.exe) switches, which are necessary for AVX to work on Windows 10?
- could it be, that my laptop CPU (Intel(R) Pentium(R) CPU 4417U @ 2.30GHz) doesn´t support AXV (despite of what i said above) and how else could i test it?

JK

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: jj2007 on July 26, 2020, 08:31:02 PM

Works just fine with UAsm and AsmC, in 32- and 64-bit mode, but ML64 chokes:

include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm and AsmC
.DATA
dd 123
MyMem OWORD 1234567890ABCDEF1234567890ABCDEFh
OWORD 1234567890ABCDEF1234567890ABCDEFh
OWORD 1234567890ABCDEF1234567890ABCDEFh
OWORD 1234567890ABCDEF1234567890ABCDEFh
.CODE
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
mov rax, offset MyMem
int 3
vmovdqu ymm0, [rax]
movq [rax], mm0 ;mmx
movdqu [rax], xmm0 ;xmm
vmovdqu [rax], ymm0 ;ymm -> gpf (exception_illegal_instruction)
Inkey "all is fine"
EndOfCode

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: Siekmanski on July 26, 2020, 08:37:06 PM

vmovdqu YMMWORD ptr[rax], ymm0

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: jj2007 on July 26, 2020, 08:40:20 PM

Try this with ML and UAsm/AsmC:

Code Select

  mov rax, offset MyMem
  vmovdqu YMMWORD ptr ymm0, [rax]	; Masm doesn't like it
  vmovdqu YMMWORD ptr [rax], ymm0

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: hutch-- on July 26, 2020, 09:13:28 PM

:biggrin:

So you have managed to choke MASM again ? :tongue:

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: JK on July 26, 2020, 09:21:16 PM

Thanks for your help! "Ymmword ptr" shouldn´t be necessary, because the 2. operand (YMM0) is a register of known size - it doesn´t do any harm on the other side.

@Jochen

running your exe makes the debugger pop up, just like with my code. So we can take two items from the list. The code is ok, it must be something else, probably my CPU or my system don´t support AVX, despite of other information.

Windows 10 does support AVX, so how could i know for sure, if my CPU supports it?

JK

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: Siekmanski on July 26, 2020, 09:40:54 PM

Quote from: JK on July 26, 2020, 09:21:16 PM
Thanks for your help! "Ymmword ptr" shouldn´t be necessary, because the 2. operand (YMM0) is a register of known size - it doesn´t do any harm on the other side.
JK

ML64.exe disagrees with you, it wants to know the data type for the destination operand. :biggrin:

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: JK on July 26, 2020, 09:44:55 PM

Ok- good to know, thanks!

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: jj2007 on July 26, 2020, 10:29:07 PM

Quote from: hutch-- on July 26, 2020, 09:13:28 PM
:biggrin:

So you have managed to choke MASM again ? :tongue:

Does vmovdqu YMMWORD ptr ymm0, [rax] work with your version ol ML64.exe? If yes, which version do you have?

Quote from: JK on July 26, 2020, 09:21:16 PM
Thanks for your help! "Ymmword ptr" shouldn´t be necessary, because the 2. operand (YMM0) is a register of known size

Agreed. Indeed, UAsm and AsmC don't need the redundant size info.

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: hutch-- on July 27, 2020, 01:12:57 AM

:biggrin:

vmovdqu ymm0, YMMWORD PTR [rax]

You have to remember, MASM is an assembler, its not trying to be a compiler. :tongue:

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: jj2007 on July 27, 2020, 03:32:46 AM

Quote from: hutch-- on July 27, 2020, 01:12:57 AM
:biggrin:

vmovdqu ymm0, YMMWORD PTR [rax]

You have to remember, MASM is an assembler, its not trying to be a compiler. :tongue:

Congrats, you got it :biggrin:

Perhaps you can help me with another problem:

Code Select

.DATA
align 64
buffer	LABEL byte
	ORG $+30000-1
	db ?
.CODE
  mov rax, offset buffer
  fxsave YMMWORD ptr [rax]  ; save FPU and xmm regs (but not ymm)

That works like a charm with UAsm and AsmC but ML says "error A2189:invalid combination with segment alignment : 64" :sad:

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: Siekmanski on July 27, 2020, 03:51:55 AM

fxsave

I think it must be a 64 bit pointer, pitty YMM is not in the list.

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: hutch-- on July 27, 2020, 11:32:31 AM

:biggrin:

Don't stretch it too much, I am already suffering from hardware diagnostic burnout followed by win10 configuration burnout and I have just finished dragging through sending a defective mother board back to Shenzhen in China to a vendor who did their best to make it difficult to do.

RE : The original post, I would set up an aligned procedure, allocate locals of the right size and copy the data to them.

If you do these things the right way, you won't have to keep finding ways to choke MASM, its so "friendly" it will tell you when it chokes itself on 0BADCODEh.

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: hutch-- on July 27, 2020, 03:06:49 PM

Here is a quick toy before I have to go back to configuring Win10 on the new box.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include64\masm64rt.inc

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

USING r12
LOCAL psrc :QWORD
LOCAL pdst :QWORD
LOCAL asrc :QWORD
LOCAL adst :QWORD
LOCAL tcnt :QWORD

SaveRegs

memsize equ <1024*1024*1024*4>

HighPriority

mov psrc, alloc(memsize+1024) ; 4 gig + 1k
alignup rax, 512 ; align the memory
mov asrc, rax ; save address in ptr

conout " ptr aligned src ",str$(asrc),lf ; display address

mov pdst, alloc(memsize+1024)
alignup rax, 512
mov adst, rax

conout " ptr aligned dst ",str$(adst),lf

rcall GetTickCount
mov r12, rax

; |||||||||||||||||||||||||||||||||||||||||

rcall aligned_data_copy,asrc,adst,memsize ; call block copy proc

; |||||||||||||||||||||||||||||||||||||||||

rcall GetTickCount
sub rax, r12
mov r12, rax

conout " -----------------------",lf
conout " 4 gig copy in ",str$(r12)," ms",lf ; show milliseconds
conout " -----------------------",lf,lf

mfree psrc ; free memory
mfree pdst

NormalPriority

waitkey
RestoreRegs
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

YMMSTACK

aligned_data_copy proc ;;;; src:QWORD,dst:QWORD,bcnt:QWORD

shr r8, 8 ; div by 256

@@:
vmovdqa ymm0, YMMWORD PTR [rcx]
vmovdqa ymm1, YMMWORD PTR [rcx+32]
vmovdqa ymm2, YMMWORD PTR [rcx+64]
vmovdqa ymm3, YMMWORD PTR [rcx+96]

vmovdqa ymm4, YMMWORD PTR [rcx+128]
vmovdqa ymm5, YMMWORD PTR [rcx+160]
vmovdqa ymm6, YMMWORD PTR [rcx+192]
vmovdqa ymm7, YMMWORD PTR [rcx+224]

vmovdqa YMMWORD PTR [rdx], ymm0
vmovdqa YMMWORD PTR [rdx+32], ymm1
vmovdqa YMMWORD PTR [rdx+64], ymm2
vmovdqa YMMWORD PTR [rdx+96], ymm3

vmovdqa YMMWORD PTR [rdx+128], ymm4
vmovdqa YMMWORD PTR [rdx+160], ymm5
vmovdqa YMMWORD PTR [rdx+192], ymm6
vmovdqa YMMWORD PTR [rdx+224], ymm7

add rcx, 256
add rdx, 256

sub r8, 1
jnz @B

ret

aligned_data_copy endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

Result on my Haswell.

ptr aligned src 5368799744
ptr aligned dst 9663869440
-----------------------
4 gig copy in 2125 ms
-----------------------

Press any key to continue...

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: hutch-- on July 27, 2020, 06:08:05 PM

As I expected, the unroll did not make it any faster but a different instruction choice did.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

YMMSTACK

aligned_data_copy proc

; src = rcx
; dst = rdx
; cnt = r8

shr r8, 5 ; div by 32

@@:
vmovntdqa ymm0, YMMWORD PTR [rcx]
vmovntdq YMMWORD PTR [rdx], ymm0
add rcx, 32
add rdx, 32
sub r8, 1
jnz @B

ret

aligned_data_copy endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: jj2007 on July 27, 2020, 06:50:16 PM

A propos: has anybody tried xsave/xrstor (https://studfile.net/preview/1583057/page:15/)?

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: nidud on July 27, 2020, 11:35:08 PM

deleted

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: JK on July 28, 2020, 09:07:48 PM

Thanks for all your input!

QuoteYou may consider this for the registers.

Sometimes i want to have code, which can be assembled for 32 and 64 bit. And i want to be able to see at first glance, that this is such code. "RAX" obviously must be 64 bit. "EAX" could be both (32 and 64 bit) or 32 bit only - it´s ambiguous in this respect. If i name it "CAX" (as i did) i can tell at once, it is common code (C for common) - just a personal preference.

JK

Title: Re: Save MMX, SSE, AVX registers to memory
Post by: HSE on July 29, 2020, 12:30:33 AM

In ObjAsm (http://masm32.com/board/index.php?board=43.0), a dual 32/64 framework, is used xax, xcx, xsi, xdi, etc

The MASM Forum

General => The Campus => Topic started by: JK on July 26, 2020, 06:02:01 AM