News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Save MMX, SSE, AVX registers to memory

Started by JK, July 26, 2020, 06:02:01 AM

Previous topic - Next topic

JK

According to Sysinternals "coreinfo" utility my CPU supports AVX, the debugger shows YMM0 to YMM7 (32 bit) and YMM0 to Ymm15 (64 bit) are available. Why does the following code fail at saving YMM0 to memory? The code assembles, links and runs in 32 and 64 bit, but i keep getting a GPF when saving YMM0 (exception_illegal_instruction).



ifdef _WIN64
  cax equ <rax>
  cbx equ <rbx>
  ccx equ <rcx>
  cdx equ <rdx>
  csi equ <rsi>
  cdi equ <rdi>
  csp equ <rsp>
  cbp equ <rbp>
 
else
  cax equ <eax>
  cbx equ <ebx>
  ccx equ <ecx>
  cdx equ <edx>
  csi equ <esi>
  cdi equ <edi>
  csp equ <esp>
  cbp equ <ebp>
endif


ifdef _WIN64
  option win64:15
  OPTION STACKBASE : RBP 
else
  .686P
  .model flat, stdcall                      ; 32 bit memory model
  .xmm
  OPTION STACKBASE : EBP 
endif



;assemble console 32
;assemble console 64


include <windows.inc>
includelib kernel32.lib


.data
  mem __m256 <>

.code


start proc
;*************************************************************************************
; mmx, sse, avx storage test
;*************************************************************************************
local ttt:__m256


  lea cax, mem
;  lea cax, ttt                                        ;doesn´t work either

  movq [cax], mm0                                     ;mmx

  movdqu [cax], xmm0                                  ;xmm

  vmovdqu [cax], ymm0                                 ;ymm -> gpf (exception_illegal_instruction)

;  vmovdqu32 [cax], ymm0                               ;ymm -> same here


  invoke ExitProcess, 0
  ret

start endp


end start



Thanks


JK

hutch--

I can't really help you with UASM but the obvious would be to try each technique separately to see if it works correctly then test the macro to see if the problem is there.

JK

I have never used AVX, but i want to learn about it. So lesson #1 for me: how to save and load YMM0. The posted code obviously doesn´t make much sense, but it demonstrates my problem: i can save MM0 (MMx Registers), i can save XMM0 (XMMx Registers), but i cannot save YMM0 (YMMx registers).

Running the code fails at "vmovdqu [rax], ymm0" with a GPF (exception_illegal_instruction).

So, what is wrong?

- reading the Intel docs the instruction (vmovdqu) seems to be correct
- the assembler (UASM in this case, but should be the same for MASM) accepts it
- the debugger disassemly shows, what i coded (vmovdqu [rax], ymm0)
- the exception (exception_illegal_instruction) means, my CPU doesn´t know this instruction (why?)
- on the other hand "coreinfo" (Sysinternals proved to be a reliable source for useful tools) tells me, that my CPU supports AVX
- the debugger (x86dbg) shows YMMx registers in the registers area

Questions:

- is this code "vmovdqu [rax], ymm0" for saving YMM0 to a memory location pointed to by RAX correct? (MASM or UASM shouldn´t be different)
- did i miss assembler (MASM, UASM) or linker (MS link.exe) switches, which are necessary for AVX to work on Windows 10?
- could it be, that my laptop CPU (Intel(R) Pentium(R) CPU 4417U @ 2.30GHz)  doesn´t support AXV (despite of what i said above) and how else could i test it?


JK

jj2007

Works just fine with UAsm and AsmC, in 32- and 64-bit mode, but ML64 chokes:

include \Masm32\MasmBasic\Res\JBasic.inc        ; ## console demo, builds in 32- or 64-bit mode with UAsm and AsmC
.DATA
        dd 123
MyMem   OWORD 1234567890ABCDEF1234567890ABCDEFh
        OWORD 1234567890ABCDEF1234567890ABCDEFh
        OWORD 1234567890ABCDEF1234567890ABCDEFh
        OWORD 1234567890ABCDEF1234567890ABCDEFh
.CODE
Init           ; OPT_64 1      ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  mov rax, offset MyMem
  int 3
  vmovdqu ymm0, [rax]
  movq [rax], mm0                                    ;mmx
  movdqu [rax], xmm0                                  ;xmm
  vmovdqu [rax], ymm0                                ;ymm -> gpf (exception_illegal_instruction)
  Inkey "all is fine"
EndOfCode

Siekmanski

Creative coders use backward thinking techniques as a strategy.

jj2007

Try this with ML and UAsm/AsmC:
  mov rax, offset MyMem
  vmovdqu YMMWORD ptr ymm0, [rax] ; Masm doesn't like it
  vmovdqu YMMWORD ptr [rax], ymm0

hutch--

 :biggrin:

So you have managed to choke MASM again ?  :tongue:

JK

Thanks for your help! "Ymmword ptr" shouldn´t be necessary, because the 2. operand (YMM0) is a register of known size - it doesn´t do any harm on the other side.

@Jochen

running your exe makes the debugger pop up, just like with my code. So we can take two items from the list. The code is ok, it must be something else, probably my CPU or my system don´t support AVX, despite of other information.

Windows 10 does support AVX, so how could i know for sure, if my CPU supports it? 


JK

Siekmanski

Quote from: JK on July 26, 2020, 09:21:16 PM
Thanks for your help! "Ymmword ptr" shouldn´t be necessary, because the 2. operand (YMM0) is a register of known size - it doesn´t do any harm on the other side.
JK

ML64.exe disagrees with you, it wants to know the data type for the destination operand.  :biggrin:
Creative coders use backward thinking techniques as a strategy.

JK


jj2007

Quote from: hutch-- on July 26, 2020, 09:13:28 PM
:biggrin:

So you have managed to choke MASM again ?  :tongue:

Does vmovdqu YMMWORD ptr ymm0, [rax] work with your version ol ML64.exe? If yes, which version do you have?

Quote from: JK on July 26, 2020, 09:21:16 PM
Thanks for your help! "Ymmword ptr" shouldn´t be necessary, because the 2. operand (YMM0) is a register of known size

Agreed. Indeed, UAsm and AsmC don't need the redundant size info.

hutch--

 :biggrin:

    vmovdqu ymm0, YMMWORD PTR [rax]

You have to remember, MASM is an assembler, its not trying to be a compiler.  :tongue:

jj2007

Quote from: hutch-- on July 27, 2020, 01:12:57 AM
:biggrin:

    vmovdqu ymm0, YMMWORD PTR [rax]

You have to remember, MASM is an assembler, its not trying to be a compiler.  :tongue:

Congrats, you got it :biggrin:

Perhaps you can help me with another problem:
.DATA
align 64
buffer LABEL byte
ORG $+30000-1
db ?
.CODE
  mov rax, offset buffer
  fxsave YMMWORD ptr [rax]  ; save FPU and xmm regs (but not ymm)


That works like a charm with UAsm and AsmC but ML says "error A2189:invalid combination with segment alignment : 64" :sad:

Siekmanski

fxsave

I think it must be a 64 bit pointer, pitty YMM is not in the list.
Creative coders use backward thinking techniques as a strategy.

hutch--

 :biggrin:

Don't stretch it too much, I am already suffering from hardware diagnostic burnout followed by win10 configuration burnout and I have just finished dragging through sending a defective mother board back to Shenzhen in China to a vendor who did their best to make it difficult to do.

RE : The original post, I would set up an aligned procedure, allocate locals of the right size and copy the data to them.

If you do these things the right way, you won't have to keep finding ways to choke MASM, its so "friendly" it will tell you when it chokes itself on 0BADCODEh.