Guys,
Seems I am having an issue with GetThreadContext returning error code 0x3E6:
ERROR_NOACCESS998 (0x3E6)
Invalid access to memory location.
I can't seem to figure out why this is being caused:
1) I am retrieving the threadid.
2) Than I am opening a handle to it using (eax is the thread identifier):
invoke OpenThread, THREAD_SET_CONTEXT | THREAD_GET_CONTEXT | THREAD_QUERY_INFORMATION | THREAD_SUSPEND_RESUME, NULL, eax
mov [MainThreadIdHandle], eax
this call is made succesfully
3) I than suspend the process using:
invoke SuspendThread, [MainThreadIdHandle]
4) I than try to retrieve the context of the process by using:
mov [ctx.ContextFlags], CONTEXT_AMD64 | CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_FLOATING_POINT
invoke GetThreadContext, [MainThreadIdHandle], offset ctx
It always returns with errorcode 3E6 on all of my running x64 processes.
Any ideas?
HANDLEs are 64-bit in x64, so I think you should store rax rather than eax in MainThreadHandle. Otherwise the high dword of it will be some random garbage when you pass it to GetThreadContext.
I've changed the code as you suggested, saving rax (64 bit handle) instead of just the lower dword of it.
The end result is still the same, a very weird situation.
Quote from: FlySky on March 09, 2015, 02:24:23 AMSeems I am having an issue with GetThreadContext returning error code 0x3E6
The return type is BOOL.
Quote from: msdnIf the function succeeds, the return value is nonzero.
Sorry guys,
I expressed myself wrong.
The return value is 0 meaning failure and calling GetLastError shows me the error code 3E6.
Maybe it's a problem with my header file where it tries to read more bytes from the CONTEXT than it is allowed and
maybe that causes the Invalid access to memory location.
Will keep you guys informed.
It looks like the CONTEXT structure must start at a 16-bit boundary, otherwise the call fails.
what does that mean Yuri, in relation to Donkeys header file:
The context structure is defined in winnt.h:
CONTEXT STRUCT
//
// Register parameter home addresses.
//
// N.B. These fields are for convience - they could be used to extend the
// context record in the future.
//
P1Home DQ
P2Home DQ
P3Home DQ
P4Home DQ
P5Home DQ
P6Home DQ
//
// Control flags.
//
ContextFlags DD
MxCsr DD
//
// Segment Registers and processor flags.
//
SegCs DW
SegDs DW
SegEs DW
SegFs DW
SegGs DW
SegSs DW
EFlags DD
//
// Debug registers
//
Dr0 DQ
Dr1 DQ
Dr2 DQ
Dr3 DQ
Dr6 DQ
Dr7 DQ
//
// Integer registers.
//
Rax DQ
Rcx DQ
Rdx DQ
Rbx DQ
Rsp DQ
Rbp DQ
Rsi DQ
Rdi DQ
R8 DQ
R9 DQ
R10 DQ
R11 DQ
R12 DQ
R13 DQ
R14 DQ
R15 DQ
//
// Program counter.
//
Rip DQ
//
// Floating point state.
//
UNION
FltSave XMM_SAVE_AREA32
STRUCT
Header DB 16*2 DUP ; M128A
Legacy DB 16*8 DUP ; M128A
Xmm0 M128A
Xmm1 M128A
Xmm2 M128A
Xmm3 M128A
Xmm4 M128A
Xmm5 M128A
Xmm6 M128A
Xmm7 M128A
Xmm8 M128A
Xmm9 M128A
Xmm10 M128A
Xmm11 M128A
Xmm12 M128A
Xmm13 M128A
Xmm14 M128A
Xmm15 M128A
ENDS
ENDUNION
//
// Vector registers.
//
VectorRegister DB 16*26 DUP ; M128A
VectorControl DQ
//
// Special debug control registers.
//
DebugControl DQ
LastBranchToRip DQ
LastBranchFromRip DQ
LastExceptionToRip DQ
LastExceptionFromRip DQ
ENDS
i don't know what the syntax is for GoAsm
but, for Masm....
ALIGN 16
ctxt CONTEXT <>
by the way, nice catch, Yuri :t
Thanks, Dave. :icon_cool:
The syntax for GoAsm is the same. There is nothing to change in the headers, FlySky, only align the structure definition in your source code.
Not sure if it's relevant (source (http://en.wikipedia.org/wiki/WoW64#Application_compatibility)):
QuoteA bug in the translation layer of the x64 version of WoW64[1][2] also renders all 32-bit applications that rely on the Windows API function GetThreadContext incompatible. Such applications include application debuggers, call stack tracers (e.g. IDEs displaying call stack) and applications that use garbage collection (GC) engines
Thanks for all the replies.
You seem to have nailed it perfectly Yuri :t.
aligning the structure definition fixed the issue!.
Quote from: jj2007 on March 12, 2015, 04:10:11 AM
Not sure if it's relevant (source (http://en.wikipedia.org/wiki/WoW64#Application_compatibility)):
QuoteA bug in the translation layer of the x64 version of WoW64[1][2] also renders all 32-bit applications that rely on the Windows API function GetThreadContext incompatible. Such applications include application debuggers, call stack tracers (e.g. IDEs displaying call stack) and applications that use garbage collection (GC) engines
It's not relevant here (there the call to GTC returns wrong contents, not fails, as it was described in further references pointet at wikipedia), but it's very useful info. Thank you for pointing that out, Jochen :t
Btw, in one of the blogs referenced there at wikipedia, there is a a post http://zachsaw.blogspot.com/2010/11/fast-memcpy-for-large-blocks.html - memcopy, that post refers to the other link, but there are "no such file" - so the code isn't available.
Thank you, Alex
That's an interesting and helpful link !
Quote from: Antariy on March 12, 2015, 06:28:49 AM
...
Btw, in one of the blogs referenced there at wikipedia, there is a a post http://zachsaw.blogspot.com/2010/11/fast-memcpy-for-large-blocks.html - memcopy, that post refers to the other link, but there are "no such file" - so the code isn't available.
Here it is:
/*
Copyright(C) 2006, William Chan
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1) Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2) Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3) Redistributions of source code must be provided at free of charge.
4) Redistributions in binary forms must be provided at free of charge.
5) Redistributions of source code within another distribution must be
provided at free of charge including the distribution which is
redistributing the source code. Also, the distribution which is
redistributing the source code must have its source code
redistributed as well.
6) Redistribution of binary forms within another distribution must be
provided at free of charge.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
void X_aligned_memcpy_sse2(void* dest, const void* src, const unsigned long size_t)
{
__asm
{
mov esi, src; //src pointer
mov edi, dest; //dest pointer
mov ebx, size_t; //ebx is our counter
shr ebx, 7; //divide by 128 (8 * 128bit registers)
loop_copy:
prefetchnta 128[ESI]; //SSE2 prefetch
prefetchnta 160[ESI];
prefetchnta 192[ESI];
prefetchnta 224[ESI];
movdqa xmm0, 0[ESI]; //move data from src to registers
movdqa xmm1, 16[ESI];
movdqa xmm2, 32[ESI];
movdqa xmm3, 48[ESI];
movdqa xmm4, 64[ESI];
movdqa xmm5, 80[ESI];
movdqa xmm6, 96[ESI];
movdqa xmm7, 112[ESI];
movntdq 0[EDI], xmm0; //move data from registers to dest
movntdq 16[EDI], xmm1;
movntdq 32[EDI], xmm2;
movntdq 48[EDI], xmm3;
movntdq 64[EDI], xmm4;
movntdq 80[EDI], xmm5;
movntdq 96[EDI], xmm6;
movntdq 112[EDI], xmm7;
add esi, 128;
add edi, 128;
dec ebx;
jnz loop_copy; //loop please
loop_copy_end:
}
}
I'll be darned - this is exactly the idea we've been beating to death in the laboratory (http://masm32.com/board/index.php?topic=4067.0)! I'd say William Chan stole it from me, except he's 9 years prior, so it would be a hard sell. But this algo has major drawbacks,
Quote from: Zach sawNote though, that you'll need to give it 16-byte aligned memory and it copies in 128-byte blocks.
Also prefetchnta seems useless, and movntdq worse-than-useless on my modern machine. Admittedly only tried them once but also saw ref's saying the same thing, that modern processors don't get much from them. (Of course u can't trust ref's)
I find that incrementing edi and esi midway through the list of mov's is better. Keeps the max offset down to 30h, no reason it should make a difference, but seems to help. And of course u should dec ebx long b4 the jnz branch, maximizes processor's ability to predict branch correctly in advance. Minor points, of course; see laboratory thread for a couple dozen more if interested
Dunno what this is doing here, would be more relevant over in the laboratory, but it was such a surprise to see it I had to comment.
BTW Yuri dedndave's right re GetThreadContext 16 bit alignment, nice catch
The structure (external) alignment is only one of the potential problems. For the compiled structure to work correctly its internal layout must be precisely as the Microsoft compilers would lay it out. No problem for a good compiler, but...
Below is the output for this source:
#include <windows.h>
#include <stdio.h>
#include <stddef.h>
int __cdecl main(void)
{
CONTEXT context;
printf("sizeof(CONTEXT) \t%I64d\n", sizeof(CONTEXT));
printf("__alignof(context) \t%I64d\n\n", __alignof(context));
printf("offsetof(p1Home) \t%I64d\n", offsetof(CONTEXT,P1Home));
printf("offsetof(p2Home) \t%I64d\n", offsetof(CONTEXT,P2Home));
printf("offsetof(p3Home) \t%I64d\n", offsetof(CONTEXT,P3Home));
printf("offsetof(p4Home) \t%I64d\n", offsetof(CONTEXT,P4Home));
printf("offsetof(p5Home) \t%I64d\n", offsetof(CONTEXT,P5Home));
printf("offsetof(p6Home) \t%I64d\n", offsetof(CONTEXT,P6Home));
printf("offsetof(p6Home) \t%I64d\n", offsetof(CONTEXT,P6Home));
printf("offsetof(ContextFlags) \t%I64d\n", offsetof(CONTEXT,ContextFlags));
printf("offsetof(MxCsr) \t%I64d\n", offsetof(CONTEXT,MxCsr));
printf("offsetof(SegCs) \t%I64d\n", offsetof(CONTEXT,SegCs));
printf("offsetof(SegDs) \t%I64d\n", offsetof(CONTEXT,SegDs));
printf("offsetof(SegEs) \t%I64d\n", offsetof(CONTEXT,SegEs));
printf("offsetof(SegFs) \t%I64d\n", offsetof(CONTEXT,SegFs));
printf("offsetof(SegGs) \t%I64d\n", offsetof(CONTEXT,SegGs));
printf("offsetof(SegSs) \t%I64d\n", offsetof(CONTEXT,SegSs));
printf("offsetof(EFlag)s \t%I64d\n", offsetof(CONTEXT,EFlags));
printf("offsetof(Dr0) \t%I64d\n", offsetof(CONTEXT,Dr0));
printf("offsetof(Dr1) \t%I64d\n", offsetof(CONTEXT,Dr1));
printf("offsetof(Dr2) \t%I64d\n", offsetof(CONTEXT,Dr2));
printf("offsetof(Dr3) \t%I64d\n", offsetof(CONTEXT,Dr3));
printf("offsetof(Dr6) \t%I64d\n", offsetof(CONTEXT,Dr6));
printf("offsetof(Dr7) \t%I64d\n", offsetof(CONTEXT,Dr7));
printf("offsetof(Rax) \t%I64d\n", offsetof(CONTEXT,Rax));
printf("offsetof(Rcx) \t%I64d\n", offsetof(CONTEXT,Rcx));
printf("offsetof(Rdx) \t%I64d\n", offsetof(CONTEXT,Rdx));
printf("offsetof(Rbx) \t%I64d\n", offsetof(CONTEXT,Rbx));
printf("offsetof(Rsp) \t%I64d\n", offsetof(CONTEXT,Rsp));
printf("offsetof(Rbp) \t%I64d\n", offsetof(CONTEXT,Rbp));
printf("offsetof(Rsi) \t%I64d\n", offsetof(CONTEXT,Rsi));
printf("offsetof(Rdi) \t%I64d\n", offsetof(CONTEXT,Rdi));
printf("offsetof(R8) \t%I64d\n", offsetof(CONTEXT,R8));
printf("offsetof(R9) \t%I64d\n", offsetof(CONTEXT,R9));
printf("offsetof(R10) \t%I64d\n", offsetof(CONTEXT,R10));
printf("offsetof(R11) \t%I64d\n", offsetof(CONTEXT,R11));
printf("offsetof(R12) \t%I64d\n", offsetof(CONTEXT,R12));
printf("offsetof(R13) \t%I64d\n", offsetof(CONTEXT,R13));
printf("offsetof(R14) \t%I64d\n", offsetof(CONTEXT,R14));
printf("offsetof(R15) \t%I64d\n", offsetof(CONTEXT,R15));
printf("offsetof(Rip) \t%I64d\n\n", offsetof(CONTEXT,Rip));
printf("sizeof(M128A) \t%I64d\n", sizeof(M128A));
printf("__alignof(FltSave) \t%I64d\n", __alignof(context.FltSave));
printf("sizeof(FltSave) \t%I64d\n", sizeof(context.FltSave));
printf("sizeof(DUMMYSTRUCTNAME) \t%I64d\n\n", sizeof(context.Header) +
sizeof(context.Legacy) +
sizeof(context.Xmm0) +
sizeof(context.Xmm1) +
sizeof(context.Xmm2) +
sizeof(context.Xmm3) +
sizeof(context.Xmm4) +
sizeof(context.Xmm5) +
sizeof(context.Xmm6) +
sizeof(context.Xmm7) +
sizeof(context.Xmm8) +
sizeof(context.Xmm9) +
sizeof(context.Xmm10) +
sizeof(context.Xmm11) +
sizeof(context.Xmm12) +
sizeof(context.Xmm13) +
sizeof(context.Xmm14) +
sizeof(context.Xmm15));
printf("offsetof(FltSave) \t%I64d\n", offsetof(CONTEXT,FltSave));
printf("offsetof(Header[2]) \t%I64d\n", offsetof(CONTEXT,Header));
printf("offsetof(Legacy[8]) \t%I64d\n", offsetof(CONTEXT,Legacy));
printf("offsetof(Xmm0) \t%I64d\n", offsetof(CONTEXT,Xmm0));
printf("offsetof(Xmm1) \t%I64d\n", offsetof(CONTEXT,Xmm1));
printf("offsetof(Xmm2) \t%I64d\n", offsetof(CONTEXT,Xmm2));
printf("...\n");
printf("offsetof(Xmm15) \t%I64d\n", offsetof(CONTEXT,Xmm15));
printf("offsetof(VectorRegister) \t%I64d\n", offsetof(CONTEXT,VectorRegister));
printf("offsetof(VectorControl) \t%I64d\n", offsetof(CONTEXT,VectorControl));
printf("...\n");
printf("offsetof(LastExceptionFromRip) \t%I64d\n\n", offsetof(CONTEXT,LastExceptionFromRip));
return 0;
}
/* THESE FROM WINNT.H:
typedef struct DECLSPEC_ALIGN(16) _M128A {
ULONGLONG Low;
LONGLONG High;
} M128A, *PM128A;
typedef struct DECLSPEC_ALIGN(16) _XSAVE_FORMAT {
WORD ControlWord;
WORD StatusWord;
BYTE TagWord;
BYTE Reserved1;
WORD ErrorOpcode;
DWORD ErrorOffset;
WORD ErrorSelector;
WORD Reserved2;
DWORD DataOffset;
WORD DataSelector;
WORD Reserved3;
DWORD MxCsr;
DWORD MxCsr_Mask;
M128A FloatRegisters[8];
#ifdef _WIN64
M128A XmmRegisters[16];
BYTE Reserved4[96];
#else
M128A XmmRegisters[8];
BYTE Reserved4[192];
DWORD StackControl[7];
DWORD Cr0NpxState;
#endif
} XSAVE_FORMAT, *PXSAVE_FORMAT;
typedef struct DECLSPEC_ALIGN (16) _CONTEXT {
DWORD64 P1Home;
DWORD64 P2Home;
DWORD64 P3Home;
DWORD64 P4Home;
DWORD64 P5Home;
DWORD64 P6Home;
DWORD ContextFlags;
DWORD MxCsr;
WORD SegCs;
WORD SegDs;
WORD SegEs;
WORD SegFs;
WORD SegGs;
WORD SegSs;
DWORD EFlags;
DWORD64 Dr0;
DWORD64 Dr1;
DWORD64 Dr2;
DWORD64 Dr3;
DWORD64 Dr6;
DWORD64 Dr7;
DWORD64 Rax;
DWORD64 Rcx;
DWORD64 Rdx;
DWORD64 Rbx;
DWORD64 Rsp;
DWORD64 Rbp;
DWORD64 Rsi;
DWORD64 Rdi;
DWORD64 R8;
DWORD64 R9;
DWORD64 R10;
DWORD64 R11;
DWORD64 R12;
DWORD64 R13;
DWORD64 R14;
DWORD64 R15;
DWORD64 Rip;
union {
XMM_SAVE_AREA32 FltSave;
struct {
M128A Header[2];
M128A Legacy[8];
M128A Xmm0;
M128A Xmm1;
M128A Xmm2;
M128A Xmm3;
M128A Xmm4;
M128A Xmm5;
M128A Xmm6;
M128A Xmm7;
M128A Xmm8;
M128A Xmm9;
M128A Xmm10;
M128A Xmm11;
M128A Xmm12;
M128A Xmm13;
M128A Xmm14;
M128A Xmm15;
} DUMMYSTRUCTNAME;
} DUMMYUNIONNAME;
M128A VectorRegister[26];
DWORD64 VectorControl;
DWORD64 DebugControl;
DWORD64 LastBranchToRip;
DWORD64 LastBranchFromRip;
DWORD64 LastExceptionToRip;
DWORD64 LastExceptionFromRip;
} CONTEXT, *PCONTEXT;
*/
Compiled to a 64-bit app with Pelles C Version 8.00.33 Release Candidate #7 (Win64):
sizeof(CONTEXT) 1232
__alignof(context) 16
offsetof(p1Home) 0
offsetof(p2Home) 8
offsetof(p3Home) 16
offsetof(p4Home) 24
offsetof(p5Home) 32
offsetof(p6Home) 40
offsetof(p6Home) 40
offsetof(ContextFlags) 48
offsetof(MxCsr) 52
offsetof(SegCs) 56
offsetof(SegDs) 58
offsetof(SegEs) 60
offsetof(SegFs) 62
offsetof(SegGs) 64
offsetof(SegSs) 66
offsetof(EFlag)s 68
offsetof(Dr0) 72
offsetof(Dr1) 80
offsetof(Dr2) 88
offsetof(Dr3) 96
offsetof(Dr6) 104
offsetof(Dr7) 112
offsetof(Rax) 120
offsetof(Rcx) 128
offsetof(Rdx) 136
offsetof(Rbx) 144
offsetof(Rsp) 152
offsetof(Rbp) 160
offsetof(Rsi) 168
offsetof(Rdi) 176
offsetof(R8) 184
offsetof(R9) 192
offsetof(R10) 200
offsetof(R11) 208
offsetof(R12) 216
offsetof(R13) 224
offsetof(R14) 232
offsetof(R15) 240
offsetof(Rip) 248
sizeof(M128A) 16
__alignof(FltSave) 16
sizeof(FltSave) 512
sizeof(DUMMYSTRUCTNAME) 416
offsetof(FltSave) 256
offsetof(Header[2]) 256
offsetof(Legacy[8]) 288
offsetof(Xmm0) 416
offsetof(Xmm1) 432
offsetof(Xmm2) 448
...
offsetof(Xmm15) 656
offsetof(VectorRegister) 768
offsetof(VectorControl) 1184
...
offsetof(LastExceptionFromRip) 1224
Quote from: vertograd on March 12, 2015, 06:53:16 AM
Thank you, Alex
That's an interesting and helpful link !
Here it is:
:t
And thank you for finding the source! :biggrin:
The description in the blog was attracting - so I thought that it maybe probably some unique technique, so it was interesting to see the source, but it was unavailable from the link pointed in the blog. So now we can see that it uses pretty straight, a bit "oversimplyfied" way, but the seeing of code is a good thing - now we know WHAT exactly that code is, without code there were thoughts possible "which algo it was? maybe it is some revolutionary thing?", but now we see things and see that the technics used are more or less well known to some of the members, but, still this code maybe useful to anyone as it is simple and straightforward, not entangled with advanced techniques etc.
Quote from: rrr314159 on March 12, 2015, 01:13:22 PM
I'll be darned - this is exactly the idea we've been beating to death in the laboratory (http://masm32.com/board/index.php?topic=4067.0)! I'd say William Chan stole it from me, except he's 9 years prior, so it would be a hard sell. But this algo has major drawbacks,
Quote from: Zach sawNote though, that you'll need to give it 16-byte aligned memory and it copies in 128-byte blocks.
Also prefetchnta seems useless, and movntdq worse-than-useless on my modern machine. Admittedly only tried them once but also saw ref's saying the same thing, that modern processors don't get much from them. (Of course u can't trust ref's)
Yes, actually it was not so useful as it was told (you know, the "loud words" on the "technology advances" are usually much part just a words with a little true) even on not very modern hardware.
Quote from: rrr314159 on March 12, 2015, 01:13:22 PM
I find that incrementing edi and esi midway through the list of mov's is better. Keeps the max offset down to 30h, no reason it should make a difference, but seems to help. And of course u should dec ebx long b4 the jnz branch, maximizes processor's ability to predict branch correctly in advance. Minor points, of course; see laboratory thread for a couple dozen more if interested
Not too big offset has the influence on timing, yes, thought there is not "obvious reason", but it does so.
Much more big point: the code doesn't support a "precise copying" - it copies just in the 128 bytes blocks and doesn't support the precise tails copying that less than 128 bytes. Very simple code.
Quote from: rrr314159 on March 12, 2015, 01:13:22 PM
Dunno what this is doing here, would be more relevant over in the laboratory, but it was such a surprise to see it I had to comment.
It was on the blog which was pointed as a reference on the wikipedia's article, which was pointed by Jochen, probably it should be clear from the posts above. And, being a "Real Lazy Coder" (TM), I did not bother to point that link in the thread with memcopy as it was not open in the browser. I did read it earlier (did not posted there as tend to agree with Hutch's and Jochen's point of view on that subject), too, so knew about that topic on the forum going at the time, so that's why pointed the link to some "unknown memcopy" algo here.
Something you must do with some values when you are using /LARGEADDRESSAWARE is to write the value to a 64 bit register then write the register to the 64 bit variable. It is not an assembler issue but part of the Win64 ABI.
Quote from: hutch-- on December 31, 2018, 06:05:20 PM
Something you must do with some values when you are using /LARGEADDRESSAWARE is to write the value to a 64 bit register then write the register to the 64 bit variable. It is not an assembler issue but part of the Win64 ABI.
Reply #16 on: March 13, 2015, 05:59:04 AM ยป :biggrin:
:biggrin:
Strangely enough I read ordinary English with no problems. :P