The MASM Forum

Projects => Rarely Used Projects => GoAsm => Topic started by: FlySky on March 09, 2015, 02:24:23 AM

Title: GetThreadContext (64 bit)
Post by: FlySky on March 09, 2015, 02:24:23 AM
Guys,

Seems I am having an issue with GetThreadContext returning error code 0x3E6:

ERROR_NOACCESS998 (0x3E6)
Invalid access to memory location.

I can't seem to figure out why this is being caused:

1) I am retrieving the threadid.

2) Than I am opening a handle to it using (eax is the thread identifier):
invoke OpenThread, THREAD_SET_CONTEXT | THREAD_GET_CONTEXT | THREAD_QUERY_INFORMATION | THREAD_SUSPEND_RESUME, NULL, eax
mov [MainThreadIdHandle], eax
this call is made succesfully

3) I than suspend the process using:
invoke SuspendThread, [MainThreadIdHandle]

4) I than try to retrieve the context of the process by using:
mov [ctx.ContextFlags], CONTEXT_AMD64 | CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_FLOATING_POINT
invoke GetThreadContext, [MainThreadIdHandle], offset ctx

It always returns with errorcode 3E6 on all of my running x64 processes.

Any ideas?


Title: Re: GetThreadContext (64 bit)
Post by: Yuri on March 09, 2015, 03:14:34 AM
HANDLEs are 64-bit in x64, so I think you should store rax rather than eax in MainThreadHandle. Otherwise the high dword of it will be some random garbage when you pass it to GetThreadContext.
Title: Re: GetThreadContext (64 bit)
Post by: FlySky on March 09, 2015, 04:23:58 AM
I've changed the code as you suggested, saving rax (64 bit handle) instead of just the lower dword of it.
The end result is still the same, a very weird situation.
Title: Re: GetThreadContext (64 bit)
Post by: qWord on March 09, 2015, 04:47:34 AM
Quote from: FlySky on March 09, 2015, 02:24:23 AMSeems I am having an issue with GetThreadContext returning error code 0x3E6
The return type is BOOL.
Quote from: msdnIf the function succeeds, the return value is nonzero.
Title: Re: GetThreadContext (64 bit)
Post by: FlySky on March 10, 2015, 04:21:19 AM
Sorry guys,

I expressed myself wrong.
The return value is 0 meaning failure and calling GetLastError shows me the error code 3E6.
Maybe it's a problem with my header file where it tries to read more bytes from the CONTEXT than it is allowed and
maybe that causes the Invalid access to memory location.
Will keep you guys informed.
Title: Re: GetThreadContext (64 bit)
Post by: Yuri on March 10, 2015, 05:53:48 PM
It looks like the CONTEXT structure must start at a 16-bit boundary, otherwise the call fails.
Title: Re: GetThreadContext (64 bit)
Post by: FlySky on March 11, 2015, 04:18:33 AM
what does that mean Yuri, in relation to Donkeys header file:

The context structure is defined in winnt.h:

CONTEXT STRUCT

    //
    // Register parameter home addresses.
    //
    // N.B. These fields are for convience - they could be used to extend the
    //      context record in the future.
    //

   P1Home DQ
   P2Home DQ
   P3Home DQ
   P4Home DQ
   P5Home DQ
   P6Home DQ

    //
    // Control flags.
    //

   ContextFlags DD
   MxCsr DD

    //
    // Segment Registers and processor flags.
    //

   SegCs DW
   SegDs DW
   SegEs DW
   SegFs DW
   SegGs DW
   SegSs DW
   EFlags DD

    //
    // Debug registers
    //

   Dr0 DQ
   Dr1 DQ
   Dr2 DQ
   Dr3 DQ
   Dr6 DQ
   Dr7 DQ

    //
    // Integer registers.
    //

   Rax DQ
   Rcx DQ
   Rdx DQ
   Rbx DQ
   Rsp DQ
   Rbp DQ
   Rsi DQ
   Rdi DQ
   R8 DQ
   R9 DQ
   R10 DQ
   R11 DQ
   R12 DQ
   R13 DQ
   R14 DQ
   R15 DQ

    //
    // Program counter.
    //

   Rip DQ

    //
    // Floating point state.
    //

   UNION
      FltSave XMM_SAVE_AREA32
      STRUCT
         Header DB 16*2 DUP ; M128A
         Legacy DB 16*8 DUP ; M128A
         Xmm0 M128A
         Xmm1 M128A
         Xmm2 M128A
         Xmm3 M128A
         Xmm4 M128A
         Xmm5 M128A
         Xmm6 M128A
         Xmm7 M128A
         Xmm8 M128A
         Xmm9 M128A
         Xmm10 M128A
         Xmm11 M128A
         Xmm12 M128A
         Xmm13 M128A
         Xmm14 M128A
         Xmm15 M128A
      ENDS
   ENDUNION

    //
    // Vector registers.
    //

   VectorRegister DB 16*26 DUP ; M128A
   VectorControl DQ

    //
    // Special debug control registers.
    //

   DebugControl DQ
   LastBranchToRip DQ
   LastBranchFromRip DQ
   LastExceptionToRip DQ
   LastExceptionFromRip DQ
ENDS
Title: Re: GetThreadContext (64 bit)
Post by: dedndave on March 11, 2015, 05:59:01 AM
i don't know what the syntax is for GoAsm
but, for Masm....

    ALIGN   16

ctxt CONTEXT <>


by the way, nice catch, Yuri   :t
Title: Re: GetThreadContext (64 bit)
Post by: Yuri on March 11, 2015, 02:05:24 PM
Thanks, Dave. :icon_cool:

The syntax for GoAsm is the same. There is nothing to change in the headers, FlySky, only align the structure definition in your source code.
Title: Re: GetThreadContext (64 bit)
Post by: jj2007 on March 12, 2015, 04:10:11 AM
Not sure if it's relevant (source (http://en.wikipedia.org/wiki/WoW64#Application_compatibility)):

QuoteA bug in the translation layer of the x64 version of WoW64[1][2] also renders all 32-bit applications that rely on the Windows API function GetThreadContext incompatible. Such applications include application debuggers, call stack tracers (e.g. IDEs displaying call stack) and applications that use garbage collection (GC) engines
Title: Re: GetThreadContext (64 bit)
Post by: FlySky on March 12, 2015, 04:35:31 AM
Thanks for all the replies.
You seem to have nailed it perfectly Yuri :t.
aligning the structure definition fixed the issue!.

Title: Re: GetThreadContext (64 bit)
Post by: Antariy on March 12, 2015, 06:28:49 AM
Quote from: jj2007 on March 12, 2015, 04:10:11 AM
Not sure if it's relevant (source (http://en.wikipedia.org/wiki/WoW64#Application_compatibility)):

QuoteA bug in the translation layer of the x64 version of WoW64[1][2] also renders all 32-bit applications that rely on the Windows API function GetThreadContext incompatible. Such applications include application debuggers, call stack tracers (e.g. IDEs displaying call stack) and applications that use garbage collection (GC) engines

It's not relevant here (there the call to GTC returns wrong contents, not fails, as it was described in further references pointet at wikipedia), but it's very useful info. Thank you for pointing that out, Jochen :t
Btw, in one of the blogs referenced there at wikipedia, there is a a post http://zachsaw.blogspot.com/2010/11/fast-memcpy-for-large-blocks.html - memcopy, that post refers to the other link, but there are "no such file" - so the code isn't available.
Title: Re: GetThreadContext (64 bit)
Post by: GoneFishing on March 12, 2015, 06:53:16 AM
Thank you, Alex
That's an  interesting and helpful link !

Quote from: Antariy on March 12, 2015, 06:28:49 AM
...
Btw, in one of the blogs referenced there at wikipedia, there is a a post http://zachsaw.blogspot.com/2010/11/fast-memcpy-for-large-blocks.html - memcopy, that post refers to the other link, but there are "no such file" - so the code isn't available.
Here it is:



/*
  Copyright(C) 2006, William Chan
  All rights reserved.

      Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are met:
   
      1) Redistributions of source code must retain the above copyright
        notice, this list of conditions and the following disclaimer.
      2) Redistributions in binary form must reproduce the above copyright
        notice, this list of conditions and the following disclaimer in the
        documentation and/or other materials provided with the distribution.
      3) Redistributions of source code must be provided at free of charge.
      4) Redistributions in binary forms must be provided at free of charge.
      5) Redistributions of source code within another distribution must be
        provided at free of charge including the distribution which is
        redistributing the source code. Also, the distribution which is
        redistributing the source code must have its source code
        redistributed as well.
      6) Redistribution of binary forms within another distribution must be
        provided at free of charge.

      THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
    IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
    THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
    PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
    BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
    CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
    SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
    INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
    CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
    ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
    POSSIBILITY OF SUCH DAMAGE.
*/

void X_aligned_memcpy_sse2(void* dest, const void* src, const unsigned long size_t)
{

  __asm
  {
    mov esi, src;    //src pointer
    mov edi, dest;   //dest pointer

    mov ebx, size_t; //ebx is our counter
    shr ebx, 7;      //divide by 128 (8 * 128bit registers)


    loop_copy:
      prefetchnta 128[ESI]; //SSE2 prefetch
      prefetchnta 160[ESI];
      prefetchnta 192[ESI];
      prefetchnta 224[ESI];

      movdqa xmm0, 0[ESI]; //move data from src to registers
      movdqa xmm1, 16[ESI];
      movdqa xmm2, 32[ESI];
      movdqa xmm3, 48[ESI];
      movdqa xmm4, 64[ESI];
      movdqa xmm5, 80[ESI];
      movdqa xmm6, 96[ESI];
      movdqa xmm7, 112[ESI];

      movntdq 0[EDI], xmm0; //move data from registers to dest
      movntdq 16[EDI], xmm1;
      movntdq 32[EDI], xmm2;
      movntdq 48[EDI], xmm3;
      movntdq 64[EDI], xmm4;
      movntdq 80[EDI], xmm5;
      movntdq 96[EDI], xmm6;
      movntdq 112[EDI], xmm7;

      add esi, 128;
      add edi, 128;
      dec ebx;

      jnz loop_copy; //loop please
    loop_copy_end:
  }
}

Title: Re: GetThreadContext (64 bit)
Post by: rrr314159 on March 12, 2015, 01:13:22 PM
I'll be darned - this is exactly the idea we've been beating to death in the laboratory (http://masm32.com/board/index.php?topic=4067.0)! I'd say William Chan stole it from me, except he's 9 years prior, so it would be a hard sell. But this algo has major drawbacks,

Quote from: Zach sawNote though, that you'll need to give it 16-byte aligned memory and it copies in 128-byte blocks.

Also prefetchnta seems useless, and movntdq worse-than-useless on my modern machine. Admittedly only tried them once but also saw ref's saying the same thing, that modern processors don't get much from them. (Of course u can't trust ref's)

I find that incrementing edi and esi midway through the list of mov's is better. Keeps the max offset down to 30h, no reason it should make a difference, but seems to help. And of course u should dec ebx long b4 the jnz branch, maximizes processor's ability to predict branch correctly in advance. Minor points, of course; see laboratory thread for a couple dozen more if interested

Dunno what this is doing here, would be more relevant over in the laboratory, but it was such a surprise to see it I had to comment.

BTW Yuri dedndave's right re GetThreadContext 16 bit alignment, nice catch
Title: Re: GetThreadContext (64 bit)
Post by: MichaelW on March 12, 2015, 03:44:32 PM
The structure (external) alignment is only one of the potential problems. For the compiled structure to work correctly its internal layout must be precisely as the Microsoft compilers would lay it out. No problem for a good compiler, but...

Below is the output for this source:

#include <windows.h>
#include <stdio.h>
#include <stddef.h>

int __cdecl main(void)
{
    CONTEXT context;
    printf("sizeof(CONTEXT)                 \t%I64d\n", sizeof(CONTEXT));
    printf("__alignof(context)              \t%I64d\n\n", __alignof(context));
    printf("offsetof(p1Home)                \t%I64d\n", offsetof(CONTEXT,P1Home));
    printf("offsetof(p2Home)                \t%I64d\n", offsetof(CONTEXT,P2Home));
    printf("offsetof(p3Home)                \t%I64d\n", offsetof(CONTEXT,P3Home));
    printf("offsetof(p4Home)                \t%I64d\n", offsetof(CONTEXT,P4Home));
    printf("offsetof(p5Home)                \t%I64d\n", offsetof(CONTEXT,P5Home));
    printf("offsetof(p6Home)                \t%I64d\n", offsetof(CONTEXT,P6Home));   
    printf("offsetof(p6Home)                \t%I64d\n", offsetof(CONTEXT,P6Home));
    printf("offsetof(ContextFlags)          \t%I64d\n", offsetof(CONTEXT,ContextFlags));   
    printf("offsetof(MxCsr)                 \t%I64d\n", offsetof(CONTEXT,MxCsr));   
    printf("offsetof(SegCs)                 \t%I64d\n", offsetof(CONTEXT,SegCs));     
    printf("offsetof(SegDs)                 \t%I64d\n", offsetof(CONTEXT,SegDs));   
    printf("offsetof(SegEs)                 \t%I64d\n", offsetof(CONTEXT,SegEs));   
    printf("offsetof(SegFs)                 \t%I64d\n", offsetof(CONTEXT,SegFs));   
    printf("offsetof(SegGs)                 \t%I64d\n", offsetof(CONTEXT,SegGs));   
    printf("offsetof(SegSs)                 \t%I64d\n", offsetof(CONTEXT,SegSs));   
    printf("offsetof(EFlag)s                \t%I64d\n", offsetof(CONTEXT,EFlags));   
    printf("offsetof(Dr0)                   \t%I64d\n", offsetof(CONTEXT,Dr0));   
    printf("offsetof(Dr1)                   \t%I64d\n", offsetof(CONTEXT,Dr1));   
    printf("offsetof(Dr2)                   \t%I64d\n", offsetof(CONTEXT,Dr2));   
    printf("offsetof(Dr3)                   \t%I64d\n", offsetof(CONTEXT,Dr3));   
    printf("offsetof(Dr6)                   \t%I64d\n", offsetof(CONTEXT,Dr6));   
    printf("offsetof(Dr7)                   \t%I64d\n", offsetof(CONTEXT,Dr7));   
    printf("offsetof(Rax)                   \t%I64d\n", offsetof(CONTEXT,Rax));   
    printf("offsetof(Rcx)                   \t%I64d\n", offsetof(CONTEXT,Rcx));   
    printf("offsetof(Rdx)                   \t%I64d\n", offsetof(CONTEXT,Rdx));   
    printf("offsetof(Rbx)                   \t%I64d\n", offsetof(CONTEXT,Rbx));   
    printf("offsetof(Rsp)                   \t%I64d\n", offsetof(CONTEXT,Rsp));   
    printf("offsetof(Rbp)                   \t%I64d\n", offsetof(CONTEXT,Rbp));   
    printf("offsetof(Rsi)                   \t%I64d\n", offsetof(CONTEXT,Rsi));   
    printf("offsetof(Rdi)                   \t%I64d\n", offsetof(CONTEXT,Rdi));   
    printf("offsetof(R8)                    \t%I64d\n", offsetof(CONTEXT,R8));   
    printf("offsetof(R9)                    \t%I64d\n", offsetof(CONTEXT,R9));   
    printf("offsetof(R10)                   \t%I64d\n", offsetof(CONTEXT,R10));   
    printf("offsetof(R11)                   \t%I64d\n", offsetof(CONTEXT,R11));   
    printf("offsetof(R12)                   \t%I64d\n", offsetof(CONTEXT,R12));   
    printf("offsetof(R13)                   \t%I64d\n", offsetof(CONTEXT,R13));   
    printf("offsetof(R14)                   \t%I64d\n", offsetof(CONTEXT,R14));   
    printf("offsetof(R15)                   \t%I64d\n", offsetof(CONTEXT,R15));   
    printf("offsetof(Rip)                   \t%I64d\n\n", offsetof(CONTEXT,Rip));   
    printf("sizeof(M128A)                   \t%I64d\n", sizeof(M128A));   
    printf("__alignof(FltSave)              \t%I64d\n", __alignof(context.FltSave));   
    printf("sizeof(FltSave)                 \t%I64d\n", sizeof(context.FltSave));
    printf("sizeof(DUMMYSTRUCTNAME)         \t%I64d\n\n", sizeof(context.Header) +
                                                          sizeof(context.Legacy) + 
                                                          sizeof(context.Xmm0) +
                                                          sizeof(context.Xmm1) +
                                                          sizeof(context.Xmm2) +
                                                          sizeof(context.Xmm3) +
                                                          sizeof(context.Xmm4) + 
                                                          sizeof(context.Xmm5) +
                                                          sizeof(context.Xmm6) +
                                                          sizeof(context.Xmm7) +
                                                          sizeof(context.Xmm8) +
                                                          sizeof(context.Xmm9) + 
                                                          sizeof(context.Xmm10) +
                                                          sizeof(context.Xmm11) +
                                                          sizeof(context.Xmm12) +
                                                          sizeof(context.Xmm13) +
                                                          sizeof(context.Xmm14) + 
                                                          sizeof(context.Xmm15));
    printf("offsetof(FltSave)               \t%I64d\n", offsetof(CONTEXT,FltSave));   
    printf("offsetof(Header[2])             \t%I64d\n", offsetof(CONTEXT,Header));   
    printf("offsetof(Legacy[8])             \t%I64d\n", offsetof(CONTEXT,Legacy));   
    printf("offsetof(Xmm0)                  \t%I64d\n", offsetof(CONTEXT,Xmm0));   
    printf("offsetof(Xmm1)                  \t%I64d\n", offsetof(CONTEXT,Xmm1));   
    printf("offsetof(Xmm2)                  \t%I64d\n", offsetof(CONTEXT,Xmm2));   
    printf("...\n");   
    printf("offsetof(Xmm15)                 \t%I64d\n", offsetof(CONTEXT,Xmm15));   
    printf("offsetof(VectorRegister)        \t%I64d\n", offsetof(CONTEXT,VectorRegister));   
    printf("offsetof(VectorControl)         \t%I64d\n", offsetof(CONTEXT,VectorControl));   
    printf("...\n");   
    printf("offsetof(LastExceptionFromRip)  \t%I64d\n\n", offsetof(CONTEXT,LastExceptionFromRip));   
    return 0;
}

/* THESE FROM WINNT.H:

typedef struct DECLSPEC_ALIGN(16) _M128A {
    ULONGLONG Low;
    LONGLONG High;
} M128A, *PM128A;

typedef struct DECLSPEC_ALIGN(16) _XSAVE_FORMAT {
    WORD ControlWord;
    WORD StatusWord;
    BYTE TagWord;
    BYTE Reserved1;
    WORD ErrorOpcode;
    DWORD ErrorOffset;
    WORD ErrorSelector;
    WORD Reserved2;
    DWORD DataOffset;
    WORD DataSelector;
    WORD Reserved3;
    DWORD MxCsr;
    DWORD MxCsr_Mask;
    M128A FloatRegisters[8];
#ifdef _WIN64
    M128A XmmRegisters[16];
    BYTE Reserved4[96];
#else
    M128A XmmRegisters[8];
    BYTE Reserved4[192];
    DWORD StackControl[7];
    DWORD Cr0NpxState;
#endif
} XSAVE_FORMAT, *PXSAVE_FORMAT;

typedef struct DECLSPEC_ALIGN (16) _CONTEXT {
    DWORD64 P1Home;
    DWORD64 P2Home;
    DWORD64 P3Home;
    DWORD64 P4Home;
    DWORD64 P5Home;
    DWORD64 P6Home;
    DWORD ContextFlags;
    DWORD MxCsr;
    WORD SegCs;
    WORD SegDs;
    WORD SegEs;
    WORD SegFs;
    WORD SegGs;
    WORD SegSs;
    DWORD EFlags;
    DWORD64 Dr0;
    DWORD64 Dr1;
    DWORD64 Dr2;
    DWORD64 Dr3;
    DWORD64 Dr6;
    DWORD64 Dr7;
    DWORD64 Rax;
    DWORD64 Rcx;
    DWORD64 Rdx;
    DWORD64 Rbx;
    DWORD64 Rsp;
    DWORD64 Rbp;
    DWORD64 Rsi;
    DWORD64 Rdi;
    DWORD64 R8;
    DWORD64 R9;
    DWORD64 R10;
    DWORD64 R11;
    DWORD64 R12;
    DWORD64 R13;
    DWORD64 R14;
    DWORD64 R15;
    DWORD64 Rip;
    union {
        XMM_SAVE_AREA32 FltSave;
        struct {
            M128A Header[2];
            M128A Legacy[8];
            M128A Xmm0;
            M128A Xmm1;
            M128A Xmm2;
            M128A Xmm3;
            M128A Xmm4;
            M128A Xmm5;
            M128A Xmm6;
            M128A Xmm7;
            M128A Xmm8;
            M128A Xmm9;
            M128A Xmm10;
            M128A Xmm11;
            M128A Xmm12;
            M128A Xmm13;
            M128A Xmm14;
            M128A Xmm15;
        } DUMMYSTRUCTNAME;
    } DUMMYUNIONNAME;
    M128A VectorRegister[26];
    DWORD64 VectorControl;
    DWORD64 DebugControl;
    DWORD64 LastBranchToRip;
    DWORD64 LastBranchFromRip;
    DWORD64 LastExceptionToRip;
    DWORD64 LastExceptionFromRip;
} CONTEXT, *PCONTEXT;
*/

Compiled to a 64-bit app with Pelles C Version 8.00.33 Release Candidate #7 (Win64):

sizeof(CONTEXT)                         1232
__alignof(context)                      16

offsetof(p1Home)                        0
offsetof(p2Home)                        8
offsetof(p3Home)                        16
offsetof(p4Home)                        24
offsetof(p5Home)                        32
offsetof(p6Home)                        40
offsetof(p6Home)                        40
offsetof(ContextFlags)                  48
offsetof(MxCsr)                         52
offsetof(SegCs)                         56
offsetof(SegDs)                         58
offsetof(SegEs)                         60
offsetof(SegFs)                         62
offsetof(SegGs)                         64
offsetof(SegSs)                         66
offsetof(EFlag)s                        68
offsetof(Dr0)                           72
offsetof(Dr1)                           80
offsetof(Dr2)                           88
offsetof(Dr3)                           96
offsetof(Dr6)                           104
offsetof(Dr7)                           112
offsetof(Rax)                           120
offsetof(Rcx)                           128
offsetof(Rdx)                           136
offsetof(Rbx)                           144
offsetof(Rsp)                           152
offsetof(Rbp)                           160
offsetof(Rsi)                           168
offsetof(Rdi)                           176
offsetof(R8)                            184
offsetof(R9)                            192
offsetof(R10)                           200
offsetof(R11)                           208
offsetof(R12)                           216
offsetof(R13)                           224
offsetof(R14)                           232
offsetof(R15)                           240
offsetof(Rip)                           248

sizeof(M128A)                           16
__alignof(FltSave)                      16
sizeof(FltSave)                         512
sizeof(DUMMYSTRUCTNAME)                 416

offsetof(FltSave)                       256
offsetof(Header[2])                     256
offsetof(Legacy[8])                     288
offsetof(Xmm0)                          416
offsetof(Xmm1)                          432
offsetof(Xmm2)                          448
...
offsetof(Xmm15)                         656
offsetof(VectorRegister)                768
offsetof(VectorControl)                 1184
...
offsetof(LastExceptionFromRip)          1224



Title: Re: GetThreadContext (64 bit)
Post by: Antariy on March 13, 2015, 05:54:16 AM
Quote from: vertograd on March 12, 2015, 06:53:16 AM
Thank you, Alex
That's an  interesting and helpful link !

Here it is:

:t

And thank you for finding the source! :biggrin:
The description in the blog was attracting - so I thought that it maybe probably some unique technique, so it was interesting to see the source, but it was unavailable from the link pointed in the blog. So now we can see that it uses pretty straight, a bit "oversimplyfied" way, but the seeing of code is a good thing - now we know WHAT exactly that code is, without code there were thoughts possible "which algo it was? maybe it is some revolutionary thing?", but now we see things and see that the technics used are more or less well known to some of the members, but, still this code maybe useful to anyone as it is simple and straightforward, not entangled with advanced techniques etc.
Title: Re: GetThreadContext (64 bit)
Post by: Antariy on March 13, 2015, 05:59:04 AM
Quote from: rrr314159 on March 12, 2015, 01:13:22 PM
I'll be darned - this is exactly the idea we've been beating to death in the laboratory (http://masm32.com/board/index.php?topic=4067.0)! I'd say William Chan stole it from me, except he's 9 years prior, so it would be a hard sell. But this algo has major drawbacks,

Quote from: Zach sawNote though, that you'll need to give it 16-byte aligned memory and it copies in 128-byte blocks.

Also prefetchnta seems useless, and movntdq worse-than-useless on my modern machine. Admittedly only tried them once but also saw ref's saying the same thing, that modern processors don't get much from them. (Of course u can't trust ref's)

Yes, actually it was not so useful as it was told (you know, the "loud words" on the "technology advances" are usually much part just a words with a little true) even on not very modern hardware.

Quote from: rrr314159 on March 12, 2015, 01:13:22 PM
I find that incrementing edi and esi midway through the list of mov's is better. Keeps the max offset down to 30h, no reason it should make a difference, but seems to help. And of course u should dec ebx long b4 the jnz branch, maximizes processor's ability to predict branch correctly in advance. Minor points, of course; see laboratory thread for a couple dozen more if interested

Not too big offset has the influence on timing, yes, thought there is not "obvious reason", but it does so.

Much more big point: the code doesn't support a "precise copying" - it copies just in the 128 bytes blocks and doesn't support the precise tails copying that less than 128 bytes. Very simple code.

Quote from: rrr314159 on March 12, 2015, 01:13:22 PM
Dunno what this is doing here, would be more relevant over in the laboratory, but it was such a surprise to see it I had to comment.

It was on the blog which was pointed as a reference on the wikipedia's article, which was pointed by Jochen, probably it should be clear from the posts above. And, being a "Real Lazy Coder" (TM), I did not bother to point that link in the thread with memcopy as it was not open in the browser. I did read it earlier (did not posted there as tend to agree with Hutch's and Jochen's point of view on that subject), too, so knew about that topic on the forum going at the time, so that's why pointed the link to some "unknown memcopy" algo here.
Title: Re: GetThreadContext (64 bit)
Post by: hutch-- on December 31, 2018, 06:05:20 PM
Something you must do with some values when you are using /LARGEADDRESSAWARE is to write the value to a 64 bit register then write the register to the 64 bit variable. It is not an assembler issue but part of the Win64 ABI.
Title: Re: GetThreadContext (64 bit)
Post by: LiaoMi on December 31, 2018, 08:30:38 PM
Quote from: hutch-- on December 31, 2018, 06:05:20 PM
Something you must do with some values when you are using /LARGEADDRESSAWARE is to write the value to a 64 bit register then write the register to the 64 bit variable. It is not an assembler issue but part of the Win64 ABI.

Reply #16 on: March 13, 2015, 05:59:04 AM ยป  :biggrin:
Title: Re: GetThreadContext (64 bit)
Post by: hutch-- on January 04, 2019, 01:43:14 AM
 :biggrin:

Strangely enough I read ordinary English with no problems.  :P