The MASM Forum

Miscellaneous => The Orphanage => Topic started by: Magnum on January 12, 2015, 04:37:25 PM

Title: Self-modifying code example
Post by: Magnum on January 12, 2015, 04:37:25 PM
/*
   Kode was written and tested using Windows XP, SP2 running inside &irtualBox.
   Visual C++ 6.0
*/

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

// Beginning of printfFunction
void printfFunction(char *szText){
    printf("%s\n", szText);
}
// Marks the end of printfFunction
void printfFunctionStub(){}

void enc(DWORD dwAddress, DWORD dwSize){
    __asm{
        mov ecx,dwAddress
        add ecx,dwSize
        mov eax,dwAddress
        C_loop:
        xor byte ptr ds:[eax],0x5A
        inc eax
        cmp eax,ecx
        jl C_loop;
    }
}

int main(){
    DWORD dwPrintFunctionSize = 0, dwOldProtect;
    char *fA = NULL, *fB = NULL;

    // Obtain the addresses for the functions so we can calculate size.
    fA = (char *)&printfFunction;
    fB = (char *)&printfFunctionStub;

    // Get total function size
    dwPrintFunctionSize = (fB - fA);
     
    // Test the function
    printfFunction("Hello A!\n");
     
    // We need to give ourselves access to modifify data at the given address
    VirtualProtect(fA, dwPrintFunctionSize, PAGE_READWRITE, &dwOldProtect);
     
    enc(fA, dwPrintFunctionSize); // XOR encrypt the function
    enc(fA, dwPrintFunctionSize); // XOR decrypt the function
     
    // Restore the old protection
    VirtualProtect(fA, dwPrintFunctionSize, dwOldProtect, NULL);

    // Test the function
    printfFunction("Hello C!\n");

    return 0;
}

Title: Re: Self-modifying code example
Post by: sinsi on January 12, 2015, 05:56:25 PM
That's not SMC.
Wow, that's some optimized __asm right there yep.
Title: Re: Self-modifying code example
Post by: Vortex on January 13, 2015, 06:10:59 AM
What's your reason of writing self-modifying code?  ::)
Title: Re: Self-modifying code example
Post by: dedndave on January 13, 2015, 07:41:04 AM
self-modifying code had a purpose - back when you were limited to 640 KB
you could do more with less code
tricky, because you had to make sure the modified code wasn't executed out of the pre-fetch queue

in modern windows code, i don't think it really has a place
Title: Re: Self-modifying code example
Post by: rrr314159 on January 13, 2015, 10:46:03 AM
The point here is not speed or functionality (doing more with less). He's encrypting the code. Presumably you can distribute it encrypted, decrypt only when it needs to run; makes it harder for bad guys to reverse engineer. That's applicable no matter how much RAM we have, and necessitates "self-modifying code" - altho actually the routine can only modify some other routine this way, not itself.

[edit] Sorry Andy / Magnum / Anasazi, didn't mean to be rude talking about u w/o introducing myself. Nice 2 meet u!
Title: Re: Self-modifying code example
Post by: jj2007 on January 13, 2015, 10:53:33 AM
Quote from: Vortex on January 13, 2015, 06:10:59 AM
What's your reason of writing self-modifying code?  ::)

The author is Anasazi, not Andy. See Exodus 20:15 (http://www.kingjamesbibleonline.org/Exodus-20-15/)
Title: Re: Self-modifying code example
Post by: FORTRANS on January 14, 2015, 12:53:35 AM
Hi,

   I have written self modifying code, though if I remember correctly,
never intentionally.  Once debugged, it would not self modify and
crash the system.

Sigh.

Steve
Title: Re: Self-modifying code example
Post by: dedndave on January 14, 2015, 02:34:16 AM
i used to write that way a lot - 16-bit DOS days - lol
i often modified the operands of instructions
sometimes, modified the opcode part, too

it could make for some very small and fast code, actually
but - very difficult to maintain or understand

a few examples....

        MOV     DH,0          ;best distance
CBDIST  label   byte

now, you can change the operand by writing to "byte ptr CBDIST-1"

in this example, i hard-coded the opcode
;;      ADD     BX,RSADJ0     ;read seg adjust
        db      81h,0C3h
RSADJ0  dw      ?

notice that ADD BX,immed is faster than ADD BX,[direct] - and it saves 2 bytes, of course

looking back, you can see how hard it is to maintain - lol
;;      JMP SHO $+8
;;or    MOV     BUFWRD+CURENT+6,CX
DDVAR0  dw      8EBh      ;0E89h = MOV [aaaa],CX
        dw      OFS BUFWRD+CURENT+6
        MOV WPT DDVAR0,8EBh   ;JMP SHO $+8


(i used to use abbreviations: OFS = offset, SHO = short, WPT = word ptr, etc)
Title: Re: Self-modifying code example
Post by: dedndave on January 14, 2015, 02:50:08 AM
i remember the 8088 had a very predictable pre-fetch queue, 6 bytes in size
a fairly simple algorithm was used
you could read through each line of code and know what the queue status was

with each new processor, the pre-fetch queue got larger and the algorithm more complex   :biggrin:

but, generally, a reverse jump, call, or return would flush the queue and start new

you had to make sure that the code you modified was not already pre-fetched if you expected it to work   :lol:
Title: Re: Self-modifying code example
Post by: anunitu on January 14, 2015, 03:03:39 AM
Damn thread sent me off surfing about the 6502 and C64 programming...Lost about 3 hours....My head hurts from remembering a lot of "Stuff"
Title: Re: Self-modifying code example
Post by: FORTRANS on January 14, 2015, 04:36:52 AM
Quote from: dedndave on January 14, 2015, 02:34:16 AM
i used to write that way a lot - 16-bit DOS days - lol
i often modified the operands of instructions
sometimes, modified the opcode part, too

it could make for some very small and fast code, actually
but - very difficult to maintain or understand

Hi Dave,

   While I knew of that kind of coding, I don't think I ever tried it
in useful programs.  I just wrote programs that would overwrite
themselves until I debugged them.  I looked at a lot of code snippets
that would self modify to encrypt themselves or tried to prevent
being looked at in a debugger.  The code that just worked on itself
to make it faster or more useful was rarer.  I think the only code
of that type I ever used was to determine if it was running on an
8088, 8086, 80186, or 80286.

   I did try out the fancy jumps that would use different parts of a
set of bytes to execute different opcodes.  But that was too difficult
to be useful in any event.

   Enjoyed seeing your examples.

Regards,

Steve N.
Title: Re: Self-modifying code example
Post by: dedndave on January 14, 2015, 06:29:15 AM
i am actually surprised how well my old stuff works under NTVDM
only 1 program gives me troubles - and i'm not sure of the cause

but, i wrote DOS device drivers that used those techniques and they all work fine
Title: Re: Self-modifying code example
Post by: Antariy on January 14, 2015, 08:51:19 AM
Quote from: dedndave on January 13, 2015, 07:41:04 AM
you could do more with less code

Exactly - for an instance make one code path for dirrent datatypes to process.

Quote from: dedndave on January 13, 2015, 07:41:04 AM
tricky, because you had to make sure the modified code wasn't executed out of the pre-fetch queue

Actually I haven't seen any problems with that in modern hardware, is that really true or a belief? How can the more or less modern x86 machine with unified memory architecture (code and data in one memory) modify the code - but "do not know" about that? Even with complex caching mechanisms there obviously should be "watchdogs" and there is no difference in what is modified - the data, or the code, the CPU knows what was modified and, depending on that, just refills the required part of the cache - the 1st level data cache, or code microinstructions cache. That is reason whay self-modifying is very slow, and when the modified code is in the same page as the executing (modification) code - the modification code executes with incredible stalls. This shows that CPU just re-decodes and refills the code from this page.

Is there any example of when the code crashed because it was patched, but CPU prefetcher retained the microops unchanged? The examples when the code is patched on multithreading environment without atomical patching are not count - that is the problem of partial memory patch when other thread may just reach this place of code and executes garbage.

Quote from: dedndave on January 13, 2015, 07:41:04 AM
in modern windows code, i don't think it really has a place

MemInfoMicro, Dave :biggrin: That's app for fun, yes, but it works, and uses some small self modification to use one algo to process two datatypes. But the code which should execute and change frequently is very slow on modern CPUs.
Title: Re: Self-modifying code example
Post by: Magnum on January 14, 2015, 11:34:51 AM
Quote from: rrr314159 on January 13, 2015, 10:46:03 AM
The point here is not speed or functionality (doing more with less). He's encrypting the code. Presumably you can distribute it encrypted, decrypt only when it needs to run; makes it harder for bad guys to reverse engineer. That's applicable no matter how much RAM we have, and necessitates "self-modifying code" - altho actually the routine can only modify some other routine this way, not itself.

[edit] Sorry Andy / Magnum / Anasazi, didn't mean to be rude talking about u w/o introducing myself. Nice 2 meet u!

rrr314159,

No offense taken.

Welcome to the group. :-)

Title: Re: Self-modifying code example
Post by: dedndave on January 15, 2015, 04:32:15 AM
to be honest, Alex, i don't know enough about code caches on modern CPU's to answer that
i can understand that data cache hits and misses are accounted for
but, the processors may not be designed for code that changes on the fly - lol

i can tell you that it was a very real problem with 8088, 80286, and probably 80386
when the 80486 came out - it became a much more complicated prediction
Title: Re: Self-modifying code example
Post by: Antariy on January 15, 2015, 06:57:39 AM
Quote from: dedndave on January 15, 2015, 04:32:15 AM
to be honest, Alex, i don't know enough about code caches on modern CPU's to answer that
i can understand that data cache hits and misses are accounted for
but, the processors may not be designed for code that changes on the fly - lol

From my experiency and analisis on this - you may freely rely on that the x86 machine will update the prefetch queue :t
Actually I'm pretty sure that the API function which flushes the code cache after a patch, and which is "strongly recommended" to be used after a patch, was introduced in the time when WinNT line was cross-platform, and intended to run on multiple types of CPUs, not only x86, so this API was a "general advice" for use because of "portability" ::) I.e., maybe some CPUs which were supported by early WinNTs were not so "advanced" to detect code changes, so as a "generic habit" it was forced to use the API on every platform supported. It becomes a popular belief - "patch a code, then flush the code cashe with API" - but I guess, for every x86 CPU with caching (i486+), and for sure for P6+, the CPU does this itself. And this is not a "guess" - this is the rule, just it is not in the same "line" as the popular belief/habit to use the API.


Quote from: dedndave on January 15, 2015, 04:32:15 AM
i can tell you that it was a very real problem with 8088, 80286, and probably 80386
when the 80486 came out - it became a much more complicated prediction

Yes, I think i386 had that problem too, and things changed after release of the CPU with "real" caching mechanism. No doubts about early CPUs, I was asking about the belief that code patch requires flushing the cache on modern CPUs. Probably that become a "tradition" and the roots are in the early WinNT architecture.

Real problem with patching the PAGE (not the CODE even) on modern CPUs is that if the patched page contains currently running code, then code will run with big stalls after every page write hit (there is no dependence, where is located the patching code and which thread runs it).
Maybe there are side effects like code in the next to the patched page will run with slowdowns, too - depending on how the CPU does the caching, i.e. how far it goes in instructions grabbing and decoding. That is a reason why Intel recommends to put the data after an uncoditional jump in the code, if the data is required to be put in the code just after it - so the CPU just redirected over the data and do not tries to interpret it (if it tries at all). Maybe such kind of data could be patched without stalls, but not sure, and didn't checked that.

But if the code requires to be patched (debugging, on-the-fly fixing, optimization for the data etc), and that is not frequent operation on frequently called time-critical code, it's usable under Windows and is legal.
Title: Re: Self-modifying code example
Post by: FORTRANS on January 15, 2015, 09:29:38 AM
Hi,

   I just briefly checked the book "The Undocumented PC", and
he was trying to detect the size of the prefetch queue for a
number of CPU's.  His method using REP STOSB worked up to
the Pentium.  For the Pentium he had to use a quirk of the A20
address line as well to get the size of the queue.  The Pentium
Pro did not allow that either, so he thought that the queue would
be flushed after a write on any newer CPU.  Of course the Pentium
Pro was the newest processor at that time.

Cheers,

Steve N.
Title: Re: Self-modifying code example
Post by: Antariy on January 15, 2015, 09:57:22 AM
Hi Steve :t

Quote from: FORTRANS on January 15, 2015, 09:29:38 AM
Hi,

   I just briefly checked the book "The Undocumented PC", and
he was trying to detect the size of the prefetch queue for a
number of CPU's.  His method using REP STOSB worked up to
the Pentium.  For the Pentium he had to use a quirk of the A20
address line as well to get the size of the queue.  The Pentium
Pro did not allow that either, so he thought that the queue would
be flushed after a write on any newer CPU.  Of course the Pentium
Pro was the newest processor at that time.

Cheers,

Steve N.

Thank you for an info!

So that's the P6+ which flushes the prefetcher for sure. But the other interesting question is: from the written it seems that the tests were in the real-mode? How else the author manipulated with A20 line?

Maybe the P6+ is the first family which support in hardware the watching of what is changed, even in real mode (that is a reason which sounds true because of P6 was the first real OoO CPU and first CPU with prediction, so it obviously had architectural advances in the side of memory-cache system tracing). But the other question is: if the similar tests will run on P5 in the protected mode with page addressation, will they show the same, or maybe P5, or even more earlier CPU, will refill the prefetching queue, too? Different addressing and operating modes may would have been a reason of different behaviour, and if at some point for designers it was not too important to check what was changed (in real mode where is no protection, the watching system at all might have been looking as superfluous, slowing down the operations, so, for speed up in early models it was turned off in the real mode operation (this is just a guess), but with release of P6 there was a global change in the cache-memory subsystem, so the feature of prefetcher refilling was working in real mode too).
Title: Re: Self-modifying code example
Post by: FORTRANS on January 15, 2015, 11:49:39 PM
Quote from: Antariy on January 15, 2015, 09:57:22 AM
Hi Steve

Hi Alex,

Quote

Thank you for an info!

So that's the P6+ which flushes the prefetcher for sure. But the other interesting question is: from the written it seems that the tests were in the real-mode? How else the author manipulated with A20 line?

   Yes, it seems to be all real mode MS-DOS programs.  But I
don't see why it could not be protected code if he (someone)
wrote his own boot loader.

Quote
Maybe the P6+ is the first family which support in hardware the watching of what is changed, even in real mode (that is a reason which sounds true because of P6 was the first real OoO CPU and first CPU with prediction, so it obviously had architectural advances in the side of memory-cache system tracing).

   He says the Pentium Pro (P6) manual states that the A20 control
logic "is provided inside the CPU".  And it is external to the Pentium
(P5) on the system he used.

Quote
But the other question is: if the similar tests will run on P5 in the protected mode with page address[ing], will they show the same, or maybe P5, or even more earlier CPU, will refill the prefetching queue, too?

   Interesting question.  I have both P5 and P6 systems if that
helps any.  I can boot them into DOS for real mode tests.  But
van Gilluwe already did that in "The Undocumented PC".

   If I boot Windows 2000 or OS/2, I know that it is in a protected
mode, even for DOS program execution.  But I do not know what
PM features are being used.  And I assume the same for DOS
programs under Windows 98.  So a test may work or not.  But
is that all that can be inferred?

   So I can run some limited tests if you want.  Though it may take
a little time to set a system back up to use.

Cheers,

Steve N.
Title: Re: Self-modifying code example
Post by: FORTRANS on January 16, 2015, 01:59:04 AM
Quote from: dedndave on January 14, 2015, 06:29:15 AM
i am actually surprised how well my old stuff works under NTVDM
only 1 program gives me troubles - and i'm not sure of the cause

Hi Dave,

   What are the problems?  Checking my memory (ha!) I have seen
the following problems.  Unsupported BIOS functions, timer related
Int15H, fn 86H, wait doesn't work when I wanted a delay.  Some
others w.r.t. memory?  I thought command line parsing was broken,
so I wrote a work around, but I can't reproduce that now.  And I tried
to stuff the keyboard buffer when asking for user input, and failed
there.  HLT doesn't work as expected.  And as you pointed out,
calling the BIOS keyboard functions caused 100% CPU usage.

   There are some other DOS versus VDM and NTVDM emulation
foibles, but I don't remember them well enough to enumerate
correctly.

Cheers,

Steve N.
Title: Re: Self-modifying code example
Post by: dedndave on January 16, 2015, 02:32:17 AM
a number of my old programs revector interrupts - work fine   :P
Title: Re: Self-modifying code example
Post by: Antariy on January 16, 2015, 06:58:35 AM
Hi Steve,

Quote from: FORTRANS on January 15, 2015, 11:49:39 PM
Quote

Thank you for an info!

So that's the P6+ which flushes the prefetcher for sure. But the other interesting question is: from the written it seems that the tests were in the real-mode? How else the author manipulated with A20 line?

   Yes, it seems to be all real mode MS-DOS programs.  But I
don't see why it could not be protected code if he (someone)
wrote his own boot loader.

Writing the minimalistic OS to run the test code in protected mode with page addressation enabled is too ineffecient, probably.


Quote from: FORTRANS on January 15, 2015, 11:49:39 PM
   Interesting question.  I have both P5 and P6 systems if that
helps any.  I can boot them into DOS for real mode tests.  But
van Gilluwe already did that in "The Undocumented PC".

   If I boot Windows 2000 or OS/2, I know that it is in a protected
mode, even for DOS program execution.  But I do not know what
PM features are being used.  And I assume the same for DOS
programs under Windows 98.  So a test may work or not.  But
is that all that can be inferred?

   So I can run some limited tests if you want.  Though it may take
a little time to set a system back up to use.

Yes, I've prepared two tests, and if you would like - when you'll fire your systems up, run the tests http://masm32.com/board/index.php?topic=3960
I've described what each of them displays in the thread.