Self-modifying code example

Magnum · January 12, 2015, 04:37:25 PM

/*
   Kode was written and tested using Windows XP, SP2 running inside &irtualBox.
   Visual C++ 6.0
*/
 
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
 
// Beginning of printfFunction
void printfFunction(char *szText){
    printf("%s\n", szText);
}
// Marks the end of printfFunction
void printfFunctionStub(){}
 
void enc(DWORD dwAddress, DWORD dwSize){
    __asm{
        mov ecx,dwAddress
        add ecx,dwSize
        mov eax,dwAddress
        C_loop:
        xor byte ptr ds:[eax],0x5A
        inc eax
        cmp eax,ecx
        jl C_loop;
    }
}
 
int main(){
    DWORD dwPrintFunctionSize = 0, dwOldProtect;
    char *fA = NULL, *fB = NULL;
 
    // Obtain the addresses for the functions so we can calculate size.
    fA = (char *)&printfFunction;
    fB = (char *)&printfFunctionStub;
 
    // Get total function size
    dwPrintFunctionSize = (fB - fA);
     
    // Test the function
    printfFunction("Hello A!\n");
     
    // We need to give ourselves access to modifify data at the given address
    VirtualProtect(fA, dwPrintFunctionSize, PAGE_READWRITE, &dwOldProtect);
     
    enc(fA, dwPrintFunctionSize); // XOR encrypt the function
    enc(fA, dwPrintFunctionSize); // XOR decrypt the function
     
    // Restore the old protection
    VirtualProtect(fA, dwPrintFunctionSize, dwOldProtect, NULL);
 
    // Test the function
    printfFunction("Hello C!\n");
 
    return 0;
}

sinsi · January 12, 2015, 05:56:25 PM

That's not SMC.
Wow, that's some optimized __asm right there yep.

Vortex · January 13, 2015, 06:10:59 AM

What's your reason of writing self-modifying code? ::)

dedndave · January 13, 2015, 07:41:04 AM

self-modifying code had a purpose - back when you were limited to 640 KB
you could do more with less code
tricky, because you had to make sure the modified code wasn't executed out of the pre-fetch queue

in modern windows code, i don't think it really has a place

rrr314159 · January 13, 2015, 10:46:03 AM

The point here is not speed or functionality (doing more with less). He's encrypting the code. Presumably you can distribute it encrypted, decrypt only when it needs to run; makes it harder for bad guys to reverse engineer. That's applicable no matter how much RAM we have, and necessitates "self-modifying code" - altho actually the routine can only modify some other routine this way, not itself.

[edit] Sorry Andy / Magnum / Anasazi, didn't mean to be rude talking about u w/o introducing myself. Nice 2 meet u!

jj2007 · January 13, 2015, 10:53:33 AM

Quote from: Vortex on January 13, 2015, 06:10:59 AM
What's your reason of writing self-modifying code? ::)

The author is Anasazi, not Andy. See Exodus 20:15

FORTRANS · January 14, 2015, 12:53:35 AM

Hi,

I have written self modifying code, though if I remember correctly,
never intentionally. Once debugged, it would not self modify and
crash the system.

Sigh.

Steve

dedndave · January 14, 2015, 02:34:16 AM

i used to write that way a lot - 16-bit DOS days - lol
i often modified the operands of instructions
sometimes, modified the opcode part, too

it could make for some very small and fast code, actually
but - very difficult to maintain or understand

a few examples....

Code Select

        MOV     DH,0          ;best distance
CBDIST  label   byte

now, you can change the operand by writing to "byte ptr CBDIST-1"

in this example, i hard-coded the opcode

Code Select

;;      ADD     BX,RSADJ0     ;read seg adjust
        db      81h,0C3h
RSADJ0  dw      ?

notice that ADD BX,immed is faster than ADD BX,[direct] - and it saves 2 bytes, of course

looking back, you can see how hard it is to maintain - lol

Code Select

;;      JMP SHO $+8
;;or    MOV     BUFWRD+CURENT+6,CX
DDVAR0  dw      8EBh      ;0E89h = MOV [aaaa],CX
        dw      OFS BUFWRD+CURENT+6
        MOV WPT DDVAR0,8EBh   ;JMP SHO $+8

(i used to use abbreviations: OFS = offset, SHO = short, WPT = word ptr, etc)

dedndave · January 14, 2015, 02:50:08 AM

i remember the 8088 had a very predictable pre-fetch queue, 6 bytes in size
a fairly simple algorithm was used
you could read through each line of code and know what the queue status was

with each new processor, the pre-fetch queue got larger and the algorithm more complex

but, generally, a reverse jump, call, or return would flush the queue and start new

you had to make sure that the code you modified was not already pre-fetched if you expected it to work :lol:

anunitu · January 14, 2015, 03:03:39 AM

Damn thread sent me off surfing about the 6502 and C64 programming...Lost about 3 hours....My head hurts from remembering a lot of "Stuff"

FORTRANS · January 14, 2015, 04:36:52 AM

Quote from: dedndave on January 14, 2015, 02:34:16 AM
i used to write that way a lot - 16-bit DOS days - lol
i often modified the operands of instructions
sometimes, modified the opcode part, too

it could make for some very small and fast code, actually
but - very difficult to maintain or understand

Hi Dave,

While I knew of that kind of coding, I don't think I ever tried it
in useful programs. I just wrote programs that would overwrite
themselves until I debugged them. I looked at a lot of code snippets
that would self modify to encrypt themselves or tried to prevent
being looked at in a debugger. The code that just worked on itself
to make it faster or more useful was rarer. I think the only code
of that type I ever used was to determine if it was running on an
8088, 8086, 80186, or 80286.

I did try out the fancy jumps that would use different parts of a
set of bytes to execute different opcodes. But that was too difficult
to be useful in any event.

Enjoyed seeing your examples.

Regards,

Steve N.

dedndave · January 14, 2015, 06:29:15 AM

i am actually surprised how well my old stuff works under NTVDM
only 1 program gives me troubles - and i'm not sure of the cause

but, i wrote DOS device drivers that used those techniques and they all work fine

Antariy · January 14, 2015, 08:51:19 AM

Quote from: dedndave on January 13, 2015, 07:41:04 AM
you could do more with less code

Exactly - for an instance make one code path for dirrent datatypes to process.

Quote from: dedndave on January 13, 2015, 07:41:04 AM
tricky, because you had to make sure the modified code wasn't executed out of the pre-fetch queue

Actually I haven't seen any problems with that in modern hardware, is that really true or a belief? How can the more or less modern x86 machine with unified memory architecture (code and data in one memory) modify the code - but "do not know" about that? Even with complex caching mechanisms there obviously should be "watchdogs" and there is no difference in what is modified - the data, or the code, the CPU knows what was modified and, depending on that, just refills the required part of the cache - the 1st level data cache, or code microinstructions cache. That is reason whay self-modifying is very slow, and when the modified code is in the same page as the executing (modification) code - the modification code executes with incredible stalls. This shows that CPU just re-decodes and refills the code from this page.

Is there any example of when the code crashed because it was patched, but CPU prefetcher retained the microops unchanged? The examples when the code is patched on multithreading environment without atomical patching are not count - that is the problem of partial memory patch when other thread may just reach this place of code and executes garbage.

Quote from: dedndave on January 13, 2015, 07:41:04 AM
in modern windows code, i don't think it really has a place

MemInfoMicro, Dave

That's app for fun, yes, but it works, and uses some small self modification to use one algo to process two datatypes. But the code which should execute and change frequently is very slow on modern CPUs.

Magnum · January 14, 2015, 11:34:51 AM

Quote from: rrr314159 on January 13, 2015, 10:46:03 AM
The point here is not speed or functionality (doing more with less). He's encrypting the code. Presumably you can distribute it encrypted, decrypt only when it needs to run; makes it harder for bad guys to reverse engineer. That's applicable no matter how much RAM we have, and necessitates "self-modifying code" - altho actually the routine can only modify some other routine this way, not itself.

[edit] Sorry Andy / Magnum / Anasazi, didn't mean to be rude talking about u w/o introducing myself. Nice 2 meet u!

rrr314159,

No offense taken.

Welcome to the group. :-)

dedndave · January 15, 2015, 04:32:15 AM

to be honest, Alex, i don't know enough about code caches on modern CPU's to answer that
i can understand that data cache hits and misses are accounted for
but, the processors may not be designed for code that changes on the fly - lol

i can tell you that it was a very real problem with 8088, 80286, and probably 80386
when the 80486 came out - it became a much more complicated prediction

The MASM Forum

News:

Self-modifying code example

Magnum

sinsi

Vortex

dedndave

rrr314159

jj2007

FORTRANS

dedndave

dedndave

anunitu

FORTRANS

dedndave

Antariy

Magnum

dedndave