Author Topic: Self modifying code  (Read 3406 times)

Siekmanski

  • Member
  • *****
  • Posts: 1079
Self modifying code
« on: February 27, 2016, 10:18:07 AM »
Is it legal to post and discuss self-modifying code here at The Masm Forum ?

- To improve performance by unrolling loops into L1/L2/L3 code cache memory ( skipping additions / subtractions of memory data pointers and changing immediate values etc.)
- Compressing code to be decompressed and executed at runtime. ( minimizes executable size )

Marinus

Magnum

  • Member
  • *****
  • Posts: 2233
Re: Self modifying code
« Reply #1 on: February 27, 2016, 10:39:36 AM »
I see no mention of it in the rules.

Take care,
                   Andy

Ubuntu-mate-16.04-desktop-amd64

http://www.goodnewsnetwork.org

ToutEnMasm

  • Member
  • *****
  • Posts: 1184
    • EditMasm
Re: Self modifying code
« Reply #2 on: February 27, 2016, 08:33:06 PM »
Pirates techniques are not allowed in this forum,That is a clear rule.
Better is to wait the permission of hutch,but I don't think he can agree with that.
But,There is some samples in the forum.
Fa is a musical note to play with CL

jj2007

  • Member
  • *****
  • Posts: 7453
  • Assembler is fun ;-)
    • MasmBasic
Re: Self modifying code
« Reply #3 on: February 27, 2016, 08:49:09 PM »
Greetings from Italy: Our logic here is "everything that is not explicitly forbidden is allowed".

Which is in sharp contrast to German logic: "everything that is not explicitly allowed is forbidden" 8)

Anyway, for Hutch to decide (does SMF has hidden members-only sub-forums?).

Your ideas look interesting, although from my experience self-modifying code is a no-no for performance, as the code cache gets invalidated. But you seem to be aware of that...

Siekmanski

  • Member
  • *****
  • Posts: 1079
Re: Self modifying code
« Reply #4 on: February 28, 2016, 12:26:27 AM »
These ideas are not intended for pirate techniques.  :eusa_naughty:

My idea is, reserve process-memory unrolling and self-modifying the code loops once.
Not on the fly at each execution so, no code cache invalidation.
Then prefetch the code to the cache you want it in. ( preventing cache miss penalties. )
So, you can take advantage of the different L1/L2/L3 cache sizes of different machines and skip the opcodes you don't need anymore.

This should boost up performance.  8)

@Hutch, do you allow this kind of discussions ?

caballero

  • Member
  • ****
  • Posts: 757
    • Abre Ojos Ensamblador
Re: Self modifying code
« Reply #5 on: February 29, 2016, 09:02:34 PM »
It seems that Hutch has choosed "the German method" to say "no"  :biggrin:

I don't know a word on this subject, but I think it is an intersesting subject, so I'd be glad to hear about it.

Regards
En un lugar de la Mancha de cuyo nombre no quiero acordarme

FORTRANS

  • Member
  • ****
  • Posts: 941
Re: Self modifying code
« Reply #6 on: March 01, 2016, 12:27:37 AM »
Hi,

Is it legal to post and discuss self-modifying code here at The Masm Forum ?

- To improve performance by unrolling loops into L1/L2/L3 code cache memory ( skipping additions / subtractions of memory data pointers and changing immediate values etc.)

   I confess I do not see how you would improve performance with
self-modifying code.  What information would you make use of at
run time that was not available at compile time?  Even given that I
am not being imaginative enough about such conditions, it would
seem difficult to improve performance by copying things around to
unroll loops or the like.

Quote
- Compressing code to be decompressed and executed at runtime. ( minimizes executable size )

   I have done this using the EXEPACK option of the linker.  Admittedly
for 16-bit code.  It reduced the executable's size from 63 kb to 25 kb.
As it was for the HP 200LX palmtop where storage can be limited.  So
it has a similar memory footprint as the uncompressed program, but
uses less storage space.

Regards,

Steve N.

Raistlin

  • Member
  • **
  • Posts: 233
Re: Self modifying code
« Reply #7 on: March 01, 2016, 12:49:59 AM »
Code: [Select]
- To improve performance by unrolling loops into L1/L2/L3 code cache memory ( skipping additions / subtractions of memory data pointers and changing immediate values etc.
I understand this statement & I am interested in the theory for the specific reason it was suggested
as I am working on a project that does "just that" -> re: enumerate underlying hardware dynamically.
My own path was to look at cache granularity with L1 code preservation and L1/2/3 Data optimizations.
This possible feature enhancement would be of use, if found to be practical - I for one would like to know more.
If there's a way we can chat about this and keep the riff-raff out of it - I'am game. :t



jimg

  • Member
  • **
  • Posts: 177
Re: Self modifying code
« Reply #8 on: March 01, 2016, 01:18:26 AM »
I used self modifying code in my sort routines to save space and decrease cpu ticks.  To code for the various possibilities in each type of sort, would take many compares and internal jumps to the correct section for the situation, which take time.  The alternative of duplicating the code for each type without the jumps would make the routine many times it's size.  Since the routine loops internally millions of times on a large sized chunk of data to sort, a simple precompiler at the start  that changes several compare type instructions and result handlers makes for a smaller and faster general purpose routine.

Besides, it's a lot of fun :)   

Siekmanski

  • Member
  • *****
  • Posts: 1079
Re: Self modifying code
« Reply #9 on: March 01, 2016, 01:49:37 AM »
Hi Steve,

The advantage could be, to prepare a routine on the fly ONCE, that runs the fastest possible on each different architecture.
You only need to know the available instruction sets, cores and cache sizes to construct the "fastest" code for the specific architecture it runs on.

Hi Raistlin,

This idea came up when I picked up my FFT routines again and wanted it to be as fast as possible.
BTW, found out a way to construct the time domain decomposition table without the need of bit-reversing.....
Special cases for the first 2 Log2 loops and the last Log2 loop.
Special case if Imaginary data is zero or not available if you need Real-Imag output.
Still have to write a Real FFT. ( where only Real data is processed )

So, you now know how many different routines are needed to accomplish the same thing...

Still no answer from Hutch if this is allowed to discuss here.....

Else, PM me your email address to discuss this further.

caballero

  • Member
  • ****
  • Posts: 757
    • Abre Ojos Ensamblador
Re: Self modifying code
« Reply #10 on: March 01, 2016, 02:35:53 AM »
Here it is a sample code. As we don't know yet if we could talk about it, i don't post the executable
Code: [Select]
org 100h
use16
start:
mov al,13h
int 10h
mov bl,3
mov si,0a0a0h
mov ds,si
again: mov cx,0c8bh
xor ch,[bx+si]
mov [si+0fec2h],ch
dec si
jnz again+1
int 16h ;ah=00h
xchg ax,bx
int 10h ;ax=0003h
ret
If you look closer, you will see "jnz again+1". One byte after the "again" label make the loop different:
Code: [Select]
010C 8B0C             MOV   CX,[SI]
010E 3228             XOR   CH,[BX+SI]
0110 88ACC2FE         MOV   [FEC2+SI],CH
0114 4E               DEC   SI
0115 75F5             JNZ    010C
If you compile and run the code (up to you) you will se a Sierpinski Triangle fractal, in just 29 bytes.
En un lugar de la Mancha de cuyo nombre no quiero acordarme

nidud

  • Member
  • *****
  • Posts: 1354
    • https://github.com/nidud/asmc
Re: Self modifying code
« Reply #11 on: March 01, 2016, 03:48:58 AM »
The conventional way will be hardware specific DLL-files where the application loads the appropriate ones on start-up.

If you think about it, having multiple versions of the same function and select one based on available hardware at runtime will create a larger EXE (at least in memory) than loading a DLL.

With regards to self-modifying code this forum is loaded with samples using VirtualProtect() to gain write access to the code segment so I wouldn’t worry too much about that  :P

Siekmanski

  • Member
  • *****
  • Posts: 1079
Re: Self modifying code
« Reply #12 on: March 01, 2016, 07:27:10 AM »
The conventional way will be hardware specific DLL-files where the application loads the appropriate ones on start-up.

If you think about it, having multiple versions of the same function and select one based on available hardware at runtime will create a larger EXE (at least in memory) than loading a DLL.

With regards to self-modifying code this forum is loaded with samples using VirtualProtect() to gain write access to the code segment so I wouldn’t worry too much about that  :P

 :biggrin: I just wanted to be sure out of respect for this great forum.


You need a lot of dll-files to for all the possible combinations of caches and instruction sets.

http://www.sandpile.org/x86/cpuid.htm#level_0000_0002h

This is why I want to create streamlined versions from the basic routines in memory.

jj2007

  • Member
  • *****
  • Posts: 7453
  • Assembler is fun ;-)
    • MasmBasic
Re: Self modifying code
« Reply #13 on: March 01, 2016, 02:17:19 PM »
You need a lot of dll-files to for all the possible combinations of caches and instruction sets.

One should expect, for example, that gdi32.dll and gdiplus.dll look different for major families of cpus, such as amd/intel i?/intel celeron etc ::)

Raistlin

  • Member
  • **
  • Posts: 233
Re: Self modifying code
« Reply #14 on: March 01, 2016, 05:19:13 PM »
Code: [Select]
BTW, found out a way to construct the time domain decomposition table without the need of bit-reversing....
Probably should'nt waste a post like this - BUT AWESOME ! - I'am all ears