News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Self-modifying code example

Started by Magnum, January 12, 2015, 04:37:25 PM

Previous topic - Next topic

Antariy

Quote from: dedndave on January 15, 2015, 04:32:15 AM
to be honest, Alex, i don't know enough about code caches on modern CPU's to answer that
i can understand that data cache hits and misses are accounted for
but, the processors may not be designed for code that changes on the fly - lol

From my experiency and analisis on this - you may freely rely on that the x86 machine will update the prefetch queue :t
Actually I'm pretty sure that the API function which flushes the code cache after a patch, and which is "strongly recommended" to be used after a patch, was introduced in the time when WinNT line was cross-platform, and intended to run on multiple types of CPUs, not only x86, so this API was a "general advice" for use because of "portability" ::) I.e., maybe some CPUs which were supported by early WinNTs were not so "advanced" to detect code changes, so as a "generic habit" it was forced to use the API on every platform supported. It becomes a popular belief - "patch a code, then flush the code cashe with API" - but I guess, for every x86 CPU with caching (i486+), and for sure for P6+, the CPU does this itself. And this is not a "guess" - this is the rule, just it is not in the same "line" as the popular belief/habit to use the API.


Quote from: dedndave on January 15, 2015, 04:32:15 AM
i can tell you that it was a very real problem with 8088, 80286, and probably 80386
when the 80486 came out - it became a much more complicated prediction

Yes, I think i386 had that problem too, and things changed after release of the CPU with "real" caching mechanism. No doubts about early CPUs, I was asking about the belief that code patch requires flushing the cache on modern CPUs. Probably that become a "tradition" and the roots are in the early WinNT architecture.

Real problem with patching the PAGE (not the CODE even) on modern CPUs is that if the patched page contains currently running code, then code will run with big stalls after every page write hit (there is no dependence, where is located the patching code and which thread runs it).
Maybe there are side effects like code in the next to the patched page will run with slowdowns, too - depending on how the CPU does the caching, i.e. how far it goes in instructions grabbing and decoding. That is a reason why Intel recommends to put the data after an uncoditional jump in the code, if the data is required to be put in the code just after it - so the CPU just redirected over the data and do not tries to interpret it (if it tries at all). Maybe such kind of data could be patched without stalls, but not sure, and didn't checked that.

But if the code requires to be patched (debugging, on-the-fly fixing, optimization for the data etc), and that is not frequent operation on frequently called time-critical code, it's usable under Windows and is legal.

FORTRANS

Hi,

   I just briefly checked the book "The Undocumented PC", and
he was trying to detect the size of the prefetch queue for a
number of CPU's.  His method using REP STOSB worked up to
the Pentium.  For the Pentium he had to use a quirk of the A20
address line as well to get the size of the queue.  The Pentium
Pro did not allow that either, so he thought that the queue would
be flushed after a write on any newer CPU.  Of course the Pentium
Pro was the newest processor at that time.

Cheers,

Steve N.

Antariy

Hi Steve :t

Quote from: FORTRANS on January 15, 2015, 09:29:38 AM
Hi,

   I just briefly checked the book "The Undocumented PC", and
he was trying to detect the size of the prefetch queue for a
number of CPU's.  His method using REP STOSB worked up to
the Pentium.  For the Pentium he had to use a quirk of the A20
address line as well to get the size of the queue.  The Pentium
Pro did not allow that either, so he thought that the queue would
be flushed after a write on any newer CPU.  Of course the Pentium
Pro was the newest processor at that time.

Cheers,

Steve N.

Thank you for an info!

So that's the P6+ which flushes the prefetcher for sure. But the other interesting question is: from the written it seems that the tests were in the real-mode? How else the author manipulated with A20 line?

Maybe the P6+ is the first family which support in hardware the watching of what is changed, even in real mode (that is a reason which sounds true because of P6 was the first real OoO CPU and first CPU with prediction, so it obviously had architectural advances in the side of memory-cache system tracing). But the other question is: if the similar tests will run on P5 in the protected mode with page addressation, will they show the same, or maybe P5, or even more earlier CPU, will refill the prefetching queue, too? Different addressing and operating modes may would have been a reason of different behaviour, and if at some point for designers it was not too important to check what was changed (in real mode where is no protection, the watching system at all might have been looking as superfluous, slowing down the operations, so, for speed up in early models it was turned off in the real mode operation (this is just a guess), but with release of P6 there was a global change in the cache-memory subsystem, so the feature of prefetcher refilling was working in real mode too).

FORTRANS

Quote from: Antariy on January 15, 2015, 09:57:22 AM
Hi Steve

Hi Alex,

Quote

Thank you for an info!

So that's the P6+ which flushes the prefetcher for sure. But the other interesting question is: from the written it seems that the tests were in the real-mode? How else the author manipulated with A20 line?

   Yes, it seems to be all real mode MS-DOS programs.  But I
don't see why it could not be protected code if he (someone)
wrote his own boot loader.

Quote
Maybe the P6+ is the first family which support in hardware the watching of what is changed, even in real mode (that is a reason which sounds true because of P6 was the first real OoO CPU and first CPU with prediction, so it obviously had architectural advances in the side of memory-cache system tracing).

   He says the Pentium Pro (P6) manual states that the A20 control
logic "is provided inside the CPU".  And it is external to the Pentium
(P5) on the system he used.

Quote
But the other question is: if the similar tests will run on P5 in the protected mode with page address[ing], will they show the same, or maybe P5, or even more earlier CPU, will refill the prefetching queue, too?

   Interesting question.  I have both P5 and P6 systems if that
helps any.  I can boot them into DOS for real mode tests.  But
van Gilluwe already did that in "The Undocumented PC".

   If I boot Windows 2000 or OS/2, I know that it is in a protected
mode, even for DOS program execution.  But I do not know what
PM features are being used.  And I assume the same for DOS
programs under Windows 98.  So a test may work or not.  But
is that all that can be inferred?

   So I can run some limited tests if you want.  Though it may take
a little time to set a system back up to use.

Cheers,

Steve N.

FORTRANS

Quote from: dedndave on January 14, 2015, 06:29:15 AM
i am actually surprised how well my old stuff works under NTVDM
only 1 program gives me troubles - and i'm not sure of the cause

Hi Dave,

   What are the problems?  Checking my memory (ha!) I have seen
the following problems.  Unsupported BIOS functions, timer related
Int15H, fn 86H, wait doesn't work when I wanted a delay.  Some
others w.r.t. memory?  I thought command line parsing was broken,
so I wrote a work around, but I can't reproduce that now.  And I tried
to stuff the keyboard buffer when asking for user input, and failed
there.  HLT doesn't work as expected.  And as you pointed out,
calling the BIOS keyboard functions caused 100% CPU usage.

   There are some other DOS versus VDM and NTVDM emulation
foibles, but I don't remember them well enough to enumerate
correctly.

Cheers,

Steve N.

dedndave

a number of my old programs revector interrupts - work fine   :P

Antariy

Hi Steve,

Quote from: FORTRANS on January 15, 2015, 11:49:39 PM
Quote

Thank you for an info!

So that's the P6+ which flushes the prefetcher for sure. But the other interesting question is: from the written it seems that the tests were in the real-mode? How else the author manipulated with A20 line?

   Yes, it seems to be all real mode MS-DOS programs.  But I
don't see why it could not be protected code if he (someone)
wrote his own boot loader.

Writing the minimalistic OS to run the test code in protected mode with page addressation enabled is too ineffecient, probably.


Quote from: FORTRANS on January 15, 2015, 11:49:39 PM
   Interesting question.  I have both P5 and P6 systems if that
helps any.  I can boot them into DOS for real mode tests.  But
van Gilluwe already did that in "The Undocumented PC".

   If I boot Windows 2000 or OS/2, I know that it is in a protected
mode, even for DOS program execution.  But I do not know what
PM features are being used.  And I assume the same for DOS
programs under Windows 98.  So a test may work or not.  But
is that all that can be inferred?

   So I can run some limited tests if you want.  Though it may take
a little time to set a system back up to use.

Yes, I've prepared two tests, and if you would like - when you'll fire your systems up, run the tests http://masm32.com/board/index.php?topic=3960
I've described what each of them displays in the thread.