News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Prefetch stalls and queue length tests

Started by Antariy, January 16, 2015, 06:47:37 AM

Previous topic - Next topic

FORTRANS

#15
Hi Alex,

   Things were set up better than I thought.  So it took less time
than I thought.  Tried to run on four systems.  Did not work well
on Windows 8.1.  Either Windows UAC or the antivirus hated it.
Cut and paste results from run.bat in the error message boxes.
Ended up locking up the Command Prompt once.  And the second
program did not seem to run.

P-III, Windows 2000

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 222648314


Test with the code from the same page but patching the data in the other section
Press a key after number appears
cycles for SEC1: 124562973
Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 232035935
Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 124891490

P-MMX, Windows 98

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 309385257

Test with the code from the same page but patching the data in the other section

Press a key after number appears
cycles for SEC1: 116441193

Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 242751843

Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 120369630

Pentium(R) M, Windows XP

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 239035281

Test with the code from the same page but patching the data in the other sectio

Press a key after number appears
cycles for SEC1: 129453196

Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 226025484

Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 128404990

i3, Windows 8.1 (Results may be bad)

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 157071075

Test with the code from the same page but patching the data in the other section
Press a key after number appears
cycles for SEC1: 138518480

Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 140490952

Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 138764775


HTH,

Steve N.

Edit:  Fixed label for P-MMX to Windows 98.

SRN

jj2007

Tests on i5:
cycles for SEC1: 140379273
cycles for SEC1: 109848725
cycles for SEC2: 138419402
cycles for SEC2: 134287359

Thanks to Dave for revealing the secret of the hanging application :bgrin:

Antariy

Hi Steve!

Quote from: FORTRANS on January 16, 2015, 09:19:54 AM
   Things were set up better than I thought.  So it took less time
than I thought.  Tried to run on four systems.  Did not work well
on Windows 8.1.  Either Windows UAC or the antivirus hated it.
Cut and paste results from run.bat in the error message boxes.
Ended up locking up the Command Prompt once.  And the second
program did not seem to run.

Thank you for the comprehensive tests :t As usually it is interesting how differently the CPUs behave.

PIII obviously has the influence of patch near to the running code, but the queue depth was not determined with the second program in every test, so, either the queue is too short or, and what is probably true, it just knows that the part patched isn't the code running.
PMMX (you mean Win98 there?) has also influence of patch near to the running code, and the stall is higher than with PIII, but still only ~3 cycles per patched dword. But it is probably due to short prefetching and not due to advanced prefetching/decoding.
Pentium M has influence too, and it and PIII both have that influence when the executable page changed, even when patching code is not near the patched place, so probably the stalls are mostly because of CPU checks what was actually patched and not prefetcher refill. This model of Pentium M based on PIII code?
i7 has influence, too, but it is very small, probably more or less CPUs just know that the patched place isn't the code running.

As a side note: it seems that the PIV, particularly Prescotts, are the slowest CPUs with TOOOOO deep pipelines and the logic of prefetching was really rude for that long pipelines - the CPU doesn't actually knows, what is patched, it just brutally refills the queue with fixed size length checking, and the refilling is very slow - more than 50 cycles for one patch.
Also it probably needed to rewrite second app a bit - to patch the actually executing code to see the stalls more precisely on every CPU.

Antariy

Hi Jochen!

Quote from: jj2007 on January 16, 2015, 04:59:10 PM
Tests on i5:
cycles for SEC1: 140379273
cycles for SEC1: 109848725
cycles for SEC2: 138419402
cycles for SEC2: 134287359

Thank you for the test! :t

And probably no differences in timings dump with the second program, as with other modern CPUs?

jj2007

#19
Quote from: Antariy on January 17, 2015, 06:56:57 AMAnd probably no differences in timings dump with the second program, as with other modern CPUs?

Hi Alex,

Attached i5 and Celeron M results for the other program.

FORTRANS

Hi Alex,

Quote from: Antariy on January 17, 2015, 06:54:16 AM
Thank you for the comprehensive tests :t As usually it is interesting how differently the CPUs behave.

   You're welcome.  Glad you got some useful information.

QuotePMMX (you mean Win98 there?)

   Yes.  P-MMX is running Windows 98.  Thanks for pointing out
the error.

QuoteThis model of Pentium M based on PIII code?

   I believe both are family 6 processors.  Here are some trimmed
results for the the first three reported above.  That program won't
run on Win 8.1 64-bit.

[Pentium III]

This processor: GenuineIntel
Processor Signature: 00000683h
Family Data: 006h
Model Data : 08h
Stepping : 3h

Maximum CPUID Standard and Extended Functions:
CPUID.(EAX=00h):EAX: 02h
CPUID.(EAX=80000000h):EAX: 03020101h

[Pentium M]

This processor: GenuineIntel
Brand String: Intel(R) Pentium(R) M processor 1.70GHz
Processor Signature: 000006D6h
Family Data: 006h
Model Data : 0Dh
Stepping : 6h

Maximum CPUID Standard and Extended Functions:
CPUID.(EAX=00h):EAX: 02h
CPUID.(EAX=80000000h):EAX: 80000004h

[Pentium MMX]

This processor: GenuineIntel Pentium(R)
Processor Signature: 00000543h
Family Data: 005h
Model Data : 04h
Stepping : 3h

Maximum CPUID Standard and Extended Functions:
CPUID.(EAX=00h):EAX: 01h
CPUID.(EAX=80000000h):EAX: 00000000h


Regards,

Steve N.

Antariy

Quote from: jj2007 on January 17, 2015, 08:36:14 AM
Quote from: Antariy on January 17, 2015, 06:56:57 AMAnd probably no differences in timings dump with the second program, as with other modern CPUs?
Attached i5 and Celeron M results for the other program.

Thank you, Jochen! :t Yes, the socond program shows no difference from the place of patch in timings. Celeron M though, seems to don't like a bit the code patching independently where it is patched.

Antariy

Hi Steve!

Quote from: FORTRANS on January 17, 2015, 09:28:07 AM
QuoteThis model of Pentium M based on PIII code?

   I believe both are family 6 processors.  Here are some trimmed
results for the the first three reported above.  That program won't
run on Win 8.1 64-bit.

I mistyped in the "code" word - was mean PIII core, but I think you did understand that correctly :t Yes, I thought about that that Pentium M model based on PIII core just because the behaviour of the CPUs are very same, so, Pentium M (and Celeron M) were the more optimized for performance and power saving "derives" from the desktop cores (different families), this similarity has told some notes on the core.