Author Topic: Prefetch stalls and queue length tests  (Read 5444 times)

FORTRANS

  • Member
  • ****
  • Posts: 945
Re: Prefetch stalls and queue length tests
« Reply #15 on: January 16, 2015, 09:19:54 AM »
Hi Alex,

   Things were set up better than I thought.  So it took less time
than I thought.  Tried to run on four systems.  Did not work well
on Windows 8.1.  Either Windows UAC or the antivirus hated it.
Cut and paste results from run.bat in the error message boxes.
Ended up locking up the Command Prompt once.  And the second
program did not seem to run.

Code: [Select]
P-III, Windows 2000

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 222648314


Test with the code from the same page but patching the data in the other section
Press a key after number appears
cycles for SEC1: 124562973
Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 232035935
Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 124891490

P-MMX, Windows 98

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 309385257

Test with the code from the same page but patching the data in the other section

Press a key after number appears
cycles for SEC1: 116441193

Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 242751843

Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 120369630

Pentium(R) M, Windows XP

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 239035281

Test with the code from the same page but patching the data in the other sectio

Press a key after number appears
cycles for SEC1: 129453196

Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 226025484

Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 128404990

i3, Windows 8.1 (Results may be bad)

Test with code self-patch the page where it runs
Press a key after number appears
cycles for SEC1: 157071075

Test with the code from the same page but patching the data in the other section
Press a key after number appears
cycles for SEC1: 138518480

Test with call to the external page which patches the other page with the code
Press a key after number appears
cycles for SEC2: 140490952

Test with call to the external page which patches the data from the other page
Press a key after number appears to EXIT
cycles for SEC2: 138764775

HTH,

Steve N.

Edit:  Fixed label for P-MMX to Windows 98.

SRN
« Last Edit: January 17, 2015, 09:11:13 AM by FORTRANS »

jj2007

  • Member
  • *****
  • Posts: 7552
  • Assembler is fun ;-)
    • MasmBasic
Re: Prefetch stalls and queue length tests
« Reply #16 on: January 16, 2015, 04:59:10 PM »
Tests on i5:
cycles for SEC1: 140379273
cycles for SEC1: 109848725
cycles for SEC2: 138419402
cycles for SEC2: 134287359

Thanks to Dave for revealing the secret of the hanging application :bgrin:

Antariy

  • Member
  • ****
  • Posts: 541
Re: Prefetch stalls and queue length tests
« Reply #17 on: January 17, 2015, 06:54:16 AM »
Hi Steve!

   Things were set up better than I thought.  So it took less time
than I thought.  Tried to run on four systems.  Did not work well
on Windows 8.1.  Either Windows UAC or the antivirus hated it.
Cut and paste results from run.bat in the error message boxes.
Ended up locking up the Command Prompt once.  And the second
program did not seem to run.

Thank you for the comprehensive tests :t As usually it is interesting how differently the CPUs behave.

PIII obviously has the influence of patch near to the running code, but the queue depth was not determined with the second program in every test, so, either the queue is too short or, and what is probably true, it just knows that the part patched isn't the code running.
PMMX (you mean Win98 there?) has also influence of patch near to the running code, and the stall is higher than with PIII, but still only ~3 cycles per patched dword. But it is probably due to short prefetching and not due to advanced prefetching/decoding.
Pentium M has influence too, and it and PIII both have that influence when the executable page changed, even when patching code is not near the patched place, so probably the stalls are mostly because of CPU checks what was actually patched and not prefetcher refill. This model of Pentium M based on PIII code?
i7 has influence, too, but it is very small, probably more or less CPUs just know that the patched place isn't the code running.

As a side note: it seems that the PIV, particularly Prescotts, are the slowest CPUs with TOOOOO deep pipelines and the logic of prefetching was really rude for that long pipelines - the CPU doesn't actually knows, what is patched, it just brutally refills the queue with fixed size length checking, and the refilling is very slow - more than 50 cycles for one patch.
Also it probably needed to rewrite second app a bit - to patch the actually executing code to see the stalls more precisely on every CPU.

Antariy

  • Member
  • ****
  • Posts: 541
Re: Prefetch stalls and queue length tests
« Reply #18 on: January 17, 2015, 06:56:57 AM »
Hi Jochen!

Tests on i5:
cycles for SEC1: 140379273
cycles for SEC1: 109848725
cycles for SEC2: 138419402
cycles for SEC2: 134287359

Thank you for the test! :t

And probably no differences in timings dump with the second program, as with other modern CPUs?

jj2007

  • Member
  • *****
  • Posts: 7552
  • Assembler is fun ;-)
    • MasmBasic
Re: Prefetch stalls and queue length tests
« Reply #19 on: January 17, 2015, 08:36:14 AM »
And probably no differences in timings dump with the second program, as with other modern CPUs?

Hi Alex,

Attached i5 and Celeron M results for the other program.
« Last Edit: January 17, 2015, 10:06:40 AM by jj2007 »

FORTRANS

  • Member
  • ****
  • Posts: 945
Re: Prefetch stalls and queue length tests
« Reply #20 on: January 17, 2015, 09:28:07 AM »
Hi Alex,

Thank you for the comprehensive tests :t As usually it is interesting how differently the CPUs behave.

   You're welcome.  Glad you got some useful information.

Quote
PMMX (you mean Win98 there?)

   Yes.  P-MMX is running Windows 98.  Thanks for pointing out
the error.

Quote
This model of Pentium M based on PIII code?

   I believe both are family 6 processors.  Here are some trimmed
results for the the first three reported above.  That program won't
run on Win 8.1 64-bit.

Code: [Select]
[Pentium III]

This processor: GenuineIntel
Processor Signature: 00000683h
 Family Data: 006h
 Model Data : 08h
 Stepping : 3h

Maximum CPUID Standard and Extended Functions:
 CPUID.(EAX=00h):EAX: 02h
 CPUID.(EAX=80000000h):EAX: 03020101h

[Pentium M]

This processor: GenuineIntel
 Brand String: Intel(R) Pentium(R) M processor 1.70GHz
Processor Signature: 000006D6h
 Family Data: 006h
 Model Data : 0Dh
 Stepping : 6h

Maximum CPUID Standard and Extended Functions:
 CPUID.(EAX=00h):EAX: 02h
 CPUID.(EAX=80000000h):EAX: 80000004h

[Pentium MMX]

This processor: GenuineIntel Pentium(R)
Processor Signature: 00000543h
 Family Data: 005h
 Model Data : 04h
 Stepping : 3h

Maximum CPUID Standard and Extended Functions:
 CPUID.(EAX=00h):EAX: 01h
 CPUID.(EAX=80000000h):EAX: 00000000h

Regards,

Steve N.

Antariy

  • Member
  • ****
  • Posts: 541
Re: Prefetch stalls and queue length tests
« Reply #21 on: January 18, 2015, 07:18:56 AM »
And probably no differences in timings dump with the second program, as with other modern CPUs?
Attached i5 and Celeron M results for the other program.

Thank you, Jochen! :t Yes, the socond program shows no difference from the place of patch in timings. Celeron M though, seems to don't like a bit the code patching independently where it is patched.

Antariy

  • Member
  • ****
  • Posts: 541
Re: Prefetch stalls and queue length tests
« Reply #22 on: January 18, 2015, 07:27:48 AM »
Hi Steve!

Quote
This model of Pentium M based on PIII code?

   I believe both are family 6 processors.  Here are some trimmed
results for the the first three reported above.  That program won't
run on Win 8.1 64-bit.

I mistyped in the "code" word - was mean PIII core, but I think you did understand that correctly :t Yes, I thought about that that Pentium M model based on PIII core just because the behaviour of the CPUs are very same, so, Pentium M (and Celeron M) were the more optimized for performance and power saving "derives" from the desktop cores (different families), this similarity has told some notes on the core.