Code location sensitivity of timings

nidud · July 12, 2014, 09:15:44 PM

From Wayback Machine, this thread from before nidud's mass deletion of all of his posts
niduds deleted posts from this thread
Unfortunately the zip files were not archived (by Wayback Machine) and will not work.

dedndave · July 12, 2014, 10:06:49 PM

i suppose you could use VirtualProtect to allow writes in the .CODE section
then, copy the code under test into a common code address space before executing it

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898%28v=vs.85%29.aspx

to help speed testing up a little....
you could use different counter_begin loop count values for the short and long tests
i try to select a loop count that yields about 0.5 seconds per pass

dedndave · July 12, 2014, 10:11:45 PM

prescott w/htt - xp sp3
unaligned

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
4925905 cycles - 2048..4096  (164) A memcpy SSE 16
4933332 cycles - 2048..4096  (164) A memcpy SSE 16
4953203 cycles - 2048..4096  (164) A memcpy SSE 16
4963909 cycles - 2048..4096  (164) A memcpy SSE 16
4923198 cycles - 2048..4096  (164) A memcpy SSE 16
4941277 cycles - 2048..4096  (164) A memcpy SSE 16

11502669        cycles - 2048..4096  (164) U memcpy SSE 16
11487135        cycles - 2048..4096  (164) U memcpy SSE 16
11564951        cycles - 2048..4096  (164) U memcpy SSE 16
11570118        cycles - 2048..4096  (164) U memcpy SSE 16
11497558        cycles - 2048..4096  (164) U memcpy SSE 16
11526087        cycles - 2048..4096  (164) U memcpy SSE 16

aligned

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
4935114 cycles - 2048..4096  (164) A memcpy SSE 16
4934727 cycles - 2048..4096  (164) A memcpy SSE 16
4942523 cycles - 2048..4096  (164) A memcpy SSE 16
4924574 cycles - 2048..4096  (164) A memcpy SSE 16
4924658 cycles - 2048..4096  (164) A memcpy SSE 16
4937763 cycles - 2048..4096  (164) A memcpy SSE 16

11490869        cycles - 2048..4096  (164) U memcpy SSE 16
11481780        cycles - 2048..4096  (164) U memcpy SSE 16
11616596        cycles - 2048..4096  (164) U memcpy SSE 16
11530420        cycles - 2048..4096  (164) U memcpy SSE 16
11488318        cycles - 2048..4096  (164) U memcpy SSE 16
11504392        cycles - 2048..4096  (164) U memcpy SSE 16

nidud · July 12, 2014, 11:04:51 PM

deleted

dedndave · July 12, 2014, 11:21:46 PM

hmmm - that seems wrong
i thought you had to be in a code section to assemble instructions
guess i've never tried it - lol

but - there is nothing to prevent you from putting a proc in the code section and copying it to another address
so long as you allow writes into the affected pages with VirtualProtect and PAGE_EXECUTE_READWRITE

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786%28v=vs.85%29.aspx

nidud · July 13, 2014, 02:28:37 AM

deleted

dedndave · July 13, 2014, 03:03:15 AM

i hadn't thought about that
but - most win32 CALL's are near relative
so, you'd have to translate the target addresses

but - for testing algorithm code that doesn't make any calls, like loops, etc,
the branch addresses are relative, but the target moves with the code :P

if you wanted to use calls or invokes in movable code,
you could store the branch address and use CALL DWORD PTR lpfnFunction
or MOV EAX,Function and CALL EAX

nidud · July 13, 2014, 04:10:22 AM

deleted

nidud · July 14, 2014, 09:39:03 PM

deleted

nidud · July 14, 2014, 10:09:22 PM

deleted

dedndave · July 15, 2014, 02:10:35 AM

hard to test individual instructions
so much relies on the surrounding code

nidud · July 15, 2014, 05:46:29 AM

deleted

nidud · July 16, 2014, 08:15:11 AM

deleted

nidud · July 20, 2014, 11:43:43 PM

deleted

dedndave · July 21, 2014, 04:40:44 AM

those tests run way too fast to get reliable readings

for best results:
1) bind to a single core
2) wait about 750 mS before performing any tests - this allows the system to settle
3) adjust individual loop counts so that each test pass takes about 0.5 seconds

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
STRCHR-------------------------------------------
170724  cycles - (  0) 0: crt_strchr
172046  cycles - ( 29) 1: x
70540   cycles - (119) 2: 'c'
62934   cycles - (107) 3: 'cccc'

170343  cycles - (  0) 0: crt_strchr
167063  cycles - ( 29) 1: x
89640   cycles - (119) 2: 'c'
82154   cycles - (107) 3: 'cccc'

240068  cycles - (  0) 0: crt_strchr
87783   cycles - ( 29) 1: x
25549   cycles - (119) 2: 'c'
23995   cycles - (107) 3: 'cccc'
--- ok ---

H:\nidudString\string\strchr => strchr

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
STRCHR-------------------------------------------
209073  cycles - (  0) 0: crt_strchr
221060  cycles - ( 29) 1: x
80107   cycles - (119) 2: 'c'
83854   cycles - (107) 3: 'cccc'

198648  cycles - (  0) 0: crt_strchr
211263  cycles - ( 29) 1: x
106992  cycles - (119) 2: 'c'
96168   cycles - (107) 3: 'cccc'

253182  cycles - (  0) 0: crt_strchr
84552   cycles - ( 29) 1: x
26531   cycles - (119) 2: 'c'
36056   cycles - (107) 3: 'cccc'

they're all over the place :P

The MASM Forum

News:

Code location sensitivity of timings

nidud

dedndave

dedndave

nidud

dedndave

nidud

dedndave

nidud

nidud

nidud

dedndave

nidud

nidud

nidud

dedndave