Author Topic: Code location sensitivity of timings  (Read 41608 times)

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Code location sensitivity of timings
« on: July 12, 2014, 09:15:44 PM »
deleted
« Last Edit: February 25, 2022, 08:23:35 AM by nidud »

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Code location sensitivity of timings
« Reply #1 on: July 12, 2014, 10:06:49 PM »
i suppose you could use VirtualProtect to allow writes in the .CODE section
then, copy the code under test into a common code address space before executing it

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898%28v=vs.85%29.aspx

to help speed testing up a little....
you could use different counter_begin loop count values for the short and long tests
i try to select a loop count that yields about 0.5 seconds per pass

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Code location sensitivity of timings
« Reply #2 on: July 12, 2014, 10:11:45 PM »
prescott w/htt - xp sp3
unaligned
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
4925905 cycles - 2048..4096  (164) A memcpy SSE 16
4933332 cycles - 2048..4096  (164) A memcpy SSE 16
4953203 cycles - 2048..4096  (164) A memcpy SSE 16
4963909 cycles - 2048..4096  (164) A memcpy SSE 16
4923198 cycles - 2048..4096  (164) A memcpy SSE 16
4941277 cycles - 2048..4096  (164) A memcpy SSE 16

11502669        cycles - 2048..4096  (164) U memcpy SSE 16
11487135        cycles - 2048..4096  (164) U memcpy SSE 16
11564951        cycles - 2048..4096  (164) U memcpy SSE 16
11570118        cycles - 2048..4096  (164) U memcpy SSE 16
11497558        cycles - 2048..4096  (164) U memcpy SSE 16
11526087        cycles - 2048..4096  (164) U memcpy SSE 16

aligned
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
4935114 cycles - 2048..4096  (164) A memcpy SSE 16
4934727 cycles - 2048..4096  (164) A memcpy SSE 16
4942523 cycles - 2048..4096  (164) A memcpy SSE 16
4924574 cycles - 2048..4096  (164) A memcpy SSE 16
4924658 cycles - 2048..4096  (164) A memcpy SSE 16
4937763 cycles - 2048..4096  (164) A memcpy SSE 16

11490869        cycles - 2048..4096  (164) U memcpy SSE 16
11481780        cycles - 2048..4096  (164) U memcpy SSE 16
11616596        cycles - 2048..4096  (164) U memcpy SSE 16
11530420        cycles - 2048..4096  (164) U memcpy SSE 16
11488318        cycles - 2048..4096  (164) U memcpy SSE 16
11504392        cycles - 2048..4096  (164) U memcpy SSE 16

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #3 on: July 12, 2014, 11:04:51 PM »
deleted
« Last Edit: February 25, 2022, 08:23:51 AM by nidud »

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Code location sensitivity of timings
« Reply #4 on: July 12, 2014, 11:21:46 PM »
hmmm - that seems wrong
i thought you had to be in a code section to assemble instructions
guess i've never tried it - lol

but - there is nothing to prevent you from putting a proc in the code section and copying it to another address
so long as you allow writes into the affected pages with VirtualProtect and PAGE_EXECUTE_READWRITE

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786%28v=vs.85%29.aspx

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #5 on: July 13, 2014, 02:28:37 AM »
deleted
« Last Edit: February 25, 2022, 08:24:03 AM by nidud »

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Code location sensitivity of timings
« Reply #6 on: July 13, 2014, 03:03:15 AM »
i hadn't thought about that
but - most win32 CALL's are near relative
so, you'd have to translate the target addresses

but - for testing algorithm code that doesn't make any calls, like loops, etc,
the branch addresses are relative, but the target moves with the code   :P

if you wanted to use calls or invokes in movable code,
you could store the branch address and use CALL DWORD PTR lpfnFunction
or MOV EAX,Function and CALL EAX

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #7 on: July 13, 2014, 04:10:22 AM »
deleted
« Last Edit: February 25, 2022, 08:24:17 AM by nidud »

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #8 on: July 14, 2014, 09:39:03 PM »
deleted
« Last Edit: February 25, 2022, 08:24:31 AM by nidud »

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #9 on: July 14, 2014, 10:09:22 PM »
deleted
« Last Edit: February 25, 2022, 08:24:46 AM by nidud »

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Code location sensitivity of timings
« Reply #10 on: July 15, 2014, 02:10:35 AM »
hard to test individual instructions
so much relies on the surrounding code

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #11 on: July 15, 2014, 05:46:29 AM »
deleted
« Last Edit: February 25, 2022, 08:25:00 AM by nidud »

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #12 on: July 16, 2014, 08:15:11 AM »
deleted
« Last Edit: February 25, 2022, 08:25:15 AM by nidud »

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #13 on: July 20, 2014, 11:43:43 PM »
deleted
« Last Edit: February 25, 2022, 08:25:30 AM by nidud »

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Code location sensitivity of timings
« Reply #14 on: July 21, 2014, 04:40:44 AM »
those tests run way too fast to get reliable readings

for best results:
1) bind to a single core
2) wait about 750 mS before performing any tests - this allows the system to settle
3) adjust individual loop counts so that each test pass takes about 0.5 seconds

Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
STRCHR-------------------------------------------
170724  cycles - (  0) 0: crt_strchr
172046  cycles - ( 29) 1: x
70540   cycles - (119) 2: 'c'
62934   cycles - (107) 3: 'cccc'

170343  cycles - (  0) 0: crt_strchr
167063  cycles - ( 29) 1: x
89640   cycles - (119) 2: 'c'
82154   cycles - (107) 3: 'cccc'

240068  cycles - (  0) 0: crt_strchr
87783   cycles - ( 29) 1: x
25549   cycles - (119) 2: 'c'
23995   cycles - (107) 3: 'cccc'
--- ok ---

H:\nidudString\string\strchr => strchr

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
STRCHR-------------------------------------------
209073  cycles - (  0) 0: crt_strchr
221060  cycles - ( 29) 1: x
80107   cycles - (119) 2: 'c'
83854   cycles - (107) 3: 'cccc'

198648  cycles - (  0) 0: crt_strchr
211263  cycles - ( 29) 1: x
106992  cycles - (119) 2: 'c'
96168   cycles - (107) 3: 'cccc'

253182  cycles - (  0) 0: crt_strchr
84552   cycles - ( 29) 1: x
26531   cycles - (119) 2: 'c'
36056   cycles - (107) 3: 'cccc'

they're all over the place   :P