News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Optimizing some code

Started by RuiLoureiro, June 10, 2014, 06:54:45 PM

Previous topic - Next topic

Gunther

Hi nidud,

results for strlen4:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
18212   cycles - 0: standard (scasb)
1871    cycles - 1: AgnerFog
1962    cycles - 2: AgnerFog (unaligned)
2789    cycles - 3: Dave

11414   cycles - 0: standard (scasb)
4372    cycles - 1: AgnerFog
4257    cycles - 2: AgnerFog (unaligned)
6608    cycles - 3: Dave

17074   cycles - 0: standard (scasb)
4358    cycles - 1: AgnerFog
4204    cycles - 2: AgnerFog (unaligned)
6618    cycles - 3: Dave

--- ok ---


Gunther
You have to know the facts before you can distort them.

FORTRANS

Hi,

   The first time it is run, it is slow on the first test.

Regards,

Steve N.

First run.

pre-P4 (SSE1)
------------------------------------------------------
315620  cycles - 0: standard (scasb)
5001    cycles - 1: AgnerFog
4967    cycles - 2: AgnerFog (unaligned)
6741    cycles - 3: Dave

11665   cycles - 0: standard (scasb)
4997    cycles - 1: AgnerFog
4975    cycles - 2: AgnerFog (unaligned)
6764    cycles - 3: Dave

11684   cycles - 0: standard (scasb)
4993    cycles - 1: AgnerFog
4967    cycles - 2: AgnerFog (unaligned)
6780    cycles - 3: Dave

--- ok ---

Second run

pre-P4 (SSE1)
------------------------------------------------------
11813   cycles - 0: standard (scasb)
4992    cycles - 1: AgnerFog
4968    cycles - 2: AgnerFog (unaligned)
6755    cycles - 3: Dave

11679   cycles - 0: standard (scasb)
4985    cycles - 1: AgnerFog
4966    cycles - 2: AgnerFog (unaligned)
6768    cycles - 3: Dave

11696   cycles - 0: standard (scasb)
4993    cycles - 1: AgnerFog
4977    cycles - 2: AgnerFog (unaligned)
6758    cycles - 3: Dave

--- ok ---


Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
------------------------------------------------------
296692 cycles - 0: standard (scasb)
4193 cycles - 1: AgnerFog
4078 cycles - 2: AgnerFog (unaligned)
6334 cycles - 3: Dave

11671 cycles - 0: standard (scasb)
4148 cycles - 1: AgnerFog
4064 cycles - 2: AgnerFog (unaligned)
6268 cycles - 3: Dave

11675 cycles - 0: standard (scasb)
4209 cycles - 1: AgnerFog
4092 cycles - 2: AgnerFog (unaligned)
6357 cycles - 3: Dave

--- ok --- 
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
------------------------------------------------------
11809 cycles - 0: standard (scasb)
4187 cycles - 1: AgnerFog
4093 cycles - 2: AgnerFog (unaligned)
6219 cycles - 3: Dave

11793 cycles - 0: standard (scasb)
4147 cycles - 1: AgnerFog
4074 cycles - 2: AgnerFog (unaligned)
6100 cycles - 3: Dave

11810 cycles - 0: standard (scasb)
4211 cycles - 1: AgnerFog
4092 cycles - 2: AgnerFog (unaligned)
6225 cycles - 3: Dave

--- ok ---

LarryC


Intel(R) Core(TM) i7 CPU         960  @ 3.20GHz (SSE4)
------------------------------------------------------
10025   cycles - 0: standard (scasb)
6427    cycles - 1: AgnerFog
6444    cycles - 2: AgnerFog (unaligned)
11482   cycles - 3: Dave

15319   cycles - 0: standard (scasb)
7873    cycles - 1: AgnerFog
6176    cycles - 2: AgnerFog (unaligned)
11058   cycles - 3: Dave

14978   cycles - 0: standard (scasb)
7816    cycles - 1: AgnerFog
6251    cycles - 2: AgnerFog (unaligned)
10995   cycles - 3: Dave

--- ok ---

nidud

#48
deleted

Gunther

Hi nidud,

strlen5:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
22200   cycles - 0: standard (scasb)
10776   cycles - 3: Dave
10271   cycles - 5: MB - len()
7120    cycles - 1: AgnerFog
7264    cycles - 2: AgnerFog (unaligned)
3007    cycles - 6: MB - Len() SSE
2280    cycles - 4: unaligned SSE2

21590   cycles - 0: standard (scasb)
10616   cycles - 3: Dave
10136   cycles - 5: MB - len()
7059    cycles - 1: AgnerFog
17323   cycles - 2: AgnerFog (unaligned)
7226    cycles - 6: MB - Len() SSE
5413    cycles - 4: unaligned SSE2

52253   cycles - 0: standard (scasb)
25722   cycles - 3: Dave
24451   cycles - 5: MB - len()
17339   cycles - 1: AgnerFog
17349   cycles - 2: AgnerFog (unaligned)
7205    cycles - 6: MB - Len() SSE
6116    cycles - 4: unaligned SSE2

--- ok ---


Gunther
You have to know the facts before you can distort them.

nidud

#50
deleted

dedndave

strlen5
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
86037   cycles - 0: standard (scasb)
31180   cycles - 3: Dave
33575   cycles - 5: MB - len()
23079   cycles - 1: AgnerFog
25595   cycles - 2: AgnerFog (unaligned)
21374   cycles - 6: MB - Len() SSE
18166   cycles - 4: unaligned SSE2

49577   cycles - 0: standard (scasb)
31080   cycles - 3: Dave
32727   cycles - 5: MB - len()
23139   cycles - 1: AgnerFog
25405   cycles - 2: AgnerFog (unaligned)
21643   cycles - 6: MB - Len() SSE
18152   cycles - 4: unaligned SSE2

49638   cycles - 0: standard (scasb)
31000   cycles - 3: Dave
32762   cycles - 5: MB - len()
23151   cycles - 1: AgnerFog
31292   cycles - 2: AgnerFog (unaligned)
21172   cycles - 6: MB - Len() SSE
18204   cycles - 4: unaligned SSE2