Author Topic: Code location sensitivity of timings  (Read 39514 times)

Gunther

  • Member
  • *****
  • Posts: 4052
  • Forgive your enemies, but never forget their names
Re: Code location sensitivity of timings
« Reply #60 on: August 12, 2014, 01:58:59 AM »
Hi nidud,

Is this possible ? to have SSE4.1 and not SSE3 ?

Note: SSE and SSE2 are pre-set since the program will exit if SSE2 is not present, so this bit must be set by the test.

I think not. Did you try that? It should show you the available instruction sets.

Gunther
Get your facts first, and then you can distort them.

nidud

  • Member
  • *****
  • Posts: 2390
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #61 on: August 12, 2014, 03:01:17 AM »
deleted
« Last Edit: February 25, 2022, 08:31:38 AM by nidud »

Gunther

  • Member
  • *****
  • Posts: 4052
  • Forgive your enemies, but never forget their names
Re: Code location sensitivity of timings
« Reply #62 on: August 12, 2014, 08:49:00 AM »
Hi nidud,

you can trust my instruction detecting application. Your laptop supports in any case SSE3 and SSSE3 and it supports AVX. You can test that with that tool, if you've at least Windows 7 with SP1 installed. The glitch must be in your code. Do you test the right bits?

Gunther
Get your facts first, and then you can distort them.

nidud

  • Member
  • *****
  • Posts: 2390
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #63 on: August 12, 2014, 09:27:19 AM »
deleted
« Last Edit: February 25, 2022, 08:31:52 AM by nidud »

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1196
Re: Code location sensitivity of timings
« Reply #64 on: August 12, 2014, 10:00:55 AM »
Hi Gunther,

My Core-i3 G3220 does not support AVX, but the results for your instruction set detection tool:
Code: [Select]
Supported by Processor and installed Operating System:
------------------------------------------------------

     MMX, CMOV and FCOMI, SSE, SSE2, SSE3, SSSE3, SSE4.1,
     POPCNT, SSE4.2

     featurenumber = 13

Appear to match the Intel specs:

http://ark.intel.com/products/77773
Well Microsoft, here’s another nice mess you’ve gotten us into.

Gunther

  • Member
  • *****
  • Posts: 4052
  • Forgive your enemies, but never forget their names
Re: Code location sensitivity of timings
« Reply #65 on: August 12, 2014, 11:02:35 AM »
Hi Michael,

Appear to match the Intel specs:

http://ark.intel.com/products/77773

I hope so. I've written the procedure using the Intel documents as a basis.

Gunther
Get your facts first, and then you can distort them.

nidud

  • Member
  • *****
  • Posts: 2390
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #66 on: August 18, 2014, 09:39:19 PM »
deleted
« Last Edit: February 25, 2022, 08:32:09 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 12481
  • Assembler is fun ;-)
    • MasmBasic
Re: Code location sensitivity of timings
« Reply #67 on: August 18, 2014, 10:36:29 PM »
Code: [Select]
  movzx eax, byte ptr [esp+8]
  if 1
imul eax, eax, 01010101h ; 4 bytes shorter, faster
  else
mov ah, al
mov ecx, eax
shl eax, 16
add eax, ecx
  endif
  movd xmm0, eax
  pshufd xmm0, xmm0, 0 ; populate char

nidud

  • Member
  • *****
  • Posts: 2390
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #68 on: August 18, 2014, 11:40:39 PM »
deleted
« Last Edit: February 25, 2022, 08:32:26 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 12481
  • Assembler is fun ;-)
    • MasmBasic
Re: Code location sensitivity of timings
« Reply #69 on: August 19, 2014, 01:40:54 AM »
Variants of memchr:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

43778   cycles for 100 * memchr scasb
4474    cycles for 100 * memchr SSE2a
5608    cycles for 100 * memchr SSE2b

43994   cycles for 100 * memchr scasb
4497    cycles for 100 * memchr SSE2a
5602    cycles for 100 * memchr SSE2b

44044   cycles for 100 * memchr scasb
4474    cycles for 100 * memchr SSE2a
5598    cycles for 100 * memchr SSE2b

36      bytes for memchr scasb
88      bytes for memchr SSE2a
92      bytes for memchr SSE2b


Could look much different on other CPUs, as movlps speeds it up a lot on my CPU.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1196
Re: Code location sensitivity of timings
« Reply #70 on: August 19, 2014, 02:27:49 AM »
My processor is near (or at) bottom-end today (retail box, $79).
Code: [Select]
Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)

24909   cycles for 100 * memchr scasb
2864    cycles for 100 * memchr SSE2a
2399    cycles for 100 * memchr SSE2b

24934   cycles for 100 * memchr scasb
2882    cycles for 100 * memchr SSE2a
2366    cycles for 100 * memchr SSE2b

24923   cycles for 100 * memchr scasb
2886    cycles for 100 * memchr SSE2a
2418    cycles for 100 * memchr SSE2b

36      bytes for memchr scasb
88      bytes for memchr SSE2a
92      bytes for memchr SSE2b

96      = eax memchr scasb
96      = eax memchr SSE2a
96      = eax memchr SSE2b
Well Microsoft, here’s another nice mess you’ve gotten us into.

nidud

  • Member
  • *****
  • Posts: 2390
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #71 on: August 19, 2014, 03:12:24 AM »
deleted
« Last Edit: February 25, 2022, 08:32:41 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 12481
  • Assembler is fun ;-)
    • MasmBasic
Re: Code location sensitivity of timings
« Reply #72 on: August 19, 2014, 03:39:31 AM »
Thanks. As I suspected, the movlps/movhps pair is good only for my trusty Celeron  :(
Here is one more, with movups instead:
Code: [Select]
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
43821   cycles for 100 * memchr scasb
4477    cycles for 100 * memchr SSE2 lps/hps
5556    cycles for 100 * memchr SSE2 nidud
5205    cycles for 100 * memchr SSE2 ups

43778   cycles for 100 * memchr scasb
4476    cycles for 100 * memchr SSE2 lps/hps
5606    cycles for 100 * memchr SSE2 nidud
5206    cycles for 100 * memchr SSE2 ups

43762   cycles for 100 * memchr scasb
4482    cycles for 100 * memchr SSE2 lps/hps
5607    cycles for 100 * memchr SSE2 nidud
5200    cycles for 100 * memchr SSE2 ups

36      bytes for memchr scasb
88      bytes for memchr SSE2 lps/hps
92      bytes for memchr SSE2 nidud
84      bytes for memchr SSE2 ups

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1196
Re: Code location sensitivity of timings
« Reply #73 on: August 19, 2014, 04:07:35 AM »
Code: [Select]
Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)

24916   cycles for 100 * memchr scasb
2889    cycles for 100 * memchr SSE2 lps/hps
2422    cycles for 100 * memchr SSE2 nidud
2351    cycles for 100 * memchr SSE2 ups

24927   cycles for 100 * memchr scasb
2890    cycles for 100 * memchr SSE2 lps/hps
2469    cycles for 100 * memchr SSE2 nidud
2342    cycles for 100 * memchr SSE2 ups

24921   cycles for 100 * memchr scasb
2885    cycles for 100 * memchr SSE2 lps/hps
2405    cycles for 100 * memchr SSE2 nidud
2351    cycles for 100 * memchr SSE2 ups

36      bytes for memchr scasb
88      bytes for memchr SSE2 lps/hps
92      bytes for memchr SSE2 nidud
84      bytes for memchr SSE2 ups
Well Microsoft, here’s another nice mess you’ve gotten us into.

nidud

  • Member
  • *****
  • Posts: 2390
    • https://github.com/nidud/asmc
Re: Code location sensitivity of timings
« Reply #74 on: August 19, 2014, 05:06:01 AM »
deleted
« Last Edit: February 25, 2022, 08:32:54 AM by nidud »