News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Code location sensitivity of timings

Started by nidud, July 12, 2014, 09:15:44 PM

Previous topic - Next topic

Gunther

Hi nidud,

Quote from: nidud on August 11, 2014, 08:46:40 PM
Is this possible ? to have SSE4.1 and not SSE3 ?

Note: SSE and SSE2 are pre-set since the program will exit if SSE2 is not present, so this bit must be set by the test.

I think not. Did you try that? It should show you the available instruction sets.

Gunther
You have to know the facts before you can distort them.

nidud

#61
deleted

Gunther

Hi nidud,

you can trust my instruction detecting application. Your laptop supports in any case SSE3 and SSSE3 and it supports AVX. You can test that with that tool, if you've at least Windows 7 with SP1 installed. The glitch must be in your code. Do you test the right bits?

Gunther
You have to know the facts before you can distort them.

nidud

#63
deleted

MichaelW

Hi Gunther,

My Core-i3 G3220 does not support AVX, but the results for your instruction set detection tool:

Supported by Processor and installed Operating System:
------------------------------------------------------

     MMX, CMOV and FCOMI, SSE, SSE2, SSE3, SSSE3, SSE4.1,
     POPCNT, SSE4.2

     featurenumber = 13


Appear to match the Intel specs:

http://ark.intel.com/products/77773
Well Microsoft, here's another nice mess you've gotten us into.

Gunther

Hi Michael,

Quote from: MichaelW on August 12, 2014, 10:00:55 AM
Appear to match the Intel specs:

http://ark.intel.com/products/77773

I hope so. I've written the procedure using the Intel documents as a basis.

Gunther
You have to know the facts before you can distort them.

nidud

#66
deleted

jj2007

  movzx eax, byte ptr [esp+8]
  if 1
imul eax, eax, 01010101h ; 4 bytes shorter, faster
  else
mov ah, al
mov ecx, eax
shl eax, 16
add eax, ecx
  endif
  movd xmm0, eax
  pshufd xmm0, xmm0, 0 ; populate char

nidud

#68
deleted

jj2007

Variants of memchr:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

43778   cycles for 100 * memchr scasb
4474    cycles for 100 * memchr SSE2a
5608    cycles for 100 * memchr SSE2b

43994   cycles for 100 * memchr scasb
4497    cycles for 100 * memchr SSE2a
5602    cycles for 100 * memchr SSE2b

44044   cycles for 100 * memchr scasb
4474    cycles for 100 * memchr SSE2a
5598    cycles for 100 * memchr SSE2b

36      bytes for memchr scasb
88      bytes for memchr SSE2a
92      bytes for memchr SSE2b


Could look much different on other CPUs, as movlps speeds it up a lot on my CPU.

MichaelW

My processor is near (or at) bottom-end today (retail box, $79).

Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)

24909   cycles for 100 * memchr scasb
2864    cycles for 100 * memchr SSE2a
2399    cycles for 100 * memchr SSE2b

24934   cycles for 100 * memchr scasb
2882    cycles for 100 * memchr SSE2a
2366    cycles for 100 * memchr SSE2b

24923   cycles for 100 * memchr scasb
2886    cycles for 100 * memchr SSE2a
2418    cycles for 100 * memchr SSE2b

36      bytes for memchr scasb
88      bytes for memchr SSE2a
92      bytes for memchr SSE2b

96      = eax memchr scasb
96      = eax memchr SSE2a
96      = eax memchr SSE2b

Well Microsoft, here's another nice mess you've gotten us into.

nidud

#71
deleted

jj2007

Thanks. As I suspected, the movlps/movhps pair is good only for my trusty Celeron  :(
Here is one more, with movups instead:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
43821   cycles for 100 * memchr scasb
4477    cycles for 100 * memchr SSE2 lps/hps
5556    cycles for 100 * memchr SSE2 nidud
5205    cycles for 100 * memchr SSE2 ups

43778   cycles for 100 * memchr scasb
4476    cycles for 100 * memchr SSE2 lps/hps
5606    cycles for 100 * memchr SSE2 nidud
5206    cycles for 100 * memchr SSE2 ups

43762   cycles for 100 * memchr scasb
4482    cycles for 100 * memchr SSE2 lps/hps
5607    cycles for 100 * memchr SSE2 nidud
5200    cycles for 100 * memchr SSE2 ups

36      bytes for memchr scasb
88      bytes for memchr SSE2 lps/hps
92      bytes for memchr SSE2 nidud
84      bytes for memchr SSE2 ups

MichaelW


Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)

24916   cycles for 100 * memchr scasb
2889    cycles for 100 * memchr SSE2 lps/hps
2422    cycles for 100 * memchr SSE2 nidud
2351    cycles for 100 * memchr SSE2 ups

24927   cycles for 100 * memchr scasb
2890    cycles for 100 * memchr SSE2 lps/hps
2469    cycles for 100 * memchr SSE2 nidud
2342    cycles for 100 * memchr SSE2 ups

24921   cycles for 100 * memchr scasb
2885    cycles for 100 * memchr SSE2 lps/hps
2405    cycles for 100 * memchr SSE2 nidud
2351    cycles for 100 * memchr SSE2 ups

36      bytes for memchr scasb
88      bytes for memchr SSE2 lps/hps
92      bytes for memchr SSE2 nidud
84      bytes for memchr SSE2 ups
Well Microsoft, here's another nice mess you've gotten us into.

nidud

#74
deleted