News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Instruction clock cycles, latency, throughput..

Started by K_F, May 26, 2015, 04:57:38 PM

Previous topic - Next topic

K_F

Does anyone know if there's a summarised version of Intel's instruction cycles..
I've seen the Latency and Throughput documentation, and it's a 'mess' -

I'm just looking for a table of instructions and their cycle times of the latest cpus....  this seems to be amiss lately
Thanks
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

rrr314159

Agner Fog is the classic optimizer. Unfortunately his manuals haven't been updated since 2007, no doubt there are newer / better summaries of instruction times available somewhere, when u find them let me know.

Here is his instruction_tables manual online (pdf).

At this page you can download the instruction_tables manual and the other four optimization manuals, which are also very good.
I am NaN ;)

hutch--

Van,

Clock cycle counts went out the door with the 486 with more than 1 pipeline. The action is in "scheduling" and using the preferred instruction set (back door RISC) and knowing the occasional trickery if you need to use one of the old instructions that was dumped into microcode. You can often "schedule" an old slow instruction in the shadow of a previous instruction that leaves a hole (time wise) and get most of its duration for free.

K_F

Ja, I realise that.. but wanted to check the actual cycles without any optimisations ..ect.

Thanks  rrr314159
8)

'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

hutch--

If you can suffer it, the real deal is to learn which instructions are optimal in terms of speed and which have been dropped into microcode. Of late the SSE series of instructions are getting into the fastest silicon, the simpler range of integer instructions live in the fast lane but most of the older complex instructions are sliding backwards in performance terms. I have not touched the very late AVX range as my i7 is too old but the SSE range that I have available seems to run at one speed and there is little optimisation available, usually choosing the best instruction for the task is the best you can do. The simpler integer instructions still respond to pairing but from practice, techniques like unrolling often does not yield a speed increase on late hardware. As usual the best technique is determined by the clock, write it, time it and see which is the fastest.