News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Question about performance of LOOP sets and MOVS/LODS sets

Started by enzechen, July 27, 2015, 05:59:09 AM

Previous topic - Next topic

enzechen

I am learning how to code in MASM these days. When I read the help chm file "Introduction to Assembler", I learnt that I should avoid LOOP sets and MOVS/LODS sets due to their low performance.

I understood the reason for "LOOP" sets. But I didn't understand why I should avoid MOVS/LODS/CMPS/SCAS sets. Anyone could explain more on this topic?

Thanks in advance.

dedndave

i don't totally avoid either one
but - you have to take into consideration where the code is used

for example, if i am parsing the command line, performance isn't really an issue
1) it only happens once in each session of the program
2) command line strings aren't usually very long, so there aren't many iterations

but, if i want to process strings, thousands of times, i take more care in how the code is written
and - sometimes, REP MOVS or REP STOS isn't that bad, performance wise
individual MOVS, LODS, STOS, etc (without REP) don't perform well on many processors
the same is true for LOOP, et al
better to DEC ECX and JNZ

enzechen


hutch--

Dave is right here, the old string instructions have not in their own right been good performers since the early P1 procesors but as they are very common mnemonics, Intel built special purpose circuitry into the processors when the string instructions are used with the REP/REPE prefixes. The old string instructions are handled in what is called microcode which is the slow lane to maintain compatibility with earlier x86 hardware. When you use them with the REP/REPE prefix they are handled with a different circuitry that is much faster once the read/write operation is over about 500 bytes.

It is worth knowing how they work but also worth knowing when not to use them.

AssemblyChallenge

Hi.

Correct me if wrong but, wasn't this... LOOP slowness a thing of the P4 era??

Or at least, I remember some reading about the topic, stating that recent CPUs "fixed" the problem and now LOOPs/Strings are very fast.

dedndave

not just P4's, but many processors that followed
in other words, they are slow on many processors that are still in wide use

Gunther

Quote from: dedndave on July 29, 2015, 03:25:09 AM
not just P4's, but many processors that followed
in other words, they are slow on many processors that are still in wide use

yes, they are often slow. But we have also a lot of other critical instructions which are discibed in Agner Fog's manuals. Furthermore, Mark Larson's site is also a good source.

Gunther
You have to know the facts before you can distort them.

Mikl__

Hi, Gunther!
on Mark Larson's site last data is 09 jul 2004, really for 10 years, nothing has changed?

Gunther

Hi Micha,

Quote from: Mikl__ on July 29, 2015, 10:20:55 PM
on Mark Larson's site last data is 09 jul 2004, really for 10 years, nothing has changed?

that's right. Mark will have other things to do. But the content is right.

Gunther
You have to know the facts before you can distort them.

jj2007

Mark dropped in occasionally until about 4 years ago. His site is still mostly valid, but bear in mind that modern processors may behave differently. In case of doubt, visit the Laboratory...