I am learning how to code in MASM these days. When I read the help chm file "Introduction to Assembler", I learnt that I should avoid LOOP sets and MOVS/LODS sets due to their low performance.
I understood the reason for "LOOP" sets. But I didn't understand why I should avoid MOVS/LODS/CMPS/SCAS sets. Anyone could explain more on this topic?
Thanks in advance.
i don't totally avoid either one
but - you have to take into consideration where the code is used
for example, if i am parsing the command line, performance isn't really an issue
1) it only happens once in each session of the program
2) command line strings aren't usually very long, so there aren't many iterations
but, if i want to process strings, thousands of times, i take more care in how the code is written
and - sometimes, REP MOVS or REP STOS isn't that bad, performance wise
individual MOVS, LODS, STOS, etc (without REP) don't perform well on many processors
the same is true for LOOP, et al
better to DEC ECX and JNZ
Dave,
Thanks a lot. :greenclp:
Best regards,
Enze
Dave is right here, the old string instructions have not in their own right been good performers since the early P1 procesors but as they are very common mnemonics, Intel built special purpose circuitry into the processors when the string instructions are used with the REP/REPE prefixes. The old string instructions are handled in what is called microcode which is the slow lane to maintain compatibility with earlier x86 hardware. When you use them with the REP/REPE prefix they are handled with a different circuitry that is much faster once the read/write operation is over about 500 bytes.
It is worth knowing how they work but also worth knowing when not to use them.
Hi.
Correct me if wrong but, wasn't this... LOOP slowness a thing of the P4 era??
Or at least, I remember some reading about the topic, stating that recent CPUs "fixed" the problem and now LOOPs/Strings are very fast.
not just P4's, but many processors that followed
in other words, they are slow on many processors that are still in wide use
Quote from: dedndave on July 29, 2015, 03:25:09 AM
not just P4's, but many processors that followed
in other words, they are slow on many processors that are still in wide use
yes, they are often slow. But we have also a lot of other critical instructions which are discibed in Agner Fog's manuals (http://www.agner.org/optimize/). Furthermore, Mark Larson's site (http://www.mark.masmcode.com/) is also a good source.
Gunther
Hi, Gunther!
on Mark Larson's site last data is 09 jul 2004, really for 10 years, nothing has changed?
Hi Micha,
Quote from: Mikl__ on July 29, 2015, 10:20:55 PM
on Mark Larson's site last data is 09 jul 2004, really for 10 years, nothing has changed?
that's right. Mark will have other things to do. But the content is right.
Gunther
Mark dropped in occasionally until about 4 years ago. His site is still mostly valid, but bear in mind that modern processors may behave differently. In case of doubt, visit the Laboratory...