The MASM Forum

General => The Campus => Topic started by: enzechen on July 27, 2015, 05:59:09 AM

Title: Question about performance of LOOP sets and MOVS/LODS sets
Post by: enzechen on July 27, 2015, 05:59:09 AM
I am learning how to code in MASM these days. When I read the help chm file "Introduction to Assembler", I learnt that I should avoid LOOP sets and MOVS/LODS sets due to their low performance.

I understood the reason for "LOOP" sets. But I didn't understand why I should avoid MOVS/LODS/CMPS/SCAS sets. Anyone could explain more on this topic?

Thanks in advance.
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: dedndave on July 27, 2015, 06:09:31 AM
i don't totally avoid either one
but - you have to take into consideration where the code is used

for example, if i am parsing the command line, performance isn't really an issue
1) it only happens once in each session of the program
2) command line strings aren't usually very long, so there aren't many iterations

but, if i want to process strings, thousands of times, i take more care in how the code is written
and - sometimes, REP MOVS or REP STOS isn't that bad, performance wise
individual MOVS, LODS, STOS, etc (without REP) don't perform well on many processors
the same is true for LOOP, et al
better to DEC ECX and JNZ
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: enzechen on July 27, 2015, 06:15:48 AM
Dave,

Thanks a lot.  :greenclp:

Best regards,
Enze
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: hutch-- on July 27, 2015, 05:56:43 PM
Dave is right here, the old string instructions have not in their own right been good performers since the early P1 procesors but as they are very common mnemonics, Intel built special purpose circuitry into the processors when the string instructions are used with the REP/REPE prefixes. The old string instructions are handled in what is called microcode which is the slow lane to maintain compatibility with earlier x86 hardware. When you use them with the REP/REPE prefix they are handled with a different circuitry that is much faster once the read/write operation is over about 500 bytes.

It is worth knowing how they work but also worth knowing when not to use them.
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: AssemblyChallenge on July 29, 2015, 03:02:23 AM
Hi.

Correct me if wrong but, wasn't this... LOOP slowness a thing of the P4 era??

Or at least, I remember some reading about the topic, stating that recent CPUs "fixed" the problem and now LOOPs/Strings are very fast.
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: dedndave on July 29, 2015, 03:25:09 AM
not just P4's, but many processors that followed
in other words, they are slow on many processors that are still in wide use
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: Gunther on July 29, 2015, 10:00:32 PM
Quote from: dedndave on July 29, 2015, 03:25:09 AM
not just P4's, but many processors that followed
in other words, they are slow on many processors that are still in wide use

yes, they are often slow. But we have also a lot of other critical instructions which are discibed in Agner Fog's manuals (http://www.agner.org/optimize/). Furthermore, Mark Larson's site (http://www.mark.masmcode.com/) is also a good source.

Gunther
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: Mikl__ on July 29, 2015, 10:20:55 PM
Hi, Gunther!
on Mark Larson's site last data is 09 jul 2004, really for 10 years, nothing has changed?
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: Gunther on July 29, 2015, 10:22:56 PM
Hi Micha,

Quote from: Mikl__ on July 29, 2015, 10:20:55 PM
on Mark Larson's site last data is 09 jul 2004, really for 10 years, nothing has changed?

that's right. Mark will have other things to do. But the content is right.

Gunther
Title: Re: Question about performance of LOOP sets and MOVS/LODS sets
Post by: jj2007 on July 29, 2015, 11:30:45 PM
Mark dropped in occasionally until about 4 years ago. His site is still mostly valid, but bear in mind that modern processors may behave differently. In case of doubt, visit the Laboratory...