News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Mov and Movq

Started by Farabi, December 21, 2012, 08:19:41 PM

Previous topic - Next topic

Farabi

Anyone ever tested which is faster?
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

Adamanteus

Free comment : I'm thinking, such idea to competition of instructions is obvious from begun. Optimising is need algorithm at whole - one of the best examples for it is Hutch StrLen function, showing that dword operations could be faster then one SCASB instruction. And even in this case my opinion is to push on processors developers, that they processed this instruction by processor's words for speeding it. Last time you measurement happen differ on differ processors models - so running for each fast assembler command is unmeasurable amount of work - so free it !
About you question : MMX algos three times faster that on ALU, SSE even a little bit more faster than MMX.
So if algo more proper processing on MMX - use it, or even invent vector processor for it specially  :t

dedndave

i can't say which is faster for small amounts of data - probably SSE
but, i seem to recall some comparison testing in using SSE vs using MOVSD on large amounts of data
that might be found in the old forum
from what i remember, it was a toss-up
i think you are hitting hardware limitations rather than the limitations of specific CPU instructions

we all know SSE is fast
but that doesn't mean the CPU manufacturers stopped optimizing performance of the other instructions   :P

hutch--

Onan,

It really depends on what you are doing, if you are mixing data movement instructions with other 32 bit integer instructions, then probably MOV is faster but if you are streaming large blocks of data the MMX and XMM instructions are generally faster. It also depends on the hardware, if you are using an older 32 bit processor, PIV or earlier the MMX and early XMM instruction were barely faster if at all as the internal architecture was the limiting factor.

With later 64 bit processors, Core2 and later the native data size is larger so data movements got a lot faster. Often MMX and XMM instruction don't mix all that well with the older integer instructions so it is very much the case of what you want to do with the data movements. If its streaming, use MMX or better still, XMM, if not use the normal 32 bit MOV.