News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Avoiding dependencies

Started by felipe, December 19, 2017, 10:27:51 AM

Previous topic - Next topic

felipe

When you want to avoid dependencies (to optimize code speed, parallel processing) in usage of registers, to execute instructions in parallel (32 bits, integer arithmetic instructions) but for some reason you can't do it in some places: It would be a good idea to interleave in these places some nops intructions? If not (for some reason) what other instruction would be a good idea to place there? As an example:

mov    eax,memlocation1
mov    ebx,memlocation2
add     eax,ebx
xor     edx,edx
mul    ebx

Can be written (to avoid register dependecies):

mov     eax,memlocation1
mov     ebx,memlocation2
xor      edx,edx
add     eax,ebx
mul     ebx

But you can't avoid all the dependencies yet. Of course here you have to assume (for the purpose of the question) than there aren't other registers available or it could be difficult to keep trace of all them when pushing and popping them from the stack). So it would be a good idea to do somenthing like this (using nops)?:

mov    eax,memlocation1
mov   ebx,memlocation2
xor    edx,edx
add   eax,ebx
nop
mul   ebx


:idea:

felipe

Or i'm absolutely wrong with this and also not too much uptodate?  :redface:

felipe

Maybe this question should go in the campus.  :redface:  :P

LordAdef

Hey Felipe,

Interesting question!
My gut feeling is "nop" wont do the job. But let´s wait for the masters.

aw27

CPUs will notice that and reorder the instructions for us. Except, with vector instructions.

hutch--

The general drift is to kick the guts out of a single thread algo then work out how to run the algo in parallel with multithreading. It requires a multicore processor but they have been around for years now so it should not be a problem. It always helps to write the control integer instructions is an efficient manner which means in modern terms "instruction scheduling" that avoids old style junk that only lives in micro-code for backwards compatibility reasons but the gains here are small alongside what you can do with AVX and to a lesser degree SSE instructions. Get the later instructions right and multithread them where real speed matters and you will see big gains in performance.

felipe

Clear as clean water, thanks.  :icon14:
I will peak the newest instructions and intel manuals soon as i can.  :greensml: