With Dave's suggestion, try this before the timing of each algo in each separate test piece. Set the priority class high enough to avoid the wanders and see if this helps to stabilise the results.
cpuid ; serialising instruction for wider seperation
pause ; spinlock delay instruction
invoke SleepEx,10,0
cpuid ; serialising instruction for wider seperation
pause ; spinlock delay instruction
Usually I have found that some algos are much more sensitive to code location than others, usually intensive BYTE operations where dealing in larger data types reduces the variation.