Hmmmm... I am getting similar results with NOTHING and a single NOP... for hutchs last posting here
Back to the drawing board....
2418 nothing
2714 1 single nop
2481 nothing
2589 1 single nop
2528 nothing
2823 1 single nop
2153 nothing
2293 1 single nop
2075 nothing
2231 1 single nop
2168 nothing
2418 1 single nop
2496 nothing
2699 1 single nop
2746 nothing
3074 1 single nop
Results
2383 nothing average
2605 single nop average
Press any key to continue...
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
call testproc
waitkey
invoke ExitProcess,0
ret
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
iterations equ <1000000000>
testproc proc
LOCAL mem1 :QWORD
LOCAL mem2 :QWORD
LOCAL mem3 :QWORD
LOCAL mem4 :QWORD
LOCAL mem5 :QWORD
LOCAL mem6 :QWORD
LOCAL mem7 :QWORD
LOCAL cnt1 :QWORD
LOCAL cnt2 :QWORD
LOCAL time :QWORD
LOCAL rslt1 :QWORD
LOCAL rslt2 :QWORD
USING rsi, rdi, rbx, r12, r13, r14, r15
SaveRegs
HighPriority
mov rslt1, 0
mov rslt2, 0
mov cnt2, 8
loopstart:
; ------------------------------------
cpuid
call GetTickCount
mov time, rax
mov cnt1, iterations
@@:
sub cnt1, 1
jnz @B
call GetTickCount
sub rax, time
add rslt1, rax
conout " ",str$(rax)," nothing",lf
; ------------------------------------
cpuid
call GetTickCount
mov time, rax
mov cnt1, iterations
@@:
nop
sub cnt1, 1
jnz @B
call GetTickCount
sub rax, time
add rslt2, rax
conout " ",str$(rax)," 1 single nop",lf
; ------------------------------------
sub cnt2, 1
jnz loopstart
shr rslt1, 3
shr rslt2, 3
conout lf," Results",lf,lf
conout " ",str$(rslt1)," nothing average",lf
conout " ",str$(rslt2)," single nop average",lf,lf
NormalPriority
RestoreRegs
ret
testproc endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
Seems the timing code is taking up most of the time, or the reg/mem moves are as fast as nothing or a single nop.
later I tried 100 nop's and the 100 nop's were faster than nothing. No extra alignment, same code otherwise.