Just found a new book, Assembly Language Succinctly by Chris Rose (http://www.syncfusion.com/Content/downloads/ebook/Assembly_Language_Succinctly.pdf). It looks OK for C/C++ coders. What raised my curiosity is a snippet called FindSmallest using cmovx - so I put up a little testbed comparing cmov timings against jump timings. Results:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
7048 cycles for 100 * max cmov
9905 cycles for 100 * max jmp
6179 cycles for 100 * max cmov lods
10048 cycles for 100 * max jmp lods
7019 cycles for 100 * max cmov
9781 cycles for 100 * max jmp
6179 cycles for 100 * max cmov lods
10026 cycles for 100 * max jmp lods
6990 cycles for 100 * max cmov
9809 cycles for 100 * max jmp
6224 cycles for 100 * max cmov lods
10060 cycles for 100 * max jmp lods
24 bytes for max cmov
25 bytes for max jmp
23 bytes for max cmov lods
23 bytes for max jmp lods
12345678 = eax max cmov
12345678 = eax max jmp
12345678 = eax max cmov lods
12345678 = eax max jmp lods
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
4847 cycles for 100 * max cmov
5345 cycles for 100 * max jmp
4843 cycles for 100 * max cmov lods
5747 cycles for 100 * max jmp lods
4843 cycles for 100 * max cmov
5350 cycles for 100 * max jmp
4845 cycles for 100 * max cmov lods
5726 cycles for 100 * max jmp lods
4845 cycles for 100 * max cmov
5344 cycles for 100 * max jmp
4841 cycles for 100 * max cmov lods
5725 cycles for 100 * max jmp lods
24 bytes for max cmov
25 bytes for max jmp
23 bytes for max cmov lods
23 bytes for max jmp lods
12345678 = eax max cmov
12345678 = eax max jmp
12345678 = eax max cmov lods
12345678 = eax max jmp lods
--- ok ---
AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
10522 cycles for 100 * max cmov
8365 cycles for 100 * max jmp
11082 cycles for 100 * max cmov lods
9343 cycles for 100 * max jmp lods
6343 cycles for 100 * max cmov
4870 cycles for 100 * max jmp
6716 cycles for 100 * max cmov lods
10113 cycles for 100 * max jmp lods
4429 cycles for 100 * max cmov
4043 cycles for 100 * max jmp
5848 cycles for 100 * max cmov lods
5475 cycles for 100 * max jmp lods
24 bytes for max cmov
25 bytes for max jmp
23 bytes for max cmov lods
23 bytes for max jmp lods
12345678 = eax max cmov
12345678 = eax max jmp
12345678 = eax max cmov lods
12345678 = eax max jmp lods
--- ok ---
Appears to depend on the CPU. JJ, your cmov vs lods time seeme to be that cmov manually updates ESI while lods increments as part of the lods execution. My AMD appears to like short jumps. Go figure.
Dave.
I have seen various comparisons done over time and none of them have ever been conclusive. It varies with whether the jump is taken or not and varies with the hardware. I rarely ever use them because they don't offer any advantage to a conventional cmp/test jump. For the little that its worth the main speed differences are related to memory access and if you can reduce this to a minimum you can usually forget about jumps.
Here's a gotcha for using CMOV
sub eax,eax
cmovnz eax,[eax]
EAX is zero so ZF is set but the CMOVNZ line evaluates [eax] before the condition, instant access violation.
About that book (which is a good read)....
They will email you, call you, email again, call again, call, call, call...
They want you to buy some high end ($$thousands) development software.
Quote from: KeepingRealBusy on February 28, 2014, 02:42:08 PMJJ, your cmov vs lods time seeme to be that cmov manually updates ESI while lods increments as part of the lods execution. My AMD appears to like short jumps. Go figure.
Mine behaves differently:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
6238 cycles for 100 * max cmov
8718 cycles for 100 * max jmp
8123 cycles for 100 * max cmov lods
9936 cycles for 100 * max jmp lods
6246 cycles for 100 * max cmov
8715 cycles for 100 * max jmp
8135 cycles for 100 * max cmov lods
9922 cycles for 100 * max jmp lods@sinsi: Nice find ;-)
@satpro: no registration required here...
Hi,
Looks like it might have been more important with older
processors.
pre-P4 (SSE1)
7597 cycles for 100 * max cmov
12030 cycles for 100 * max jmp
7907 cycles for 100 * max cmov lods
11726 cycles for 100 * max jmp lods
7603 cycles for 100 * max cmov
12064 cycles for 100 * max jmp
7896 cycles for 100 * max cmov lods
11706 cycles for 100 * max jmp lods
7592 cycles for 100 * max cmov
12007 cycles for 100 * max jmp
7906 cycles for 100 * max cmov lods
11732 cycles for 100 * max jmp lods
24 bytes for max cmov
25 bytes for max jmp
23 bytes for max cmov lods
23 bytes for max jmp lods
12345678 = eax max cmov
12345678 = eax max jmp
12345678 = eax max cmov lods
12345678 = eax max jmp lods
--- ok ---
Cheers,
Steve N.
i'm with Hutch on this one, but for slightly different reasoning
CMOV isn't supported on all pentiums
so, you have to test for it - then provide fall-back code or exit if it's not supported
all that's a pain in the ass for too little advantage, if any - lol
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
14264 cycles for 100 * max cmov
22800 cycles for 100 * max jmp
25437 cycles for 100 * max cmov lods
23358 cycles for 100 * max jmp lods
14243 cycles for 100 * max cmov
23859 cycles for 100 * max jmp
25468 cycles for 100 * max cmov lods
23253 cycles for 100 * max jmp lods
14217 cycles for 100 * max cmov
23216 cycles for 100 * max jmp
25374 cycles for 100 * max cmov lods
23278 cycles for 100 * max jmp lods
EDIT: my mistake - i was thinking of the CMPXCHG instruction not being supported :P
Jochen,
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
4215 cycles for 100 * max cmov
4650 cycles for 100 * max jmp
4226 cycles for 100 * max cmov lods
5537 cycles for 100 * max jmp lods
4215 cycles for 100 * max cmov
4671 cycles for 100 * max jmp
4824 cycles for 100 * max cmov lods
5000 cycles for 100 * max jmp lods
4514 cycles for 100 * max cmov
4642 cycles for 100 * max jmp
4216 cycles for 100 * max cmov lods
5647 cycles for 100 * max jmp lods
24 bytes for max cmov
25 bytes for max jmp
23 bytes for max cmov lods
23 bytes for max jmp lods
12345678 = eax max cmov
12345678 = eax max jmp
12345678 = eax max cmov lods
12345678 = eax max jmp lods
--- ok ---
Gunther
Quote from: dedndave on March 01, 2014, 12:44:46 AM
i'm with Hutch on this one, but for slightly different reasoning
CMOV isn't supported on all pentiums
so, you have to test for it - then provide fall-back code or exit if it's not supported
all that's a pain in the ass for too little advantage, if any - lol
...
EDIT: my mistake - i was thinking of the CMPXCHG instruction not being supported :P
Hi Dave,
No. You were correct. The program bombs on my P-MMX.
CMOV was not supported until the Pentium Pro. (IIRC, could be
the P-II.) Given that, I probably wouldn't worry about it too much.
<g>
Cheers,
Steve N.
Its an ARM instruction isnt it? I think ARM used it this way.
Hi Farabi,
Quote from: Farabi on March 02, 2014, 03:36:44 PM
Its an ARM instruction isnt it? I think ARM used it this way.
no, the CMOV instruction was introduced with the P6 (Pentium Pro) for compiler optimization.
Gunther
I do use "CMOVE" but only trough macro. It is slightly more difficult to wrap my mind around it as conditional jumps. :P
; *********************************************************
; CMOV SDWORD/DWORD,Arg1,"Operator",Arg2,opt Arg3,opt Arg4
; Syntax:
; CMOV DWORD,eax,"!=",ebx 'cmp eax, ebx:cmovne eax, ebx
; CMOV DWORD,eax,"<>",ebx,edx 'cmp eax, ebx:cmovne eax, edx
; CMOV DWORD,eax,"!=",ebx,edx,ecx 'cmp eax, ebx:cmovne edx, ecx
; *********************************************************
CMOV MACRO Sign:REQ,Arg1:REQ,Operator:REQ,Arg2:REQ,Arg3,Arg4
LOCAL L_Operator,m1,m2
cmp Arg1, Arg2
L_Operator TEXTEQU @CatStr(Operator)
IF @SizeStr(Arg4)
m1 TEXTEQU <Arg3>
m2 TEXTEQU <Arg4>
ELSEIF @SizeStr(Arg3)
m1 TEXTEQU <Arg1>
m2 TEXTEQU <Arg3>
ELSE
m1 TEXTEQU <Arg1>
m2 TEXTEQU <Arg2>
ENDIF
IF Sign EQ dword
IFIDN L_Operator,<"!>">
cmova m1, m2
ELSEIFIDN L_Operator,<"!<">
cmovb m1, m2
ELSEIFIDN L_Operator,<"=">
cmove m1, m2
ELSEIFIDN L_Operator,<"==">
cmove m1, m2
ELSEIFIDN L_Operator,<"!<!>">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!!=">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!>=">
cmovae m1, m2
ELSEIFIDN L_Operator,<"=!>">
cmovae m1, m2
ELSEIFIDN L_Operator,<"!<=">
cmovbe m1, m2
ELSEIFIDN L_Operator,<"=!<">
cmovbe m1, m2
ELSE
echo The Operator operator is not valid
ENDIF
ELSEIF Sign EQ sdword
IFIDN L_Operator,<"!>">
cmovg m1, m2
ELSEIFIDN L_Operator,<"!<">
cmovl m1, m2
ELSEIFIDN L_Operator,<"=">
cmove m1, m2
ELSEIFIDN L_Operator,<"==">
cmove m1, m2
ELSEIFIDN L_Operator,<"!<!>">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!!=">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!>=">
cmovge m1, m2
ELSEIFIDN L_Operator,<"=!>">
cmovge m1, m2
ELSEIFIDN L_Operator,<"!<=">
cmovle m1, m2
ELSEIFIDN L_Operator,<"=!<">
cmovle m1, m2
ELSE
echo The Operator operator is not valid
ENDIF
ELSE
echo The first parameter have to be "DWORD" or "SDWORD" !
ENDIF
ENDM
https://dmytrish.net/lib/asm-x86/Assembly_Language_Succinctly.pdf