Author Topic: CMOVx faster?  (Read 4525 times)

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
CMOVx faster?
« on: February 28, 2014, 11:13:34 AM »
Just found a new book, Assembly Language Succinctly by Chris Rose. It looks OK for C/C++ coders. What raised my curiosity is a snippet called FindSmallest using cmovx - so I put up a little testbed comparing cmov timings against jump timings. Results:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

7048    cycles for 100 * max cmov
9905    cycles for 100 * max jmp
6179    cycles for 100 * max cmov lods
10048   cycles for 100 * max jmp lods

7019    cycles for 100 * max cmov
9781    cycles for 100 * max jmp
6179    cycles for 100 * max cmov lods
10026   cycles for 100 * max jmp lods

6990    cycles for 100 * max cmov
9809    cycles for 100 * max jmp
6224    cycles for 100 * max cmov lods
10060   cycles for 100 * max jmp lods

24      bytes for max cmov
25      bytes for max jmp
23      bytes for max cmov lods
23      bytes for max jmp lods

12345678        = eax max cmov
12345678        = eax max jmp
12345678        = eax max cmov lods
12345678        = eax max jmp lods

Siekmanski

  • Member
  • *****
  • Posts: 1089
Re: CMOVx faster?
« Reply #1 on: February 28, 2014, 01:47:43 PM »
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

4847    cycles for 100 * max cmov
5345    cycles for 100 * max jmp
4843    cycles for 100 * max cmov lods
5747    cycles for 100 * max jmp lods

4843    cycles for 100 * max cmov
5350    cycles for 100 * max jmp
4845    cycles for 100 * max cmov lods
5726    cycles for 100 * max jmp lods

4845    cycles for 100 * max cmov
5344    cycles for 100 * max jmp
4841    cycles for 100 * max cmov lods
5725    cycles for 100 * max jmp lods

24      bytes for max cmov
25      bytes for max jmp
23      bytes for max cmov lods
23      bytes for max jmp lods

12345678        = eax max cmov
12345678        = eax max jmp
12345678        = eax max cmov lods
12345678        = eax max jmp lods

--- ok ---

KeepingRealBusy

  • Member
  • ***
  • Posts: 426
Re: CMOVx faster?
« Reply #2 on: February 28, 2014, 02:42:08 PM »
AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)

10522   cycles for 100 * max cmov
8365    cycles for 100 * max jmp
11082   cycles for 100 * max cmov lods
9343    cycles for 100 * max jmp lods

6343    cycles for 100 * max cmov
4870    cycles for 100 * max jmp
6716    cycles for 100 * max cmov lods
10113   cycles for 100 * max jmp lods

4429    cycles for 100 * max cmov
4043    cycles for 100 * max jmp
5848    cycles for 100 * max cmov lods
5475    cycles for 100 * max jmp lods

24      bytes for max cmov
25      bytes for max jmp
23      bytes for max cmov lods
23      bytes for max jmp lods

12345678        = eax max cmov
12345678        = eax max jmp
12345678        = eax max cmov lods
12345678        = eax max jmp lods

--- ok ---

Appears to depend on the CPU.  JJ, your cmov vs lods time seeme to be that cmov manually updates ESI while lods increments as part of the lods execution. My AMD appears to like short jumps. Go figure.

Dave.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4807
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CMOVx faster?
« Reply #3 on: February 28, 2014, 03:29:33 PM »
I have seen various comparisons done over time and none of them have ever been conclusive. It varies with whether the jump is taken or not and varies with the hardware. I rarely ever use them because they don't offer any advantage to a conventional cmp/test jump. For the little that its worth the main speed differences are related to memory access and if you can reduce this to a minimum you can usually forget about jumps.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

sinsi

  • Member
  • ****
  • Posts: 996
Re: CMOVx faster?
« Reply #4 on: February 28, 2014, 03:52:42 PM »
Here's a gotcha for using CMOV
Code: [Select]
    sub eax,eax
    cmovnz eax,[eax]
EAX is zero so ZF is set but the CMOVNZ line evaluates [eax] before the condition, instant access violation.
I can walk on water but stagger on beer.

satpro

  • Member
  • **
  • Posts: 116
Re: CMOVx faster?
« Reply #5 on: February 28, 2014, 07:56:08 PM »
About that book (which is a good read)....

They will email you, call you, email again, call again, call, call, call...
They want you to buy some high end ($$thousands) development software.

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: CMOVx faster?
« Reply #6 on: February 28, 2014, 09:47:58 PM »
  JJ, your cmov vs lods time seeme to be that cmov manually updates ESI while lods increments as part of the lods execution. My AMD appears to like short jumps. Go figure.

Mine behaves differently:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

6238    cycles for 100 * max cmov
8718    cycles for 100 * max jmp
8123    cycles for 100 * max cmov lods
9936    cycles for 100 * max jmp lods

6246    cycles for 100 * max cmov
8715    cycles for 100 * max jmp
8135    cycles for 100 * max cmov lods
9922    cycles for 100 * max jmp lods


@sinsi: Nice find ;-)
@satpro: no registration required here...

FORTRANS

  • Member
  • ****
  • Posts: 944
Re: CMOVx faster?
« Reply #7 on: March 01, 2014, 12:44:24 AM »
Hi,

   Looks like it might have been more important with older
processors.

pre-P4 (SSE1)

7597    cycles for 100 * max cmov
12030   cycles for 100 * max jmp
7907    cycles for 100 * max cmov lods
11726   cycles for 100 * max jmp lods

7603    cycles for 100 * max cmov
12064   cycles for 100 * max jmp
7896    cycles for 100 * max cmov lods
11706   cycles for 100 * max jmp lods

7592    cycles for 100 * max cmov
12007   cycles for 100 * max jmp
7906    cycles for 100 * max cmov lods
11732   cycles for 100 * max jmp lods

24      bytes for max cmov
25      bytes for max jmp
23      bytes for max cmov lods
23      bytes for max jmp lods

12345678        = eax max cmov
12345678        = eax max jmp
12345678        = eax max cmov lods
12345678        = eax max jmp lods

--- ok ---


Cheers,

Steve N.

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: CMOVx faster?
« Reply #8 on: March 01, 2014, 12:44:46 AM »
i'm with Hutch on this one, but for slightly different reasoning
CMOV isn't supported on all pentiums
so, you have to test for it - then provide fall-back code or exit if it's not supported
all that's a pain in the ass for too little advantage, if any - lol

prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

14264   cycles for 100 * max cmov
22800   cycles for 100 * max jmp
25437   cycles for 100 * max cmov lods
23358   cycles for 100 * max jmp lods

14243   cycles for 100 * max cmov
23859   cycles for 100 * max jmp
25468   cycles for 100 * max cmov lods
23253   cycles for 100 * max jmp lods

14217   cycles for 100 * max cmov
23216   cycles for 100 * max jmp
25374   cycles for 100 * max cmov lods
23278   cycles for 100 * max jmp lods

EDIT: my mistake - i was thinking of the CMPXCHG instruction not being supported   :P

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: CMOVx faster?
« Reply #9 on: March 01, 2014, 02:28:22 AM »
Jochen,

Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

4215    cycles for 100 * max cmov
4650    cycles for 100 * max jmp
4226    cycles for 100 * max cmov lods
5537    cycles for 100 * max jmp lods

4215    cycles for 100 * max cmov
4671    cycles for 100 * max jmp
4824    cycles for 100 * max cmov lods
5000    cycles for 100 * max jmp lods

4514    cycles for 100 * max cmov
4642    cycles for 100 * max jmp
4216    cycles for 100 * max cmov lods
5647    cycles for 100 * max jmp lods

24      bytes for max cmov
25      bytes for max jmp
23      bytes for max cmov lods
23      bytes for max jmp lods

12345678        = eax max cmov
12345678        = eax max jmp
12345678        = eax max cmov lods
12345678        = eax max jmp lods

--- ok ---

Gunther
Get your facts first, and then you can distort them.

FORTRANS

  • Member
  • ****
  • Posts: 944
Re: CMOVx faster?
« Reply #10 on: March 01, 2014, 05:55:47 AM »
i'm with Hutch on this one, but for slightly different reasoning
CMOV isn't supported on all pentiums
so, you have to test for it - then provide fall-back code or exit if it's not supported
all that's a pain in the ass for too little advantage, if any - lol

...

EDIT: my mistake - i was thinking of the CMPXCHG instruction not being supported   :P

Hi Dave,

   No.  You were correct.  The program bombs on my P-MMX.
CMOV was not supported until the Pentium Pro.  (IIRC, could be
the P-II.)  Given that, I probably wouldn't worry about it too much.
<g>

Cheers,

Steve N.

Farabi

  • Member
  • ****
  • Posts: 970
  • Neuroscience Fans
Re: CMOVx faster?
« Reply #11 on: March 02, 2014, 03:36:44 PM »
Its an ARM instruction isnt it? I think ARM used it this way.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: CMOVx faster?
« Reply #12 on: March 02, 2014, 11:10:13 PM »
Hi Farabi,

Its an ARM instruction isnt it? I think ARM used it this way.

no, the CMOV instruction was introduced with the P6 (Pentium Pro) for compiler optimization.

Gunther
Get your facts first, and then you can distort them.

Ficko

  • Regular Member
  • *
  • Posts: 38
Re: CMOVx faster?
« Reply #13 on: March 06, 2014, 01:30:40 AM »
I do use "CMOVE" but only trough macro. It is slightly more difficult to wrap my mind around it as conditional jumps. :P

Code: [Select]
; *********************************************************
; CMOV SDWORD/DWORD,Arg1,"Operator",Arg2,opt Arg3,opt Arg4
; Syntax:
; CMOV DWORD,eax,"!=",ebx 'cmp eax, ebx:cmovne eax, ebx
; CMOV DWORD,eax,"<>",ebx,edx 'cmp eax, ebx:cmovne eax, edx
; CMOV DWORD,eax,"!=",ebx,edx,ecx 'cmp eax, ebx:cmovne edx, ecx
; *********************************************************
CMOV MACRO Sign:REQ,Arg1:REQ,Operator:REQ,Arg2:REQ,Arg3,Arg4
LOCAL L_Operator,m1,m2
cmp Arg1, Arg2
L_Operator TEXTEQU @CatStr(Operator)
IF @SizeStr(Arg4)
m1 TEXTEQU <Arg3>
m2 TEXTEQU <Arg4>
ELSEIF @SizeStr(Arg3)
m1 TEXTEQU <Arg1>
m2 TEXTEQU <Arg3>
ELSE
m1 TEXTEQU <Arg1>
m2 TEXTEQU <Arg2>
ENDIF
IF Sign EQ dword
IFIDN L_Operator,<"!>">
cmova m1, m2
ELSEIFIDN L_Operator,<"!<">
cmovb m1, m2
ELSEIFIDN L_Operator,<"=">
cmove m1, m2
ELSEIFIDN L_Operator,<"==">
cmove m1, m2
ELSEIFIDN L_Operator,<"!<!>">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!!=">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!>=">
cmovae m1, m2
ELSEIFIDN L_Operator,<"=!>">
cmovae m1, m2
ELSEIFIDN L_Operator,<"!<=">
cmovbe m1, m2
ELSEIFIDN L_Operator,<"=!<">
cmovbe m1, m2
ELSE
echo The Operator operator is not valid
ENDIF
ELSEIF Sign EQ sdword
IFIDN L_Operator,<"!>">
cmovg m1, m2
ELSEIFIDN L_Operator,<"!<">
cmovl m1, m2
ELSEIFIDN L_Operator,<"=">
cmove m1, m2
ELSEIFIDN L_Operator,<"==">
cmove m1, m2
ELSEIFIDN L_Operator,<"!<!>">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!!=">
cmovne m1, m2
ELSEIFIDN L_Operator,<"!>="> 
cmovge m1, m2
ELSEIFIDN L_Operator,<"=!>"> 
cmovge m1, m2
ELSEIFIDN L_Operator,<"!<=">
cmovle m1, m2
ELSEIFIDN L_Operator,<"=!<">
cmovle m1, m2
ELSE
echo The Operator operator is not valid  
ENDIF
ELSE
echo The first parameter have to be "DWORD" or "SDWORD" !
ENDIF
ENDM

alloy

  • Regular Member
  • *
  • Posts: 12