Author Topic: Branch Misprediction  (Read 235 times)

AW

  • Member
  • *****
  • Posts: 1562
  • Let's Make ASM Great Again!
Branch Misprediction
« on: November 09, 2018, 08:55:52 PM »
This is a spin-off of this thread and I reused the structure of its code (thank you Alex, Marinus and the Masm32 Library).

According to Intel:
"When tuning, note that all Intel 64 and IA-32 processors usually have very high branch
prediction rates. Consistently mispredicted branches are generally rare."

Given that, I am testing to what point eliminating mispredicted branches produce a performance boost. For unpredicability I am using the rdrand function to keep the code small, you can use your favorite random algo in its place   :t. Branch elimination can be done with SETCC or CMOV instructions, I am testing both. There appears to be a slight improvement by eliminating branches in my setup, but results may be different on other systems or different setups. I am showing esi values but they will only be statistically close, not the same.

Code: [Select]
Unpredicatable Branching Performance Test begins:

Test A (Branching):  5310.804483     esi: 25001015
Test B (No Branching/Using SETCC):  5154.628838     esi: 25000277
Test C (No Branching/Using CMOV):  5102.413384     esi: 24996680

Press any key to continue ...

six_L

  • Member
  • **
  • Posts: 144
Re: Branch Misprediction
« Reply #1 on: November 09, 2018, 09:00:55 PM »
Quote
Unpredicatable Branching Performance Test begins:

Test A (Branching):  6147.469874     esi: 25006397

Test B (No Branching/Using SETCC):  5842.821538     esi: 24998669

Test C (No Branching/Using CMOV):  5737.716296     esi: 25005668

Press any key to continue ...

johnsa

  • Member
  • ****
  • Posts: 710
    • Uasm
Re: Branch Misprediction
« Reply #2 on: November 09, 2018, 09:14:49 PM »
Code: [Select]
Unpredicatable Branching Performance Test begins:

Test A (Branching):  17905.004104     esi: 24999384

Test B (No Branching/Using SETCC):  17769.385169     esi: 25000978

Test C (No Branching/Using CMOV):  17706.415355     esi: 24995715

Press any key to continue ...

cpu: AMD Threadripper 1950X

I wonder why these values are so large compared to yours ?

Siekmanski

  • Member
  • *****
  • Posts: 1684
Re: Branch Misprediction
« Reply #3 on: November 09, 2018, 09:35:28 PM »
Win 8.1 i7-4930K

Unpredicatable Branching Performance Test begins:

Test A (Branching):  4847.646489     esi: 24995192

Test B (No Branching/Using SETCC):  4715.992466     esi: 24998433

Test C (No Branching/Using CMOV):  4792.383529     esi: 25007685
Creative coders use backward thinking techniques as a strategy.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5897
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Branch Misprediction
« Reply #4 on: November 09, 2018, 09:46:22 PM »
Doesn't look like there is enough difference to worry about.

Unpredicatable Branching Performance Test begins:

Test A (Branching):  4714.086700     esi: 24999738

Test B (No Branching/Using SETCC):  4347.706700     esi: 24992511

Test C (No Branching/Using CMOV):  4286.387600     esi: 25000474

Press any key to continue ...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

AW

  • Member
  • *****
  • Posts: 1562
  • Let's Make ASM Great Again!
Re: Branch Misprediction
« Reply #5 on: November 09, 2018, 09:47:54 PM »
cpu: AMD Threadripper 1950X
I wonder why these values are so large compared to yours ?

The reason could be the rdrand instruction. It is a slow instruction and eventually much slower on the AMD. You can infer how slow it is by commenting out:
         ;rdrand ax
         ;jnc @b   

I also believe that rdrand is slower on my system than in Siekmarski's and Hutch's because my system is generally faster.
            

AW

  • Member
  • *****
  • Posts: 1562
  • Let's Make ASM Great Again!
Re: Branch Misprediction
« Reply #6 on: November 09, 2018, 11:06:13 PM »
Since rdrand is so slow, it explains most of the results which was not what we were seeking. So I decided to replace it with xorshift32 and will use 4 times more iterations.

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1079.229572     esi: 99999910

Test B (No Branching/Using SETCC):  560.419929     esi: 99992311

Test C (No Branching/Using CMOV):  508.874674     esi: 100000643

Press any key to continue ...

Now we see a clear difference.  :t

six_L

  • Member
  • **
  • Posts: 144
Re: Branch Misprediction
« Reply #7 on: November 10, 2018, 12:25:42 AM »
Quote
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1707.725816     esi: 100003249

Test B (No Branching/Using SETCC):  814.433475     esi: 99999228

Test C (No Branching/Using CMOV):  815.539814     esi: 100012874

Press any key to continue ...

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5897
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Branch Misprediction
« Reply #8 on: November 10, 2018, 12:44:47 AM »
Haswell E/EP

Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1390.336600     esi: 99996111

Test B (No Branching/Using SETCC):  753.261600     esi: 100000008

Test C (No Branching/Using CMOV):  717.176900     esi: 100000849

Press any key to continue ...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5897
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Branch Misprediction
« Reply #9 on: November 10, 2018, 12:46:43 AM »
Try more than 1 run with a SleepEx,100,0 followed by a CPUID instruction. Isolates one test from the other. May help with a change in priority class.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 8826
  • Assembler is fun ;-)
    • MasmBasic
Re: Branch Misprediction
« Reply #10 on: November 10, 2018, 01:16:21 AM »
Core i5, a factor for branching vs CMOV:
Code: [Select]
Test A (Branching):  1839.689614     esi: 99994305

Test B (No Branching/Using SETCC):  1014.606289     esi: 100011883

Test C (No Branching/Using CMOV):  916.680932     esi: 100005125

Siekmanski

  • Member
  • *****
  • Posts: 1684
Re: Branch Misprediction
« Reply #11 on: November 10, 2018, 01:49:53 AM »
Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1474.770048     esi: 99998649

Test B (No Branching/Using SETCC):  776.358238     esi: 99996812

Test C (No Branching/Using CMOV):  740.065497     esi: 100008227

Press any key to continue ...
Creative coders use backward thinking techniques as a strategy.

FORTRANS

  • Member
  • *****
  • Posts: 1033
Re: Branch Misprediction
« Reply #12 on: November 10, 2018, 02:33:26 AM »
Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  7489.981548     esi: 99994224

Test B (No Branching/Using SETCC):  6022.567114     esi: 100000788

Test C (No Branching/Using CMOV):  6892.027059     esi: 100007146

Press any key to continue ...

AW

  • Member
  • *****
  • Posts: 1562
  • Let's Make ASM Great Again!
Re: Branch Misprediction
« Reply #13 on: November 10, 2018, 03:36:23 AM »
Try more than 1 run with a SleepEx,100,0 followed by a CPUID instruction. Isolates one test from the other. May help with a change in priority class.

I usually follow the RDTSC route, today I followed the method inherited from the other thread which has also some advantages, in my opinion.

LiaoMi

  • Member
  • ***
  • Posts: 324
Re: Branch Misprediction
« Reply #14 on: November 10, 2018, 05:00:21 AM »
i7-4810mq

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1348.180566     esi: 100011227

Test B (No Branching/Using SETCC):  718.488105     esi: 100001925

Test C (No Branching/Using CMOV):  706.041224     esi: 100002620

Press any key to continue ...

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1346.779943     esi: 100012082

Test B (No Branching/Using SETCC):  681.084780     esi: 99999300

Test C (No Branching/Using CMOV):  683.320791     esi: 100016696

Press any key to continue ...

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1354.122122     esi: 100000595

Test B (No Branching/Using SETCC):  682.919775     esi: 99994066

Test C (No Branching/Using CMOV):  686.241900     esi: 99991064

Press any key to continue ...