The MASM Forum

General => The Laboratory => Topic started by: AW on November 09, 2018, 08:55:52 PM

Title: Branch Misprediction
Post by: AW on November 09, 2018, 08:55:52 PM
This is a spin-off of this thread (http://masm32.com/board/index.php?topic=7509.0) and I reused the structure of its code (thank you Alex, Marinus and the Masm32 Library).

According to Intel:
"When tuning, note that all Intel 64 and IA-32 processors usually have very high branch
prediction rates. Consistently mispredicted branches are generally rare."

Given that, I am testing to what point eliminating mispredicted branches produce a performance boost. For unpredicability I am using the rdrand function to keep the code small, you can use your favorite random algo in its place   :t. Branch elimination can be done with SETCC or CMOV instructions, I am testing both. There appears to be a slight improvement by eliminating branches in my setup, but results may be different on other systems or different setups. I am showing esi values but they will only be statistically close, not the same.

Code: [Select]
Unpredicatable Branching Performance Test begins:

Test A (Branching):  5310.804483     esi: 25001015
Test B (No Branching/Using SETCC):  5154.628838     esi: 25000277
Test C (No Branching/Using CMOV):  5102.413384     esi: 24996680

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: six_L on November 09, 2018, 09:00:55 PM
Quote
Unpredicatable Branching Performance Test begins:

Test A (Branching):  6147.469874     esi: 25006397

Test B (No Branching/Using SETCC):  5842.821538     esi: 24998669

Test C (No Branching/Using CMOV):  5737.716296     esi: 25005668

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: johnsa on November 09, 2018, 09:14:49 PM
Code: [Select]
Unpredicatable Branching Performance Test begins:

Test A (Branching):  17905.004104     esi: 24999384

Test B (No Branching/Using SETCC):  17769.385169     esi: 25000978

Test C (No Branching/Using CMOV):  17706.415355     esi: 24995715

Press any key to continue ...

cpu: AMD Threadripper 1950X

I wonder why these values are so large compared to yours ?
Title: Re: Branch Misprediction
Post by: Siekmanski on November 09, 2018, 09:35:28 PM
Win 8.1 i7-4930K

Unpredicatable Branching Performance Test begins:

Test A (Branching):  4847.646489     esi: 24995192

Test B (No Branching/Using SETCC):  4715.992466     esi: 24998433

Test C (No Branching/Using CMOV):  4792.383529     esi: 25007685
Title: Re: Branch Misprediction
Post by: hutch-- on November 09, 2018, 09:46:22 PM
Doesn't look like there is enough difference to worry about.

Unpredicatable Branching Performance Test begins:

Test A (Branching):  4714.086700     esi: 24999738

Test B (No Branching/Using SETCC):  4347.706700     esi: 24992511

Test C (No Branching/Using CMOV):  4286.387600     esi: 25000474

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: AW on November 09, 2018, 09:47:54 PM
cpu: AMD Threadripper 1950X
I wonder why these values are so large compared to yours ?

The reason could be the rdrand instruction. It is a slow instruction and eventually much slower on the AMD. You can infer how slow it is by commenting out:
         ;rdrand ax
         ;jnc @b   

I also believe that rdrand is slower on my system than in Siekmarski's and Hutch's because my system is generally faster.
            
Title: Re: Branch Misprediction
Post by: AW on November 09, 2018, 11:06:13 PM
Since rdrand is so slow, it explains most of the results which was not what we were seeking. So I decided to replace it with xorshift32 and will use 4 times more iterations.

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1079.229572     esi: 99999910

Test B (No Branching/Using SETCC):  560.419929     esi: 99992311

Test C (No Branching/Using CMOV):  508.874674     esi: 100000643

Press any key to continue ...

Now we see a clear difference.  :t
Title: Re: Branch Misprediction
Post by: six_L on November 10, 2018, 12:25:42 AM
Quote
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1707.725816     esi: 100003249

Test B (No Branching/Using SETCC):  814.433475     esi: 99999228

Test C (No Branching/Using CMOV):  815.539814     esi: 100012874

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: hutch-- on November 10, 2018, 12:44:47 AM
Haswell E/EP

Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1390.336600     esi: 99996111

Test B (No Branching/Using SETCC):  753.261600     esi: 100000008

Test C (No Branching/Using CMOV):  717.176900     esi: 100000849

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: hutch-- on November 10, 2018, 12:46:43 AM
Try more than 1 run with a SleepEx,100,0 followed by a CPUID instruction. Isolates one test from the other. May help with a change in priority class.
Title: Re: Branch Misprediction
Post by: jj2007 on November 10, 2018, 01:16:21 AM
Core i5, a factor for branching vs CMOV:
Code: [Select]
Test A (Branching):  1839.689614     esi: 99994305

Test B (No Branching/Using SETCC):  1014.606289     esi: 100011883

Test C (No Branching/Using CMOV):  916.680932     esi: 100005125
Title: Re: Branch Misprediction
Post by: Siekmanski on November 10, 2018, 01:49:53 AM
Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1474.770048     esi: 99998649

Test B (No Branching/Using SETCC):  776.358238     esi: 99996812

Test C (No Branching/Using CMOV):  740.065497     esi: 100008227

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: FORTRANS on November 10, 2018, 02:33:26 AM
Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  7489.981548     esi: 99994224

Test B (No Branching/Using SETCC):  6022.567114     esi: 100000788

Test C (No Branching/Using CMOV):  6892.027059     esi: 100007146

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: AW on November 10, 2018, 03:36:23 AM
Try more than 1 run with a SleepEx,100,0 followed by a CPUID instruction. Isolates one test from the other. May help with a change in priority class.

I usually follow the RDTSC route, today I followed the method inherited from the other thread which has also some advantages, in my opinion.
Title: Re: Branch Misprediction
Post by: LiaoMi on November 10, 2018, 05:00:21 AM
i7-4810mq

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1348.180566     esi: 100011227

Test B (No Branching/Using SETCC):  718.488105     esi: 100001925

Test C (No Branching/Using CMOV):  706.041224     esi: 100002620

Press any key to continue ...

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1346.779943     esi: 100012082

Test B (No Branching/Using SETCC):  681.084780     esi: 99999300

Test C (No Branching/Using CMOV):  683.320791     esi: 100016696

Press any key to continue ...

Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1354.122122     esi: 100000595

Test B (No Branching/Using SETCC):  682.919775     esi: 99994066

Test C (No Branching/Using CMOV):  686.241900     esi: 99991064

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: Siekmanski on November 10, 2018, 05:13:53 AM
I usually follow the RDTSC route, today I followed the method inherited from the other thread which has also some advantages, in my opinion.
:t

The timers are not to calculate routine execution times.
These timers are meant to run simultaneous in realtime to control multimedia events in games or demos.

for example:

- timer 1 controls the time to switch to the next scene.
- timer 2 controls when a flock of birds fly over.
- timer 3 controls the duration of a bullet salvo.
- timer 4 controls when an UFO enters the earth orbit.

etc.
Title: Re: Branch Misprediction
Post by: LordAdef on November 10, 2018, 09:37:49 AM
Hi jose, Thanks for carrying the tourch!
Code: [Select]
Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1452.756573     esi: 100007857

Test B (No Branching/Using SETCC):  679.610210     esi: 99997780

Test C (No Branching/Using CMOV):  588.437494     esi: 100011413

Press any key to continue ...
Title: Re: Branch Misprediction
Post by: daydreamer on January 12, 2019, 07:58:07 PM

The timers are not to calculate routine execution times.
These timers are meant to run simultaneous in realtime to control multimedia events in games or demos.

for example:

- timer 1 controls the time to switch to the next scene.
- timer 2 controls when a flock of birds fly over.
- timer 3 controls the duration of a bullet salvo.
- timer 4 controls when an UFO enters the earth orbit.

etc.
I look forward to test timers for synchronize several threads working together
LOCK prefix maybe needs to be used?
rdrand vs simplest randomgenerator used in perlin noise would be interesting to test against each other
I think we should add D nobranch test:SIMD comparisions so we know how much worth the time spending on getting nonbranch code right and how much gain/loss?