News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Branch Misprediction

Started by aw27, November 09, 2018, 08:55:52 PM

Previous topic - Next topic

aw27

This is a spin-off of this thread and I reused the structure of its code (thank you Alex, Marinus and the Masm32 Library).

According to Intel:
"When tuning, note that all Intel 64 and IA-32 processors usually have very high branch
prediction rates. Consistently mispredicted branches are generally rare."

Given that, I am testing to what point eliminating mispredicted branches produce a performance boost. For unpredicability I am using the rdrand function to keep the code small, you can use your favorite random algo in its place   :t. Branch elimination can be done with SETCC or CMOV instructions, I am testing both. There appears to be a slight improvement by eliminating branches in my setup, but results may be different on other systems or different setups. I am showing esi values but they will only be statistically close, not the same.


Unpredicatable Branching Performance Test begins:

Test A (Branching):  5310.804483     esi: 25001015
Test B (No Branching/Using SETCC):  5154.628838     esi: 25000277
Test C (No Branching/Using CMOV):  5102.413384     esi: 24996680

Press any key to continue ...


six_L

QuoteUnpredicatable Branching Performance Test begins:

Test A (Branching):  6147.469874     esi: 25006397

Test B (No Branching/Using SETCC):  5842.821538     esi: 24998669

Test C (No Branching/Using CMOV):  5737.716296     esi: 25005668

Press any key to continue ...
Say you, Say me, Say the codes together for ever.

johnsa


Unpredicatable Branching Performance Test begins:

Test A (Branching):  17905.004104     esi: 24999384

Test B (No Branching/Using SETCC):  17769.385169     esi: 25000978

Test C (No Branching/Using CMOV):  17706.415355     esi: 24995715

Press any key to continue ...


cpu: AMD Threadripper 1950X

I wonder why these values are so large compared to yours ?

Siekmanski

Win 8.1 i7-4930K

Unpredicatable Branching Performance Test begins:

Test A (Branching):  4847.646489     esi: 24995192

Test B (No Branching/Using SETCC):  4715.992466     esi: 24998433

Test C (No Branching/Using CMOV):  4792.383529     esi: 25007685
Creative coders use backward thinking techniques as a strategy.

hutch--

Doesn't look like there is enough difference to worry about.

Unpredicatable Branching Performance Test begins:

Test A (Branching):  4714.086700     esi: 24999738

Test B (No Branching/Using SETCC):  4347.706700     esi: 24992511

Test C (No Branching/Using CMOV):  4286.387600     esi: 25000474

Press any key to continue ...

aw27

Quote from: johnsa on November 09, 2018, 09:14:49 PM
cpu: AMD Threadripper 1950X
I wonder why these values are so large compared to yours ?

The reason could be the rdrand instruction. It is a slow instruction and eventually much slower on the AMD. You can infer how slow it is by commenting out:
         ;rdrand ax
         ;jnc @b   

I also believe that rdrand is slower on my system than in Siekmarski's and Hutch's because my system is generally faster.
            

aw27

Since rdrand is so slow, it explains most of the results which was not what we were seeking. So I decided to replace it with xorshift32 and will use 4 times more iterations.


Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1079.229572     esi: 99999910

Test B (No Branching/Using SETCC):  560.419929     esi: 99992311

Test C (No Branching/Using CMOV):  508.874674     esi: 100000643

Press any key to continue ...


Now we see a clear difference.  :t

six_L

QuoteUnpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1707.725816     esi: 100003249

Test B (No Branching/Using SETCC):  814.433475     esi: 99999228

Test C (No Branching/Using CMOV):  815.539814     esi: 100012874

Press any key to continue ...
Say you, Say me, Say the codes together for ever.

hutch--

Haswell E/EP

Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1390.336600     esi: 99996111

Test B (No Branching/Using SETCC):  753.261600     esi: 100000008

Test C (No Branching/Using CMOV):  717.176900     esi: 100000849

Press any key to continue ...

hutch--

Try more than 1 run with a SleepEx,100,0 followed by a CPUID instruction. Isolates one test from the other. May help with a change in priority class.

jj2007

Core i5, a factor for branching vs CMOV:
Test A (Branching):  1839.689614     esi: 99994305

Test B (No Branching/Using SETCC):  1014.606289     esi: 100011883

Test C (No Branching/Using CMOV):  916.680932     esi: 100005125

Siekmanski

Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1474.770048     esi: 99998649

Test B (No Branching/Using SETCC):  776.358238     esi: 99996812

Test C (No Branching/Using CMOV):  740.065497     esi: 100008227

Press any key to continue ...
Creative coders use backward thinking techniques as a strategy.

FORTRANS

Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  7489.981548     esi: 99994224

Test B (No Branching/Using SETCC):  6022.567114     esi: 100000788

Test C (No Branching/Using CMOV):  6892.027059     esi: 100007146

Press any key to continue ...

aw27

Quote from: hutch-- on November 10, 2018, 12:46:43 AM
Try more than 1 run with a SleepEx,100,0 followed by a CPUID instruction. Isolates one test from the other. May help with a change in priority class.

I usually follow the RDTSC route, today I followed the method inherited from the other thread which has also some advantages, in my opinion.

LiaoMi

i7-4810mq

Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1348.180566     esi: 100011227

Test B (No Branching/Using SETCC):  718.488105     esi: 100001925

Test C (No Branching/Using CMOV):  706.041224     esi: 100002620

Press any key to continue ...


Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1346.779943     esi: 100012082

Test B (No Branching/Using SETCC):  681.084780     esi: 99999300

Test C (No Branching/Using CMOV):  683.320791     esi: 100016696

Press any key to continue ...


Unpredictable Branching Performance Test using xorshift32 begins:

Test A (Branching):  1354.122122     esi: 100000595

Test B (No Branching/Using SETCC):  682.919775     esi: 99994066

Test C (No Branching/Using CMOV):  686.241900     esi: 99991064

Press any key to continue ...