News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

What is the fastest C/C++ compiler

Started by aw27, March 17, 2019, 05:14:59 AM

Previous topic - Next topic

aw27

This is a comparative 64-bit performance test between 5 common C/C++ compilers.
I am using the latest versions of the compilers.
All tests, except the CLang, will run on Windows XP 64-bit and above.
All tests run with Fast (i.e. not Precise) Floating Point and are using the default instruction set for 64-bit, which is SSE2. I have optimized for fastest possible speed, but since I don't know much about some of the compilers, for sure results can be improved.
As we can see, there are no large differences in performance under the specified conditions. Still, the clear winner is CLang and the clear loser is the Intel compiler. Of course, testing is always complicated and results will be different under other circumstances.
Now, we all know that in ASM we can get get better performance than this. However, this test has been published in CodeProject 2 years ago and till now nobody ever produced such ASM jewel. I already confessed my inaptitude. But others that downvoted the article could not as well. Strange.


Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
C compiler: Microsoft Visual C/C++ Version: 1916 Toolset: 191627027
Elapsed time: 7670.614100 miliseconds

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
C compiler: Intel, Version 1900 Build Date: 20190206
Elapsed time: 8395.283300 miliseconds

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
C compiler: LLVM CLang 7.0 patchlevel 1
Elapsed time: 7327.724200 miliseconds

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
C compiler: Pelles, Version 900
Elapsed time: 7674.496300 miliseconds

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
C compiler: MingW64 6.0
Elapsed time: 7698.716100 miliseconds

hutch--

With the difference being between 7 and 8 seconds for the whole lot, its pretty much a case of who cares. What I would be interested in seeing is which C compiler produced the best code and best here means fastest. I would trade performance over compile time any time.

aw27

Quote from: hutch-- on March 17, 2019, 07:15:58 AM
With the difference being between 7 and 8 seconds for the whole lot, its pretty much a case of who cares. What I would be interested in seeing is which C compiler produced the best code and best here means fastest. I would trade performance over compile time any time.

:biggrin:
The tests are about performance of the code not compile time of the compilers. I know I was not clear with the title and my English sometimes leaves margin for confusion. Actually, I never bothered with the time compilers or assemblers take, they are always fast enough for me.

daydreamer

Quote from: AW on March 17, 2019, 07:23:49 AM
Quote from: hutch-- on March 17, 2019, 07:15:58 AM
With the difference being between 7 and 8 seconds for the whole lot, its pretty much a case of who cares. What I would be interested in seeing is which C compiler produced the best code and best here means fastest. I would trade performance over compile time any time.

:biggrin:
The tests are about performance of the code not compile time of the compilers. I know I was not clear with the title and my English sometimes leaves margin for confusion. Actually, I never bothered with the time compilers or assemblers take, they are always fast enough for me.
isnt that why we code directly in assembly,in a philosophic way being the "compiler",when breaking up for example a complex math formula into lots of assembly code? :biggrin:

I dont know how much time a JITcompiler takes before it has compiled all bytecode into machine code,but maybe thats a slowest compiler,because it runs beside the bytecode before it has compiled all bytecode?
first analyzing most freqently called code and compile that first
yes I know it isnt a C compiler,but maybe thats the fastest way to compile to machine code produced to best performance on a wide range cpu's,if its P4's or AMD it uses right opcodes for it to run fastest
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Manos

Go to: http://masm32.com/board/index.php?topic=7762.0

Manos.

jj2007

The output looks different because of rand() and xorshift32 in Helper.c. If I understand the source correctly, this benchmark basically measures multiplication performance.

Determinant of Matrix using C/C++ is -652805124668261794840576.00.

       Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
C compiler: Microsoft Visual C/C++ Version: 1916 Toolset: 191627027
Elapsed time: 13349.706968 miliseconds

Determinant of Matrix using C/C++ is -3135872705554328936513536.00.

       Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
C compiler: LLVM CLang 7.0 patchlevel 1
Elapsed time: 13640.585769 miliseconds

Determinant of Matrix using C/C++ is 254821845453545970000000.00.

       Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
C compiler: MingW64 6.0
Elapsed time: 13188.805571 miliseconds

Determinant of Matrix using C/C++ is 1860171740789390634745300000000.00.

       Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
C compiler: Pelles, Version 900
Elapsed time: 14703.056519 miliseconds

Determinant of Matrix using C/C++ is -413661966842255395258368.00.

       Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
C compiler: Intel, Version 1900 Build Date: 20190206
Elapsed time: 15497.578726 miliseconds

Caché GB

OK the results on my "mobile gaming desktop".


Determinant of Matrix using C/C++ is -45013464764769225408512.00.

Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
C compiler: LLVM CLang 7.0 patchlevel 1
Elapsed time: 7802.865251 miliseconds

----------------------------------------------------------------------------->

Determinant of Matrix using C/C++ is 27506122033733258182656.00.

Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
C compiler: Intel, Version 1900 Build Date: 20190206
Elapsed time: 8932.999451 miliseconds

----------------------------------------------------------------------------->

Determinant of Matrix using C/C++ is -318059265722058350000000.00.

Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
C compiler: MingW64 6.0
Elapsed time: 8200.381124 miliseconds

----------------------------------------------------------------------------->

Determinant of Matrix using C/C++ is -172211128538387334561792.00.

Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
C compiler: Microsoft Visual C/C++ Version: 1916 Toolset: 191627027
Elapsed time: 8425.579861 miliseconds

----------------------------------------------------------------------------->

Determinant of Matrix using C/C++ is 569480412259785342030200000000.00.

Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
C compiler: Pelles, Version 900
Elapsed time: 8309.385132 miliseconds
Caché GB's 1 and 0-nly language:MASM

aw27

Quote from: jj2007 on March 17, 2019, 10:07:42 AM
If I understand the source correctly, this benchmark basically measures multiplication performance.
It tests a few other things like loops and compilers usually make a difference here.
Interesting is that MingW64 (here using gcc 8.1) outperforms CLang in your computer.

Quote from: daydreamer on March 17, 2019, 07:49:25 AM
I dont know how much time a JITcompiler takes before it has compiled all bytecode into machine code,but maybe thats a slowest compiler,because it runs beside the bytecode before it has compiled all bytecode?
I am not going to make any performance tests for bytecode based compilers:biggrin:

TimoVJL

With old AMD:
AMD Athlon(tm) II X2 220 Processor
C compiler: LLVM CLang 7.0 patchlevel 1
Elapsed time: 23105.062680 miliseconds

AMD Athlon(tm) II X2 220 Processor
C compiler: Intel, Version 1900 Build Date: 20190206
Elapsed time: 23266.848760 miliseconds

AMD Athlon(tm) II X2 220 Processor
C compiler: Microsoft Visual C/C++ Version: 1916 Toolset: 191627027
Elapsed time: 24822.899920 miliseconds

AMD Athlon(tm) II X2 220 Processor
C compiler: MingW64 6.0
Elapsed time: 25953.812400 miliseconds

AMD Athlon(tm) II X2 220 Processor
C compiler: Pelles, Version 900
Elapsed time: 29015.721800 miliseconds

Support routine with C
#include <intrin.h>
char *getcpuBrandString(char *s)
{
_cpuid((int*)s, 0x80000000);
if (*(unsigned int*)s >= 0x80000004) {
_cpuid((int*)s, 0x80000002);
_cpuid((int*)(s+16), 0x80000003);
_cpuid((int*)(s+32), 0x80000004);
}
return s;
}
May the source be with you

jj2007

Quote from: AW on March 17, 2019, 05:49:47 PMIt tests a few other things like loops and compilers usually make a difference here.
Interesting is that MingW64 (here using gcc 8.1) outperforms CLang in your computer.

The "other things" have apparently not such a big influence on the performance. Which is not surprising. Here is a look under the hood:57*mul msvc
46*mul mingw
48*mul PellesC
45*mul clang
213*mul intel

These muls are almost exclusively mulss instructions. It would be interesting to see the 32-bit results, too - with -arch:SSE or similar, of course.

aw27

Quote from: jj2007 on March 17, 2019, 09:40:28 PM
These muls are almost exclusively mulss instructions. It would be interesting to see the 32-bit results, too - with -arch:SSE or similar, of course.

Up to line 44 it is mostly SSE instructions, namely mulss, addss and subss. Things are different after that.
The same applies in 32-bit when built for SSE or SSE2. In 32-bit(*) it runs over 20% slower when we should expect it to be faster, or am wrong?  :lol: .

(*) I have none built now but will do it if you are curious and don't want to try to build it by yourself.

jj2007

I am curious to see that, of course. Can you post the exes? Unfortunately, your Main.c throws plenty of errors. I wish C would be compatible to itself :(

Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

Tmp.cpp
c:\program files (x86)\microsoft visual studio 10.0\vc\include\codeanalysis\sourceannotations.h(78): warning C4467: usage of ATL attributes is deprecated
Tmp.cpp(40): error C3861: '_alloca': identifier not found

TimoVJL

alloca is in malloc.h in older msvc headers ?
May the source be with you

aw27

Quote from: jj2007 on March 17, 2019, 10:28:24 PM
I am curious to see that, of course. Can you post the exes? Unfortunately, your Main.c throws plenty of errors. I wish C would be compatible to itself :(

Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

Tmp.cpp
c:\program files (x86)\microsoft visual studio 10.0\vc\include\codeanalysis\sourceannotations.h(78): warning C4467: usage of ATL attributes is deprecated
Tmp.cpp(40): error C3861: '_alloca': identifier not found


I am posting the 32-bit tests. Note that the C source code compiled without any change in all compilers. Even better now, because I have replaced the ASM with Timo's suggestion.

jj2007

compiler 64 bit 32 bit gain
Mingw 12.99 14.85 14%
msvc 13.32 22.74 71%
clang 13.4 16.74 25%
Intel 15.25 16.69 9.4%
PellesC 14.61 17.31 19%


How much of that gain is attributable to 64-bit instructions is debatable. For example, the number of mul instructions in the msvc code (the one with the highest gain) is 213 in the 64-bit version but only 46 in the 32-bit exe.

Again, it's a pity that C/C++ is such a mess - cryptic error messages all over the place...
C:\TDM-GCC-32\bin\gcc.exe  -O3 -s -o tmp.exe "Tmp.cpp" -lquadmath
____________________
Tmp.cpp:6:33: error: conflicting declaration of 'char* getcpuBrandString(char*)' with 'C' linkage
  char *getcpuBrandString(char *s);
                                 ^
In file included from Tmp.cpp:1:0:
commonHeader_1.h:11:7: note: previous declaration with 'C++' linkage
char *getcpuBrandString(char *s);
       ^
Tmp.cpp: In function 'float determinant(float*, int)':
Tmp.cpp:40:67: error: '_alloca' was not declared in this scope
   minorMat = (float*)_alloca((rows - 1)*(rows - 1) * sizeof(float));
                                                                   ^