News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

C/C++ vs Assembler

Started by Manos, May 04, 2013, 04:11:50 AM

Previous topic - Next topic

Manos

Hi all.

It is true that code written in assembly is faster than code
written in any HiLevel language.
Someday, in foretime, I read in this forum that some source
was faster when was written in C/C++ than in assembly.
Why ?
The answer is  below.

These days I read in the Web the follow:

..........................
I wrote this function in C++, assembly (in-line) and assembly (MASM).

Here is the C++ Code:

    char cppToUpper(char c)
    {
    if (c > 122 || c < 97 )
    return c;
    else return c - 32;
    }

Here is the inline assembly Code:

    char cToUpper(int c)
    {
    //
    //cout << cLowerLimit;
    _asm
    {
    //Copy the character onto the arithmetic register for single bytes
    mov eax, c;
    //Test the Upper Limit
    cmp eax, 122; // Compare the Character to 122
    ja End; // Jump to the end if above -- the character is too high to be a lower case letter
    //Test the lower limit
    cmp eax, 97 //Compare the character to 97
    jb End; // Jump to the end if below == the character is too low to be a lower case letter
    //Now the operation begins
    sub eax, 32; //Subtract 32 from the character in the register
    End:
    // mov result, al; //Move the Character in the register into the result variable
    }
    }

And here is the function in pure assembly language:

    .686
    .model flat, stdcall
    option casemap :none
    .code
    cUpperCase2 proc cValue:DWORD
    mov eax, cValue
    cmp eax, 122
    ja TEnd
    cmp eax, 97
    jb TEnd
    sub eax, 32
    TEnd:
    ret
    cUpperCase2 endp
    end

Now, here is what the C++ function disassembles to:

    char cppToUpper(char c)
    {
    01271680 push ebp
    01271681 mov ebp,esp
    01271683 sub esp,0C0h
    01271689 push ebx
    0127168A push esi
    0127168B push edi
    0127168C lea edi,[ebp-0C0h]
    01271692 mov ecx,30h
    01271697 mov eax,0CCCCCCCCh
    0127169C rep stos dword ptr es:[edi]
    if (c > 122 || c < 97 )
    0127169E movsx eax,byte ptr [c]
    012716A2 cmp eax,7Ah
    012716A5 jg cppToUpper+30h (12716B0h)
    012716A7 movsx eax,byte ptr [c]
    012716AB cmp eax,61h
    012716AE jge cppToUpper+37h (12716B7h)
    return c;
    012716B0 mov al,byte ptr [c]
    012716B3 jmp cppToUpper+3Eh (12716BEh)
    012716B5 jmp cppToUpper+3Eh (12716BEh)
    else return c - 32;
    012716B7 movsx eax,byte ptr [c]
    012716BB sub eax,20h
    }
    012716BE pop edi
    012716BF pop esi
    012716C0 pop ebx
    012716C1 mov esp,ebp
    012716C3 pop ebp
    012716C4 ret

ALRIGHT HERE'S the question. Why is the C++ code considerably faster even though it compiles to far more instructions than my assembly language code uses? 48 "ticks" expire when executing the pure assembly language function 10,000,000 times (I'll put this stuff at the very bottom); 0 ticks when executing it in C++, and 16 when using inline assembly?

I am impressed that I was even able to get it to work in assembly but perplexed at the performance results. I'll put the main() function below along with the efficiency timing stuff for your reference.
Any ideas? I am just trying to learn a little assembly because I am curious about how computers actually work.

#include "stdafx.h"
#include <iostream>
#include <string>
#include "windows.h"
#include "time.h"
using namespace std;
extern "C" int _stdcall cUpperCase2(char c);
class stopwatch
{
public:
stopwatch() : start(clock()){} //start counting time
~stopwatch();
private:
clock_t start;
};
stopwatch::~stopwatch()
{
clock_t total = clock()-start; //get elapsed time
cout<<"total of ticks for this activity: "<<total<<endl;
cout <<"in seconds: "<< double(total/CLK_TCK) <<endl;
}
void main()
{
bool bAgain = true;
while (bAgain)
{
// unsigned long lTimeNow = t_time;
char c = 'a';
char d = '!';
char e;
//cout << "A lowercase character will be converted to Uppercase:" << endl;
//cin >> c;
{
stopwatch watch;
for (int i=0; i < 10000000; i++)
{
e= cUpperCase2(c);
e= cUpperCase2(d);
}
}
cout << "That was the external function written in assembler." << endl;
{
stopwatch watch;
for (int i=0; i < 10000000; i++)
{
// cout << cppToUpper(c);
//cout << cppToUpper(d);
e= cppToUpper(c);
e= cppToUpper(d);
}
}
cout << "That was C++\n";
{
stopwatch watch;
for (int i=0; i < 10000000; i++)
{
e= cToUpper(c);
e= cToUpper(d);
}
}
cout << "That was in line assembler\n";
cout << "Enter a letter and hit enter to exit (a will repeat) . . ." << endl;
cin >> c;
//return 0;
if (c != 'a')
bAgain = false;
}   


My answer is that C/C++ compiler never executes the above loop when call the cppToUpper function.
This is because C/C++ compiler knows at compile time the result and put the result without execute the loop.
This called optimize.
But if in the above loop put below the call e= cToUpper(d);
the follow:
if(i > 10000000)
break;

the C/C++ will execut the loop and the result is different.
The conclusion is that sometimes the C/C++ is faster.

Manos.


qWord

For fairness, please use a release build, turn all optimization and use a realistic function/algorithm with real-word-data. If the function's input depends on some runtime-input (e.g. command line or user input), the compiler can't remove the function as it did in your test. Also remarks that the compiler maybe inline your code.
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Quote from: Manos on May 04, 2013, 04:11:50 AM
These days I read in the Web the follow:

In the DaniWeb? Rarely seen so many confused people in one thread :P

When comparing reasonable C++ and assembly code, they are often equally fast.

If the C++ code is more than 5% faster, go and check if it does not eliminate some steps by guessing that two constants can be condensed into one (compilers can be clever 8))
No problem - do the same in assembler, and you are back on par.

If, on the other hand, you believe the C++ code is not fast enough, then disassemble the innermost loop and trim it with hand-made assembly. Depending on the quality of the initial code, improvements between 10% and a factor 10 are always possible. Search the Laboratory for the word timings to see what's feasible. Many good algos have been written "against" the C Runtime Library, which is probably one of the libraries that have been "beaten to death" by M$ programmers to tickle the last bit of performance out of Windows. And voilĂ , assembly is still often a factor 2 or 3 faster. Ask Lingo if you can find him ;)

Adamanteus

 Main goal of this topic is not absolutely correct, because you are really discussed C-code, shown that it with prolog and epilog and so on, that on assembly is discarding - correct opinion, assembly is more efficient than C is proved be experience, but C++ compilers often have more high level of optimisation,  and underscore before asm keyword are showing, that it's giving system depended results - not language.

anunitu

Hoping this isn't going to turn into a flame war about different languages,seen way to may of those. :dazzled:

hutch--

You tend to do comparisons like this by writing the task in both languages and comparing them in a benchmark, combining the two formats in a compiler does justice to neither, the inline assembler messes up the compiler optimisation and the C compiler generally uses registers in non-standard ways which increases the overhead of calling an inline assembler routine.

Microsoft have long had the solution, write your C/C++ in a C compiler and write your assembler code with an assembler, then LINK them together and you get the bast of both worlds, not the worst.

The next factor is its easy to write both lousy C and lousy assembler, if you are going to make comparisons you need to benchmark both to locally optimise each one. THEN do the comparison.

dedndave

i suspect that, for some rare cases, you can implement some things faster in assembler
also, in C, some things are easier, like COM and .NET, etc - and C is more maintainable
otherwise - it's a good design/bad design thing, as Hutch says

use the right tool for the job

now, in my case, i am not very proficient in C
and - i don't write code for a living, either
i prefer assembler and write in assembler

MichaelW

This code eliminates the C++ stuff, adds some function test code (currently commented out), adds a test of the CRT function, adds a naked function, and removes the prologue and epilogue from the external assembly procedure.

//=============================================================================
#include <windows.h>
#include <conio.h>
#include <stdio.h>
#include "counter_c.c"
//=============================================================================
// These for safety on single-core systems.
#define PP HIGH_PRIORITY_CLASS
#define TP THREAD_PRIORITY_NORMAL
// These for multi-core systems.
//#define PP REALTIME_PRIORITY_CLASS
//#define TP THREAD_PRIORITY_TIME_CRITICAL

#define LOOPS 10000000
//=============================================================================

int c_toupper(int c)
{
  if (c > 122 || c < 97)
    return c;
  else
    return c - 32;
}

char ia_toupper(int c)
{
  __asm
  {
    mov eax, c
    cmp eax, 122
    ja  end
    cmp eax, 97
    jb  end
    sub eax, 32
  end:
  }
}

__declspec(naked) int nk_toupper(int c)
{
  __asm
  {
    mov eax, [esp+4]
    cmp eax, 122
    ja  end
    cmp eax, 97
    jb  end
    sub eax, 32
  end:
    ret
  }
}

//----------------------------------------------------------------------------
// This is necessary to prevent the optimizer from breaking the counter code.
//----------------------------------------------------------------------------

#pragma optimize("",off)

int asm_toupper(int c);

void main(void)
{
  int i, c;
  /*
  for(i=0;i<200;i++)
  {
    c = rand() >> 8;
    printf("%c",toupper(c));
    printf("%c",c_toupper(c));
    printf("%c",ia_toupper(c));
    printf("%c",nk_toupper(c));
    printf("%c",asm_toupper(c));
    printf("\n");
  }
  */

  SetProcessAffinityMask(GetCurrentProcess(),1);

  Sleep(5000);

  for(i=0;i<4;i++)
  {
    counter_begin(1,LOOPS,PP,TP);
    counter_end(1)
    printf( "%d cycles, empty\n", counter_cycles );
    counter_begin(2,LOOPS,PP,TP);
      c = toupper(95);
      c = toupper(110);
      c = toupper(125);
    counter_end(2)
    printf( "%d cycles, toupper\n", counter_cycles );
    counter_begin(3,LOOPS,PP,TP);
      c = c_toupper(95);
      c = c_toupper(110);
      c = c_toupper(125);
    counter_end(3)
    printf( "%d cycles, c_toupper\n", counter_cycles );
    counter_begin(4,LOOPS,PP,TP);
      c = ia_toupper(95);
      c = ia_toupper(110);
      c = ia_toupper(125);
    counter_end(4)
    printf( "%d cycles, ia_toupper\n", counter_cycles );
    counter_begin(5,LOOPS,PP,TP);
      c = nk_toupper(95);
      c = nk_toupper(110);
      c = nk_toupper(125);
    counter_end(5)
    printf( "%d cycles, nk_toupper\n", counter_cycles );
    counter_begin(6,LOOPS,PP,TP);
      c = asm_toupper(95);
      c = asm_toupper(110);
      c = asm_toupper(125);
    counter_end(6)
    printf( "%d cycles, asm_toupper\n\n", counter_cycles );
  }
  getch();
}

#pragma optimize("",on)


Results on my P3:

0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
26 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper

0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
27 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper

0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
27 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper

0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
27 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper


Well Microsoft, here's another nice mess you've gotten us into.

hutch--

This is the result on my Core2 quad. (3 gig)



0 cycles, empty
31 cycles, toupper
52 cycles, c_toupper
63 cycles, ia_toupper
39 cycles, nk_toupper
30 cycles, asm_toupper

0 cycles, empty
31 cycles, toupper
44 cycles, c_toupper
63 cycles, ia_toupper
32 cycles, nk_toupper
31 cycles, asm_toupper

0 cycles, empty
31 cycles, toupper
48 cycles, c_toupper
63 cycles, ia_toupper
41 cycles, nk_toupper
29 cycles, asm_toupper

0 cycles, empty
31 cycles, toupper
57 cycles, c_toupper
61 cycles, ia_toupper
41 cycles, nk_toupper
30 cycles, asm_toupper

Manos

Some people have not understood my spirit of my words.

Assembly is the best for small programs and for writtng APIs, libraries and drivers.
But if you attempt to write a big program like my IDE, you will spend a ton of time.
And a question: Why MASM, JWASM, POASM, NASM and Windows are written in C language ?

In my MSDN for VStudio 6.0, Microsoft writes:
C Language Reference
The C language is a general-purpose programming language known for its efficiency, economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.


And a little words to jj2007
.
You have no the right to call Microsoft as M$.
If you don't like Microsoft, turn to Linux.

Manos.


Gunther

Hi Manos,

Quote from: Manos on May 04, 2013, 06:27:15 PM
In my MSDN for VStudio 6.0, Microsoft writes:
C Language Reference
The C language is a general-purpose programming language known for its efficiency, economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.


and is it right, because Microsoft wrote it? Do you really believe that?

Quote from: Manos on May 04, 2013, 06:27:15 PM
And a question: Why MASM, JWASM, POASM, NASM and Windows are written in C language ?

The answer is easy: they are written in C for better maintenance, but that has nothing to do with performance. There are on the other hand good assemblers (FASM, SolAsm) which are written in assembly language. Furthermore, there are compilers written in assembly language, too. You should, for example, have a look into that thread, especially reply #6.

Gunther
You have to know the facts before you can distort them.

qWord

Quote from: Gunther on May 04, 2013, 07:30:26 PM
Quote from: Manos on May 04, 2013, 06:27:15 PM
In my MSDN for VStudio 6.0, Microsoft writes:
C Language Reference
The C language is a general-purpose programming language known for its efficiency, economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.


and is it right, because Microsoft wrote it? Do you really believe that?
So, it's wrong because MS wrote it?
I would accept it with a few modifications:
QuoteThe C language is a general-purpose programming language known for its efficiency(?), economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.

Quote from: Manos on May 04, 2013, 06:27:15 PMAnd a little words to jj2007
.
You have no the right to call Microsoft as M$.
of course he has the right!
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

the assembler program is probably smaller than the hll one   :P

Gunther

Hi qWord,

Quote from: qWord on May 04, 2013, 09:48:56 PM
So, it's wrong because MS wrote it?

I would say: yes, because you made a lot of changes to the original statement to accept it, didn't you? 8)

Quote from: qWord on May 04, 2013, 09:48:56 PM
of course he has the right!

Without any doubt.  :t

Gunther
You have to know the facts before you can distort them.

anunitu

You do know Microsoft is not a religion right?..really one could say anything about MS and not be burned alive on a stack if motherboards.