Author Topic: Need for Speed - C++ versus Assembly Language  (Read 493 times)

aw27

  • Member
  • **
  • Posts: 149
Re: Need for Speed - C++ versus Assembly Language
« Reply #45 on: April 21, 2017, 09:39:09 PM »
Like x64 fastcall calling convention, many algorithms use less that 3 args on 32 bit and can be run as FASTCALL.
Except for very small functions, FASTCALL will end making the code slower.
The reason is that you will have to save the registers content somewhere inside the function.
Before the call you have to load the registers with data and inside the function you will have to save the registers content somewhere because you need the registers for other things. A waste of cycles, it's like, put the car keys in the pocket to cross the room and place them in another table.
The same applies to x64, although it has more registers to play with, it is not called FASTCALL anymore - there is no other. Ah yes, Vectorcall, but the same problem.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4316
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Need for Speed - C++ versus Assembly Language
« Reply #46 on: April 22, 2017, 12:55:49 AM »
This is indeed an unusual comment, if you don't use register passing ALA the Microsoft Application Binary Interface "rcx rdx r8 r9" you are left with passing data by globals or old slower style STDCALL stack passing with pushes and pops. Now of course nothing is stopping you from pre-loading a number of AVX registers and calling a procedure that will use them but you must get the arguments for a procedure some how and it does not happen by magic. In 32 bit you used the Intel Application Binary Interface which was a standard PUSH/CALL technique and you can emulate FASTCALL with up to 3 registers to keep the stack overhead down if it is a short leaf procedure.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

aw27

  • Member
  • **
  • Posts: 149
Re: Need for Speed - C++ versus Assembly Language
« Reply #47 on: April 22, 2017, 02:02:34 AM »
This is indeed an unusual comment,
Not unusual.
I made a quick search on google and there was someone with the same idea:
"You don't gain anything by passing in registers if the called function immediately needs to spill everything out into memory for its own calculations."
Another one:
"How fast is this calling convention, comparing to __cdecl and __stdcall? Find out for yourselves. Set the compiler option /Gr, and compare the execution time. I didn't find __fastcall to be any faster than other calling conventons, but you may come to different conclusions."

old slower style STDCALL stack passing with pushes and pops.
STDCALL is not slow anymore, it is as fast as CDECL and, in my opinion, in real life, not school class examples, both are faster than FASTCALL. Sound weird, but this the reason not to be widespread.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4316
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Need for Speed - C++ versus Assembly Language
« Reply #48 on: April 22, 2017, 03:38:05 AM »
This tends to be why assembler programmers benchmark techniques rather than search the internet for quotations.

FASTCALL in 64 bit is specification.

  mov rcx, handle
  mov rdx, wmsg
  mov r8,  wparam
  mov r9,  lparam
  call SendMessage
  mov retval, rax


In 32 bit STDCALL is specification.

  push lparam
  push wparam
  push wmsg
  push handle
  call SendMessage
  mov retval, eax


Its a simple fact that registers are a lot faster than memory and much of the design of 64 bit FASTCALL was to reduce the call overhead for the vast majority of function calls that use 4 or less arguments. When you don't need to twiddle the stack you reduce overhead and pick up speed. The other factor of course is "does it matter" when you are calling high level code in either libraries or DLL system functions.

Being able to save a few pico-seconds making a MessageBox() call seems to be the achievement of much modern high level code design where the benchmarking approach puts the effort where it matters, in high level code you pursue clarity and maintainability where in low level code you design and benchmark to get the speed up. You need to do more than just twiddle compiler options, a dis-assembler does not tell lies.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

nidud

  • Member
  • *****
  • Posts: 1120
    • https://github.com/nidud/asmc
Re: Need for Speed - C++ versus Assembly Language
« Reply #49 on: April 22, 2017, 04:06:41 AM »
 :biggrin:

If you plan on using these arguments, which is often the case, then you either use the stack or use nonvolitile registers, which have to be saved by pushing them on to the stack, so you will end up using the stack anyway.

It's also misleading to call the 64-bit calling convention fastcall given you in addition to pass arguments in registers also have to allocate stack-space for the arguments in case you plan on using them, which is often the case ...

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4316
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Need for Speed - C++ versus Assembly Language
« Reply #50 on: April 22, 2017, 04:36:58 AM »
This tends to be why you have a variety of techniques, stack frames for high level code that uses many arguments and local variables and no stack frame for low argument counts and direct register passing to cut overhead and improve speed.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

nidud

  • Member
  • *****
  • Posts: 1120
    • https://github.com/nidud/asmc
Re: Need for Speed - C++ versus Assembly Language
« Reply #51 on: April 22, 2017, 06:17:28 AM »
Here's a simple test case.

Code: [Select]
.x64
.model flat, pascal
.code

entry proc a1:ptr, a2:ptr, a3:ptr, a4:ptr
local l1:ptr, l2:ptr, l3:ptr, l4:ptr

mov rax,a1
mov l1,rax
mov rax,a2
mov l2,rax
mov rax,a3
mov l3,rax
mov rax,a4
mov l4,rax

mov rcx,l1
mov rdx,l2
mov r8,l3
mov r9,l4

mov rax,rcx
add rax,rdx
add rax,r8
add rax,r9

ret
entry endp

END

Seems to be more or less the same..
Code: [Select]
total [1 .. 3], 1++
 12342242 cycles 1.asm: stdcall
 12342272 cycles 3.asm: pascal
 12342401 cycles 0.asm: fastcall
 12342405 cycles 2.asm: c

felipe

  • Member
  • **
  • Posts: 55
  • Why to be low?...To growth up of course.
Re: Need for Speed - C++ versus Assembly Language
« Reply #52 on: April 25, 2017, 12:05:08 PM »

A helicopter was flying around above Seattle when an electrical malfunction disabled all of the aircraft's electronic navigation and communications equipment. Due to the clouds and haze, the pilot could not determine the helicopter's position and course to fly to the airport. The pilot saw a tall building, flew toward it, circled, drew a handwritten sign, and held it in the helicopter's window. The pilot's sign said "WHERE AM I?" in large letters. People in the tall building quickly responded to the aircraft, drew a large sign, and held it in a building window. Their sign read "YOU ARE IN A HELICOPTER." The pilot smiled, waved, looked at his map, determined the course to steer to SEATAC airport, and landed safely. After they were on the ground, the co-pilot asked the pilot how the "YOU ARE IN A HELICOPTER" sign helped determine their position. The pilot responded "I knew that had to be the Microsoft building because they gave me a technically correct, but completely useless answer."

HAHAHA! This is really funny  :lol:  :biggrin:
; An assembly researcher.