Author Topic: x86 Machine Code Statistics  (Read 1715 times)

LiaoMi

  • Member
  • ****
  • Posts: 582
x86 Machine Code Statistics
« on: February 20, 2019, 10:45:16 PM »
Quote
Which instruction is the most common one in your code? In this test, three popular open-source applications were disassembled and analysed with a Basic script:

7-Zip archiver (version 2.30 beta 28, file 7za.exe),
LAME encoder (version 3.92 MMX, file lame.exe), and
NSIS installer (version 2.0, file makensis.exe).
All programs were developed with Microsoft Visual C++ 6.0.

Most frequent instructions



https://www.strchr.com/x86_machine_code_statistics

jj2007

  • Member
  • *****
  • Posts: 9697
  • Assembler is fun ;-)
    • MasmBasic
Re: x86 Machine Code Statistics
« Reply #1 on: February 20, 2019, 10:48:45 PM »
Interesting. I see a slight dominance of push over pop :bgrin:

It's invoke someproc, arg1, ... of course. More intriguing is the imbalance of call and ret. And good ol' fpu is still present: fld+fstp=2%

Mikl__

  • Member
  • ****
  • Posts: 762
Re: x86 Machine Code Statistics
« Reply #2 on: February 20, 2019, 11:08:42 PM »
Quote
I see a slight dominance of push over pop
Ciao, jj2007!
When students learn assembler, they are gived the rule: “There must be push for every pop” -- "На каждую пушу есть своя попа", and it turns out to be a double meaning. In russian, it sounds funny because of the play on words - the word "popa" is sounding as the word "ass"

jj2007

  • Member
  • *****
  • Posts: 9697
  • Assembler is fun ;-)
    • MasmBasic
Re: x86 Machine Code Statistics
« Reply #3 on: February 21, 2019, 01:08:04 AM »
Hmmm... 120x .if without .endif :lol:
Code: [Select]
2908 mov
2796 push
1258 pop
1254 .if
1134 .endif
961 invoke
838 call
660 inc
637 add
567 ret
484 test
430 dec
362 sub
249 je
238 fld
222 .Repeat ... .Until
122 m2m
112 fstp
106 movlps
99 jne
87 movups
74 SendMessage
51 js
37 .While ... .Endw
20 pcmpeq?
16 movaps
13 jns
12 jnc

HSE

  • Member
  • *****
  • Posts: 1110
  • <AMD>< 7-32>
Re: x86 Machine Code Statistics
« Reply #4 on: February 21, 2019, 02:17:38 AM »
m2m?
The scrip is guessing what the code was. Really bad for .if .endif. Perhaps fail because .else.

felipe

  • Member
  • *****
  • Posts: 1249
  • Eagles are just great!
Re: x86 Machine Code Statistics
« Reply #5 on: February 21, 2019, 02:44:26 AM »
More intriguing is the imbalance of call and ret.

Maybe the compiler jumps back to return to the calling function...(i mean using jmp of course) :idea:. Maybe the compiler calls some functions and jumps to another before the first one finish, just to make the code a little more difficult to understand (obfuscated code)... :idea:

Anyway, where are the statistics of the use of instructions from the users of this forum!  :bgrin:
Felipe.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6676
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: x86 Machine Code Statistics
« Reply #6 on: February 21, 2019, 03:04:31 AM »
Anything built with Microsoft Visual C++ 6.0 would be old code. The best optimisation I saw from 32 bit Microsoft C compilers was VC2003 which produced some very good if obscure optimisations.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

TimoVJL

  • Member
  • ***
  • Posts: 452
Re: x86 Machine Code Statistics
« Reply #7 on: February 21, 2019, 03:08:52 AM »
It was from disassembled code,  so no exact correlations between them.
There can be several calls to same subroutine and subroutine can have several ret's.

BTW: what optimizing C/C++ compiler do, just give a try:
x86-64 gcc 8.2 -O2 -mavx2 -ffast-math:
https://gcc.godbolt.org/
Code: [Select]
float scalarproduct(float * array1, float * array2, int length) {
  float sum = 0.0f;
  for (int i = 0; i < length; ++i) {
    sum += array1[i] * array2[i];
  }
  return sum;
}
May the source be with you

felipe

  • Member
  • *****
  • Posts: 1249
  • Eagles are just great!
Re: x86 Machine Code Statistics
« Reply #8 on: February 21, 2019, 04:27:32 AM »
There can be several calls to same subroutine and subroutine can have several ret's.

Of course! well thought   :t
Felipe.

Raistlin

  • Member
  • ****
  • Posts: 501
Re: x86 Machine Code Statistics
« Reply #9 on: February 21, 2019, 04:29:34 AM »
What I found extremely frightening was Intel's own published code for CPU identification inclusive of features. The use of mov ax serially executed 4 to 6 times... yes I might not get out much. So potentially not so strange to find these kind of interesting pseudo anomalies.
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

jj2007

  • Member
  • *****
  • Posts: 9697
  • Assembler is fun ;-)
    • MasmBasic
Re: x86 Machine Code Statistics
« Reply #10 on: February 21, 2019, 04:48:33 AM »
Really bad for .if .endif. Perhaps fail because .else

The statistics are for one of my sources. I've investigated a bit, the discrepancy has three causes:
1. .Break .if eax
2. nop ; there is .if in the comments
3. conditional assembly

HSE

  • Member
  • *****
  • Posts: 1110
  • <AMD>< 7-32>
Re: x86 Machine Code Statistics
« Reply #11 on: February 21, 2019, 12:10:29 PM »
The statistics are for one of my sources. I've investigated a bit, the discrepancy has three causes:
1. .Break .if eax
2. nop ; there is .if in the comments
3. conditional assembly
interesting, you never use .else, and I never used .break  :biggrin:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6676
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: x86 Machine Code Statistics
« Reply #12 on: February 21, 2019, 01:43:02 PM »
Doing instruction analysis on compiler code is little better than a lesson in the history of optimisation theory. Look at the data of the compiler then look at the optimisation theory for the ten years or so before that date and the characteristics of the then prevailing hardware and you have answered your question.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 9697
  • Assembler is fun ;-)
    • MasmBasic
Re: x86 Machine Code Statistics
« Reply #13 on: February 21, 2019, 02:29:26 PM »
interesting, you never use .else, and I never used .break  :biggrin:
No, I just forgot to count them:
Code: [Select]
205 .else
101 .elseif

Plus else in conditional assembly (of course, a disassembly can't analyse these, you need the source):
Code: [Select]
1117 else (without dot)
189 .err *)
172 ife
109 elseifidni
74 elseifidn
13 elseifdifi
5 elseifdif

*) of which 149*specific error messages like the ones that compilers generate, e.g.
Code: [Select]
if @InStr(sc+1, <arg>, <eax>) or @InStr(sc+1, <arg>, <edx>)
tmp$ CATSTR <## line >, %@Line, <: Insert *$(n)=>, src$, < won't work here. Use a non-volatile register ##>
% echo tmp$
.err
exitm
endif

For example, a line like Insert info$(infoCt)=eax would generate this message:
Code: [Select]
## line 604: Insert *$(n)=eax won't work here. Use a non-volatile register ##

AW

  • Member
  • *****
  • Posts: 2338
  • Let's Make ASM Great Again!
Re: x86 Machine Code Statistics
« Reply #14 on: February 21, 2019, 02:40:06 PM »
Microsoft compilers push ebp in the beginning but dont pop ebp in the end, do a leave.