The MASM Forum

General => The Colosseum => Topic started by: LiaoMi on February 20, 2019, 10:45:16 PM

Title: x86 Machine Code Statistics
Post by: LiaoMi on February 20, 2019, 10:45:16 PM
Quote
Which instruction is the most common one in your code? In this test, three popular open-source applications were disassembled and analysed with a Basic script:

7-Zip archiver (version 2.30 beta 28, file 7za.exe),
LAME encoder (version 3.92 MMX, file lame.exe), and
NSIS installer (version 2.0, file makensis.exe).
All programs were developed with Microsoft Visual C++ 6.0.

Most frequent instructions

(https://www.strchr.com/media/top20_instructions_x86.png)

https://www.strchr.com/x86_machine_code_statistics (https://www.strchr.com/x86_machine_code_statistics)
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 20, 2019, 10:48:45 PM
Interesting. I see a slight dominance of push over pop :bgrin:

It's invoke someproc, arg1, ... of course. More intriguing is the imbalance of call and ret. And good ol' fpu is still present: fld+fstp=2%
Title: Re: x86 Machine Code Statistics
Post by: Mikl__ on February 20, 2019, 11:08:42 PM
Quote
I see a slight dominance of push over pop
Ciao, jj2007!
When students learn assembler, they are gived the rule: “There must be push for every pop” -- "На каждую пушу есть своя попа", and it turns out to be a double meaning. In russian, it sounds funny because of the play on words - the word "popa" is sounding as the word "ass"
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 21, 2019, 01:08:04 AM
Hmmm... 120x .if without .endif :lol:
Code: [Select]
2908 mov
2796 push
1258 pop
1254 .if
1134 .endif
961 invoke
838 call
660 inc
637 add
567 ret
484 test
430 dec
362 sub
249 je
238 fld
222 .Repeat ... .Until
122 m2m
112 fstp
106 movlps
99 jne
87 movups
74 SendMessage
51 js
37 .While ... .Endw
20 pcmpeq?
16 movaps
13 jns
12 jnc
Title: Re: x86 Machine Code Statistics
Post by: HSE on February 21, 2019, 02:17:38 AM
m2m?
The scrip is guessing what the code was. Really bad for .if .endif. Perhaps fail because .else.
Title: Re: x86 Machine Code Statistics
Post by: felipe on February 21, 2019, 02:44:26 AM
More intriguing is the imbalance of call and ret.

Maybe the compiler jumps back to return to the calling function...(i mean using jmp of course) :idea:. Maybe the compiler calls some functions and jumps to another before the first one finish, just to make the code a little more difficult to understand (obfuscated code)... :idea:

Anyway, where are the statistics of the use of instructions from the users of this forum!  :bgrin:
Title: Re: x86 Machine Code Statistics
Post by: hutch-- on February 21, 2019, 03:04:31 AM
Anything built with Microsoft Visual C++ 6.0 would be old code. The best optimisation I saw from 32 bit Microsoft C compilers was VC2003 which produced some very good if obscure optimisations.
Title: Re: x86 Machine Code Statistics
Post by: TimoVJL on February 21, 2019, 03:08:52 AM
It was from disassembled code,  so no exact correlations between them.
There can be several calls to same subroutine and subroutine can have several ret's.

BTW: what optimizing C/C++ compiler do, just give a try:
x86-64 gcc 8.2 -O2 -mavx2 -ffast-math:
https://gcc.godbolt.org/
Code: [Select]
float scalarproduct(float * array1, float * array2, int length) {
  float sum = 0.0f;
  for (int i = 0; i < length; ++i) {
    sum += array1[i] * array2[i];
  }
  return sum;
}
Title: Re: x86 Machine Code Statistics
Post by: felipe on February 21, 2019, 04:27:32 AM
There can be several calls to same subroutine and subroutine can have several ret's.

Of course! well thought   :t
Title: Re: x86 Machine Code Statistics
Post by: Raistlin on February 21, 2019, 04:29:34 AM
What I found extremely frightening was Intel's own published code for CPU identification inclusive of features. The use of mov ax serially executed 4 to 6 times... yes I might not get out much. So potentially not so strange to find these kind of interesting pseudo anomalies.
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 21, 2019, 04:48:33 AM
Really bad for .if .endif. Perhaps fail because .else

The statistics are for one of my sources. I've investigated a bit, the discrepancy has three causes:
1. .Break .if eax
2. nop ; there is .if in the comments
3. conditional assembly
Title: Re: x86 Machine Code Statistics
Post by: HSE on February 21, 2019, 12:10:29 PM
The statistics are for one of my sources. I've investigated a bit, the discrepancy has three causes:
1. .Break .if eax
2. nop ; there is .if in the comments
3. conditional assembly
interesting, you never use .else, and I never used .break  :biggrin:
Title: Re: x86 Machine Code Statistics
Post by: hutch-- on February 21, 2019, 01:43:02 PM
Doing instruction analysis on compiler code is little better than a lesson in the history of optimisation theory. Look at the data of the compiler then look at the optimisation theory for the ten years or so before that date and the characteristics of the then prevailing hardware and you have answered your question.
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 21, 2019, 02:29:26 PM
interesting, you never use .else, and I never used .break  :biggrin:
No, I just forgot to count them:
Code: [Select]
205 .else
101 .elseif

Plus else in conditional assembly (of course, a disassembly can't analyse these, you need the source):
Code: [Select]
1117 else (without dot)
189 .err *)
172 ife
109 elseifidni
74 elseifidn
13 elseifdifi
5 elseifdif

*) of which 149*specific error messages like the ones that compilers generate, e.g.
Code: [Select]
if @InStr(sc+1, <arg>, <eax>) or @InStr(sc+1, <arg>, <edx>)
tmp$ CATSTR <## line >, %@Line, <: Insert *$(n)=>, src$, < won't work here. Use a non-volatile register ##>
% echo tmp$
.err
exitm
endif

For example, a line like Insert (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1134) info$(infoCt)=eax would generate this message:
Code: [Select]
## line 604: Insert *$(n)=eax won't work here. Use a non-volatile register ##
Title: Re: x86 Machine Code Statistics
Post by: AW on February 21, 2019, 02:40:06 PM
Microsoft compilers push ebp in the beginning but dont pop ebp in the end, do a leave.
Title: Re: x86 Machine Code Statistics
Post by: hutch-- on February 21, 2019, 03:10:09 PM
> Microsoft compilers push ebp in the beginning but dont pop ebp in the end, do a leave.  :t

That is a characteristic of 32 bit MASM as well.
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 21, 2019, 03:12:58 PM
Microsoft compilers push ebp in the beginning but dont pop ebp in the end, do a leave.

Not all of them, apparently:

Code: [Select]
char* somefunc(char* x, char* y) {
  char *instr;
  _asm int 3;
  instr=strstr(x, y);
  printf("in the func: %s\n", instr);
  return strstr(x, y);
}
Code: [Select]
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86
Code: [Select]
00F7B490      Ú$  55               push ebp                      ; Tmp.00F7B490(guessed Arg1,Arg2)
00F7B491      ³.  8BEC             mov ebp, esp
00F7B493      ³.  51               push ecx
00F7B494      ³.  CC               int3
...
00F7B4C9      ³.  8BE5             mov esp, ebp
00F7B4CB      ³.  5D               pop ebp
00F7B4CC      À.  C3               retn

That is a characteristic of 32 bit MASM as well.

Masm32 returns indeed with leave & ret. Test it:
Code: [Select]
include \masm32\include\masm32rt.inc

.code
start:
  int 3
  inkey str$(find$(1, "txTest", "Test"))
  exit

end start
Title: Re: x86 Machine Code Statistics
Post by: AW on February 21, 2019, 05:06:23 PM
It is true, they are not doing leave anymore in C/C++, probably because they want to insert security checks in the epilogue. I have no VC 6++ installed to see how things were in those times.  I would need to buy a floppy drive and install VC C++ 6.0 on a XP machine to figure it out. A lot of trouble.  :(
Title: Re: x86 Machine Code Statistics
Post by: TimoVJL on February 21, 2019, 06:16:03 PM
With option -O2
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8804 for 80x86
Code: [Select]
_somefunc:
00000000  56                       push esi
00000001  57                       push edi
00000002  CC                       int3
...
00000027  5F                       pop edi
00000028  5E                       pop esi
00000029  C3                       ret
Code: [Select]
_somefunc:
00000000  55                       push ebp
00000001  8BEC                     mov ebp, esp
00000003  51                       push ecx
00000004  CC                       int3
...
00000039  8BE5                     mov esp, ebp
0000003B  5D                       pop ebp
0000003C  C3                       ret
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 11.00.7022 for 80x86
Code: [Select]
00000000 55                   push ebp
00000001 8bec                 mov ebp, esp
00000003 51                   push ecx
00000004 53                   push ebx
00000005 56                   push esi
00000006 57                   push edi
00000007 cc                   int3
...
0000003c 5f                   pop edi
0000003d 5e                   pop esi
0000003e 5b                   pop ebx
0000003f 8be5                 mov esp, ebp
00000041 5d                   pop ebp
00000042 c3                   ret
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 9.00 for 80x86
Microsoft (R) 32-bit C/C++ Standard Compiler Version 10.00.6002 for 80x86
Code: [Select]
00000000 55                   push ebp
00000001 8bec                 mov ebp, esp
00000003 83ec04               sub esp, 4h
00000006 53                   push ebx
00000007 56                   push esi
00000008 57                   push edi
00000009 cc                   int3
...
00000043 5f                   pop edi
00000044 5e                   pop esi
00000045 5b                   pop ebx
00000046 c9                   leave
00000047 c3                   ret
Visual C++ 6, 5, 4.2/4, 2
Title: Re: x86 Machine Code Statistics
Post by: hutch-- on February 21, 2019, 06:36:18 PM
Timo's 2nd example is how 32 bit CL.EXE has usually constructed a stack frame, using LEAVE is usually a trait of MASM code added into a C executable. You used to see it in some 32 bit OS code at very low levels where a reasonably obvious piece of code was written in MASM and linked into the application.

Don't laugh but I see this stuff like working on a T model Ford, cute, interesting but why bother.  :P
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 21, 2019, 07:30:49 PM
Apparently, speed-wise there is no difference:
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

576     cycles for 100 * mov+pop
570     cycles for 100 * leave

565     cycles for 100 * mov+pop
565     cycles for 100 * leave

26      bytes for mov+pop
24      bytes for leave
Title: Re: x86 Machine Code Statistics
Post by: sinsi on February 21, 2019, 07:47:06 PM
The Intel 80386, part 9: Stack frame instructions (https://blogs.msdn.microsoft.com/oldnewthing/20190131-00/?p=100835)

Quote
Modern compilers avoid the ENTER instruction but keep the LEAVE instruction.
Title: Re: x86 Machine Code Statistics
Post by: hutch-- on February 21, 2019, 08:54:30 PM
Thanks, that is a good article. In 64 bit I started with ENTER as it was simple. I later did the alternatives which all work OK but to be honest, it was hard to tell the difference. Almost any high level code like system API code is so much slower then mnemonic code that the creation of a stack frame is trivial in comparison. In Win64 you have many more options, FASTCALL with registers with 4 or less arguments, no stack frame procedures, stack frame procedures and if you need it, aligned procedures with no stack frame.

As always, put your effort into things that matter, with pure assembler algorithms, write them as no stack frame if possible and absolutely kick the guts out of the code to get it up to speed. The rest is folklore.
Title: Re: x86 Machine Code Statistics
Post by: jj2007 on February 21, 2019, 09:24:20 PM
The Intel 80386, part 9: Stack frame instructions (https://blogs.msdn.microsoft.com/oldnewthing/20190131-00/?p=100835)

A little test for enter 20h, 10h inspired by the bonus chatter:
Code: [Select]
include \masm32\include\masm32rt.inc
.code

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

EnterTest proc _arg1, _arg2
  push esi
  enter 20h, 10h ; <<<<<<< there it is
  pop esi
  ret 2*4
EnterTest endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

start:
  or edx, -1
  .Repeat
push edx
sub edx, 11111111h
  .Until Zero?
  int 3
  invoke EnterTest, 12345678h, 87654321h
  exit
end start
Title: Re: x86 Machine Code Statistics
Post by: AW on February 22, 2019, 03:04:25 AM
It was from disassembled code,  so no exact correlations between them.
There can be several calls to same subroutine and subroutine can have several ret's.

BTW: what optimizing C/C++ compiler do, just give a try:
x86-64 gcc 8.2 -O2 -mavx2 -ffast-math:
https://gcc.godbolt.org/
Code: [Select]
float scalarproduct(float * array1, float * array2, int length from the caller (20)) {
  float sum = 0.0f;
  for (int i = 0; i < length; ++i) {
    sum += array1[i] * array2[i];
  }
  return sum;
}

x86-64 gcc 8.2 -O2 -mavx2 -ffast-math:
Code: [Select]
scalarproduct proc
        test    edx, edx
        jle     .L4
        lea     ecx, [rdx-1]
        xor     eax, eax
        vxorps  xmm0, xmm0, xmm0
        jmp     .L3
.L5:
        mov     rax, rdx
.L3:
        vmovss  xmm1, DWORD PTR [rdi+rax*4]
        vmulss  xmm1, xmm1, DWORD PTR [rsi+rax*4]
        lea     rdx, [rax+1]
        vaddss  xmm0, xmm0, xmm1
        cmp     rcx, rax
        jne     .L5
        ret
.L4:
        vxorps  xmm0, xmm0, xmm0
        ret
scalarproduct  endp

VS2017 avx2 optimized for size (yeah it snooped the true value of length, which is 20)
Code: [Select]
scalarproduct PROC
vxorps xmm0, xmm0, xmm0
sub rcx, rdx
mov eax, 20
$LL8@scalarprod:
vmovss xmm1, DWORD PTR [rcx+rdx]
vmulss xmm2, xmm1, DWORD PTR [rdx]
lea rdx, QWORD PTR [rdx+4]
vaddss xmm0, xmm0, xmm2
sub rax, 1
jne SHORT $LL8@scalarprod
ret 0
scalarproduct ENDP

VS2017 AVX2 using Fast Floating Point, optimized for speed (again he snooped the value of length):
Code: [Select]
scalarproduct  PROC
vmovups ymm1, YMMWORD PTR [rcx]
vmulps ymm3, ymm1, YMMWORD PTR [rdx]
vmovups ymm1, YMMWORD PTR [rcx+32]
vmulps ymm1, ymm1, YMMWORD PTR [rdx+32]
vaddps ymm0, ymm1, ymm3
vhaddps ymm1, ymm0, ymm0
vhaddps ymm2, ymm1, ymm1
vextractf128 xmm0, ymm2, 1
vaddps xmm4, xmm0, xmm2
vmovss xmm0, DWORD PTR [rdx+68]
vmulss xmm3, xmm0, DWORD PTR [rcx+68]
vmovss xmm2, DWORD PTR [rcx+64]
vmovss xmm0, DWORD PTR [rcx+72]
vmulss xmm1, xmm0, DWORD PTR [rdx+72]
vfmadd132ss xmm2, xmm4, DWORD PTR [rdx+64]
vaddss xmm2, xmm3, xmm2
vaddss xmm3, xmm2, xmm1
vmovss xmm2, DWORD PTR [rcx+76]
vmulss xmm0, xmm2, DWORD PTR [rdx+76]
vaddss xmm0, xmm3, xmm0
vzeroupper
ret 0
scalarproduct  ENDP


Anyway the first is built for Linux the others for Windows. No direct comparison.