News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

trying to write my own disassembler

Started by Grincheux, January 15, 2016, 03:59:21 PM

Previous topic - Next topic

Grincheux

The file to dissasemble is loaded in memory. I don't use a mapped file because Windows writes in it. It is simply read into memory like you process for a normal file. When the file is selected it is copied into the "Work" Folder. I work now with the copy. Once the file is in memory it is closed.

I allow *.exe and *.dll. They must be 32 or 64 bits.
I read the directories IMPORT, RESOURCES and IAT.
If the file does not have an IMPORT directory it is an error.
Then I read all the sections and keep those which are flagged as CODE or DATA (IMAGE_SCN_CNT_CODE or IMAGE_SCN_CNT_UNINITIALIZED_DATA or IMAGE_SCN_CNT_INITIALIZED_DATA)

I have not made the IMPORT list, it will be created later at the end of the disassembly when I process labels and procs.

The source code for this part of the work is joint here.

Now I can beguin disaasembly.
The datas I have found :

Quote
;Three byte VEX escape prefix
;The layout is as follows, starting with a byte with value 0xC4:
;  7                           0       7                           0     7                           0
;+---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
;|   1     1     0     0    0     1     0      0 |   |~R |~X |~B |     map_select    |   |W/E|     ~vvvv     |  L |         pp              |
;+---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
;A VEX instruction whose values for certain fields are VEX.~X == 1, VEX.~B == 1,
;VEX.W/E == 0 and map_select == b00001 may be encoded using the two byte VEX escape prefix.

;Three byte XOP escape prefix
;The layout is the same as the three-byte VEX escape prefix, but with initial byte value 0x8F:
;  7                           0       7                           0     7                           0
;+---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
;|   1    0     0     0     1     1     1     1 |     |~R |~X |~B |     map_select          |    |W/E|     ~vvvv           | L   |      pp  |
;+---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+

;Two byte VEX escape prefix
;A VEX instruction whose values for certain fields are VEX.~X == 1, VEX.~B == 1,
;VEX.W/E == 0 and map_select == b00001 may be encoded using the two byte VEX escape prefix.
;The layout is as follows:
;  7                                                0       7                                                      0
;+---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
;|  1     1     0     0      0     1     0     1 |   |~R |     ~vvvv             | L   |    pp     |
;+---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+

;Table 3-2. Addressable General Purpose Registers

;                                       Register Type Without REX                                    With REX
;Byte Registers               AL,  BL,  CL,  DL,  AH,  BH,  CH,  DH                    AL,  BL,  CL,  DL,  DIL, SIL, BPL, SPL, R8L - R15L
;Word Registers              AX,  BX,  CX,  DX,  DI,  SI,  BP,  SP                   AX,  BX,  CX,  DX,  DI,  SI,  BP,  SP,  R8W - R15W
;Doubleword Registers   EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP          EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D
;Quadword Registers     N.A.                                                                      RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8  - R15

;   Operand-Size Prefix 66H
;   Address-Size Prefix 67H

;Table 3-3. Effective Operand- and Address-Size Attributes

;D Flag in Code Segment Descriptor  0   0   0   0   1   1   1   1
;Operand-Size Prefix 66H            N   N   Y   Y   N   N   Y   Y
;Address-Size Prefix 67H            N   Y   N   Y   N   Y   N   Y
;Effective Operand Size             16  16  32  32  32  32  16  16
;Effective Address Size             16  32  16  32  32  16  32  16

;Table 3-4. Effective Operand- and Address-Size Attributes in 64-Bit Mode

;L  Flag in Code Segment Descriptor 1   1   1   1   1   1   1   1
;REX.W Prefix                              0   0   0   0   1   1   1   1
;Operand-Size Prefix 66H            N   N   Y   Y   N   N   Y   Y
;Address-Size Prefix 67H            N   Y   N   Y   N   Y   N   Y
;Effective Operand Size              32  32  16  16  64  64  64  64
;Effective Address Size              64  32  64  32  64  32  64  32

What I have to do before writing any code.

OpCodes are in many categories/groups. I must make the list.
What to do do when an opcodes is found just after the previous.
Example : 66h 66h 66h Is it data in code segment?
Except for 0F followed by 0F which is 3DNow!
Or lea eax,[eax+eax] = NOP ?
What to do if a NOP has a parameter (allowed)?
If I find such a code : 90,37,90 (NOP, AAA, NOP). The problem is AAA which is forbidden in 64 bits mode. What do I do?
I must make the list of the segment override.

For the operand of AND, XOR, NOT it will be displayed in binary form.
That's all for today.
I have many work to do I thought taht it would more difficult to walk througt the PE file.
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

TouEnMasm

writing a disassembler is not an easy task.
obconv with is source code in c++ could give you an idea of the work needed to fulfill the goal.
http://www.agner.org/optimize/
Fa is a musical note to play with CL

ragdog

Quotewriting a disassembler is not an easy task.

Right :t

The best way is work on a open source disassembler like Olly Disasm/BeaEngine or diStorm.
The Disasm/BeaEngine is very good an support 64bit.

shankle

Hi Grincheux,
I am trying to do the same thing.
The only reference that I have found so far is the "programmer's reference manual".
Chapter 17 goes into some detail. Also I have found an OPcode list.
I have not progressed as far as you have. Looks like a daunting task.
May I ask where you found your information?

Grincheux

https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx
http://www.agner.org/optimize/
http://www.sunshine2k.de/reversing/tuts/tut_rvait.htm
http://www.csn.ul.ie/~caolan/pub/winresdump/winresdump/doc/pefile.html
https://msdn.microsoft.com/en-us/windows/hardware/gg463180.aspx
http://x86.renejeschke.de/
http://www.sandpile.org/
http://ref.x86asm.net/index.html
http://wiki.osdev.org/X86-64_Instruction_Encoding
https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html --- ABSOLUTELY DOWNLOAD
https://www.onlinedisassembler.com/odaweb/
http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf --- ABSOLUTELY DOWNLOAD
http://support.amd.com/TechDocs/24592.pdf --- ABSOLUTELY DOWNLOAD
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2008/10/24594_APM_v3.pdf --- ABSOLUTELY DOWNLOAD

Perhaps there are other but I think that I listed the ones I frequently use.

Write a dummy program with instruction code liket "db xxh,xxh,xxh" into the code section and pass it to a debugger.

Links on my site : http://www.phrio.biz/AsmForFun/links/
here, http://www.phrio.biz/mediawiki/Current_project, I describe what I do into this project.
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

Grincheux

I suppose you use MASM32 or JWASM.
Is it a 32 bits or a 64 bits application?
This program is for which OS? Windows, Linux...
Will you produce 32, 64 bits decoded instructions or both?
I will produce 32 bits if the program is a 32 bit one (IMAGE_FILE_MACHINE_I386) or 64 bits (IMAGE_FILE_MACHINE_AMD64)
I will try to locate the functions arguments automatically. In a list of push before the call I can find the good arguments. For example, CreateWindows has 11 arguments => 11 pushs so it is easy to identify them and name the adress pushed with the argument's name.

HWND WINAPI CreateWindow(
  _In_opt_ LPCTSTR   lpClassName,
  _In_opt_ LPCTSTR   lpWindowName,
  _In_     DWORD     dwStyle,
  _In_     int       x,
  _In_     int       y,
  _In_     int       nWidth,
  _In_     int       nHeight,
  _In_opt_ HWND      hWndParent,
  _In_opt_ HMENU     hMenu,
  _In_opt_ HINSTANCE hInstance,
  _In_opt_ LPVOID    lpParam
);

The thenth parameter, if I am lucky, very often is a global variable.
Send me a PM we will discuss longer.

Bon courage !
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

dedndave

i often have hInstance in a register (EBX, ESI, or EDI)
at least, when i create the main window
that is because i previously used it to register a window class   :biggrin:

;------------------------------

;initialize common controls

        INVOKE  InitCommonControlsEx,offset icc

;------------------------------

;register the window class

        xor     edi,edi                                 ;EDI = 0
        mov     esi,offset wc                           ;ESI = offset wc
        INVOKE  GetModuleHandle,edi
        mov     [esi].WNDCLASSEX.hInstance,eax
        xchg    eax,ebx                                 ;EBX = wc.hInstance
        INVOKE  LoadIcon,ebx,IDI_ICON
        mov     [esi].WNDCLASSEX.hIcon,eax
        mov     [esi].WNDCLASSEX.hIconSm,eax
        INVOKE  LoadCursor,edi,IDC_ARROW
        mov     [esi].WNDCLASSEX.hCursor,eax
        INVOKE  RegisterClassEx,esi

;------------------------------

;create the window

        call    CreateMainMenu
        INVOKE  CreateWindowEx,edi,offset szClassName,offset szAppName,
                WS_OVERLAPPEDWINDOW or WS_VISIBLE or WS_CLIPCHILDREN,
                CW_USEDEFAULT,SW_SHOWNORMAL,MAIN_WIDTH,MAIN_HEIGHT,edi,eax,ebx,edi
        INVOKE  UpdateWindow,eax

;------------------------------


other times, i create multiple child windows, from tables, in a loop like this one

;***********************************************************************************************

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

TableCreate PROC   lphWndParent:LPHANDLE,lpTable:LPVOID

        push    esi
        push    edi
        mov     esi,[esp+16]      ;ESI = lpTable
        push    ebx
        push    ebp
        lodsd
        mov     edx,hInstance     ;EDX = hInstance
        xchg    eax,edi           ;EDI = lpHandleList
        lodsd
        mov     ebx,[esp+20]      ;EBX = hWndParent
        xchg    eax,ebp           ;EBP = table entry count

Creat0: lodsd             ;EAX = lParam
        push    edx       ;save hInstance
        push    eax                         ;CreateWindowEx:lParam
        lodsd             ;EAX = hMenu
        push    edx                         ;CreateWindowEx:hInstance
        push    eax                         ;CreateWindowEx:hMenu
        mov     ecx,[ebx] ;ECX = hWndParent
        lodsd             ;EAX = nHeight
        push    ecx                         ;CreateWindowEx:hWndParent
        push    eax                         ;CreateWindowEx:nHeight
        lodsd             ;EAX = nWidth
        push    eax                         ;CreateWindowEx:nWidth
        lodsd             ;EAX = y (position)
        push    eax                         ;CreateWindowEx:y
        lodsd             ;EAX = x (position)
        push    eax                         ;CreateWindowEx:x
        lodsd             ;EAX = dwStyle
        push    eax                         ;CreateWindowEx:dwStyle
        lodsd             ;EAX = lpWindowName
        push    eax                         ;CreateWindowEx:lpWindowName
        lodsd             ;EAX = lpClassName
        push    eax                         ;CreateWindowEx:lpClassName
        lodsd             ;EAX = dwExStyle
        push    eax                         ;CreateWindowEx:dwExStyle
        CALL    CreateWindowEx
        or      edi,edi
        jz      Creat1

        stosd             ;store the handle

Creat1: or      eax,eax
        jz      Creat3

        mov     ecx,[esi-8]
        cmp     ecx,offset szStBarClass
        jz      Creat2

        cmp     ecx,offset szBtnClass
        jz      Creat2

        cmp     ecx,offset szStcClass
        jnz     Creat3

Creat2: push    eax
        INVOKE  SendMessage,eax,WM_SETFONT,hFont,TRUE
        pop     eax

Creat3: dec     ebp
        pop     edx       ;EDX = hInstance
        jnz     Creat0

        pop     ebp
        pop     ebx
        pop     edi
        pop     esi
        ret     8

TableCreate ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

;***********************************************************************************************

dedndave

another thing that you may run into (especially in a masm forum)

you might see cases where one argument is pushed,
then a call is made to acquire the next argument

example
(assume that EAX holds the address of a RECT structure)

    push    eax
    INVOKE  GetWindowHandle ;(or something)
    push    eax
    call    GetWindowRect


both push'es are arguments for GetWindowRect
and - you may see several lines of unrelated code in between - lol

i often do something similar for WM_PAINT
where i push a saved object handle, then use it later as an argument to restore the DC
so - trying to keep track of what's on the stack is not always a simple thing

Grincheux

I never did with a table, and did not knew it. Curious.


mov eax,__hInstance
mov [edi].WNDCLASS.hInstance,eax

INVOKE GetStockObject,BLACK_BRUSH

mov [edi].WNDCLASS.hbrBackground,eax
mov [edi].WNDCLASS.lpszMenuName,IDM_MENU
mov [edi].WNDCLASS.lpszClassName,OFFSET szClass

INVOKE RegisterClass,edi
----------------------------------------------------------
push OFFSET szPgmDirectory
push OFFSET @F
push OFFSET szPgmFileName
push OFFSET szPgmDirectory
push OFFSET PathFindFileName
push MAX_PATH
push OFFSET szPgmFileName
push eax
push OFFSET lstrcpy
jmp GetModuleFileName
---------------------------------------------------------------
@WmPaint :

lea eax,_Ps
push eax
push __hWnd
push eax
push __hWnd
call BeginPaint
push edi
mov edi,eax
INVOKE CreateCompatibleDC,eax
push esi
mov esi,eax
push eax ; _hDCMem
INVOKE LoadImage,hInstance,IDB_BITMAP_01,IMAGE_BITMAP,0,0,LR_VGACOLOR
push eax ; _hBitmap
push ebx
mov ebx,eax
INVOKE GetObject,ebx,SIZEOF BITMAP,ADDR _Bitmap
INVOKE SelectObject,esi,ebx
mov ebx,eax
xchg ebx,DWord ptr [esp]
INVOKE BitBlt,edi,0,0,_Bitmap.bmWidth,_Bitmap.bmHeight,esi,0,0,SRCCOPY
push esi
call SelectObject
call DeleteObject
call DeleteDC
call EndPaint
pop esi
pop edi
INVOKE SetFocus,hListBox
xor eax,eax
ret


Your code and mine will cause headache.
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

dedndave

here's a program i wrote a few years ago
it uses TableCreate - and demonstrates the tables

the program can open a BMP image, then pan/zoom
i also wanted to play with different StretchBlt modes, so they are in the View menu

guga

Here´s dave file disassembled and reassembled with RosAsm in one click. File is working.
1 - Open the original file in RosAsm
2 - Select the disasm choices it appears (Enable the Convert Direct Api calls to Indirect and Fix Direct Api convertion)
3 - After it disassembled it, just press "Run" from the menu and voilá ;)

Get ready, phillipe. You will have tons of headaches with the disassembler  :icon_mrgreen:

But...it is fun to develop  !!!

On small files it is easier to you test. You know you are suceeding when you are able to disasm and reasssemble those file.

Whenever you suceed to disassemble a simple file (Try it with small "messagebox" fils 1st. You can now start bigger ones, stage by stage, fixing the errors that are showing.

If you try to disasm big files 1st, then you will have tons of problems. 1St of all you need to make the tests on tiny ones and develop slowly, piece by piece.

Keep in mind that there is no such a thing of a perfectly automatically disassembler. The decision of choosing what is code and what is data, sometimes is needed to be done by hand but, until you reach a development stage of more then 95% of success disassemble listings, it will be a loong way to go. Get ready ;)

And absolutely, avoid testing on packed files. Let this stage of development for the last of the last. Since packed files are complex to decide and isolate code/data, the better is try with real files (normal ones, not packed) and eventually you will end up improving the disasm to handle those sort of complex ones.

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

Thank you for your files and your encouragement.

The kind of analyze : http://www.phrio.biz/mediawiki/Current_project
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

shankle

Thanks for responding Grincheux.
Answers to your questions:
   GoAsm 64-bit
   64-bit application
   windows 7pro 64-bit
   64-bit code

I will spend a lot of time reading your list before attempting any code.


Grincheux

I need you.
Could you download my program here for test.
Launch it and click on the menu item "Open". That's all
Send me a hard copy of the box opened after you select the file.
Send me the file "Grumpy.phr" (< 200 bytes).
This file is into the folder from which the program has been launched.

The file you send me contains the results, EAX, EBX, ECX and EDX, of CPUI for function 0, 1, 7, 8000001, 8000002, 8000003, 8000004 and 8000008
The box which appears would select the good cpu and checks what has been detected.

Thank You in advance
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

hutch--

This is what I get from your download.