The file to dissasemble is loaded in memory. I don't use a mapped file because Windows writes in it. It is simply read into memory like you process for a normal file. When the file is selected it is copied into the "Work" Folder. I work now with the copy. Once the file is in memory it is closed.
I allow *.exe and *.dll. They must be 32 or 64 bits.
I read the directories IMPORT, RESOURCES and IAT.
If the file does not have an IMPORT directory it is an error.
Then I read all the sections and keep those which are flagged as CODE or DATA (IMAGE_SCN_CNT_CODE or IMAGE_SCN_CNT_UNINITIALIZED_DATA or IMAGE_SCN_CNT_INITIALIZED_DATA)
I have not made the IMPORT list, it will be created later at the end of the disassembly when I process labels and procs.
The source code for this part of the work is joint here.
Now I can beguin disaasembly.
The datas I have found :
Quote
;Three byte VEX escape prefix
;The layout is as follows, starting with a byte with value 0xC4:
; 7 0 7 0 7 0
;+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
;| 1 1 0 0 0 1 0 0 | |~R |~X |~B | map_select | |W/E| ~vvvv | L | pp |
;+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
;A VEX instruction whose values for certain fields are VEX.~X == 1, VEX.~B == 1,
;VEX.W/E == 0 and map_select == b00001 may be encoded using the two byte VEX escape prefix.
;Three byte XOP escape prefix
;The layout is the same as the three-byte VEX escape prefix, but with initial byte value 0x8F:
; 7 0 7 0 7 0
;+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
;| 1 0 0 0 1 1 1 1 | |~R |~X |~B | map_select | |W/E| ~vvvv | L | pp |
;+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
;Two byte VEX escape prefix
;A VEX instruction whose values for certain fields are VEX.~X == 1, VEX.~B == 1,
;VEX.W/E == 0 and map_select == b00001 may be encoded using the two byte VEX escape prefix.
;The layout is as follows:
; 7 0 7 0
;+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
;| 1 1 0 0 0 1 0 1 | |~R | ~vvvv | L | pp |
;+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
;Table 3-2. Addressable General Purpose Registers
; Register Type Without REX With REX
;Byte Registers AL, BL, CL, DL, AH, BH, CH, DH AL, BL, CL, DL, DIL, SIL, BPL, SPL, R8L - R15L
;Word Registers AX, BX, CX, DX, DI, SI, BP, SP AX, BX, CX, DX, DI, SI, BP, SP, R8W - R15W
;Doubleword Registers EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D
;Quadword Registers N.A. RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8 - R15
; Operand-Size Prefix 66H
; Address-Size Prefix 67H
;Table 3-3. Effective Operand- and Address-Size Attributes
;D Flag in Code Segment Descriptor 0 0 0 0 1 1 1 1
;Operand-Size Prefix 66H N N Y Y N N Y Y
;Address-Size Prefix 67H N Y N Y N Y N Y
;Effective Operand Size 16 16 32 32 32 32 16 16
;Effective Address Size 16 32 16 32 32 16 32 16
;Table 3-4. Effective Operand- and Address-Size Attributes in 64-Bit Mode
;L Flag in Code Segment Descriptor 1 1 1 1 1 1 1 1
;REX.W Prefix 0 0 0 0 1 1 1 1
;Operand-Size Prefix 66H N N Y Y N N Y Y
;Address-Size Prefix 67H N Y N Y N Y N Y
;Effective Operand Size 32 32 16 16 64 64 64 64
;Effective Address Size 64 32 64 32 64 32 64 32
What I have to do before writing any code.
OpCodes are in many categories/groups. I must make the list.
What to do do when an opcodes is found just after the previous.
Example : 66h 66h 66h Is it data in code segment?
Except for 0F followed by 0F which is 3DNow!
Or lea eax,[eax+eax] = NOP ?
What to do if a NOP has a parameter (allowed)?
If I find such a code : 90,37,90 (NOP, AAA, NOP). The problem is AAA which is forbidden in 64 bits mode. What do I do?
I must make the list of the segment override.
For the operand of AND, XOR, NOT it will be displayed in binary form.
That's all for today.
I have many work to do I thought taht it would more difficult to walk througt the PE file.
writing a disassembler is not an easy task.
obconv with is source code in c++ could give you an idea of the work needed to fulfill the goal.
http://www.agner.org/optimize/
Quotewriting a disassembler is not an easy task.
Right :t
The best way is work on a open source disassembler like Olly Disasm/BeaEngine or diStorm.
The Disasm/BeaEngine is very good an support 64bit.
Hi Grincheux,
I am trying to do the same thing.
The only reference that I have found so far is the "programmer's reference manual".
Chapter 17 goes into some detail. Also I have found an OPcode list.
I have not progressed as far as you have. Looks like a daunting task.
May I ask where you found your information?
https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx
http://www.agner.org/optimize/
http://www.sunshine2k.de/reversing/tuts/tut_rvait.htm
http://www.csn.ul.ie/~caolan/pub/winresdump/winresdump/doc/pefile.html
https://msdn.microsoft.com/en-us/windows/hardware/gg463180.aspx
http://x86.renejeschke.de/
http://www.sandpile.org/
http://ref.x86asm.net/index.html
http://wiki.osdev.org/X86-64_Instruction_Encoding
https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html --- ABSOLUTELY DOWNLOAD
https://www.onlinedisassembler.com/odaweb/
http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf --- ABSOLUTELY DOWNLOAD
http://support.amd.com/TechDocs/24592.pdf --- ABSOLUTELY DOWNLOAD
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2008/10/24594_APM_v3.pdf --- ABSOLUTELY DOWNLOAD
Perhaps there are other but I think that I listed the ones I frequently use.
Write a dummy program with instruction code liket "db xxh,xxh,xxh" into the code section and pass it to a debugger.
Links on my site : http://www.phrio.biz/AsmForFun/links/
here, http://www.phrio.biz/mediawiki/Current_project, I describe what I do into this project.
I suppose you use MASM32 or JWASM.
Is it a 32 bits or a 64 bits application?
This program is for which OS? Windows, Linux...
Will you produce 32, 64 bits decoded instructions or both?
I will produce 32 bits if the program is a 32 bit one (IMAGE_FILE_MACHINE_I386) or 64 bits (IMAGE_FILE_MACHINE_AMD64)
I will try to locate the functions arguments automatically. In a list of push before the call I can find the good arguments. For example, CreateWindows has 11 arguments => 11 pushs so it is easy to identify them and name the adress pushed with the argument's name.
HWND WINAPI CreateWindow(
_In_opt_ LPCTSTR lpClassName,
_In_opt_ LPCTSTR lpWindowName,
_In_ DWORD dwStyle,
_In_ int x,
_In_ int y,
_In_ int nWidth,
_In_ int nHeight,
_In_opt_ HWND hWndParent,
_In_opt_ HMENU hMenu,
_In_opt_ HINSTANCE hInstance,
_In_opt_ LPVOID lpParam
);
The thenth parameter, if I am lucky, very often is a global variable.
Send me a PM we will discuss longer.
Bon courage !
i often have hInstance in a register (EBX, ESI, or EDI)
at least, when i create the main window
that is because i previously used it to register a window class :biggrin:
;------------------------------
;initialize common controls
INVOKE InitCommonControlsEx,offset icc
;------------------------------
;register the window class
xor edi,edi ;EDI = 0
mov esi,offset wc ;ESI = offset wc
INVOKE GetModuleHandle,edi
mov [esi].WNDCLASSEX.hInstance,eax
xchg eax,ebx ;EBX = wc.hInstance
INVOKE LoadIcon,ebx,IDI_ICON
mov [esi].WNDCLASSEX.hIcon,eax
mov [esi].WNDCLASSEX.hIconSm,eax
INVOKE LoadCursor,edi,IDC_ARROW
mov [esi].WNDCLASSEX.hCursor,eax
INVOKE RegisterClassEx,esi
;------------------------------
;create the window
call CreateMainMenu
INVOKE CreateWindowEx,edi,offset szClassName,offset szAppName,
WS_OVERLAPPEDWINDOW or WS_VISIBLE or WS_CLIPCHILDREN,
CW_USEDEFAULT,SW_SHOWNORMAL,MAIN_WIDTH,MAIN_HEIGHT,edi,eax,ebx,edi
INVOKE UpdateWindow,eax
;------------------------------
other times, i create multiple child windows, from tables, in a loop like this one
;***********************************************************************************************
OPTION PROLOGUE:None
OPTION EPILOGUE:None
TableCreate PROC lphWndParent:LPHANDLE,lpTable:LPVOID
push esi
push edi
mov esi,[esp+16] ;ESI = lpTable
push ebx
push ebp
lodsd
mov edx,hInstance ;EDX = hInstance
xchg eax,edi ;EDI = lpHandleList
lodsd
mov ebx,[esp+20] ;EBX = hWndParent
xchg eax,ebp ;EBP = table entry count
Creat0: lodsd ;EAX = lParam
push edx ;save hInstance
push eax ;CreateWindowEx:lParam
lodsd ;EAX = hMenu
push edx ;CreateWindowEx:hInstance
push eax ;CreateWindowEx:hMenu
mov ecx,[ebx] ;ECX = hWndParent
lodsd ;EAX = nHeight
push ecx ;CreateWindowEx:hWndParent
push eax ;CreateWindowEx:nHeight
lodsd ;EAX = nWidth
push eax ;CreateWindowEx:nWidth
lodsd ;EAX = y (position)
push eax ;CreateWindowEx:y
lodsd ;EAX = x (position)
push eax ;CreateWindowEx:x
lodsd ;EAX = dwStyle
push eax ;CreateWindowEx:dwStyle
lodsd ;EAX = lpWindowName
push eax ;CreateWindowEx:lpWindowName
lodsd ;EAX = lpClassName
push eax ;CreateWindowEx:lpClassName
lodsd ;EAX = dwExStyle
push eax ;CreateWindowEx:dwExStyle
CALL CreateWindowEx
or edi,edi
jz Creat1
stosd ;store the handle
Creat1: or eax,eax
jz Creat3
mov ecx,[esi-8]
cmp ecx,offset szStBarClass
jz Creat2
cmp ecx,offset szBtnClass
jz Creat2
cmp ecx,offset szStcClass
jnz Creat3
Creat2: push eax
INVOKE SendMessage,eax,WM_SETFONT,hFont,TRUE
pop eax
Creat3: dec ebp
pop edx ;EDX = hInstance
jnz Creat0
pop ebp
pop ebx
pop edi
pop esi
ret 8
TableCreate ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;***********************************************************************************************
another thing that you may run into (especially in a masm forum)
you might see cases where one argument is pushed,
then a call is made to acquire the next argument
example
(assume that EAX holds the address of a RECT structure)
push eax
INVOKE GetWindowHandle ;(or something)
push eax
call GetWindowRect
both push'es are arguments for GetWindowRect
and - you may see several lines of unrelated code in between - lol
i often do something similar for WM_PAINT
where i push a saved object handle, then use it later as an argument to restore the DC
so - trying to keep track of what's on the stack is not always a simple thing
I never did with a table, and did not knew it. Curious.
mov eax,__hInstance
mov [edi].WNDCLASS.hInstance,eax
INVOKE GetStockObject,BLACK_BRUSH
mov [edi].WNDCLASS.hbrBackground,eax
mov [edi].WNDCLASS.lpszMenuName,IDM_MENU
mov [edi].WNDCLASS.lpszClassName,OFFSET szClass
INVOKE RegisterClass,edi
----------------------------------------------------------
push OFFSET szPgmDirectory
push OFFSET @F
push OFFSET szPgmFileName
push OFFSET szPgmDirectory
push OFFSET PathFindFileName
push MAX_PATH
push OFFSET szPgmFileName
push eax
push OFFSET lstrcpy
jmp GetModuleFileName
---------------------------------------------------------------
@WmPaint :
lea eax,_Ps
push eax
push __hWnd
push eax
push __hWnd
call BeginPaint
push edi
mov edi,eax
INVOKE CreateCompatibleDC,eax
push esi
mov esi,eax
push eax ; _hDCMem
INVOKE LoadImage,hInstance,IDB_BITMAP_01,IMAGE_BITMAP,0,0,LR_VGACOLOR
push eax ; _hBitmap
push ebx
mov ebx,eax
INVOKE GetObject,ebx,SIZEOF BITMAP,ADDR _Bitmap
INVOKE SelectObject,esi,ebx
mov ebx,eax
xchg ebx,DWord ptr [esp]
INVOKE BitBlt,edi,0,0,_Bitmap.bmWidth,_Bitmap.bmHeight,esi,0,0,SRCCOPY
push esi
call SelectObject
call DeleteObject
call DeleteDC
call EndPaint
pop esi
pop edi
INVOKE SetFocus,hListBox
xor eax,eax
ret
Your code and mine will cause headache.
here's a program i wrote a few years ago
it uses TableCreate - and demonstrates the tables
the program can open a BMP image, then pan/zoom
i also wanted to play with different StretchBlt modes, so they are in the View menu
Here´s dave file disassembled and reassembled with RosAsm in one click. File is working.
1 - Open the original file in RosAsm
2 - Select the disasm choices it appears (Enable the Convert Direct Api calls to Indirect and Fix Direct Api convertion)
3 - After it disassembled it, just press "Run" from the menu and voilá ;)
Get ready, phillipe. You will have tons of headaches with the disassembler :icon_mrgreen:
But...it is fun to develop !!!
On small files it is easier to you test. You know you are suceeding when you are able to disasm and reasssemble those file.
Whenever you suceed to disassemble a simple file (Try it with small "messagebox" fils 1st. You can now start bigger ones, stage by stage, fixing the errors that are showing.
If you try to disasm big files 1st, then you will have tons of problems. 1St of all you need to make the tests on tiny ones and develop slowly, piece by piece.
Keep in mind that there is no such a thing of a perfectly automatically disassembler. The decision of choosing what is code and what is data, sometimes is needed to be done by hand but, until you reach a development stage of more then 95% of success disassemble listings, it will be a loong way to go. Get ready ;)
And absolutely, avoid testing on packed files. Let this stage of development for the last of the last. Since packed files are complex to decide and isolate code/data, the better is try with real files (normal ones, not packed) and eventually you will end up improving the disasm to handle those sort of complex ones.
Thank you for your files and your encouragement.
The kind of analyze : http://www.phrio.biz/mediawiki/Current_project
Thanks for responding Grincheux.
Answers to your questions:
GoAsm 64-bit
64-bit application
windows 7pro 64-bit
64-bit code
I will spend a lot of time reading your list before attempting any code.
I need you.
Could you download my program here (http://www.phrio.biz/ASD/ASD.exe) for test.
Launch it and click on the menu item "Open". That's all
Send me a hard copy of the box opened after you select the file.
Send me the file "Grumpy.phr" (< 200 bytes).
This file is into the folder from which the program has been launched.
The file you send me contains the results, EAX, EBX, ECX and EDX, of CPUI for function 0, 1, 7, 8000001, 8000002, 8000003, 8000004 and 8000008
The box which appears would select the good cpu and checks what has been detected.
Thank You in advance
This is what I get from your download.
So now that Hutch has replied, should I still reply?
maybe this will help...
http://users.atw.hu/instlatx64/ (http://users.atw.hu/instlatx64/)
Thanks, but the datas are incomplete, no data for function 7.
I will analyze hutch datas, because he took some time to send what I request, and I have no Pentium, all AMD.
This is the only file I got.
Folder "work" is empty.
jack,
When you run Phillipe's program, open an assembler binary and you will get a result in the work directory.
here is the other one.
Quote from: Grincheux on January 18, 2016, 09:17:28 AM
Thanks, but the datas are incomplete, no data for function 7.
if you look at the later intel data dumps, they do provide function 7 information
(if you looked at earlier processors, they do not)
i did not upload data because i am using an older pentium IV processor
You may want to take a look at this online disassembler
https://www.onlinedisassembler.com/odaweb/Fsu7h0S4/0
Instruction set
I asked to mysel why add, sud... had opcodes before 80h and after 80h. I found the answer.
I thought that the opcode gave the name of the instruction.
That's wrong.
I knew the two last bits of opcode that give Direction and width.
In fact opcodes are composed of
xxx = Instruction (0 ) special, Or, And, cmp, sub, add,mov 1, mov 2
___xx = registers (AX, CX, DX, BX)
_____xDW
It is a bit complicated but with this articles it is easy (my problem is to understand the true sens of english words)
https://courses.engr.illinois.edu/ece390/books/artofasm/CH03/CH03-3.html
and take a look at this one
http://www.c-jump.com/CIS77/CPU/x86/lecture.html
I need to ask a question please.
My knowledge of this subject is absolute zero.
I am not interested in disassembling 16-bit or 32-bit code.
I am interested in disassembling 64-bit code.
From my reading so far there should be a large change
from the 32-bit architecture to the 64-bit architecture.
I have found nothing so far that refers to the 64-bit architecture.
Can someone straighten me out on this?
http://www.phrio.biz/mediawiki/Current_project
http://x86.renejeschke.de/
http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/
https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
The minimum required to disassemble 64 bits code is to know 32 bits code.
************* Can someone straighten me out on this? ***************
dumpbin /DISASM for 32 and 64 bits do that.
This tool is part of the sdk
Here a sample in 64 bits