Quote0040102B: 67 81 06 11 01 78 56 34 12 add dword ptr ds:[0000h],12345678h
I don't understand this form of coding, I change bytes 3 and 4, I always get the same result!
It lloks like this form
Quote00401001 81 05 00 00 00 00 78 56 34 12 add dword ptr ds:[0],12345678h
0040100B 81 05 FE CA 00 00 78 56 34 12 add dword ptr ds:[0CAFEh],12345678h
Look at : Here (http://www.phrio.biz/mediawiki/Strange_Codings)
perhaps you've found a bug in either the assembler or disassembler
Quote0040102B: 67 81 06 11 01 78 56 34 12 add dword ptr ds:[0000h],12345678h
i believe that the 67h is a size override operator - which is out of place
as though you are assembling 32-bit code in a 16-bit segment
if we throw that away, i get
81 06 11 01 78 56 add dword ptr ds:[esi],56780111h ;(DS segment override implied)
34 12 xor al,12h
Exactly.
It have only 1 way to it be a valid instruction (packuswb) when used after 0F (or with other escape prefix 066)
http://www.phrio.biz/mediawiki/Strange_Codings
I list here all the codes that seem strange
That could be a way for a terrorist to pass a message
That could serve to install a protection...
Strange codings updated for codes 81h and 82h (http://www.phrio.biz/mediawiki/Strange_Codings)
I made three disassembly :
- Visual Studio 2015
- Borg
- dumppe (Masm)
With Borg and DumpPe the strange codes :eusa_clap: are not only with Visual Studio.
I ask to myself Why? :dazzled: MS does know its disassembly is bad! :eusa_naughty:
VS and dumppe are far from being a usable disassemblers. At most you can use them for having some basic notions of some small parts of chunk but it is not to be used as a regular daily basis.
Borg is extremelly old. The last time i used it was more then 15 years ago :greensml: But, why are you using such tools ? There are much much better ones for you to start.
2 of them are free and opensource (RosAsm and Olly - the disasm engine, i mean). other is commercial and extremelly expensive (IdaPro - but...you can find it ;) )
If you want to write your own disassembler, i strongly suggest you to read Rosasm source code. It is way more easier then it seems :icon_mrgreen:
I don't know how to use the RosAsm Disassembler :eusa_snooty:
IDA 5 free (from the maker) (https://www.hex-rays.com/products/ida/support/download_freeware.shtml)
Use the disassembler or it´s source ???
To use all you have to do is open a PE file on it...It will disassemble it automatically.
About the source code., you need to study the syntax, but it is not hard to follow. Look at small examples 1st (Iczelion´s, Test Department etc)
First use the disassembler
Quote[Data04265CC: D$ 01000000, 03000200, 0400, 0BBCCDD05, 0600AA, 080007, 0A0009
0C000B, 0CCDD0D00, 0E00AABB, 010000F00, 012001100, 014001300
0DD150000, 0AABBCC, 0170016, 0190018, 01B001A, 01D00001C
0AABBCCDD, 01F001E00]
Code0426620: A0:
push esi
push ebx
fxsave X$Virtual0463020
xchg eax ebx
xor edx edx
cmp ebx 0B | jne C8> ; Code042663C
dec ebx
Code0426632: B8:
call Code042664B
dec ebx | jns B8< ; Code0426632
jmp D3> ; Code042664
RosAsm DisAssembly!
:t
To see how accurate it was, I would need to look at the file to see the rest of the code but...
[Data04265CC: D$ 01000000, 03000200, 0400, 0BBCCDD05, 0600AA, 080007, 0A0009
0C000B, 0CCDD0D00, 0E00AABB, 010000F00, 012001100, 014001300
0DD150000, 0AABBCC, 0170016, 0190018, 01B001A, 01D00001C
0AABBCCDD, 01F001E00]
This is decoded as Data. Everything in between brackets "[" "]" remains to the data section. They are data. On this case, the data chain is formed by a array of DWORDS (D$)
Code0426620: A0:
push esi
push ebx
fxsave X$Virtual0463020
xchg eax ebx
xor edx edx
cmp ebx 0B | jne C8> ; Code042663C
dec ebx
Code0426632: B8:
call Code042664B
dec ebx | jns B8< ; Code0426632
jmp D3> ; Code042664
"X$ data type. This can be with any size. Since the target is a memory location and the opcode allow storing in bigger sizes (512-byte ) on this case, the data type used is "X$" meaning that it uses a size not "conventional". Conventional i mean: dword, qword, word etc
jne C8> ... It is performing a jmp below that line. In case, C8 is a address labeled on a short form. The token ">" means the direction of the jmp. In case it is below that line of code (Go down). If it was jumping before it the sign would be "<" . Same as forward/backward (or up/down)
Virtual0463020 A address in the virtual data section of the PE.
Code0426632: B8: The disassembler uses readable labels. So, "Code" means that the address belongs to the code section and it is, in fact, code and not data. The number after it, is the address. And the label "Data" (Like in Data04265CC) means that the address is, in fact, data and belongs to the data section. "Virtual": the same concept, but is a virtual address. The goal of any disassembler is basically distinguish what is code and what is data so...they are labeled accord to what they are.
The next token "B8" is just the short form of that address. Useful for jumps to that location. (More readable then we have in a source tons of "je CodeXXXX" "jne "CodeYYYYY" all over the place. (Nevertheless, we make a reference to the a dress in comments on the same line as in "jns B8< ; Code0426632" . (It will jmp to Code0426632 that is also labeled as "B8") - That address is written as "Code0426632: B8:".
The ":" sign means the address is a label. (In case, a code label)
Basically, all the values in the disassembly data are in hexadecimal form (0 in front of the value as in 01F001E00) (There are few cases when it disassembles as decimal, but in RosAsm the syntax of the data is trivial. 0 in front for hex (0A, 0B, 0FFFF etc etc) and without zero for decimal (9, 1, 125256, 777 etc). For binary are double zeroes after a "_" sign "00_" (00__0001, 00__0000_0001__0000_0000 etc)
Also, for hex the "h" char at the end is acceptable (but, needs 0 at the 1st). 0FFFFFEFFh for example.
call Code042664B a call to a function labeled as "Code042664B" meaning that at that address 042664B there is a function.
Here is all the project with RosAsm source file
Use 7zip to decompress
http://www.7-zip.org/download.html
Like i said..it is data correctly interpreted as such:
Rosasm listing:
Main:
Code0426580: A0:
push ebp
mov ebp esp
sub esp 08
push 00
push 080
push 02
push 00
push 00
push 040000000
push Data0462260
call 'kernel32.CreateFileA'
mov D$ebp-04 eax
push 00
lea eax D$ebp-08
push eax
push 036260
push Data042C000
push D$ebp-04
call 'kernel32.WriteFile'
push D$ebp-04
call 'kernel32.CloseHandle'
push 00
call 'kernel32.ExitProcess'
[Data04265CC: D$ 01000000, 03000200, 0400, 0BBCCDD05, 0600AA, 080007, 0A0009
0C000B, 0CCDD0D00, 0E00AABB, 010000F00, 012001100, 014001300
0DD150000, 0AABBCC, 0170016, 0190018, 01B001A, 01D00001C
0AABBCCDD, 01F001E00]
Code0426620: A0:
push esi
push ebx
fxsave X$Virtual0463020
xchg eax ebx
xor edx edx
cmp ebx 0B | jne C8> ; Code042663C
dec ebx
Code0426632: B8:
call Code042664B
dec ebx | jns B8< ; Code0426632
jmp D3> ; Code0426641
Code042663C: C8:
call Code042664B
Code0426641: D3:
fxrstor X$Virtual0463020
pop ebx
pop esi
ret
IdaPro listing
; =============== S U B R O U T I N E =======================================
; Attributes: noreturn bp-based frame
public start
start proc near
NumberOfBytesWritten= dword ptr -8
hFile = dword ptr -4
push ebp
mov ebp, esp
sub esp, 8
push 0 ; hTemplateFile
push 80h ; dwFlagsAndAttributes
push 2 ; dwCreationDisposition
push 0 ; lpSecurityAttributes
push 0 ; dwShareMode
push 40000000h ; dwDesiredAccess
push offset asc_462260 ; "C:\\Users\\Grincheux\\Documents\\PrjJwA"...
call CreateFileA
mov [ebp+hFile], eax
push 0 ; lpOverlapped
lea eax, [ebp+NumberOfBytesWritten]
push eax ; lpNumberOfBytesWritten
push 36260h ; nNumberOfBytesToWrite
push offset unk_42C000 ; lpBuffer
push [ebp+hFile] ; hFile
call WriteFile
push [ebp+hFile] ; hObject
call CloseHandle
push 0 ; uExitCode
call ExitProcess
start endp
; ---------------------------------------------------------------------------
dd 1000000h, 3000200h, 400h, 0BBCCDD05h, 600AAh, 80007h
dd 0A0009h, 0C000Bh, 0CCDD0D00h, 0E00AABBh, 10000F00h
dd 12001100h, 14001300h, 0DD150000h, 0AABBCCh, 170016h
dd 190018h, 1B001Ah, 1D00001Ch, 0AABBCCDDh, 1F001E00h
; =============== S U B R O U T I N E =======================================
sub_426620 proc near ; CODE XREF: sub_426DC8+9AFp
; sub_427EB7+18Bp ...
push esi
push ebx
fxsave ds:dword_463020
xchg eax, ebx
xor edx, edx
cmp ebx, 0Bh
jnz short loc_42663C
dec ebx
loc_426632: ; CODE XREF: sub_426620+18j
call sub_42664B
dec ebx
jns short loc_426632
jmp short loc_426641
; ---------------------------------------------------------------------------
loc_42663C: ; CODE XREF: sub_426620+Fj
call sub_42664B
loc_426641: ; CODE XREF: sub_426620+1Aj
fxrstor ds:dword_463020
pop ebx
pop esi
retn
sub_426620 endp
The only difference is that i didn´t implemented yet the macro and api recognition, the DIS system (Digital DNA - similar as flair). But the raw interpretation of what is code/data is exactly the same.
It found only 3 errors. One with a XMM1 data size (I´ll fix that later - movd XMM1 D$esp+014 ; instead of movd XMM1 W$esp+014) and other with Jochen´s library here:
Code04295F7: I7:
test cl 05 | je L9> ; Code0429617
or B$ebp-034 04
lea eax D$edx*4+Data0429914 <----- This address does not exists !!!! It is a simple value. Something wrong with the linker happened ? Because masm (or jwasm) should be using that address or insert the proper error code. Idapro produces the same result. Olly too.
Code0429E28: K4:
lea eax D$edi*4+Data0429E6D <----- same as above. This address does not exists. How the linker assembled it ?
call eax
About the above 2 problems regarding the address, this is a problematic decision. If we simply disallow the disassembler to interpret any non referenced address as data/code, we are easily leading to errors on the rest of the code, because this value is both (a address or a immediate). So, it is more a matter of choice of interpretation then a error per se. Can be enhanced i guess, to overcome those problems by some linkers (i saw some of those things too in watcom files - rare to happens, fortunately), but i´ll think on a solution later.
But for your project of writing a disassembler keep in mind that what you need to do at the very 1st place is analyze the contents of the PE sections). So, the better technique is use maps (as i explained earlier in another post).
On the sections map you flag what is a resource (that is data), IAT (also data), data section (also ata), virtual section (as the name says...virtual data), the PE header itself (data....but unused most of the time), the MZ header (idem), etc..For that, the better is check the characteristics of the section, regardless the name it was labeled (.text., .data., .idata., .potato., .orange, whatever :icon_mrgreen: ). So, you must flag everything that you know that belongs to data on the very 1st place.
All that left can be either code or data and this is where the disassembler will works to try to separate data in the middle of code.
Quote from: guga on January 14, 2016, 10:42:20 AM
It found only 3 errors ... with Jochen´s library here:
Code04295F7: I7:
test cl 05 | je L9> ; Code0429617
or B$ebp-034 04
lea eax D$edx*4+Data0429914 <----- This address does not exists !!!! It is a simple value. Something wrong with the linker happened ? Because masm (or jwasm) should be using that address or insert the proper error code. Idapro produces the same result. Olly too.
Code0429E28: K4:
lea eax D$edi*4+Data0429E6D <----- same as above. This address does not exists. How the linker assembled it ?
call eax
Nice find, Gustavo :t
(the code works like a charm, of course. This is Float2Asc, tested a thousand times...)
#1:
test cl, 4+1 ; MbXmmR or MbXmmI
.if !Zero?
or byte ptr f2sInt, 4 ; prevent %u correction below
lea eax, [MovXmmStr+4*edx-80]
lea edx, f2sTmp64
call eax
test cl, 1 ; odd or even?
.if Zero?
fld REAL8 ptr [edx]
.else
fild QWORD ptr [edx]
.endif
.endifOlly:00406F48 ³. F6C1 05 ³test cl, 05
00406F4B ³. 74 1B ³jz short 00406F68
00406F4D ³. 804D CC 04 ³or byte ptr [ebp-34], 04
00406F51 ³. 8D0495 647240 ³lea eax, [edx*4+407264]
00406F58 ³. 8D55 F8 ³lea edx, [ebp-8]
00406F5B ³. FFD0 ³call eax
MovXmmStr:
movlps qword ptr [edx], xmm0 ; 4 bytes incl. ret
retn
movlps qword ptr [edx], xmm1
retnOlly:004072B4 Ú. 0F1302 movlps [edx], xmm0
004072B7 À. C3 retn
004072B8 Ú. 0F130A movlps [edx], xmm1
004072BB À. C3 retn
#2:
test dl, 4+1 ; MbXmmR or MbXmmI
.if !Zero?
test dl, 1
.if Zero?
fstp QWORD ptr [ebx]
.else
fistp REAL8 ptr [ebx] ; use integer for r format
.endif
lea eax, [MovXmm+4*edi-80] ; 7 bytes
call eax
.endifOlly:0040776D ³. F6C2 05 test dl, 05
00407770 ³. 74 14 jz short 00407786
00407772 ³. F6C2 01 test dl, 01
00407775 ³. 75 04 jnz short 0040777B
00407777 ³. DD1B fstp qword ptr [ebx]
00407779 ³. EB 02 jmp short 0040777D
0040777B ³> DF3B fistp qword ptr [ebx]
0040777D ³> 8D04BD C27740 lea eax, [edi*4+4077C2]
00407784 ³. FFD0 call eax
MovXmm:
movlps xmm0, qword ptr [ebx] ; 4 bytes each incl. ret
retnOlly:00407812 Ú. 0F1203 movlps xmm0, [ebx]
00407815 À. C3 retn
Thank you sinsi for the link. I downloaded the file but when I run it it crashes!
What do you want to check here ?
QuoteDirectory Name VirtAddr VirtSize
-------------------------------------- -------- --------
Export 00000000 00000000
Import 00002000 00000534
Resource 00004000 00034D90
Exception 00000000 00000000
Security 00000000 00000000
Base Relocation 00000000 00000000
Debug 00000000 00000000
Decription/Architecture 00000000 00000000
Machine Value (MIPS GP) 00000000 00000000
Thread Storage 00000000 00000000
Load Configuration 00000000 00000000
Bound Import 00000000 00000000
Import Address Table 00000000 00000000
Delay Import 00000000 00000000
COM Runtime Descriptor 00000000 00000000
(reserved) 00000000 00000000
No IAT
Only
QuoteImport 00002000 00000534
So I must look for the datas section (intialized and unitialized), the code section.
QuoteSize of Code 00000C00
Size of Initialized Data 00000A00
Size of Uninitialized Data 00000000
Address of Entry Point 00001880
Base of Code 00001000
Base of Data 00002000
Image Base 00400000
Here is the command lines that created the file analyzed by dumppe
QuoteC:\JWasm\Bin\JWASM.EXE -9 -Fl -c -zlf -zlp -zls -W3 -coff -Cp -nologo /I"C:\JWasm\Include" "ASD.asm"
C:\JWasm\Bin\JWlink.EXE FORMAT WINDOWS PE LIBPATH C:\JWasm\Lib OPTION SHOWDEAD OPTION NXCOMPAT OPTION NORELOCS OPTION ELIMINATE OPTION CHECKSUM RESOURCE ASD.res RUNTIME WINDOWS NAME ASD.exe FILE ASD.obj
You must analyze 1st what is data and what is code. The easier way to do that is seeing what are the contents of the data. For example, you start at the very 1st byte (MZ), this belongs to the IMAGE_DOS_HEADER structure. Since you know it is all data, you flag it as such. Then you see the pointers to it´s member. In case, the next pointer is the PE header.
You go there and do the same, flag all this structure as data.
Then do the same as before.... check the pointers of the members.
If they points to virtual data, you flag it as such on the previous created map file.
If they are data you do the same.
How you know they which members points to data only ? Check the contents of _IMAGE_DATA_DIRECTORY. All those members are pointers to data (in form of structures)
The next thing is analyzing the contents of the IMAGE_SECTION_HEADER.
You start by seeing if at that section the EntryPoint is there or not. If is there, it _may_ be code section. To make sure, you check for the characteristics of that section. (IMAGE_SCN_MEM_EXECUTE or IMAGE_SCN_CNT_CODE).
If they do not contains...then the section is formed by data only. You flag it as such.
Go to the next section. See if there is the EP. Ok....Ep found there...then this sections is the one you must target the disassembler since it may contains code + data
Do the same for the remainder sections (perhaps you already flagged them when you checked IMAGE_DATA_DIRECTORY).
See ? You need to follow the contents of the PE structures....discard everything that may be data 1st and then only what is left is what you need to analyze. Much much faster then do a byte by byte scan in all data since the very 1st one ('MZ').
Try dl Ida somewhere, it really will help you understand what it is needed to do.(Since you are having difficulties with Olly and RosAsm)
In the OptionalHeader I got the code and unitialized data
mov eax,lpNtHeader
INVOKE ImageRvaToSection,lpNtHeader,NULL,[eax].IMAGE_NT_HEADERS.OptionalHeader.BaseOfCode
mov lpSectionCode,eax
mov eax,lpNtHeader
INVOKE ImageRvaToSection,lpNtHeader,NULL,[eax].IMAGE_NT_HEADERS.OptionalHeader.BaseOfData
mov lpSectionUData,eax
Quote0x0513C180 41 55 54 4f 00 00 00 00 12 5b 00 00 00 10 00 00 00 5c 00 00 00 02 00 00 00 00 00 AUTO.....[.......\.........
0x0513C19B 00 00 00 00 00 00 00 00 00 20 00 00 60 2e 72 64 61 74 61 00 00 bd 10 00 00 00 70 ......... ..`.rdata.......p
0x0513C1B6 00 00 00 12 00 00 00 5e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 44 .......^..............@..@D
0x0513C1D1 47 52 4f 55 50 00 00 64 34 ab 00 00 90 00 00 00 06 00 00 00 70 00 00 00 00 00 00 GROUP..d4«..........p......
0x0513C1EC 00 00 00 00 00 00 00 00 40 00 00 c0 2e 72 73 72 63 00 00 00 c4 e1 17 00 00 d0 ab ........@..À.rsrc...Äá...Ы
0x0513C207 00 00 e2 17 00 00 76 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 60 8b ..â...v..............@..@`.
0x0513C222 54 24 2c 0f b6 0a 8b 7a 01 69 c9 01 01 01 01 66 0f 6e c1 66 0f 70 c0 00 8b c7 83 T$,.¶..z.iÉ....f.nÁf.pÀ..ǃ
I cannot check the EntryPoint because for a DLL it is 0, not always DllMain.
The only address I have not got again is for the initialized data.
I will go to see what are the "@DGROUP" and "rdata".
The pupils made a good work.
All is loaded in memory.
The section addresses are well known and checked as DATA or CODE
I join the part of the that make this part of the job :eusa_clap: :eusa_dance: :badgrin: