News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

RosAsm PDB Dumper v1.0

Started by guga, July 24, 2014, 04:17:28 PM

Previous topic - Next topic

guga

While you were answering, i found something that seems to be the path. I´ll read it now, many tks..this may helps

What i found so far is that....
The "AA" is due to 2 null terminated bytes, followed by the proper ascii char. So, this ??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@...is in fact the Unicode null terminated string "Cfm"

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@ = Cfm

The problem seems now to handle non 'AA' tokens, such as 'CF" for example....But, perhaps...all others falls on the situation described here:
http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm

Example:
??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@ ==> 5 'CF' ?? s

??_C@_1CK@LECDOEAE@?$AAC?$AAV?$AA_?$AAC?$AAA?$AAL?$AAL?$AA_?$AAR?$AAE?$AAS?$AAE?$AAR?$AAV?$AAE?$AAD?$AA?5?$AA?5?$AA?5?$AA?5?$AA?$AA@ = CV_CALL_RESERVED5555


Btw...according to the src i´m using as a reference the number '5' is related to a space
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

Hi Gustavo,

there seems to be a lot of different PDB files. For example:

  • Palmpilot Documents
  • Graphics Format by Adobe PhotoDeluxe
  • Protein Data Base format
  • C++ Data Base format
  • Accelrys Insight II file
  • Euro Sistemi Pegasus Multimedia Data Base
  • Asta Powerproject file (Project Management Software)
  • Powerbasic Debugger

Which format do you use?

Gunther
You have to know the facts before you can distort them.

guga

I´m using the ones from Microsoft debugger. The same ones that are created on VisualStudio when you compile an application.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

I've tried .pdb files from the MS symbol server, the sample guga posted and one of my programs (via link /debug).

PublicSymbol|0:0|0x0000518C|4:0x0000018C|4|23:__imp__SendMessageA@16|23:__imp__SendMessageA@16|1
PublicSymbol|0:0|0x00005140|4:0x00000140|4|21:__imp__ExitProcess@4|21:__imp__ExitProcess@4|2
PublicSymbol|0:0|0x00005174|4:0x00000174|4|22:__imp__GetMessageA@16|22:__imp__GetMessageA@16|3
PublicSymbol|0:0|0x00005188|4:0x00000188|4|26:__imp__RegisterClassExA@4|26:__imp__RegisterClassExA@4|4
PublicSymbol|1:0|0x00001190|1:0x00000190|6|20:_CreateWindowExA@48|20:_CreateWindowExA@48|5
PublicSymbol|0:0|0x00005110|4:0x00000110|4|24:__imp__GetStockObject@4|24:__imp__GetStockObject@4|6
PublicSymbol|1:0|0x000011C6|1:0x000001C6|6|19:_PostQuitMessage@4|19:_PostQuitMessage@4|7
PublicSymbol|0:0|0x00005114|4:0x00000114|4|23:GDI32_NULL_THUNK_DATA|23:GDI32_NULL_THUNK_DATA|8
PublicSymbol|1:0|0x000011D8|1:0x000001D8|6|20:_TranslateMessage@4|20:_TranslateMessage@4|9
PublicSymbol|1:0|0x000011B4|1:0x000001B4|6|18:_GetStockObject@4|18:_GetStockObject@4|10
PublicSymbol|0:0|0x00005178|4:0x00000178|4|26:__imp__DispatchMessageA@4|26:__imp__DispatchMessageA@4|11
PublicSymbol|0:0|0x00005148|4:0x00000148|4|26:KERNEL32_NULL_THUNK_DATA|26:KERNEL32_NULL_THUNK_DATA|12
PublicSymbol|0:0|0x00005144|4:0x00000144|4|26:__imp__GetModuleHandleA@4|26:__imp__GetModuleHandleA@4|13
PublicSymbol|1:0|0x000011BA|1:0x000001BA|6|15:_LoadCursorA@8|15:_LoadCursorA@8|14
PublicSymbol|0:0|0x0000519C|4:0x0000019C|0|24:USER32_NULL_THUNK_DATA|24:USER32_NULL_THUNK_DATA|15
PublicSymbol|1:0|0x00001196|1:0x00000196|6|19:_DefWindowProcA@16|19:_DefWindowProcA@16|16
PublicSymbol|1:0|0x0000119C|1:0x0000019C|6|20:_DispatchMessageA@4|20:_DispatchMessageA@4|17
PublicSymbol|0:0|0x000010B8|1:0x000000B8|136|7:_start|7:_start|18
PublicSymbol|0:0|0x00005180|4:0x00000180|4|19:__imp__LoadIconA@8|19:__imp__LoadIconA@8|19
PublicSymbol|1:0|0x000011AE|1:0x000001AE|6|20:_GetModuleHandleA@4|20:_GetModuleHandleA@4|20
PublicSymbol|0:0|0x00005190|4:0x00000190|4|26:__imp__TranslateMessage@4|26:__imp__TranslateMessage@4|21
PublicSymbol|0:0|0x00005000|4:0x00000000|20|27:__IMPORT_DESCRIPTOR_USER32|27:__IMPORT_DESCRIPTOR_USER32|22
PublicSymbol|0:0|0x00005194|4:0x00000194|4|21:__imp__LoadCursorA@8|21:__imp__LoadCursorA@8|23
PublicSymbol|1:0|0x000011A8|1:0x000001A8|6|16:_GetMessageA@16|16:_GetMessageA@16|24
PublicSymbol|0:0|0x00005198|4:0x00000198|4|26:__imp__CreateWindowExA@48|26:__imp__CreateWindowExA@48|25
PublicSymbol|0:0|0x0000517C|4:0x0000017C|4|25:__imp__DefWindowProcA@16|25:__imp__DefWindowProcA@16|26
PublicSymbol|0:0|0x00005184|4:0x00000184|4|25:__imp__PostQuitMessage@4|25:__imp__PostQuitMessage@4|27
PublicSymbol|1:0|0x000011CC|1:0x000001CC|6|20:_RegisterClassExA@4|20:_RegisterClassExA@4|28
PublicSymbol|0:0|0x00005014|4:0x00000014|20|29:__IMPORT_DESCRIPTOR_KERNEL32|29:__IMPORT_DESCRIPTOR_KERNEL32|29
PublicSymbol|1:0|0x000011A2|1:0x000001A2|6|15:_ExitProcess@4|15:_ExitProcess@4|30
PublicSymbol|0:0|0x00005028|4:0x00000028|20|26:__IMPORT_DESCRIPTOR_GDI32|26:__IMPORT_DESCRIPTOR_GDI32|31
PublicSymbol|1:0|0x000011C0|1:0x000001C0|6|13:_LoadIconA@8|13:_LoadIconA@8|32
PublicSymbol|1:0|0x000011D2|1:0x000001D2|6|17:_SendMessageA@16|17:_SendMessageA@16|33
PublicSymbol|0:0|0x0000503C|4:0x0000003C|20|25:__NULL_IMPORT_DESCRIPTOR|25:__NULL_IMPORT_DESCRIPTOR|34

guga

OK, i´m getting closer now:)

The literal string parser display things like:
Quote
PublicSymbol|0:0|0x00010BD4|2:0x00006BD4|8|42:??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@|4:Cfm|1067
PublicSymbol|0:0|0x0000EBF0|2:0x00004BF0|8|46:??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@|4: Os|1068
PublicSymbol|0:0|0x00010628|2:0x00006628|14|58:??_C@_1O@DLDHEOAC@?$AAI?$AAn?$AAt?$AAR?$AA9?$AA0?$AA?$AA@|7:IntR90|1069
PublicSymbol|0:0|0x0000F67C|2:0x0000567C|12|53:??_C@_1M@MBAMLEOM@?$AAA?$AAR?$AA1?$AA1?$AA6?$AA?$AA@|6:AR116|1070
PublicSymbol|0:0|0x00011EC0|2:0x00007EC0|14|58:??_C@_1O@EKFLKENN@?$AAx?$AAm?$AAm?$AA9?$AA_?$AA1?$AA?$AA@|7:xmm9_1|1071
PublicSymbol|0:0|0x00010A38|2:0x00006A38|14|58:??_C@_1O@GGCOEOPD@?$AAI?$AAn?$AAt?$AAR?$AA2?$AA5?$AA?$AA@|7:IntR25|1072
PublicSymbol|0:0|0x0000C1C4|2:0x000021C4|42|133:??_C@_1CK@LECDOEAE@?$AAC?$AAV?$AA_?$AAC?$AAA?$AAL?$AAL?$AA_?$AAR?$AAE?$AAS?$AAE?$AAR?$AAV?$AAE?$AAD?$AA?5?$AA?5?$AA?5?$AA?5?$AA?$AA@|21:CV_CALL_RESERVED    |1073
PublicSymbol|1:0|0x00004280|1:0x00003280|208|48:?DumpSymbolWithRVA@@YA_NPAUIDiaSession@@KPB_W@Z|83:bool __cdecl DumpSymbolWithRVA(struct IDiaSession *,unsigned long,wchar_t const *)|1074
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Vortex

Hello,

You can check Agner Fog's manual, it's a good one : Calling conventions for different C++ compilers and operating systems

http://agner.org/optimize/calling_conventions.pdf

http://agner.org/optimize/


Gunther

Quote from: Vortex on July 26, 2014, 06:11:35 AM
You can check Agner Fog's manual, it's a good one : Calling conventions for different C++ compilers and operating systems

Yes, that's a good and reliable source.  :t

Gunther
You have to know the facts before you can distort them.

guga

HI guys, many tks for the tips.

I read Agner´s fog article, but there is few it could help. Mainly it helped me on parsing numbers.

The good news is...i succeed to translate almost all Ascii strings. This literal strings uses some sort of table to encode and decode a wchar_t const. It is mainly based on this token "?$AA" . But, there are exceptions which i´m struggling to understand. For instance, i found a "HM" token that seems to be part of another table related to vertical bar "|" signs, but i did not succeed to check it yet.

Also, i found some damn encodage of Unicode strings with escape prefixes like this

const TCHAR * g_pszAltFontName001 = L"AR P\x30da\x30f3\x6977\x66f8\x4f53L"

which is encoded as

??_C@_1BG@MLNGONLK@?$AAA?$AAR?$AA?5?$AAP0?Z0?siwf?xOS?$AAL?$AA?$AA@

Btw...the encodage woks as:
"??_C$_" predefined signature for literal strings (string namespace)
1 = Identification of Ansi or Unicode. Ansi Char type = 0 , Unicode char = 1 {wchar_t (non negative integer)}
BG = The len of the sitring (in a form of a encoded decimal ascii)
@ = separator (CRC of string starts here)
MLNGONLK = 8 chars for the CRC. This string is also encoded
@ = End of CRC
.... String Data
@ = End of string


The parser can identifies Ascii (And i´m working on the exceptions of this "HM" stuff)

But..i´m still trying to understand and figure it out how the Unicode is formed. As far  saw it is mainly teh hexadecimal values with some tokens in between.

The literal string information can also be found (barelly) here


So far, the table described here, i extended to

[LiteralSpecialChars:
LiteralSpecialChars.Comma: B$ ","                  ; index 0
LiteralSpecialChars.ForwardSlash: B$ "/"           ; index 1
LiteralSpecialChars.BackSlash: B$ "\"              ; index 2
LiteralSpecialChars.Colon: B$ ":"                  ; index 3
LiteralSpecialChars.Period: B$ "."                 ; index 4
LiteralSpecialChars.Space: B$ " "                  ; index 5
LiteralSpecialChars.NewLine: B$ "n"                ; index 6
LiteralSpecialChars.Tab: B$ "t"                    ; index 7
LiteralSpecialChars.SingleQuote: B$ "'"            ; index 8
LiteralSpecialChars.Hyphen: B$ "-"                 ; index 9
LiteralSpecialChars.Asterix: B$ "*"                ; index 10
LiteralSpecialChars.OpenBracket: B$ "["            ; index 11
LiteralSpecialChars.OpenAngleBracket: B$ "<"       ; index 12
LiteralSpecialChars.CloseBracket: B$ "]"           ; index 13
LiteralSpecialChars.CloseAngleBracket: B$ ">"      ; index 14
LiteralSpecialChars.QuestionMark: B$ "?"]          ; index 15



The new test version is here
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

I´m quite retrieving the necessary info. For now, i developed a function to convert the literal string numercial system. (Same as hex to int etc)....It is formed by the letters "A" to "P" to compute CRC or len of the string.

The function is:


String = The string to be converted
StringLen. = The len of the string (not including the null terminated byte)
Base = The base to be converted. Possible equates used can be:
BASE_HEX = 16
BASE_DEC = 10
BASE_OCT = 8
BASE_FOUR  =4
BASE_BIN  =2
pErr = Check for error. Pointert to a int value. True f the function succeeds. False, otherwise

Return Value in eax.
Remarks: If the function suceeds, pErr = TRUE. If it fails, return False.

Proc LiteralAsciiBaseEx:
    Arguments @String, @StringLen, @Base, @pErr
    Uses esi, ebx, ecx, edi

    mov esi D@String, eax 0, ebx 0
    mov ecx D@StringLen
    mov edi D@pErr | mov D$edi &TRUE

    While ecx <> 0
        mov eax ebx | mul D@Base | mov ebx eax, eax 0
        mov al B$esi
        If_And al >= 'A', al <= 'J'
            sub al 'A' | add al '0'
        Else_if al >= 'K', al <= 'P'
            sub al 'K' | add al 'A'
        Else
            xor eax eax
            mov D$edi &FALSE | ExitP
        End_If

        sub al '0'
          ; Cases of Hexa Notation:
        On al > 9, sub al 7
        add ebx eax | inc esi
        dec ecx
    End_While

    mov eax ebx

EndP


Example of usage:


Examples of usage:
Example a)

[BASE_HEX 16, BASE_DEC 10, BASE_OCT 8, BASE_FOUR 4, BASE_BIN 2]

[Text: B$ "CHHF", 0]
[IsError: D$ 0]
    call LiteralAsciiBaseEx Text, 4, BASE_HEX, IsError

In eax the returned value is 02775 (in hex)


Example b)


[BASE_HEX 16, BASE_DEC 10, BASE_OCT 8, BASE_FOUR 4, BASE_BIN 2]

[Text: B$ "P", 0]
[IsError: D$ 0]
    call LiteralAsciiBaseEx Text, 1, BASE_HEX, IsError

In eax the returned value is 0F (in hex)

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Ok guys....I guess i succeeded to decode almost the full string literal tokens. It was made by VS with some tables o encode the literal string texts as wchar, and char type (L and R). I needed to rebuild my functions from scratch, because the older functions was wrong. I had to create a app in VS containing 65536 unicode strings and analysed their sequences to understand how the tables was made.

The decoding of the CRC is ok now. It also can decode the correct CRC of the string. (Although i didn´t made any routine to decrypt the CRC. It mainly decode the M$char types to a full dword that is the CRC Value. The CRC seems to be Adler32, but, i´m not sure about that yet)

And example of decoding a char type is:
PublicSymbol|0:0|0x0002D38C|2:0x0000188C|8|28:??_C@_07CJME@H?3mm?3ss?$AA@|8:const char* Sz_StringId2236 = "H:mm:ss"|2236
PublicSymbol|0:0|0x00093F20|1:0x00092F20|32|95:?Change_State_WaitForStart@CXaSuperiorConn_Starting_State@@QAEXPAVCXaSuperiorConn_Starting@@@Z|116:const char* Sz_StringId11781 = "public: void __thiscall CXaSuperiorConn_Starting_State::Change_State_WaitForStart(class CXaSuperiorConn_Starting *)"|11781

And the wchar type is:
PublicSymbol|0:0|0x00013620|2:0x00000620|10|48:??_C@_19FLGNNEPJ@?$AAA?$AAT?$AAL?$AA?3?$AA?$AA@|5:const wchar_t* Sz_StringWId4569 = "ATL:"|4569
PublicSymbol|0:0|0x00005404|1:0x00004404|18|68:??_C@_1BC@PAPC@?$AA?$CK?$AAn?$AAE?$AAC?$AA1?$AAF?$AA0?$AA0?$AA?$AA@|9:const wchar_t* Sz_StringWId5 = "*nEC1F00"|5
PublicSymbol|0:0|0x00264B38|3:0x00132B38|36|123:??_C@_1CE@NKACHKJK@?$AAM?$AAe?$AAr?$AAd?$AAa?$AA3?$AA7?$AA0?$AA2?$AA6?$AA?$CI?$AA9?$AA0?$AAA?$AA2?$AA?$CJ?$JA?$KC?$AA?$AA@|23:const wchar_t* Sz_StringWId66214 = "Merda37026(90A2)\x903f"|66214

Please, can someone test it to see if it hangs etc.

Apparently it is missing one table for mshtml.pdb that causes the app to loop infinitely, but so far it is almost done.
The mshtml containg string withouth the proper tokens, thyat are causing the loop. Such as:
??_C@_1BK@DALFAAEA@?$AAG?$AAu?$AAg?$AAa?$AA?5?$AAP0?Z0?siwf?xOS?$AAL?$AA?$AA@
The above literal string is a unicode font of type "ARP " followed by the unicode hexadecimal string values. The problem here is the absense of the tokens. It seems o be in another table that i´m currently looking for
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

#25
Btw...If someone wants to see the src code, open the file in RosAsm and search for the function "GetFullUndecorate". It is located under the TITLE tab "PUBLICSYMBOLPARSING"


Proc GetFullUndecorate:
    Arguments @ptrSymbol, @pDecoratedName, @pOutput, @LiteralStrType
    Local @OriginalStringLen, @pCRCValue
    Uses edi, esi, ecx, edx, ebx

    mov esi D@pOutput
    mov edi D@pDecoratedName
    ..If_And D$edi = '??_C', W$edi+4 = '@_' ; <--------- Parser of literal strings symbols. (This is the signature for the literal symbol)
        lea ebx D@OriginalStringLen | mov D$ebx 0
        lea ecx D@pCRCValue | mov D$ecx 0
        call LiteralSringUndecorate edi, Sz_Symbol_Temp, ebx, ecx, D@LiteralStrType
        mov D$esi+CDiaBSTR.SzSymbolNameDis Sz_Symbol_Temp
        mov D$esi+CDiaBSTR.SymbolLenDis eax
    ..Else ; <--------- Parser of regular symbols with MsDia Api
        call GetUndecoratedSymbolName esi, D@ptrSymbol
        mov ecx D$UndecoratedName.Len
        mov edx D$UndecoratedName.Name
    ..End_If
EndP


The function below is my decoder of literal strings (char and wchar types). I ommited here the rest of the functions, because it have several routines. See the embeded source opening the file in RosAsm to analyse it.

;;
??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
??_C@_ = String token
1 = char type 0 = char, 1 = wchar_t (non negative integer)
7 = string len including the null termination byte
CKCOCGAB@ = CRC
?$AAC = s
?$AAf = t
?$AAm = (
?$AA = 4 065 04
?$AA@ = )
;;

; the len of undecorated string in eax. If 0 the functyion fails.
;;
    Refs:
    http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm?tx=14,16
    calling_conventions.pdf
    http://agner.org/optimize/calling_conventions.pdf
   
;;

Proc LiteralSringUndecorate:
    Arguments @pInString, @pOutput, @pOriginalStringLen, @pCRC, @LiteralStrType
    Local @IsUnicode, @StringLiteralFormErr
    Uses esi, edi, ecx, ebx


    mov esi D@pInString
    mov edi D@pOutput
    mov D@IsUnicode &FALSE

    xor eax eax
    If_And D$esi <> '??_C', W$esi+4 <> '@_'
        ExitP
    End_If

    ;  bypass literal token "??_C@_"
    mov ebx D@LiteralStrType
    add esi 6
    If B$esi = '1'
        mov D$ebx LITERAL_STRING_WCHAR
        mov D@IsUnicode &TRUE
    Else_If B$esi <> '0' ; The literal string format is unknown. Return 0 if failure
        mov D$ebx LITERAL_STRING_CHAR
        mov eax 0 | ExitP
    Else
        mov D$ebx LITERAL_STRING_CHAR
    End_If

    inc esi ; bypass unicode/ascii check

    ; calculate original len of the string
    mov ebx D@pOriginalStringLen
    ..If_And B$esi >= '0', B$esi <= '9'
        movzx eax B$esi
        sub al '0' | mov D$ebx eax
        inc esi ; bypass the len, and go to start of CRC. (On this case, the CRC is not preceeeded with '@' char)
    ..Else
        ;1st we check the lenght of the "len" seeking for the next '@' char
        mov edx esi
        xor ecx ecx
        While B$edx+ecx <> '@'
            inc ecx
             On ecx > 8, jmp L1> ; for safety, if len if bigger then 8, exit loop
        End_While
        L1:
        .If_Or ecx = 0, ecx > 8 ; No len token '@'. Error
            xor eax eax | ExitP
        .Else
            lea edx D@StringLiteralFormErr | mov D$edx 0
            call LiteralAsciiBaseEx esi, ecx, BASE_HEX, edx
            If D@StringLiteralFormErr = &FALSE ; The format of CRC is incorrect. Exit
                xor eax eax | ExitP
            Else
                mov D$ebx eax
            End_If
        .End_If

        add esi ecx ; got to the end of the "len" string
        inc esi ; bypass '@' char and go to the start of CRC
    ..End_If

    ; Now we compute the CRC.

    ; 1st we check the len of the string related to the CRC
    mov edx esi
    xor ecx ecx
    While B$edx+ecx <> '@'
        inc ecx
        On ecx > 8, jmp L1> ; for safety, if len if bigger then 8, exit loop
    End_While
    L1:

    .If_Or ecx = 0, ecx > 8 ; No ending CRC token '@'. Error
        xor eax eax | ExitP
    .Else
        mov ebx D@pCRC
        lea edx D@StringLiteralFormErr | mov D$edx 0
        call LiteralAsciiBaseEx esi, ecx, BASE_HEX, edx
        If D@StringLiteralFormErr = &FALSE ; The format of CRC is incorrect. Exit
            xor eax eax | ExitP
        Else
            mov D$ebx eax
        End_If
    .End_If

    add esi ecx ; got to the end of the "CRC" string
    inc esi ; bypass '@' char and go to the start of data

    If D@IsUnicode = &TRUE ; const wchar_t

        call DemangleLiteralUnicodeString esi, edi

    Else ; const char

        call DemangleLiteralCharString esi, edi

    End_If

EndP


decoder of the "A" to "P" table (A replacement of the regular inttohex conversions

; used for tables "A" to "P". Len is bigger then 9
;;

Dec     Hex     Literal
0       0           A
1       1           B
2       2           C
3       3           D
4       4           E
5       5           F
6       6           G
7       7           H
8       8           I
9       9           J
10      A           K
11      B           L
12      C           M
13      D           N
14      E           O
15      F           P

;;

; this function does not include the null termination byte
Proc LiteralAsciiBaseEx:
    Arguments @String, @StringLen, @Base, @pErr
    Uses esi, ebx, ecx, edi

    mov esi D@String, eax 0, ebx 0
    mov ecx D@StringLen
    mov edi D@pErr | mov D$edi &TRUE

    While ecx <> 0
        mov eax ebx | mul D@Base | mov ebx eax, eax 0
        mov al B$esi
        If_And al >= 'A', al <= 'J'
            sub al 'A' | add al '0'
        Else_if_And al >= 'K', al <= 'P'
            sub al 'K' | add al 'A'
        Else
            xor eax eax
            mov D$edi &FALSE | ExitP
        End_If

        sub al '0'
          ; Cases of Hexa Notation:
        On al > 9, sub al 7
        add ebx eax | inc esi
        dec ecx
    End_While

    mov eax ebx

EndP


The function below  shows the normal symbols under MSDia

Proc GetUndecoratedSymbolName:
    Arguments @pCDiaBSTR, @ptrSymbol
    Local @bstrName
    Uses esi, ecx, ebx, edx, edi

    mov edi D@pCDiaBSTR
    mov D$edi+CDiaBSTR.SzSymbolNameDis 0
    mov D$edi+CDiaBSTR.SymbolLenDis 0
    mov D@bstrName 0
    lea ebx D@bstrName
    icall DIA_SYMBOL_GET_UNDECORATEDNAMEEX D@ptrSymbol, &UNDNAME_COMPLETE, ebx
    .If eax = &S_OK
        call WriteTempSymbol D@pCDiaBSTR, D@bstrName
        mov eax &TRUE
    .Else
        xor eax eax
    .End_If

EndP
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

OK, executable working fine. I´m preparing the dll and the tutorials right now :)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

You're a hard working man, Gustavo.  :t

Gunther
You have to know the facts before you can distort them.

guga

Tks Gunther  :t :t

I´m doing my best to the library be the correct as possible, to allow other programmers to writes their parsers based on MSdia the easier as possible.

With the dll i´m building it will be possible to write a converter for pdb to inc (to be used in C, masm, rosasm, Fasm, Nasm etc), and also the same dll can be used as a starting point for a symbolic parser for debuggers.

Since JJ helped with a fix on the memory allocation function, i can trace other errors on huge files i have here. I loaded today a 150 Mb pdb file from (windows 8 ) and the file did not crashed, it simply showed me that on that file, there is no content that msdia120 could be able to parse.

Also, on another file i´m testing (23 Mb), i found a small bug on ws_printA that is being unable to load a string bigger then 1024 bytes. For this particular problem, i´m porting StringCbPrintf to RosAsm and will also insert it on the dll as an export function :)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

A bit more stable version. There are still some minor errors on the Unicode strings chars. When it encounters a \xABDC, it may identify it correctly, but, in some cases, the result is not accurated. Nevertheless i´m uploading here the last version and finishing the tutorial before building the dll version.

The output can now identify public symbols (literal strings) as what they are.

So, when a public symbol is identified as Ascii, the start of the text token is "AsciiString". When the identification is for unicode, it is "UnicodeString". In any other cases, it simply identifies the public symbol as "PublicSymbol".

Note: A unicode string may be represented as Latin unicode (With the common Ascii table), or with foreign chars (Asian etc etc). For non Latin chars, the identification is always with "\xABCD" Where ABCD is the hexadecimal value of the string. Exactly as when you use it in C
Examples:

AsciiString|0xEBE74A0E|0x0000318C|2:0x0000018C|14|39:??_C@_0O@OLOHEKAO@The?5answer?5is?$AA@|14:const char* Sz_StringId1469 = 'The answer is'|1469


UnicodeString|0x470AF02B|0x000031B0|2:0x000001B0|96|187:??_C@_1GA@EHAKPACL@?$AAf?$AA?3?$AA?2?$AAd?$AAd?$AA?2?$AAv?$AAc?$AAt?$AAo?$AAo?$AAl?$AAs?$AA?2?$AAc?$AAr?$AAt?$AA_?$AAb?$AAl?$AAd?$AA?2?$AAs?$AAe?$AAl?$AAf?$AA_?$AAx?$AA8?$AA6?$AA?2?$AAc@|32:const wchar_t* Sz_StringWId1466 = 'f:\dd\vctools\crt_bld\self_x86\c'|1466


UnicodeString|0xDA027A9A|0x00264B38|3:0x00132B38|36|123:??_C@_1CE@NKACHKJK@?$AAM?$AAe?$AAr?$AAd?$AAa?$AA3?$AA7?$AA0?$AA2?$AA6?$AA?$CI?$AA9?$AA0?$AAA?$AA2?$AA?$CJ?$JA?$KC?$AA?$AA@|23:const wchar_t* Sz_StringWId66214 = 'Merda37026(90A2)\x90a2'|66214


PublicSymbol|1:0|0x00002090|1:0x00001090|95|20:__ValidateImageBase|20:__ValidateImageBase|1487

PublicSymbol|1:0|0x00001150|1:0x00000150|93|31:?StringCbPrintfA@@YAJPADIPBDZZ|67:long __cdecl StringCbPrintfA(char *,unsigned int,char const *,...)|1378
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com