The MASM Forum

Projects => Rarely Used Projects => RosAsm => Topic started by: guga on July 24, 2014, 04:17:28 PM

Title: RosAsm PDB Dumper v1.0
Post by: guga on July 24, 2014, 04:17:28 PM
A small app that parses information from pdb files.

This version is still in executable form to testing purposes. If someone can test it in other OSes i´ll appreciate it. It works on Xp, didn´t tested it on WinNT, Windows Vista, Windows 7, 8 etc

I´m currently building a tutorial about the exported functions inside, since the app will be turned onto a dll and you will need a tutorial about the parsed information.

After that it will be more easy to anyone build a pdb2inc, for example  :t

Title: Re: RosAsm PDB Dumper v1.0
Post by: sinsi on July 24, 2014, 04:48:20 PM
Windows 8.1 Pro x64, nothing happens with any pdb I choose, even running as admin.
Also, the title "C/C++ Comment Remover" is a mistake, yes?
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 24, 2014, 05:51:07 PM
Yeah. the comment remover is just the title from an older app i built upon this. I forgot to fix the title.

About pdb from windows 8. I have some of those here (and windows7), it seems that since windows7 all the pdb´s contains are the public symbols itself (i.e.:  the variable names)., compiland information, module names and also the FPO data. It does not seems to contains enumeration values, typedefs and UDTs.

I´ll later include the parsing of the public symbols, although it won´t be helpfull information to build a pdb2inc.

Try loading a Xp symbol or anyother you may created in VS2008/2010 and see if it is there. Or even better, can u please see if this symbol i´m uploaling the parser works for on windows 8 ? Here it is parsing correctly all the symbols that contains enums, typedefs and udts

simpleDBGTest.zip - dbg on windosxp
1394bus.zip - result of the CV SDK compiled. I tested this on a windows 7 pdb and there is no enum, udts, typedefs
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 24, 2014, 06:14:16 PM
Btw...this newer version should display the proper error msg
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 01:09:25 AM
New version. Small test to display the public symbols. I´m tired so, it won´t do much except display them and their indexes. No undecorate or displaying their offsets yet..I´ll do it at night.
Title: Re: RosAsm PDB Dumper v1.0
Post by: Gunther on July 25, 2014, 04:08:29 AM
Hi Gustavo,

your application needs ROSMEM.dll to run properly, doesn't it?

Gunther
Title: Re: RosAsm PDB Dumper v1.0
Post by: Vortex on July 25, 2014, 05:48:45 AM
True. The application needs that DLL.
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 08:18:34 AM
Yes. it is located on the 1st post. These other posts (the smaller zip) contains only the updates of the main executable.
On 1st post the zip file contains:
Title: Re: RosAsm PDB Dumper v1.0
Post by: Gunther on July 25, 2014, 08:21:27 AM
Thank you Erol and Gustavo. I'll test it out.

Gunther
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 03:57:00 PM
OK, guys, i´m suceeding to parse the public symbols and i´m making the proper tokens for that.

One question. How to undecorate a string literal name ?

I mean...when using UnDecorateSymbolName Api (or .__unDNameEx from msvcrt and  others) , it is ok to undecorate functions, structures etc...but,....in what concern strings (Unicode and ascii) it fails badly. All it exports is "string". I wanted the whole string back.

I tried the api DsCrackUnquotedMangledRdn and it also fails.

The only place i found some valid info about Microsoft literal string mangling is here (https://llvm.org/svn/llvm-project/cfe/trunk/lib/AST/MicrosoftMangle.cpp) at the function.
void MicrosoftMangleContextImpl::mangleStringLiteral(const StringLiteral *SL, raw_ostream &Out) {. Also, some info for demangling is found here (http://www.opensource.apple.com/source/libcppabi/libcppabi-26/src/cxa_demangle.cpp)

This function perfectly describes how to encode a string to display things like this:

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@

The question is..how to decrypt it ????

The encryption algorithm from llvm project (link above) for M$ literal string mangling is this:
Code: [Select]
void MicrosoftMangleContextImpl::mangleStringLiteral(const StringLiteral *SL,
                                                     raw_ostream &Out) {
  // <char-type> ::= 0   # char
  //             ::= 1   # wchar_t
  //             ::= ??? # char16_t/char32_t will need a mangling too...
  //
  // <literal-length> ::= <non-negative integer>  # the length of the literal
  //
  // <encoded-crc>    ::= <hex digit>+ @          # crc of the literal including
  //                                              # null-terminator
  //
  // <encoded-string> ::= <simple character>           # uninteresting character
  //                  ::= '?$' <hex digit> <hex digit> # these two nibbles
  //                                                   # encode the byte for the
  //                                                   # character
  //                  ::= '?' [a-z]                    # \xe1 - \xfa
  //                  ::= '?' [A-Z]                    # \xc1 - \xda
  //                  ::= '?' [0-9]                    # [,/\:. \n\t'-]
  //
  // <literal> ::= '??_C@_' <char-type> <literal-length> <encoded-crc>
  //               <encoded-string> '@'
  MicrosoftCXXNameMangler Mangler(*this, Out);
  Mangler.getStream() << "\01??_C@_";

  // <char-type>: The "kind" of string literal is encoded into the mangled name.
  // TODO: This needs to be updated when MSVC gains support for unicode
  // literals.
  if (SL->isAscii())
    Mangler.getStream() << '0';
  else if (SL->isWide())
    Mangler.getStream() << '1';
  else
    llvm_unreachable("unexpected string literal kind!");

  // <literal-length>: The next part of the mangled name consists of the length
  // of the string.
  // The StringLiteral does not consider the NUL terminator byte(s) but the
  // mangling does.
  // N.B. The length is in terms of bytes, not characters.
  Mangler.mangleNumber(SL->getByteLength() + SL->getCharByteWidth());

  // We will use the "Rocksoft^tm Model CRC Algorithm" to describe the
  // properties of our CRC:
  //   Width  : 32
  //   Poly   : 04C11DB7
  //   Init   : FFFFFFFF
  //   RefIn  : True
  //   RefOut : True
  //   XorOut : 00000000
  //   Check  : 340BC6D9
  uint32_t CRC = 0xFFFFFFFFU;

  auto UpdateCRC = [&CRC](char Byte) {
    for (unsigned i = 0; i < 8; ++i) {
      bool Bit = CRC & 0x80000000U;
      if (Byte & (1U << i))
        Bit = !Bit;
      CRC <<= 1;
      if (Bit)
        CRC ^= 0x04C11DB7U;
    }
  };

  auto GetLittleEndianByte = [&Mangler, &SL](unsigned Index) {
    unsigned CharByteWidth = SL->getCharByteWidth();
    uint32_t CodeUnit = SL->getCodeUnit(Index / CharByteWidth);
    unsigned OffsetInCodeUnit = Index % CharByteWidth;
    return static_cast<char>((CodeUnit >> (8 * OffsetInCodeUnit)) & 0xff);
  };

  auto GetBigEndianByte = [&Mangler, &SL](unsigned Index) {
    unsigned CharByteWidth = SL->getCharByteWidth();
    uint32_t CodeUnit = SL->getCodeUnit(Index / CharByteWidth);
    unsigned OffsetInCodeUnit = (CharByteWidth - 1) - (Index % CharByteWidth);
    return static_cast<char>((CodeUnit >> (8 * OffsetInCodeUnit)) & 0xff);
  };

  // CRC all the bytes of the StringLiteral.
  for (unsigned I = 0, E = SL->getByteLength(); I != E; ++I)
    UpdateCRC(GetLittleEndianByte(I));

  // The NUL terminator byte(s) were not present earlier,
  // we need to manually process those bytes into the CRC.
  for (unsigned NullTerminator = 0; NullTerminator < SL->getCharByteWidth();
       ++NullTerminator)
    UpdateCRC('\x00');

  // The literature refers to the process of reversing the bits in the final CRC
  // output as "reflection".
  CRC = llvm::reverseBits(CRC);

  // <encoded-crc>: The CRC is encoded utilizing the standard number mangling
  // scheme.
  Mangler.mangleNumber(CRC);

  // <encoded-string>: The mangled name also contains the first 32 _characters_
  // (including null-terminator bytes) of the StringLiteral.
  // Each character is encoded by splitting them into bytes and then encoding
  // the constituent bytes.
  auto MangleByte = [&Mangler](char Byte) {
    // There are five different manglings for characters:
    // - [a-zA-Z0-9_$]: A one-to-one mapping.
    // - ?[a-z]: The range from \xe1 to \xfa.
    // - ?[A-Z]: The range from \xc1 to \xda.
    // - ?[0-9]: The set of [,/\:. \n\t'-].
    // - ?$XX: A fallback which maps nibbles.
    if (isIdentifierBody(Byte, /*AllowDollar=*/true)) {
      Mangler.getStream() << Byte;
    } else if (isLetter(Byte & 0x7f)) {
      Mangler.getStream() << '?' << static_cast<char>(Byte & 0x7f);
    } else {
      switch (Byte) {
        case ',':
          Mangler.getStream() << "?0";
          break;
        case '/':
          Mangler.getStream() << "?1";
          break;
        case '\\':
          Mangler.getStream() << "?2";
          break;
        case ':':
          Mangler.getStream() << "?3";
          break;
        case '.':
          Mangler.getStream() << "?4";
          break;
        case ' ':
          Mangler.getStream() << "?5";
          break;
        case '\n':
          Mangler.getStream() << "?6";
          break;
        case '\t':
          Mangler.getStream() << "?7";
          break;
        case '\'':
          Mangler.getStream() << "?8";
          break;
        case '-':
          Mangler.getStream() << "?9";
          break;
        default:
          Mangler.getStream() << "?$";
          Mangler.getStream() << static_cast<char>('A' + ((Byte >> 4) & 0xf));
          Mangler.getStream() << static_cast<char>('A' + (Byte & 0xf));
          break;
      }
    }
  };

  // Enforce our 32 character max.
  unsigned NumCharsToMangle = std::min(32U, SL->getLength());
  for (unsigned I = 0, E = NumCharsToMangle * SL->getCharByteWidth(); I != E;
       ++I)
    MangleByte(GetBigEndianByte(I));

  // Encode the NUL terminator if there is room.
  if (NumCharsToMangle < 32)
    for (unsigned NullTerminator = 0; NullTerminator < SL->getCharByteWidth();
         ++NullTerminator)
      MangleByte(0);

  Mangler.getStream() << '@';}


But....i failed to understand how to decode this stuff.

I also found something for gcc here (http://stackoverflow.com/questions/281818/unmangling-the-result-of-stdtype-infoname), but no clue how to use or port it
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 04:59:16 PM
Ok, i found something easier that seems valid

http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm?tx=14,16
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 06:01:55 PM
Newer update with undecorate symbols. I´m currently working on undecorating the literal strings. Once i finish i can also finish the tutorial on how to use. Dont´forget to also dl the necessary dlls from the 1st post

Also, it seems  need to release the allocated strings in DIA_SYMBOL_GET_NAME

Abolut the public symbols...A brief description is like.

Token | Code/Data | target RVA | targetsection:targetoffset |SymbolLenght | Symbol Name Len | Symbol Name Decorated | Undecorated Sym Len | Symbol Name Undecorated | SymIndexID
. Later once i finish this, i can write the proper tutorial

Btw...i´m only retrieving the offset, RVA and section info, so i can compare the contents on those addresses with the info gattered on the pdb. So i can try to correctly interpret when a string is a wchar, wchar_t, ascii, pascal, etc etc
Title: Re: RosAsm PDB Dumper v1.0
Post by: sinsi on July 25, 2014, 07:36:31 PM
OK, 7e6 is working, here's an example from comctl32.pdb
Code: [Select]
PublicSymbol|1:0|0x0006DADF|1:0x0005704C|229|39:?OnEnableGroupView@CListView@@QAEJ_N@Z|59:public: long __thiscall CListView::OnEnableGroupView(bool)|1
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 08:33:11 PM
Tks

btw...if you or others have any idea how to decode the string literals, let me know. I read the info on http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm and on the other code i provided from https://llvm.org/svn/llvm-project/cfe/trunk/lib/AST/MicrosoftMangle.cpp but i´m clueless.

I have no idea how to decode those literal strings.
All i barelly understood is that it uses some sort of CRC to encode them. So, things like this:

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
Are interpreted as:

??_C@_ = String token. With this token we can identify if it is a string
1 => 0 = ansi char,  1 = wchar_t (non negative integer)
7 = string len including the null termination byte
CKCOCGAB@ = CRC (The ending @ token is just to identify. All we need is the 1st 8 digits that represents the CRC
?$AAC = what encoded char is that ? AAC represents what ?
?$AAf = idem above AAf is what ?
?$AAm = idem above
?$AA = idem above
?$AA@ = idem above (the last @ is just the encoded ending of the string)...So AA represents what ?

"?$" identifies the encoded char. So they are not computed.

On the damn example above all i understood is that it is a Unicode string with len of 7 words (included the null terminated byte)
Title: Re: RosAsm PDB Dumper v1.0
Post by: sinsi on July 25, 2014, 09:01:34 PM
Have a look at http://blogs.msdn.com/b/oldnewthing/ (http://blogs.msdn.com/b/oldnewthing/). Search for "decorated", there's a fair bit of information.
One thing led to another and I ended up on MSDN - UnDecorateSymbolName (http://msdn.microsoft.com/en-us/library/windows/desktop/ms681400(v=vs.85).aspx). Maybe it will help.
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 09:29:46 PM
While you were answering, i found something that seems to be the path. I´ll read it now, many tks..this may helps

What i found so far is that....
The "AA" is due to 2 null terminated bytes, followed by the proper ascii char. So, this ??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@...is in fact the Unicode null terminated string "Cfm"

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@ = Cfm

The problem seems now to handle non 'AA' tokens, such as 'CF" for example....But, perhaps...all others falls on the situation described here:
http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm
 
Example:
??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@ ==> 5 'CF' ?? s

??_C@_1CK@LECDOEAE@?$AAC?$AAV?$AA_?$AAC?$AAA?$AAL?$AAL?$AA_?$AAR?$AAE?$AAS?$AAE?$AAR?$AAV?$AAE?$AAD?$AA?5?$AA?5?$AA?5?$AA?5?$AA?$AA@ = CV_CALL_RESERVED5555


Btw...according to the src i´m using as a reference the number '5' is related to a space
Title: Re: RosAsm PDB Dumper v1.0
Post by: Gunther on July 25, 2014, 09:39:10 PM
Hi Gustavo,

there seems to be a lot of different PDB files. For example:

Which format do you use?

Gunther
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 25, 2014, 09:42:28 PM
I´m using the ones from Microsoft debugger. The same ones that are created on VisualStudio when you compile an application.
Title: Re: RosAsm PDB Dumper v1.0
Post by: sinsi on July 25, 2014, 10:12:08 PM
I've tried .pdb files from the MS symbol server, the sample guga posted and one of my programs (via link /debug).
Code: [Select]
PublicSymbol|0:0|0x0000518C|4:0x0000018C|4|23:__imp__SendMessageA@16|23:__imp__SendMessageA@16|1
PublicSymbol|0:0|0x00005140|4:0x00000140|4|21:__imp__ExitProcess@4|21:__imp__ExitProcess@4|2
PublicSymbol|0:0|0x00005174|4:0x00000174|4|22:__imp__GetMessageA@16|22:__imp__GetMessageA@16|3
PublicSymbol|0:0|0x00005188|4:0x00000188|4|26:__imp__RegisterClassExA@4|26:__imp__RegisterClassExA@4|4
PublicSymbol|1:0|0x00001190|1:0x00000190|6|20:_CreateWindowExA@48|20:_CreateWindowExA@48|5
PublicSymbol|0:0|0x00005110|4:0x00000110|4|24:__imp__GetStockObject@4|24:__imp__GetStockObject@4|6
PublicSymbol|1:0|0x000011C6|1:0x000001C6|6|19:_PostQuitMessage@4|19:_PostQuitMessage@4|7
PublicSymbol|0:0|0x00005114|4:0x00000114|4|23:GDI32_NULL_THUNK_DATA|23:GDI32_NULL_THUNK_DATA|8
PublicSymbol|1:0|0x000011D8|1:0x000001D8|6|20:_TranslateMessage@4|20:_TranslateMessage@4|9
PublicSymbol|1:0|0x000011B4|1:0x000001B4|6|18:_GetStockObject@4|18:_GetStockObject@4|10
PublicSymbol|0:0|0x00005178|4:0x00000178|4|26:__imp__DispatchMessageA@4|26:__imp__DispatchMessageA@4|11
PublicSymbol|0:0|0x00005148|4:0x00000148|4|26:KERNEL32_NULL_THUNK_DATA|26:KERNEL32_NULL_THUNK_DATA|12
PublicSymbol|0:0|0x00005144|4:0x00000144|4|26:__imp__GetModuleHandleA@4|26:__imp__GetModuleHandleA@4|13
PublicSymbol|1:0|0x000011BA|1:0x000001BA|6|15:_LoadCursorA@8|15:_LoadCursorA@8|14
PublicSymbol|0:0|0x0000519C|4:0x0000019C|0|24:USER32_NULL_THUNK_DATA|24:USER32_NULL_THUNK_DATA|15
PublicSymbol|1:0|0x00001196|1:0x00000196|6|19:_DefWindowProcA@16|19:_DefWindowProcA@16|16
PublicSymbol|1:0|0x0000119C|1:0x0000019C|6|20:_DispatchMessageA@4|20:_DispatchMessageA@4|17
PublicSymbol|0:0|0x000010B8|1:0x000000B8|136|7:_start|7:_start|18
PublicSymbol|0:0|0x00005180|4:0x00000180|4|19:__imp__LoadIconA@8|19:__imp__LoadIconA@8|19
PublicSymbol|1:0|0x000011AE|1:0x000001AE|6|20:_GetModuleHandleA@4|20:_GetModuleHandleA@4|20
PublicSymbol|0:0|0x00005190|4:0x00000190|4|26:__imp__TranslateMessage@4|26:__imp__TranslateMessage@4|21
PublicSymbol|0:0|0x00005000|4:0x00000000|20|27:__IMPORT_DESCRIPTOR_USER32|27:__IMPORT_DESCRIPTOR_USER32|22
PublicSymbol|0:0|0x00005194|4:0x00000194|4|21:__imp__LoadCursorA@8|21:__imp__LoadCursorA@8|23
PublicSymbol|1:0|0x000011A8|1:0x000001A8|6|16:_GetMessageA@16|16:_GetMessageA@16|24
PublicSymbol|0:0|0x00005198|4:0x00000198|4|26:__imp__CreateWindowExA@48|26:__imp__CreateWindowExA@48|25
PublicSymbol|0:0|0x0000517C|4:0x0000017C|4|25:__imp__DefWindowProcA@16|25:__imp__DefWindowProcA@16|26
PublicSymbol|0:0|0x00005184|4:0x00000184|4|25:__imp__PostQuitMessage@4|25:__imp__PostQuitMessage@4|27
PublicSymbol|1:0|0x000011CC|1:0x000001CC|6|20:_RegisterClassExA@4|20:_RegisterClassExA@4|28
PublicSymbol|0:0|0x00005014|4:0x00000014|20|29:__IMPORT_DESCRIPTOR_KERNEL32|29:__IMPORT_DESCRIPTOR_KERNEL32|29
PublicSymbol|1:0|0x000011A2|1:0x000001A2|6|15:_ExitProcess@4|15:_ExitProcess@4|30
PublicSymbol|0:0|0x00005028|4:0x00000028|20|26:__IMPORT_DESCRIPTOR_GDI32|26:__IMPORT_DESCRIPTOR_GDI32|31
PublicSymbol|1:0|0x000011C0|1:0x000001C0|6|13:_LoadIconA@8|13:_LoadIconA@8|32
PublicSymbol|1:0|0x000011D2|1:0x000001D2|6|17:_SendMessageA@16|17:_SendMessageA@16|33
PublicSymbol|0:0|0x0000503C|4:0x0000003C|20|25:__NULL_IMPORT_DESCRIPTOR|25:__NULL_IMPORT_DESCRIPTOR|34
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 26, 2014, 03:05:45 AM
OK, i´m getting closer now:)

The literal string parser display things like:
Quote
PublicSymbol|0:0|0x00010BD4|2:0x00006BD4|8|42:??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@|4:Cfm|1067
PublicSymbol|0:0|0x0000EBF0|2:0x00004BF0|8|46:??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@|4: Os|1068
PublicSymbol|0:0|0x00010628|2:0x00006628|14|58:??_C@_1O@DLDHEOAC@?$AAI?$AAn?$AAt?$AAR?$AA9?$AA0?$AA?$AA@|7:IntR90|1069
PublicSymbol|0:0|0x0000F67C|2:0x0000567C|12|53:??_C@_1M@MBAMLEOM@?$AAA?$AAR?$AA1?$AA1?$AA6?$AA?$AA@|6:AR116|1070
PublicSymbol|0:0|0x00011EC0|2:0x00007EC0|14|58:??_C@_1O@EKFLKENN@?$AAx?$AAm?$AAm?$AA9?$AA_?$AA1?$AA?$AA@|7:xmm9_1|1071
PublicSymbol|0:0|0x00010A38|2:0x00006A38|14|58:??_C@_1O@GGCOEOPD@?$AAI?$AAn?$AAt?$AAR?$AA2?$AA5?$AA?$AA@|7:IntR25|1072
PublicSymbol|0:0|0x0000C1C4|2:0x000021C4|42|133:??_C@_1CK@LECDOEAE@?$AAC?$AAV?$AA_?$AAC?$AAA?$AAL?$AAL?$AA_?$AAR?$AAE?$AAS?$AAE?$AAR?$AAV?$AAE?$AAD?$AA?5?$AA?5?$AA?5?$AA?5?$AA?$AA@|21:CV_CALL_RESERVED    |1073
PublicSymbol|1:0|0x00004280|1:0x00003280|208|48:?DumpSymbolWithRVA@@YA_NPAUIDiaSession@@KPB_W@Z|83:bool __cdecl DumpSymbolWithRVA(struct IDiaSession *,unsigned long,wchar_t const *)|1074
Title: Re: RosAsm PDB Dumper v1.0
Post by: Vortex on July 26, 2014, 06:11:35 AM
Hello,

You can check Agner Fog's manual, it's a good one : Calling conventions for different C++ compilers and operating systems

http://agner.org/optimize/calling_conventions.pdf

http://agner.org/optimize/

Title: Re: RosAsm PDB Dumper v1.0
Post by: Gunther on July 26, 2014, 08:42:54 AM
You can check Agner Fog's manual, it's a good one : Calling conventions for different C++ compilers and operating systems

Yes, that's a good and reliable source.  :t

Gunther
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 27, 2014, 01:34:14 AM
HI guys, many tks for the tips.

I read Agner´s fog article, but there is few it could help. Mainly it helped me on parsing numbers.

The good news is...i succeed to translate almost all Ascii strings. This literal strings uses some sort of table to encode and decode a wchar_t const. It is mainly based on this token "?$AA" . But, there are exceptions which i´m struggling to understand. For instance, i found a "HM" token that seems to be part of another table related to vertical bar "|" signs, but i did not succeed to check it yet.

Also, i found some damn encodage of Unicode strings with escape prefixes like this

const TCHAR * g_pszAltFontName001 = L"AR P\x30da\x30f3\x6977\x66f8\x4f53L"

which is encoded as

??_C@_1BG@MLNGONLK@?$AAA?$AAR?$AA?5?$AAP0?Z0?siwf?xOS?$AAL?$AA?$AA@

Btw...the encodage woks as:
"??_C$_" predefined signature for literal strings (string namespace)
1 = Identification of Ansi or Unicode. Ansi Char type = 0 , Unicode char = 1 {wchar_t (non negative integer)}
BG = The len of the sitring (in a form of a encoded decimal ascii)
@ = separator (CRC of string starts here)
MLNGONLK = 8 chars for the CRC. This string is also encoded
@ = End of CRC
.... String Data
@ = End of string


The parser can identifies Ascii (And i´m working on the exceptions of this "HM" stuff)

But..i´m still trying to understand and figure it out how the Unicode is formed. As far  saw it is mainly teh hexadecimal values with some tokens in between.

The literal string information can also be found (barelly) here (http://msdn.microsoft.com/en-us/library/69ze775t.aspx)


So far, the table described here (http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm), i extended to
Code: [Select]
[LiteralSpecialChars:
 LiteralSpecialChars.Comma: B$ ","                  ; index 0
 LiteralSpecialChars.ForwardSlash: B$ "/"           ; index 1
 LiteralSpecialChars.BackSlash: B$ "\"              ; index 2
 LiteralSpecialChars.Colon: B$ ":"                  ; index 3
 LiteralSpecialChars.Period: B$ "."                 ; index 4
 LiteralSpecialChars.Space: B$ " "                  ; index 5
 LiteralSpecialChars.NewLine: B$ "n"                ; index 6
 LiteralSpecialChars.Tab: B$ "t"                    ; index 7
 LiteralSpecialChars.SingleQuote: B$ "'"            ; index 8
 LiteralSpecialChars.Hyphen: B$ "-"                 ; index 9
 LiteralSpecialChars.Asterix: B$ "*"                ; index 10
 LiteralSpecialChars.OpenBracket: B$ "["            ; index 11
 LiteralSpecialChars.OpenAngleBracket: B$ "<"       ; index 12
 LiteralSpecialChars.CloseBracket: B$ "]"           ; index 13
 LiteralSpecialChars.CloseAngleBracket: B$ ">"      ; index 14
 LiteralSpecialChars.QuestionMark: B$ "?"]          ; index 15


The new test version is here
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 28, 2014, 04:55:16 AM
I´m quite retrieving the necessary info. For now, i developed a function to convert the literal string numercial system. (Same as hex to int etc)....It is formed by the letters "A" to "P" to compute CRC or len of the string.

The function is:

Code: [Select]
String = The string to be converted
StringLen. = The len of the string (not including the null terminated byte)
Base = The base to be converted. Possible equates used can be:
BASE_HEX = 16
BASE_DEC = 10
BASE_OCT = 8
BASE_FOUR  =4
BASE_BIN  =2
pErr = Check for error. Pointert to a int value. True f the function succeeds. False, otherwise

Return Value in eax.
Remarks: If the function suceeds, pErr = TRUE. If it fails, return False.

Proc LiteralAsciiBaseEx:
    Arguments @String, @StringLen, @Base, @pErr
    Uses esi, ebx, ecx, edi

    mov esi D@String, eax 0, ebx 0
    mov ecx D@StringLen
    mov edi D@pErr | mov D$edi &TRUE

    While ecx <> 0
        mov eax ebx | mul D@Base | mov ebx eax, eax 0
        mov al B$esi
        If_And al >= 'A', al <= 'J'
            sub al 'A' | add al '0'
        Else_if al >= 'K', al <= 'P'
            sub al 'K' | add al 'A'
        Else
            xor eax eax
            mov D$edi &FALSE | ExitP
        End_If

        sub al '0'
          ; Cases of Hexa Notation:
        On al > 9, sub al 7
        add ebx eax | inc esi
        dec ecx
    End_While

    mov eax ebx

EndP

Example of usage:


Examples of usage:
Example a)
Code: [Select]
[BASE_HEX 16, BASE_DEC 10, BASE_OCT 8, BASE_FOUR 4, BASE_BIN 2]

[Text: B$ "CHHF", 0]
[IsError: D$ 0]
    call LiteralAsciiBaseEx Text, 4, BASE_HEX, IsError

In eax the returned value is 02775 (in hex)

Example b)
Code: [Select]

[BASE_HEX 16, BASE_DEC 10, BASE_OCT 8, BASE_FOUR 4, BASE_BIN 2]

[Text: B$ "P", 0]
[IsError: D$ 0]
    call LiteralAsciiBaseEx Text, 1, BASE_HEX, IsError

In eax the returned value is 0F (in hex)

Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 31, 2014, 06:58:59 PM
Ok guys....I guess i succeeded to decode almost the full string literal tokens. It was made by VS with some tables o encode the literal string texts as wchar, and char type (L and R). I needed to rebuild my functions from scratch, because the older functions was wrong. I had to create a app in VS containing 65536 unicode strings and analysed their sequences to understand how the tables was made.

The decoding of the CRC is ok now. It also can decode the correct CRC of the string. (Although i didn´t made any routine to decrypt the CRC. It mainly decode the M$char types to a full dword that is the CRC Value. The CRC seems to be Adler32, but, i´m not sure about that yet)

And example of decoding a char type is:
PublicSymbol|0:0|0x0002D38C|2:0x0000188C|8|28:??_C@_07CJME@H?3mm?3ss?$AA@|8:const char* Sz_StringId2236 = "H:mm:ss"|2236
PublicSymbol|0:0|0x00093F20|1:0x00092F20|32|95:?Change_State_WaitForStart@CXaSuperiorConn_Starting_State@@QAEXPAVCXaSuperiorConn_Starting@@@Z|116:const char* Sz_StringId11781 = "public: void __thiscall CXaSuperiorConn_Starting_State::Change_State_WaitForStart(class CXaSuperiorConn_Starting *)"|11781

And the wchar type is:
PublicSymbol|0:0|0x00013620|2:0x00000620|10|48:??_C@_19FLGNNEPJ@?$AAA?$AAT?$AAL?$AA?3?$AA?$AA@|5:const wchar_t* Sz_StringWId4569 = "ATL:"|4569
PublicSymbol|0:0|0x00005404|1:0x00004404|18|68:??_C@_1BC@PAPC@?$AA?$CK?$AAn?$AAE?$AAC?$AA1?$AAF?$AA0?$AA0?$AA?$AA@|9:const wchar_t* Sz_StringWId5 = "*nEC1F00"|5
PublicSymbol|0:0|0x00264B38|3:0x00132B38|36|123:??_C@_1CE@NKACHKJK@?$AAM?$AAe?$AAr?$AAd?$AAa?$AA3?$AA7?$AA0?$AA2?$AA6?$AA?$CI?$AA9?$AA0?$AAA?$AA2?$AA?$CJ?$JA?$KC?$AA?$AA@|23:const wchar_t* Sz_StringWId66214 = "Merda37026(90A2)\x903f"|66214

Please, can someone test it to see if it hangs etc.

Apparently it is missing one table for mshtml.pdb that causes the app to loop infinitely, but so far it is almost done.
The mshtml containg string withouth the proper tokens, thyat are causing the loop. Such as:
??_C@_1BK@DALFAAEA@?$AAG?$AAu?$AAg?$AAa?$AA?5?$AAP0?Z0?siwf?xOS?$AAL?$AA?$AA@
The above literal string is a unicode font of type "ARP " followed by the unicode hexadecimal string values. The problem here is the absense of the tokens. It seems o be in another table that i´m currently looking for
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on July 31, 2014, 08:39:10 PM
Btw...If someone wants to see the src code, open the file in RosAsm and search for the function "GetFullUndecorate". It is located under the TITLE tab "PUBLICSYMBOLPARSING"

Code: [Select]
Proc GetFullUndecorate:
    Arguments @ptrSymbol, @pDecoratedName, @pOutput, @LiteralStrType
    Local @OriginalStringLen, @pCRCValue
    Uses edi, esi, ecx, edx, ebx

    mov esi D@pOutput
    mov edi D@pDecoratedName
    ..If_And D$edi = '??_C', W$edi+4 = '@_' ; <--------- Parser of literal strings symbols. (This is the signature for the literal symbol)
        lea ebx D@OriginalStringLen | mov D$ebx 0
        lea ecx D@pCRCValue | mov D$ecx 0
        call LiteralSringUndecorate edi, Sz_Symbol_Temp, ebx, ecx, D@LiteralStrType
        mov D$esi+CDiaBSTR.SzSymbolNameDis Sz_Symbol_Temp
        mov D$esi+CDiaBSTR.SymbolLenDis eax
    ..Else ; <--------- Parser of regular symbols with MsDia Api
        call GetUndecoratedSymbolName esi, D@ptrSymbol
        mov ecx D$UndecoratedName.Len
        mov edx D$UndecoratedName.Name
    ..End_If
EndP

The function below is my decoder of literal strings (char and wchar types). I ommited here the rest of the functions, because it have several routines. See the embeded source opening the file in RosAsm to analyse it.
Code: [Select]
;;
??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
??_C@_ = String token
1 = char type 0 = char, 1 = wchar_t (non negative integer)
7 = string len including the null termination byte
CKCOCGAB@ = CRC
?$AAC = s
?$AAf = t
?$AAm = (
?$AA = 4 065 04
?$AA@ = )
;;

; the len of undecorated string in eax. If 0 the functyion fails.
;;
    Refs:
    http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm?tx=14,16
    calling_conventions.pdf
    http://agner.org/optimize/calling_conventions.pdf
   
;;

Proc LiteralSringUndecorate:
    Arguments @pInString, @pOutput, @pOriginalStringLen, @pCRC, @LiteralStrType
    Local @IsUnicode, @StringLiteralFormErr
    Uses esi, edi, ecx, ebx


    mov esi D@pInString
    mov edi D@pOutput
    mov D@IsUnicode &FALSE

    xor eax eax
    If_And D$esi <> '??_C', W$esi+4 <> '@_'
        ExitP
    End_If

    ;  bypass literal token "??_C@_"
    mov ebx D@LiteralStrType
    add esi 6
    If B$esi = '1'
        mov D$ebx LITERAL_STRING_WCHAR
        mov D@IsUnicode &TRUE
    Else_If B$esi <> '0' ; The literal string format is unknown. Return 0 if failure
        mov D$ebx LITERAL_STRING_CHAR
        mov eax 0 | ExitP
    Else
        mov D$ebx LITERAL_STRING_CHAR
    End_If

    inc esi ; bypass unicode/ascii check

    ; calculate original len of the string
    mov ebx D@pOriginalStringLen
    ..If_And B$esi >= '0', B$esi <= '9'
        movzx eax B$esi
        sub al '0' | mov D$ebx eax
        inc esi ; bypass the len, and go to start of CRC. (On this case, the CRC is not preceeeded with '@' char)
    ..Else
        ;1st we check the lenght of the "len" seeking for the next '@' char
        mov edx esi
        xor ecx ecx
        While B$edx+ecx <> '@'
            inc ecx
             On ecx > 8, jmp L1> ; for safety, if len if bigger then 8, exit loop
        End_While
        L1:
        .If_Or ecx = 0, ecx > 8 ; No len token '@'. Error
            xor eax eax | ExitP
        .Else
            lea edx D@StringLiteralFormErr | mov D$edx 0
            call LiteralAsciiBaseEx esi, ecx, BASE_HEX, edx
            If D@StringLiteralFormErr = &FALSE ; The format of CRC is incorrect. Exit
                xor eax eax | ExitP
            Else
                mov D$ebx eax
            End_If
        .End_If

        add esi ecx ; got to the end of the "len" string
        inc esi ; bypass '@' char and go to the start of CRC
    ..End_If

    ; Now we compute the CRC.

    ; 1st we check the len of the string related to the CRC
    mov edx esi
    xor ecx ecx
    While B$edx+ecx <> '@'
        inc ecx
        On ecx > 8, jmp L1> ; for safety, if len if bigger then 8, exit loop
    End_While
    L1:

    .If_Or ecx = 0, ecx > 8 ; No ending CRC token '@'. Error
        xor eax eax | ExitP
    .Else
        mov ebx D@pCRC
        lea edx D@StringLiteralFormErr | mov D$edx 0
        call LiteralAsciiBaseEx esi, ecx, BASE_HEX, edx
        If D@StringLiteralFormErr = &FALSE ; The format of CRC is incorrect. Exit
            xor eax eax | ExitP
        Else
            mov D$ebx eax
        End_If
    .End_If

    add esi ecx ; got to the end of the "CRC" string
    inc esi ; bypass '@' char and go to the start of data

    If D@IsUnicode = &TRUE ; const wchar_t

        call DemangleLiteralUnicodeString esi, edi

    Else ; const char

        call DemangleLiteralCharString esi, edi

    End_If

EndP

decoder of the "A" to "P" table (A replacement of the regular inttohex conversions
Code: [Select]
; used for tables "A" to "P". Len is bigger then 9
;;

Dec     Hex     Literal
0       0           A
1       1           B
2       2           C
3       3           D
4       4           E
5       5           F
6       6           G
7       7           H
8       8           I
9       9           J
10      A           K
11      B           L
12      C           M
13      D           N
14      E           O
15      F           P

;;

; this function does not include the null termination byte
Proc LiteralAsciiBaseEx:
    Arguments @String, @StringLen, @Base, @pErr
    Uses esi, ebx, ecx, edi

    mov esi D@String, eax 0, ebx 0
    mov ecx D@StringLen
    mov edi D@pErr | mov D$edi &TRUE

    While ecx <> 0
        mov eax ebx | mul D@Base | mov ebx eax, eax 0
        mov al B$esi
        If_And al >= 'A', al <= 'J'
            sub al 'A' | add al '0'
        Else_if_And al >= 'K', al <= 'P'
            sub al 'K' | add al 'A'
        Else
            xor eax eax
            mov D$edi &FALSE | ExitP
        End_If

        sub al '0'
          ; Cases of Hexa Notation:
        On al > 9, sub al 7
        add ebx eax | inc esi
        dec ecx
    End_While

    mov eax ebx

EndP

The function below  shows the normal symbols under MSDia
Code: [Select]
Proc GetUndecoratedSymbolName:
    Arguments @pCDiaBSTR, @ptrSymbol
    Local @bstrName
    Uses esi, ecx, ebx, edx, edi

    mov edi D@pCDiaBSTR
    mov D$edi+CDiaBSTR.SzSymbolNameDis 0
    mov D$edi+CDiaBSTR.SymbolLenDis 0
    mov D@bstrName 0
    lea ebx D@bstrName
    icall DIA_SYMBOL_GET_UNDECORATEDNAMEEX D@ptrSymbol, &UNDNAME_COMPLETE, ebx
    .If eax = &S_OK
        call WriteTempSymbol D@pCDiaBSTR, D@bstrName
        mov eax &TRUE
    .Else
        xor eax eax
    .End_If

EndP
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on August 02, 2014, 12:15:48 PM
OK, executable working fine. I´m preparing the dll and the tutorials right now :)
Title: Re: RosAsm PDB Dumper v1.0
Post by: Gunther on August 03, 2014, 10:39:41 AM
You're a hard working man, Gustavo.  :t

Gunther
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on August 03, 2014, 02:06:18 PM
Tks Gunther  :t :t

I´m doing my best to the library be the correct as possible, to allow other programmers to writes their parsers based on MSdia the easier as possible.

With the dll i´m building it will be possible to write a converter for pdb to inc (to be used in C, masm, rosasm, Fasm, Nasm etc), and also the same dll can be used as a starting point for a symbolic parser for debuggers.

Since JJ helped with a fix on the memory allocation function, i can trace other errors on huge files i have here. I loaded today a 150 Mb pdb file from (windows 8 ) and the file did not crashed, it simply showed me that on that file, there is no content that msdia120 could be able to parse.

Also, on another file i´m testing (23 Mb), i found a small bug on ws_printA that is being unable to load a string bigger then 1024 bytes. For this particular problem, i´m porting StringCbPrintf (http://msdn.microsoft.com/en-us/library/windows/desktop/ms647510%28v=vs.85%29.aspx) to RosAsm and will also insert it on the dll as an export function :)
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on August 10, 2014, 01:10:06 AM
A bit more stable version. There are still some minor errors on the Unicode strings chars. When it encounters a \xABDC, it may identify it correctly, but, in some cases, the result is not accurated. Nevertheless i´m uploading here the last version and finishing the tutorial before building the dll version.

The output can now identify public symbols (literal strings) as what they are.

So, when a public symbol is identified as Ascii, the start of the text token is "AsciiString". When the identification is for unicode, it is "UnicodeString". In any other cases, it simply identifies the public symbol as "PublicSymbol".

Note: A unicode string may be represented as Latin unicode (With the common Ascii table), or with foreign chars (Asian etc etc). For non Latin chars, the identification is always with "\xABCD" Where ABCD is the hexadecimal value of the string. Exactly as when you use it in C
Examples:
Code: [Select]
AsciiString|0xEBE74A0E|0x0000318C|2:0x0000018C|14|39:??_C@_0O@OLOHEKAO@The?5answer?5is?$AA@|14:const char* Sz_StringId1469 = 'The answer is'|1469

Code: [Select]
UnicodeString|0x470AF02B|0x000031B0|2:0x000001B0|96|187:??_C@_1GA@EHAKPACL@?$AAf?$AA?3?$AA?2?$AAd?$AAd?$AA?2?$AAv?$AAc?$AAt?$AAo?$AAo?$AAl?$AAs?$AA?2?$AAc?$AAr?$AAt?$AA_?$AAb?$AAl?$AAd?$AA?2?$AAs?$AAe?$AAl?$AAf?$AA_?$AAx?$AA8?$AA6?$AA?2?$AAc@|32:const wchar_t* Sz_StringWId1466 = 'f:\dd\vctools\crt_bld\self_x86\c'|1466

Code: [Select]
UnicodeString|0xDA027A9A|0x00264B38|3:0x00132B38|36|123:??_C@_1CE@NKACHKJK@?$AAM?$AAe?$AAr?$AAd?$AAa?$AA3?$AA7?$AA0?$AA2?$AA6?$AA?$CI?$AA9?$AA0?$AAA?$AA2?$AA?$CJ?$JA?$KC?$AA?$AA@|23:const wchar_t* Sz_StringWId66214 = 'Merda37026(90A2)\x90a2'|66214

Code: [Select]
PublicSymbol|1:0|0x00002090|1:0x00001090|95|20:__ValidateImageBase|20:__ValidateImageBase|1487
Code: [Select]
PublicSymbol|1:0|0x00001150|1:0x00000150|93|31:?StringCbPrintfA@@YAJPADIPBDZZ|67:long __cdecl StringCbPrintfA(char *,unsigned int,char const *,...)|1378
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on August 14, 2014, 09:48:19 AM
A preview of the help file. I plan to make it in chm format and not pdf to make easier search inside.

Also, i´m currently finishing writing the help for the public symbols. I added 16 more or something and plan to add a few more i found here and there, including static float and double data, static guid, pch (precompile headers), IIDs, CLSIDs, Import descriptors.

For the rest of the public symbols, i´m also planning to add a routine to check if a undecorated symbol is really code or data (when the identification on the "normal" way failed)
The importance of the public symbol is also for disassembler purposes, since, it can be used as a guidance for correct code/data chunks identifications.

I would like to know what you guys think.... If the help file is readable, if it is easy to understand how it works etc.
Title: Re: RosAsm PDB Dumper v1.0
Post by: Gunther on August 14, 2014, 10:17:33 AM
Hi Gustavo,

the PDF file looks good. I think you're explaining very detailed. That's good for interested coders.  :t

Gunther
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on August 14, 2014, 03:17:08 PM
Hi Gunther. Many thanks.  :biggrin:

I`m struggling with the pdbparser, because i want it to be the more accurated as possible in order to allow anyone to build whatever tool is needed. Such as a pdb to inc, a better usage of a disassembler or debugger that loads pdb (RosAsm, IdaPro, Olly etc), write wrappers for classes on a automated and faster way (This is possible once i succeed to properly parse all classes info and build the related structure of it (or equate enumerations too) , or even it seems possible to write tiny lib files from some information gattered from the pdb. The possibilities of usage the information inside a pdb seems huge.

Too bad, M$ made such a mess on it. Although MsDia is a handy tool, it is a bit complicated to work because the enumeration of the data is not always that easy to retrieve on the proper manner.

Once i suceed to finish the public symbols on the necessary way, and finish the help file, i´ll build the dll. The next versions i´m planning to make a binary data of those tokens, because it seems that if i succeed to make it in binary, it would be even easier to grab more info and the export file may be much smaller. (Plus, with the binay data, i canfinally try finishing a Flirt system i´m planning for RosAsm since years)
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on August 20, 2014, 02:06:52 PM
Ok, i´m delaying the release, because i´m fine tunning the results. The public symbol identification is the most painful thing ever. But, it provides valuable information. So far i achieved a margin of error of 3% of unidentified symbols  taking shell32.pdb as an example. From 24736 symbols existent (code, data, structures, linker info, delay load helpers routines, IAT data etc), it only missed 751 - in fact, it didn´t "missed", it simply couldn´t identify by it´s unmangled name neither the chunk size (Mainly the unidentified info is data), but i´m trying to achieve a higher rate closer to 99% of correct identification before release

Also, i enabled the hability to dl symbols from ms server based on the loaded executable/dll. For this, later i´ll add a match check for the symbol and executable and maybe force it to match even if they seems different (This is possible when the symbols and executable have different ages, but the same signature info)

I won´t upload the last version now, because i need to change one routine for the temporary dl of the pdb files.  For the executable demo, probably i´ll add a browse folder to save it when you try to parse and executable that contains the pdb info to be dl.


Btw...one question to better guarantee correctnes. SOmeone have a list of all possible Guid data existent ? (I mean, their names and values). I tried to apply Japeth´s comview tool here to create a text file that may help the parsing identification, but ComView is not opening here (probably because my OS is too heavy right now)
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 04, 2014, 03:17:02 PM
Can someone, please test this to see if it is workingon other systems then WinXp ?

Added more Public Symbols structures and a progressbar computing the parsing position (It was necessary for huge pdbs.

Also....how to use multithreading ? I mean, i plan to place the main function inside a thread with createthead, but since it is using now a progressbar, does it means i have to put the progressbar function on another thread too ?


Also...how to pause/resume/stop the progressbar ?
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 04, 2014, 06:43:36 PM
Damn, i hate threads ! :icon_mrgreen:

I´m testing another version but I can´t make it work. It keeps crashing all the time

What´s the point of threads afterall ? Does it release memory for oother applications ? If it is, then someone have a working example/tutorial on how to use it ? (Also for multithreads)
Title: Re: RosAsm PDB Dumper v1.0
Post by: ToutEnMasm on September 05, 2014, 01:04:13 AM

The dbghelp.dll offer a set of functions to browse the pdb.
Perhaps he could help,there is no need of thread.
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 05, 2014, 06:46:52 AM
dbghelp is not the best way to retrieve the pdb info., unfortunatelly

It is mainly a kind of wrapper for MsDia (The older version on your system - v 1.09, i presume) and does not have all functions as MSDia.
This is why i´m putting so much efford in creating a parser using msdia 1.20 and not dbghelp

One question. If there´s no need of threads, how can i make the app be free to use while it is parsing ? I mean, try to parse a "huge" pdb (Example one with 12 Mb or 22 MB) and then try to move it´s window or the progressbar. You won´t be able to do it, since it is parsing the "huge" database. How can i handle this ? I mean, how to make the rest of the app available for usage while the pdb is being parsed, regardless how big the pdb is ?

Title: Re: RosAsm PDB Dumper v1.0
Post by: ToutEnMasm on September 05, 2014, 06:59:21 PM

I have the msdia100.dll provided by c++ express,version 10.00.40219.01
dumpbin show no exports functions for this dll,must be not easy to use it.
The dbghelp.dll is used by windbg and I don't see in what the msdia can be better.

If you are in trouble with thread,write your code without the thread and add it at he end.
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 06, 2014, 04:52:42 AM
Hi

From what i read, msdia100 is buggy. They fixed some problems iin msdia120. DbgHelp is a wrapper mainly. I´m not sure if the last version of dbghelp (for VS 2013) uses msdia120. What i know is that dbghelp does not uses all functions in msdia. (At least the version that does not wrap msdia120). I´m not saying dbghelp is useless. It just don´t uses all functionality of msdia.

About threads, i tried putting at the end,but it still needs to wait the parser finishes and don´t release the app before it ends it´s job. For small pdbs this is not a problem, but for big files, you have to wait the parse finishes.

Try parsing the file "wmp_notestroot.pdb" (23 Mb - for windows 8 ) to you see what i mean. It have 76227 symbols. Maybe i´m placing the thread in the wrong place ? The main parsing engine is at GetPDBInfo

If i place the thread there, it won´t make any difference (also, it crashed whenever i tried. Or kept showing the error message in ScanHeaderFile, because the thread ended before it parsed)
Code: [Select]
Proc GetPDBInfo::
    Arguments @lpFileName, @Flags
    Local @InputFileLen, @hDiaLib
    Uses esi, edi, ecx, ebx, edx

    lea ebx D@hDiaLib | mov D$ebx 0
    call GetMSDiaLibrary D@lpFileName, ebx
    ..If eax = &TRUE
        call StartPDBProgressBar &NULL
        mov D$PdbInfoMem 0
        call CreatePDBinMemory D@lpFileName, PdbInfoMem
        .If eax <> 0
            mov D@InputFileLen eax

            mov esi D$PdbInfoMem
            Test_If D@Flags PDB_PARSE_ENUM
                call DumpAllEnums D$g_pDiaSymbol, esi | mov esi eax
            Test_End

            Test_If D@Flags PDB_PARSE_TYPEDEF
                call DumpAllTypeDefs D$g_pDiaSymbol, esi | mov esi eax
            Test_End

            Test_If D@Flags PDB_PARSE_UDT
                call DumpAllUDTs D$g_pDiaSymbol, esi | mov esi eax
            Test_End

            Test_If D@Flags PDB_PARSE_PUBLIC_SYMBOL
                call DumpAllPublicSymbols D$g_pDiaSymbol, esi | mov esi eax
            Test_End

            call ClosePDBProgressBar

            If D$g_pDiaSymbol <> 0
                call ReleaseInterface D$g_pDiaSymbol
                mov D$g_pDiaSymbol 0
            End_If

            If D$g_pDiaSession <> 0
                call ReleaseInterface D$g_pDiaSession
                mov D$g_pDiaSession 0
            End_If

            If D$g_pDiaSymbol <> 0
                call ReleaseInterface D$g_pDiaSymbol
                mov D$g_pDiaSymbol 0
            End_If

            mov ecx esi | sub ecx D$PdbInfoMem
            If ecx > D@InputFileLen
                Align_On MEM_ALIGNMENT ecx
                call ReAllocateMemory D$PdbInfoMem, ecx
                mov D$PdbInfoMem 0
            Else_If ecx = 0 ; Nothing was parsed
                call 'RosMem.Heap_Operator_Delete' D$PdbInfoMem
                xor eax eax
            Else
                mov eax D$PdbInfoMem
            End_If

        .End_If
    ..End_If

    mov esi eax

    ; release msdia library before exit
    If D@hDiaLib <> 0
        call 'KERNEL32.FreeLibrary' D@hDiaLib
    End_If

    mov eax esi

EndP

While the routine to call it, is at:

Code: [Select]
Proc ScanHeaderFile:
    Arguments @Adressee, @Message, @wParam, @lParam
    Local @hRtfEdit

     pushad

    ...If D@Message = &WM_COMMAND                  ; User action

        ..If D@wParam = IDC_EXIT
            call HeaderScanCleanUp D@Adressee
            call 'USER32.EndDialog' D@Adressee &NULL
        ..Else_If D@wParam = IDC_CLOSE

            call HeaderScanCleanUp D@Adressee

        ..Else_If D@wParam = IDC_OPEN

                mov B$HeaderSaveFilter 0
                move D$ofn.hwndOwner D@Adressee
                move D$ofn.hInstance D$hInstance
                mov D$ofn.lpstrFilter HeaderFileFilter
                call 'comdlg32.GetOpenFileNameW' ofn

            .If eax = &TRUE
                call HeaderScanCleanUp D@Adressee
                call GetPDBInfo HeaderSaveFilter, (PDB_PARSE_TYPEDEF+PDB_PARSE_ENUM+PDB_PARSE_UDT+PDB_PARSE_PUBLIC_SYMBOL);, PDBFlag ; 8 = 1st check only enum, UDT = 010, typedef = 4
                If eax <> 0
                    mov D$PdbMem eax
                    call 'user32.GetDlgItem' D@Adressee, IDC_TEXT | mov D@hRtfEdit eax
                    call 'USER32.SendMessageA' eax, &EM_SETLIMITTEXT, 0, 0 ; extends the text limit ot the edit control to 0-1
                    call 'USER32.SendMessageA' D@hRtfEdit, &EM_SETSEL, 0-2, 0-2

                    ;mov esi D$PdbMem Development routine to Test to see if everything went fine. Not needed anylonger
                    ;While B$esi <> 0
                     ;   inc esi
                    ;End_While

                    call 'USER32.SendMessageA' D@hRtfEdit, &EM_REPLACESEL, 0, D$PdbMem

                    ;call 'USER32.SetDlgItemTextA' D@Adressee, IDC_TEXT, D$PdbMem

                Else
                    call 'User32.MessageBoxA' &NULL, Sz_PDBError, {B$ "Error !!!", 0}, &MB_ICONERROR__&MB_ICONWARNING__&MB_TOPMOST
                End_If
            .End_If

        ..End_If


    ...Else_If D@Message = &WM_INITDIALOG

    ...Else_If D@Message = &WM_CLOSE

         call HeaderScanCleanUp D@Adressee
         call 'USER32.EndDialog' D@Adressee &NULL

    ...Else
        popad | mov eax &FALSE | ExitP

    ...End_If

L9: popad | mov eax &TRUE

EndP

Do i need to put a thread after in D@wParam = IDC_OPEN ?
Title: Re: RosAsm PDB Dumper v1.0
Post by: ToutEnMasm on September 08, 2014, 04:08:02 AM
To simplify your problem,you can choose the normal way to get a pointer on IDiaDataSource.
That is:
made a dynamic link on DllRegisterServer and DllUnregisterServer
There are just PROTO stdcall with no parameter.

init:  invoke DllRegisterServer
This load the clsid and the interfaces into the registry

Code: [Select]
invoke CoCreateInstance,addr CLSID_DiaSource, NULL,CLSCTX_INPROC_SERVER,addr IID_IDiaDataSource,addr ppvIDiaDataSource ;ppv out
;eax S_OK réussite
.if ppvIDiaDataSource != 0 ;
IDiaDataSource loadDataFromPdb,addr pdbfile ;OK
.if eax == S_OK
IDiaDataSource loadDataForExe,addr exefile,NULL,NULL ;catastrophic failure,need more experiment
.if eax != S_OK
;invoke GetLastError
invoke LireEr_Com,eax
.if eax != 0
invoke MessageBox,NULL,edx,ecx,MB_OK
.else
invoke MessageBox,NULL,TXT("IDiaDataSource loadDataForExe"),TXT("ERROR not FOUND"),MB_OK
.endif
.endif
IDiaDataSource Release
.endif
.endif


end: invoke DllUnregisterServer
Take care that some interfaces use a virtual stdcalll who need special features to avoid the crash

Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 08, 2014, 03:24:56 PM
Actually there is no need to use DllRegisterServer or CLSCTX_INPROC_SERVER

MSdia can be loaded "unregistered". The routine i used for this is:

Code: [Select]
Proc GetMSDiaLibrary:
    Arguments @SzOutput, @phDiaLib
    Uses esi, edi, ecx, ebx, edx

    call InitializeDiaInterface {U$ "msdia120.dll" 0}, CLSID_DiaSourceAlt, IID_IDiaDataSource, g_pDiaDataSource, D@phDiaLib
    If eax <> &S_OK
        xor eax eax
        ExitP
    End_If


    icall DIA_LOADDATA_FROMPDB D$g_pDiaDataSource, D@SzOutput
    .If eax <> &S_OK
        call TryLoadingSymbolForExe D@SzOutput
        On eax = &FALSE, EXitP
    .End_If

    icall DIA_OPENSESSION D$g_pDiaDataSource, g_pDiaSession
    If eax <> &S_OK
        call ReportWinError {'DiaDataSource: OpenSession' 0} ; show cause of failure
        xor eax eax
        ExitP
    End_If

    icall DIA_SESSION_GET_GLOBALSCOPE D$g_pDiaSession, g_pDiaSymbol
    If eax <> &S_OK
        call ReportWinError {'DiaSession: GlobalScope' 0} ; show cause of failure
        xor eax eax
    Else
        mov eax &TRUE
    End_If

EndP

Code: [Select]


; Initialize an object not registered
; http://msdn.microsoft.com/en-us/library/windows/desktop/ms680760%28v=vs.85%29.aspx

Proc InitializeDiaInterface:
    Arguments @lpLibFileName, @pClsid, @pIID, @pOut, @phDiaLib
    Local @hpfDllGetClassObject, @ppv
    Uses esi, edi, ecx, ebx, edx

    call 'KERNEL32.LoadLibraryExW' D@lpLibFileName, 0, &LOAD_WITH_ALTERED_SEARCH_PATH
    .If eax = 0
        call ReportWinError {'LoadLibraryExW' 0} ; show cause of failure
        mov eax &E_FAIL
        ExitP
    .End_If

    mov edi D@phDiaLib | mov D$edi eax

    call 'KERNEL32.GetProcAddress' eax, {B$ "DllGetClassObject", 0}
    .If eax = 0
        call ReportWinError {'msdia120.dll' 0} ; show cause of failure
        mov eax &E_FAIL
        ExitP
    .End_If
    mov D@hpfDllGetClassObject eax

    lea ecx D@ppv | mov D$ecx 0
    call D@hpfDllGetClassObject D@pClsid, IID_IClassFactory, ecx
    mov esi eax
    .If eax = &S_OK ; points to CreateInstance
        icall ICLASS_FACTORY_CREATE_INSTANCE D@ppv, 0, D@pIID, D@pOut
        mov esi eax
        call ReleaseInterface D@ppv
    .End_If

    mov eax esi

EndP

The function is called like this:
Code: [Select]
[HeaderSaveFilter: U$ 0 #&MAX_PATH] ; The path and name of the pdb file (In unicode string)
[hDiaLib: D$ 0] ; A variable that is used to store the msdia120.dll handle

    call GetMSDiaLibrary HeaderSaveFilter, hDiaLib


MSdia contains as an export this function DllGetClassObject, which is used to load unregistered classes
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680760%28v=vs.85%29.aspx

So, if a dll contains this function, you can simply use this routines i did, to load it without registering the interface. Not sure what are other dlls that have this external Api, but for what i saw, whenever a dll have it (and uses classes) you can use it, and therefore, you can put the dll in whatever folder to avoid relying on the system folder. This is particular good if you have 2 dlls containing tteh same (or similar) interfaces, but you want to use the most recent one on your own directory and not the one that is provided in system32 directory

For loading the symbols from a exe/dll, You example may not work if you don´t set the proper servers like this:

Code: [Select]
Proc TryLoadingSymbolForExe:
    Arguments @SzOutput
    Local @callback

    icall DIA_LOADDATA_FOREXE D$g_pDiaDataSource, D@SzOutput, &NULL, &NULL
    If eax = &S_OK
        mov eax &TRUE
        ExitP
    End_If

    icall DIA_LOADDATA_FOREXE D$g_pDiaDataSource, D@SzOutput, {U$ "srv*C:\temp*http://srv.symbolsource.org/pdb/Public;srv*C:\temp*http://symbols.mozilla.org/firefox;srv*C:\temp*http://referencesource.microsoft.com/symbols;srv*C:\temp*http://msdl.microsoft.com/download/symbols", 0}, &NULL

    ..If eax <> &S_OK
        call ReportWinError {'DiaDataSource: LoadDataFromPdb' 0} ; show cause of failure
        xor eax eax
        ExitP
    ..End_If
    mov eax &TRUE

EndP

The 3rd parameter of the loadDataForExe function (the 2nd one in your function) is used to check where to load the pdb. If it is NULL, the function tries to load the pdb on the same directory as in your main file. If it can´t find it, it returns something else then S_OK. Then you can simply see if on the proper servers there is a pdb to be loaded. (I presume you did that on LireEr_Com ?)

Since i´m testing the loaddataforexe, the downloaded pdbs are saved on "C:\temp". Later i´ll make some routines to allow the user to choose the directory he wants for the downloaded files. ANd also some routines to check if the loaded pdb matches to the exe/dll (Through it´s age and signature). If it don´t match there is a way to force a match, simply resetting it´s age, as long the signature is the same.

Also, it is important that symsrv.dll be at the same directory as your msdia120.dll. Otherwise it may not work as expected.

For the syntax used to load the pdb/dbg from the servers (ms or other), the correct is:
srv*C:\temp*http://msdl.microsoft.com/download/symbols

that means:
srv = the token to activate the download
* = a separator. Must be used
c:\temp = the directory for the downloaded files
* = a separator. Must be used
http://msdl.microsoft.com/download/symbols = the server from where you want to dl the pdb

Also. To use more then 1 servers to be searched and dwonloaded, you must separate the syntaxes with a ";" token, like this:
C:\temp*http://referencesource.microsoft.com/symbols;srv*C:\temp*http://msdl.microsoft.com/download/symbols

And the whole string MUST be in Unicode format. This is why i put ' "U$" datatype in my code and not a "B$" in icall DIA_LOADDATA_FOREXE. In RosAsm the "U$" token refers to a Unicode String
Title: Re: RosAsm PDB Dumper v1.0
Post by: ToutEnMasm on September 08, 2014, 04:04:45 PM

Quote
Actually there is no need to use DllRegisterServer or CLSCTX_INPROC_SERVER

A beautifull affirmation without proof.
Microsoft use CLSCTX_INPROC_SERVER in all of his sample.
To affirm that, you need to know what do the DllRegisterServer function.
And if is not enough,you can search for what your program had so much trouble to work.
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 08, 2014, 04:09:45 PM
I´m not following M$ example.
See here
http://stackoverflow.com/questions/2466138/windows-c-how-to-use-a-com-dll-which-is-not-registered
http://masm32.com/board/index.php?topic=3441.0
Title: Re: RosAsm PDB Dumper v1.0
Post by: ToutEnMasm on September 08, 2014, 04:22:22 PM

Those two links not grant that it work perfectly.
The answer is inside the dll itself with his function.
Quote
msdia100.dll
   DllCanUnloadNow,DllGetClassObject,DllRegisterServer,DllUnregisterServer,VSDllRegisterServer
   VSDllUnregisterServer

Find how to use VSDllRegisterServer and  VSDllUnregisterServer and you surely have an anwer.

Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 08, 2014, 04:34:33 PM
So far it is working correctly. The symbols are being parsed, and i´m using M$ sample as a guidance for retrieving others symbols.

With IdaPro, take a look at DllRegisterServer and DllGetClassObject (called through call D@hpfDllGetClassObject) to you see what i mean. DllRegisterServer mainly sets onto the registry classes  like: DiaDataSourceCLSID, DiaAltSourceCLSID,  DiaStackWalkerCLSID which are the exact classes used/retrieved in DllGetClassObject CLSID_DiaSource, CLSID_DiaStackWalker, CLSID_DiaSourceAlt.

DllGetClassObject retrieve all necessary classes to be used by the library. Registering them is redundance.

The problem also relyes on the fact that, i´m not using msdia100 !!!. I´m using msdia120 (I provided the dll and the link on the 1st post). From what i read, Msdia100 is buggy, while msdia120 have some troubles fixed up. This is why i´m using the newer version and not the one that is inside my system32 folder
Title: Re: RosAsm PDB Dumper v1.0
Post by: ToutEnMasm on September 08, 2014, 04:40:11 PM

You are sure that all the initialistions are made with the DllRegisterServer function ,and only with that.
VSDllRegisterServer,
Using my searcher,i find how to use those functions:
http://bbs.pediy.com/showthread.php?t=103147 (http://bbs.pediy.com/showthread.php?t=103147)
Title: Re: RosAsm PDB Dumper v1.0
Post by: guga on September 08, 2014, 05:02:31 PM
Thanks, i´ll dl it for later usage.

But...I really don´t understanding why use DllRegisterServer if DllGetClassObject (The one existent on msdia120) uses the exact same classes without the need to registering them.

I´m with IdaPro opened and i´m analyzing both functions, and the same classes that are settled onto the registry by DllRegisterServer are also used on DllGetClassObject. This seems to be what DllGetClassObject is for, it retrieve all necessary classes to be used in msdia. Since the symbols are being properly parsed on the same way as in M$ example, it seems pointless use DllRegisterServer. Well. at least for msdia. I´m not talking about others dlls, because i didn´t anallyzed them.

About the problems on the older version of msdia, i didn´t bookmared them to you see. But here is one example:
https://groups.google.com/forum/#!topic/google-breakpad-discuss/XixkXpEaS-I

Also, it seems that the server technique does not works for windows7 with msdia100
http://www.ask-coder.com/389202/windowsc-how-to-use-a-com-dll-which-is-not-registered

So, the better is use msdia120 that is the last version of msdia and avoid using DllRegisterServer