News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

RosAsm PDB Dumper v1.0

Started by guga, July 24, 2014, 04:17:28 PM

Previous topic - Next topic

guga

A small app that parses information from pdb files.

This version is still in executable form to testing purposes. If someone can test it in other OSes i´ll appreciate it. It works on Xp, didn´t tested it on WinNT, Windows Vista, Windows 7, 8 etc

I´m currently building a tutorial about the exported functions inside, since the app will be turned onto a dll and you will need a tutorial about the parsed information.

After that it will be more easy to anyone build a pdb2inc, for example  :t

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

Windows 8.1 Pro x64, nothing happens with any pdb I choose, even running as admin.
Also, the title "C/C++ Comment Remover" is a mistake, yes?

guga

Yeah. the comment remover is just the title from an older app i built upon this. I forgot to fix the title.

About pdb from windows 8. I have some of those here (and windows7), it seems that since windows7 all the pdb´s contains are the public symbols itself (i.e.:  the variable names)., compiland information, module names and also the FPO data. It does not seems to contains enumeration values, typedefs and UDTs.

I´ll later include the parsing of the public symbols, although it won´t be helpfull information to build a pdb2inc.

Try loading a Xp symbol or anyother you may created in VS2008/2010 and see if it is there. Or even better, can u please see if this symbol i´m uploaling the parser works for on windows 8 ? Here it is parsing correctly all the symbols that contains enums, typedefs and udts

simpleDBGTest.zip - dbg on windosxp
1394bus.zip - result of the CV SDK compiled. I tested this on a windows 7 pdb and there is no enum, udts, typedefs
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Btw...this newer version should display the proper error msg
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

New version. Small test to display the public symbols. I´m tired so, it won´t do much except display them and their indexes. No undecorate or displaying their offsets yet..I´ll do it at night.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

Hi Gustavo,

your application needs ROSMEM.dll to run properly, doesn't it?

Gunther
You have to know the facts before you can distort them.

Vortex

True. The application needs that DLL.

guga

Yes. it is located on the 1st post. These other posts (the smaller zip) contains only the updates of the main executable.
On 1st post the zip file contains:

  • DiaDump7b.exe (The older version)
  • msdia120.dll
  • RosMem.dll
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

Thank you Erol and Gustavo. I'll test it out.

Gunther
You have to know the facts before you can distort them.

guga

OK, guys, i´m suceeding to parse the public symbols and i´m making the proper tokens for that.

One question. How to undecorate a string literal name ?

I mean...when using UnDecorateSymbolName Api (or .__unDNameEx from msvcrt and  others) , it is ok to undecorate functions, structures etc...but,....in what concern strings (Unicode and ascii) it fails badly. All it exports is "string". I wanted the whole string back.

I tried the api DsCrackUnquotedMangledRdn and it also fails.

The only place i found some valid info about Microsoft literal string mangling is here at the function.
void MicrosoftMangleContextImpl::mangleStringLiteral(const StringLiteral *SL, raw_ostream &Out) {. Also, some info for demangling is found here

This function perfectly describes how to encode a string to display things like this:

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@

The question is..how to decrypt it ????

The encryption algorithm from llvm project (link above) for M$ literal string mangling is this:

void MicrosoftMangleContextImpl::mangleStringLiteral(const StringLiteral *SL,
                                                     raw_ostream &Out) {
  // <char-type> ::= 0   # char
  //             ::= 1   # wchar_t
  //             ::= ??? # char16_t/char32_t will need a mangling too...
  //
  // <literal-length> ::= <non-negative integer>  # the length of the literal
  //
  // <encoded-crc>    ::= <hex digit>+ @          # crc of the literal including
  //                                              # null-terminator
  //
  // <encoded-string> ::= <simple character>           # uninteresting character
  //                  ::= '?$' <hex digit> <hex digit> # these two nibbles
  //                                                   # encode the byte for the
  //                                                   # character
  //                  ::= '?' [a-z]                    # \xe1 - \xfa
  //                  ::= '?' [A-Z]                    # \xc1 - \xda
  //                  ::= '?' [0-9]                    # [,/\:. \n\t'-]
  //
  // <literal> ::= '??_C@_' <char-type> <literal-length> <encoded-crc>
  //               <encoded-string> '@'
  MicrosoftCXXNameMangler Mangler(*this, Out);
  Mangler.getStream() << "\01??_C@_";

  // <char-type>: The "kind" of string literal is encoded into the mangled name.
  // TODO: This needs to be updated when MSVC gains support for unicode
  // literals.
  if (SL->isAscii())
    Mangler.getStream() << '0';
  else if (SL->isWide())
    Mangler.getStream() << '1';
  else
    llvm_unreachable("unexpected string literal kind!");

  // <literal-length>: The next part of the mangled name consists of the length
  // of the string.
  // The StringLiteral does not consider the NUL terminator byte(s) but the
  // mangling does.
  // N.B. The length is in terms of bytes, not characters.
  Mangler.mangleNumber(SL->getByteLength() + SL->getCharByteWidth());

  // We will use the "Rocksoft^tm Model CRC Algorithm" to describe the
  // properties of our CRC:
  //   Width  : 32
  //   Poly   : 04C11DB7
  //   Init   : FFFFFFFF
  //   RefIn  : True
  //   RefOut : True
  //   XorOut : 00000000
  //   Check  : 340BC6D9
  uint32_t CRC = 0xFFFFFFFFU;

  auto UpdateCRC = [&CRC](char Byte) {
    for (unsigned i = 0; i < 8; ++i) {
      bool Bit = CRC & 0x80000000U;
      if (Byte & (1U << i))
        Bit = !Bit;
      CRC <<= 1;
      if (Bit)
        CRC ^= 0x04C11DB7U;
    }
  };

  auto GetLittleEndianByte = [&Mangler, &SL](unsigned Index) {
    unsigned CharByteWidth = SL->getCharByteWidth();
    uint32_t CodeUnit = SL->getCodeUnit(Index / CharByteWidth);
    unsigned OffsetInCodeUnit = Index % CharByteWidth;
    return static_cast<char>((CodeUnit >> (8 * OffsetInCodeUnit)) & 0xff);
  };

  auto GetBigEndianByte = [&Mangler, &SL](unsigned Index) {
    unsigned CharByteWidth = SL->getCharByteWidth();
    uint32_t CodeUnit = SL->getCodeUnit(Index / CharByteWidth);
    unsigned OffsetInCodeUnit = (CharByteWidth - 1) - (Index % CharByteWidth);
    return static_cast<char>((CodeUnit >> (8 * OffsetInCodeUnit)) & 0xff);
  };

  // CRC all the bytes of the StringLiteral.
  for (unsigned I = 0, E = SL->getByteLength(); I != E; ++I)
    UpdateCRC(GetLittleEndianByte(I));

  // The NUL terminator byte(s) were not present earlier,
  // we need to manually process those bytes into the CRC.
  for (unsigned NullTerminator = 0; NullTerminator < SL->getCharByteWidth();
       ++NullTerminator)
    UpdateCRC('\x00');

  // The literature refers to the process of reversing the bits in the final CRC
  // output as "reflection".
  CRC = llvm::reverseBits(CRC);

  // <encoded-crc>: The CRC is encoded utilizing the standard number mangling
  // scheme.
  Mangler.mangleNumber(CRC);

  // <encoded-string>: The mangled name also contains the first 32 _characters_
  // (including null-terminator bytes) of the StringLiteral.
  // Each character is encoded by splitting them into bytes and then encoding
  // the constituent bytes.
  auto MangleByte = [&Mangler](char Byte) {
    // There are five different manglings for characters:
    // - [a-zA-Z0-9_$]: A one-to-one mapping.
    // - ?[a-z]: The range from \xe1 to \xfa.
    // - ?[A-Z]: The range from \xc1 to \xda.
    // - ?[0-9]: The set of [,/\:. \n\t'-].
    // - ?$XX: A fallback which maps nibbles.
    if (isIdentifierBody(Byte, /*AllowDollar=*/true)) {
      Mangler.getStream() << Byte;
    } else if (isLetter(Byte & 0x7f)) {
      Mangler.getStream() << '?' << static_cast<char>(Byte & 0x7f);
    } else {
      switch (Byte) {
        case ',':
          Mangler.getStream() << "?0";
          break;
        case '/':
          Mangler.getStream() << "?1";
          break;
        case '\\':
          Mangler.getStream() << "?2";
          break;
        case ':':
          Mangler.getStream() << "?3";
          break;
        case '.':
          Mangler.getStream() << "?4";
          break;
        case ' ':
          Mangler.getStream() << "?5";
          break;
        case '\n':
          Mangler.getStream() << "?6";
          break;
        case '\t':
          Mangler.getStream() << "?7";
          break;
        case '\'':
          Mangler.getStream() << "?8";
          break;
        case '-':
          Mangler.getStream() << "?9";
          break;
        default:
          Mangler.getStream() << "?$";
          Mangler.getStream() << static_cast<char>('A' + ((Byte >> 4) & 0xf));
          Mangler.getStream() << static_cast<char>('A' + (Byte & 0xf));
          break;
      }
    }
  };

  // Enforce our 32 character max.
  unsigned NumCharsToMangle = std::min(32U, SL->getLength());
  for (unsigned I = 0, E = NumCharsToMangle * SL->getCharByteWidth(); I != E;
       ++I)
    MangleByte(GetBigEndianByte(I));

  // Encode the NUL terminator if there is room.
  if (NumCharsToMangle < 32)
    for (unsigned NullTerminator = 0; NullTerminator < SL->getCharByteWidth();
         ++NullTerminator)
      MangleByte(0);

  Mangler.getStream() << '@';}



But....i failed to understand how to decode this stuff.

I also found something for gcc here, but no clue how to use or port it
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Ok, i found something easier that seems valid

http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm?tx=14,16
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Newer update with undecorate symbols. I´m currently working on undecorating the literal strings. Once i finish i can also finish the tutorial on how to use. Dont´forget to also dl the necessary dlls from the 1st post

Also, it seems  need to release the allocated strings in DIA_SYMBOL_GET_NAME

Abolut the public symbols...A brief description is like.

Token | Code/Data | target RVA | targetsection:targetoffset |SymbolLenght | Symbol Name Len | Symbol Name Decorated | Undecorated Sym Len | Symbol Name Undecorated | SymIndexID
. Later once i finish this, i can write the proper tutorial

Btw...i´m only retrieving the offset, RVA and section info, so i can compare the contents on those addresses with the info gattered on the pdb. So i can try to correctly interpret when a string is a wchar, wchar_t, ascii, pascal, etc etc
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

OK, 7e6 is working, here's an example from comctl32.pdb
PublicSymbol|1:0|0x0006DADF|1:0x0005704C|229|39:?OnEnableGroupView@CListView@@QAEJ_N@Z|59:public: long __thiscall CListView::OnEnableGroupView(bool)|1

guga

Tks

btw...if you or others have any idea how to decode the string literals, let me know. I read the info on http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm and on the other code i provided from https://llvm.org/svn/llvm-project/cfe/trunk/lib/AST/MicrosoftMangle.cpp but i´m clueless.

I have no idea how to decode those literal strings.
All i barelly understood is that it uses some sort of CRC to encode them. So, things like this:

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
Are interpreted as:

??_C@_ = String token. With this token we can identify if it is a string
1 => 0 = ansi char,  1 = wchar_t (non negative integer)
7 = string len including the null termination byte
CKCOCGAB@ = CRC (The ending @ token is just to identify. All we need is the 1st 8 digits that represents the CRC
?$AAC = what encoded char is that ? AAC represents what ?
?$AAf = idem above AAf is what ?
?$AAm = idem above
?$AA = idem above
?$AA@ = idem above (the last @ is just the encoded ending of the string)...So AA represents what ?

"?$" identifies the encoded char. So they are not computed.

On the damn example above all i understood is that it is a Unicode string with len of 7 words (included the null terminated byte)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

Have a look at http://blogs.msdn.com/b/oldnewthing/. Search for "decorated", there's a fair bit of information.
One thing led to another and I ended up on MSDN - UnDecorateSymbolName. Maybe it will help.