Author Topic: RosAsm PDB Dumper v1.0  (Read 12739 times)

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
RosAsm PDB Dumper v1.0
« on: July 24, 2014, 04:17:28 PM »
A small app that parses information from pdb files.

This version is still in executable form to testing purposes. If someone can test it in other OSes i´ll appreciate it. It works on Xp, didn´t tested it on WinNT, Windows Vista, Windows 7, 8 etc

I´m currently building a tutorial about the exported functions inside, since the app will be turned onto a dll and you will need a tutorial about the parsed information.

After that it will be more easy to anyone build a pdb2inc, for example  :t

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

  • Member
  • *****
  • Posts: 1006
Re: RosAsm PDB Dumper v1.0
« Reply #1 on: July 24, 2014, 04:48:20 PM »
Windows 8.1 Pro x64, nothing happens with any pdb I choose, even running as admin.
Also, the title "C/C++ Comment Remover" is a mistake, yes?
I can walk on water but stagger on beer.

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #2 on: July 24, 2014, 05:51:07 PM »
Yeah. the comment remover is just the title from an older app i built upon this. I forgot to fix the title.

About pdb from windows 8. I have some of those here (and windows7), it seems that since windows7 all the pdb´s contains are the public symbols itself (i.e.:  the variable names)., compiland information, module names and also the FPO data. It does not seems to contains enumeration values, typedefs and UDTs.

I´ll later include the parsing of the public symbols, although it won´t be helpfull information to build a pdb2inc.

Try loading a Xp symbol or anyother you may created in VS2008/2010 and see if it is there. Or even better, can u please see if this symbol i´m uploaling the parser works for on windows 8 ? Here it is parsing correctly all the symbols that contains enums, typedefs and udts

simpleDBGTest.zip - dbg on windosxp
1394bus.zip - result of the CV SDK compiled. I tested this on a windows 7 pdb and there is no enum, udts, typedefs
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #3 on: July 24, 2014, 06:14:16 PM »
Btw...this newer version should display the proper error msg
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #4 on: July 25, 2014, 01:09:25 AM »
New version. Small test to display the public symbols. I´m tired so, it won´t do much except display them and their indexes. No undecorate or displaying their offsets yet..I´ll do it at night.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

  • Member
  • *****
  • Posts: 3517
  • Forgive your enemies, but never forget their names
Re: RosAsm PDB Dumper v1.0
« Reply #5 on: July 25, 2014, 04:08:29 AM »
Hi Gustavo,

your application needs ROSMEM.dll to run properly, doesn't it?

Gunther
Get your facts first, and then you can distort them.

Vortex

  • Member
  • *****
  • Posts: 1733
Re: RosAsm PDB Dumper v1.0
« Reply #6 on: July 25, 2014, 05:48:45 AM »
True. The application needs that DLL.

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #7 on: July 25, 2014, 08:18:34 AM »
Yes. it is located on the 1st post. These other posts (the smaller zip) contains only the updates of the main executable.
On 1st post the zip file contains:
  • DiaDump7b.exe (The older version)
  • msdia120.dll
  • RosMem.dll
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

  • Member
  • *****
  • Posts: 3517
  • Forgive your enemies, but never forget their names
Re: RosAsm PDB Dumper v1.0
« Reply #8 on: July 25, 2014, 08:21:27 AM »
Thank you Erol and Gustavo. I'll test it out.

Gunther
Get your facts first, and then you can distort them.

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #9 on: July 25, 2014, 03:57:00 PM »
OK, guys, i´m suceeding to parse the public symbols and i´m making the proper tokens for that.

One question. How to undecorate a string literal name ?

I mean...when using UnDecorateSymbolName Api (or .__unDNameEx from msvcrt and  others) , it is ok to undecorate functions, structures etc...but,....in what concern strings (Unicode and ascii) it fails badly. All it exports is "string". I wanted the whole string back.

I tried the api DsCrackUnquotedMangledRdn and it also fails.

The only place i found some valid info about Microsoft literal string mangling is here at the function.
void MicrosoftMangleContextImpl::mangleStringLiteral(const StringLiteral *SL, raw_ostream &Out) {. Also, some info for demangling is found here

This function perfectly describes how to encode a string to display things like this:

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
??_C@_17ENEANJDH@?$AA?5?$AA?$CF?$AAs?$AA?$AA@

The question is..how to decrypt it ????

The encryption algorithm from llvm project (link above) for M$ literal string mangling is this:
Code: [Select]
void MicrosoftMangleContextImpl::mangleStringLiteral(const StringLiteral *SL,
                                                     raw_ostream &Out) {
  // <char-type> ::= 0   # char
  //             ::= 1   # wchar_t
  //             ::= ??? # char16_t/char32_t will need a mangling too...
  //
  // <literal-length> ::= <non-negative integer>  # the length of the literal
  //
  // <encoded-crc>    ::= <hex digit>+ @          # crc of the literal including
  //                                              # null-terminator
  //
  // <encoded-string> ::= <simple character>           # uninteresting character
  //                  ::= '?$' <hex digit> <hex digit> # these two nibbles
  //                                                   # encode the byte for the
  //                                                   # character
  //                  ::= '?' [a-z]                    # \xe1 - \xfa
  //                  ::= '?' [A-Z]                    # \xc1 - \xda
  //                  ::= '?' [0-9]                    # [,/\:. \n\t'-]
  //
  // <literal> ::= '??_C@_' <char-type> <literal-length> <encoded-crc>
  //               <encoded-string> '@'
  MicrosoftCXXNameMangler Mangler(*this, Out);
  Mangler.getStream() << "\01??_C@_";

  // <char-type>: The "kind" of string literal is encoded into the mangled name.
  // TODO: This needs to be updated when MSVC gains support for unicode
  // literals.
  if (SL->isAscii())
    Mangler.getStream() << '0';
  else if (SL->isWide())
    Mangler.getStream() << '1';
  else
    llvm_unreachable("unexpected string literal kind!");

  // <literal-length>: The next part of the mangled name consists of the length
  // of the string.
  // The StringLiteral does not consider the NUL terminator byte(s) but the
  // mangling does.
  // N.B. The length is in terms of bytes, not characters.
  Mangler.mangleNumber(SL->getByteLength() + SL->getCharByteWidth());

  // We will use the "Rocksoft^tm Model CRC Algorithm" to describe the
  // properties of our CRC:
  //   Width  : 32
  //   Poly   : 04C11DB7
  //   Init   : FFFFFFFF
  //   RefIn  : True
  //   RefOut : True
  //   XorOut : 00000000
  //   Check  : 340BC6D9
  uint32_t CRC = 0xFFFFFFFFU;

  auto UpdateCRC = [&CRC](char Byte) {
    for (unsigned i = 0; i < 8; ++i) {
      bool Bit = CRC & 0x80000000U;
      if (Byte & (1U << i))
        Bit = !Bit;
      CRC <<= 1;
      if (Bit)
        CRC ^= 0x04C11DB7U;
    }
  };

  auto GetLittleEndianByte = [&Mangler, &SL](unsigned Index) {
    unsigned CharByteWidth = SL->getCharByteWidth();
    uint32_t CodeUnit = SL->getCodeUnit(Index / CharByteWidth);
    unsigned OffsetInCodeUnit = Index % CharByteWidth;
    return static_cast<char>((CodeUnit >> (8 * OffsetInCodeUnit)) & 0xff);
  };

  auto GetBigEndianByte = [&Mangler, &SL](unsigned Index) {
    unsigned CharByteWidth = SL->getCharByteWidth();
    uint32_t CodeUnit = SL->getCodeUnit(Index / CharByteWidth);
    unsigned OffsetInCodeUnit = (CharByteWidth - 1) - (Index % CharByteWidth);
    return static_cast<char>((CodeUnit >> (8 * OffsetInCodeUnit)) & 0xff);
  };

  // CRC all the bytes of the StringLiteral.
  for (unsigned I = 0, E = SL->getByteLength(); I != E; ++I)
    UpdateCRC(GetLittleEndianByte(I));

  // The NUL terminator byte(s) were not present earlier,
  // we need to manually process those bytes into the CRC.
  for (unsigned NullTerminator = 0; NullTerminator < SL->getCharByteWidth();
       ++NullTerminator)
    UpdateCRC('\x00');

  // The literature refers to the process of reversing the bits in the final CRC
  // output as "reflection".
  CRC = llvm::reverseBits(CRC);

  // <encoded-crc>: The CRC is encoded utilizing the standard number mangling
  // scheme.
  Mangler.mangleNumber(CRC);

  // <encoded-string>: The mangled name also contains the first 32 _characters_
  // (including null-terminator bytes) of the StringLiteral.
  // Each character is encoded by splitting them into bytes and then encoding
  // the constituent bytes.
  auto MangleByte = [&Mangler](char Byte) {
    // There are five different manglings for characters:
    // - [a-zA-Z0-9_$]: A one-to-one mapping.
    // - ?[a-z]: The range from \xe1 to \xfa.
    // - ?[A-Z]: The range from \xc1 to \xda.
    // - ?[0-9]: The set of [,/\:. \n\t'-].
    // - ?$XX: A fallback which maps nibbles.
    if (isIdentifierBody(Byte, /*AllowDollar=*/true)) {
      Mangler.getStream() << Byte;
    } else if (isLetter(Byte & 0x7f)) {
      Mangler.getStream() << '?' << static_cast<char>(Byte & 0x7f);
    } else {
      switch (Byte) {
        case ',':
          Mangler.getStream() << "?0";
          break;
        case '/':
          Mangler.getStream() << "?1";
          break;
        case '\\':
          Mangler.getStream() << "?2";
          break;
        case ':':
          Mangler.getStream() << "?3";
          break;
        case '.':
          Mangler.getStream() << "?4";
          break;
        case ' ':
          Mangler.getStream() << "?5";
          break;
        case '\n':
          Mangler.getStream() << "?6";
          break;
        case '\t':
          Mangler.getStream() << "?7";
          break;
        case '\'':
          Mangler.getStream() << "?8";
          break;
        case '-':
          Mangler.getStream() << "?9";
          break;
        default:
          Mangler.getStream() << "?$";
          Mangler.getStream() << static_cast<char>('A' + ((Byte >> 4) & 0xf));
          Mangler.getStream() << static_cast<char>('A' + (Byte & 0xf));
          break;
      }
    }
  };

  // Enforce our 32 character max.
  unsigned NumCharsToMangle = std::min(32U, SL->getLength());
  for (unsigned I = 0, E = NumCharsToMangle * SL->getCharByteWidth(); I != E;
       ++I)
    MangleByte(GetBigEndianByte(I));

  // Encode the NUL terminator if there is room.
  if (NumCharsToMangle < 32)
    for (unsigned NullTerminator = 0; NullTerminator < SL->getCharByteWidth();
         ++NullTerminator)
      MangleByte(0);

  Mangler.getStream() << '@';}


But....i failed to understand how to decode this stuff.

I also found something for gcc here, but no clue how to use or port it
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #10 on: July 25, 2014, 04:59:16 PM »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #11 on: July 25, 2014, 06:01:55 PM »
Newer update with undecorate symbols. I´m currently working on undecorating the literal strings. Once i finish i can also finish the tutorial on how to use. Dont´forget to also dl the necessary dlls from the 1st post

Also, it seems  need to release the allocated strings in DIA_SYMBOL_GET_NAME

Abolut the public symbols...A brief description is like.

Token | Code/Data | target RVA | targetsection:targetoffset |SymbolLenght | Symbol Name Len | Symbol Name Decorated | Undecorated Sym Len | Symbol Name Undecorated | SymIndexID
. Later once i finish this, i can write the proper tutorial

Btw...i´m only retrieving the offset, RVA and section info, so i can compare the contents on those addresses with the info gattered on the pdb. So i can try to correctly interpret when a string is a wchar, wchar_t, ascii, pascal, etc etc
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

  • Member
  • *****
  • Posts: 1006
Re: RosAsm PDB Dumper v1.0
« Reply #12 on: July 25, 2014, 07:36:31 PM »
OK, 7e6 is working, here's an example from comctl32.pdb
Code: [Select]
PublicSymbol|1:0|0x0006DADF|1:0x0005704C|229|39:?OnEnableGroupView@CListView@@QAEJ_N@Z|59:public: long __thiscall CListView::OnEnableGroupView(bool)|1
I can walk on water but stagger on beer.

guga

  • Moderator
  • Member
  • *****
  • Posts: 826
  • Assembly is a state of art.
    • RosAsm
Re: RosAsm PDB Dumper v1.0
« Reply #13 on: July 25, 2014, 08:33:11 PM »
Tks

btw...if you or others have any idea how to decode the string literals, let me know. I read the info on http://www.geoffchappell.com/studies/msvc/language/decoration/strings.htm and on the other code i provided from https://llvm.org/svn/llvm-project/cfe/trunk/lib/AST/MicrosoftMangle.cpp but i´m clueless.

I have no idea how to decode those literal strings.
All i barelly understood is that it uses some sort of CRC to encode them. So, things like this:

??_C@_17CKCOCGAB@?$AAC?$AAf?$AAm?$AA?$AA@
Are interpreted as:

??_C@_ = String token. With this token we can identify if it is a string
1 => 0 = ansi char,  1 = wchar_t (non negative integer)
7 = string len including the null termination byte
CKCOCGAB@ = CRC (The ending @ token is just to identify. All we need is the 1st 8 digits that represents the CRC
?$AAC = what encoded char is that ? AAC represents what ?
?$AAf = idem above AAf is what ?
?$AAm = idem above
?$AA = idem above
?$AA@ = idem above (the last @ is just the encoded ending of the string)...So AA represents what ?

"?$" identifies the encoded char. So they are not computed.

On the damn example above all i understood is that it is a Unicode string with len of 7 words (included the null terminated byte)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

sinsi

  • Member
  • *****
  • Posts: 1006
Re: RosAsm PDB Dumper v1.0
« Reply #14 on: July 25, 2014, 09:01:34 PM »
Have a look at http://blogs.msdn.com/b/oldnewthing/. Search for "decorated", there's a fair bit of information.
One thing led to another and I ended up on MSDN - UnDecorateSymbolName. Maybe it will help.
I can walk on water but stagger on beer.