News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Structure conversion tool version beta 3

Started by hutch--, July 29, 2016, 11:53:51 PM

Previous topic - Next topic

hutch--

This version is a lot closer to being a release version. It handles embedded unions with further embedded structures reasonably well but still has a problem with named embedded unions and structures. It seems to handle the anonymous unions and structures correctly in most instances. The results are indented to make them closer to human readable and easier to edit if there is a problem. The tool is designed to produce MASM and compatible structures so they should work OK with JWASM and the later forks.

This is an example of a structure that does not fully convert properly.

typedef struct _FILE_REMOTE_PROTOCOL_INFO
{
    // Structure Version
    USHORT StructureVersion;     // 1
    USHORT StructureSize;        // sizeof(FILE_REMOTE_PROTOCOL_INFO)
   
    DWORD  Protocol;             // Protocol (WNNC_NET_*) defined in wnnc.h or ntifs.h.
   
    // Protocol Version & Type
    USHORT ProtocolMajorVersion;
    USHORT ProtocolMinorVersion;
    USHORT ProtocolRevision;
   
    USHORT Reserved;
   
    // Protocol-Generic Information
    DWORD  Flags;
   
    struct {
        DWORD Reserved[8];
    } GenericReserved;

    // Protocol specific information
   
    struct {
        DWORD Reserved[16];
    } ProtocolSpecificReserved;
   
} FILE_REMOTE_PROTOCOL_INFO, *PFILE_REMOTE_PROTOCOL_INFO;

When converted it looks like this.

  FILE_REMOTE_PROTOCOL_INFO STRUCT QWORD
    StructureVersion dw ?
    StructureSize dw ?
    Protocol dd ?
    ProtocolMajorVersion dw ?
    ProtocolMinorVersion dw ?
    ProtocolRevision dw ?
    Reserved dw ?
    Flags dd ?
      STRUCT
      Reserved dd 8 dup (?)
      ENDS ; GenericReserved
      STRUCT
      Reserved dd 16 dup (?)
      ENDS ; ProtocolSpecificReserved
  FILE_REMOTE_PROTOCOL_INFO ENDS

The problem is with named internal unions and structures is that the name is in the wrong place, it should be after the STRUCT or UNION statement, not after the ENDS statement. With the indenting it should be reasonably intuitive to see when the name needs to be moved to.

This is what it should look like when its been edited.

  FILE_REMOTE_PROTOCOL_INFO STRUCT QWORD
    StructureVersion dw ?
    StructureSize dw ?
    Protocol dd ?
    ProtocolMajorVersion dw ?
    ProtocolMinorVersion dw ?
    ProtocolRevision dw ?
    Reserved dw ?
    Flags dd ?
      STRUCT GenericReserved
        Reserved dd 8 dup (?)
      ENDS
      STRUCT ProtocolSpecificReserved
        Reserved dd 16 dup (?)
      ENDS
  FILE_REMOTE_PROTOCOL_INFO ENDS

A very wide range of structures will not need to be edited but there is still potential problems in missing data types as the Microsoft headers often cook their own on the fly with #define statements. The default if a data type is not recognised is to write it as a structure with the trailing "<>" so with an unknown data type there will be an error.

Tools of this type are tedious bastards of things to get working as a C compiler reads structures as linear scans backed up with large data structures that hold equates and previous #define data which means you would have to write a C compiler front end.

habran

Quote from: hutch-- on July 29, 2016, 11:53:51 PM
Tools of this type are tedious bastards of things to get working as a C compiler reads structures as linear scans backed up with large data structures that hold equates and previous #define data which means you would have to write a C compiler front end.
This is why I am interested in precompiled headers. It would be great if we had some compiler front end so that we could just use C headers with .h extension. If I knew the structure of precompiled headers I would be able to make HJWasm able to work with it. That would solve the hassle with translating headers to include files.
That would make assembler more acceptable to programmers. I've been thinking of using Visual Studio for creating  precompiled headers first and than use them to run HJWasm in the same program.
Cod-Father

hutch--

habran,

I wonder if open source code like GCC would have the data you would need to read C header files. While I have no doubt you can write your own, getting the basic logic of what is done would make the task a lot easier. What I have done to get a wide range of structures is get the list of include files from "windows.h", copy them together in the order they are listed in so I have one single file to work on. I added a couple at the end, the common control and common dialog and by scanning the entire file I get about 1250 structures in their original C format minus the commenting and blank lines.

If I have the order right, to read the C header files you need to do the conditional parts first, #ifdef/#ifndef and clean out any junk (#ifndef __MAC etc..) you don't need, then load the #define statements into a data structure, then the structures, unions and enums.

jj2007

Hutch,

Great work :t

I wonder what would the best and clearest way to make them usable for both 64- and 32-bit code. In the other thread, there is a list of the 300+ types that a C compiler uses. For assembler, how many would we really need? 4 or 5??

32/64-bit
1/1    BYTE or CHAR
2/2    SHORT, USHORT
4/4    LONG, ULONG
8/8    LONGLONG, ULONGLONG
4/8    HANDLE
4/8    POINTER

hutch--

After working on these things for years, you may understand why I go for the fixed data sizes.

    BYTE
    WORD
    DWORD
    QWORD
    XMMWORD

I know what you are after but its a nightmare to get it going.

I use a one pass hash table word replace for the data types so it reasonably clean to just add word pairs for anything you want changed from C/C++ data types to generic ASM data sizes.

LiaoMi

Hello,

Quote from: hutch-- on July 29, 2016, 11:53:51 PM
This version is a lot closer to being a release version. It handles embedded unions with further embedded structures reasonably well but still has a problem with named embedded unions and structures. It seems to handle the anonymous unions and structures correctly in most instances. The results are indented to make them closer to human readable and easier to edit if there is a problem. The tool is designed to produce MASM and compatible structures so they should work OK with JWASM and the later forks.

This is an example of a structure that does not fully convert properly.

typedef struct _FILE_REMOTE_PROTOCOL_INFO
{
    // Structure Version
    USHORT StructureVersion;     // 1
    USHORT StructureSize;        // sizeof(FILE_REMOTE_PROTOCOL_INFO)
   
    DWORD  Protocol;             // Protocol (WNNC_NET_*) defined in wnnc.h or ntifs.h.
   
    // Protocol Version & Type
    USHORT ProtocolMajorVersion;
    USHORT ProtocolMinorVersion;
    USHORT ProtocolRevision;
   
    USHORT Reserved;
   
    // Protocol-Generic Information
    DWORD  Flags;
   
    struct {
        DWORD Reserved[8];
    } GenericReserved;

    // Protocol specific information
   
    struct {
        DWORD Reserved[16];
    } ProtocolSpecificReserved;
   
} FILE_REMOTE_PROTOCOL_INFO, *PFILE_REMOTE_PROTOCOL_INFO;

When converted it looks like this.

  FILE_REMOTE_PROTOCOL_INFO STRUCT QWORD
    StructureVersion dw ?
    StructureSize dw ?
    Protocol dd ?
    ProtocolMajorVersion dw ?
    ProtocolMinorVersion dw ?
    ProtocolRevision dw ?
    Reserved dw ?
    Flags dd ?
      STRUCT
      Reserved dd 8 dup (?)
      ENDS ; GenericReserved
      STRUCT
      Reserved dd 16 dup (?)
      ENDS ; ProtocolSpecificReserved
  FILE_REMOTE_PROTOCOL_INFO ENDS

The problem is with named internal unions and structures is that the name is in the wrong place, it should be after the STRUCT or UNION statement, not after the ENDS statement. With the indenting it should be reasonably intuitive to see when the name needs to be moved to.

This is what it should look like when its been edited.

  FILE_REMOTE_PROTOCOL_INFO STRUCT QWORD
    StructureVersion dw ?
    StructureSize dw ?
    Protocol dd ?
    ProtocolMajorVersion dw ?
    ProtocolMinorVersion dw ?
    ProtocolRevision dw ?
    Reserved dw ?
    Flags dd ?
      STRUCT GenericReserved
        Reserved dd 8 dup (?)
      ENDS
      STRUCT ProtocolSpecificReserved
        Reserved dd 16 dup (?)
      ENDS
  FILE_REMOTE_PROTOCOL_INFO ENDS

A very wide range of structures will not need to be edited but there is still potential problems in missing data types as the Microsoft headers often cook their own on the fly with #define statements. The default if a data type is not recognised is to write it as a structure with the trailing "<>" so with an unknown data type there will be an error.

Tools of this type are tedious bastards of things to get working as a C compiler reads structures as linear scans backed up with large data structures that hold equates and previous #define data which means you would have to write a C compiler front end.

this is what generates throws an editasm (http://luce.yves.pagesperso-orange.fr/Editmasm.htm)

FILE_REMOTE_PROTOCOL_INFO STRUCT DEFALIGNMASM
StructureVersion WORD ? ; 1
StructureSize WORD ? ; sizeof(FILE_REMOTE_PROTOCOL_INFO)
Protocol DWORD ? ; Protocol (WNNC_NET_*) defined in wnnc.h or ntifs.h.
ProtocolMajorVersion WORD ?
ProtocolMinorVersion WORD ?
ProtocolRevision WORD ?
Reserved WORD ?
Flags DWORD ?
STRUCT GenericReserved
Reserved DWORD 8 dup (?)
ENDS
STRUCT ProtocolSpecificReserved
Reserved DWORD 16 dup (?)
ENDS
FILE_REMOTE_PROTOCOL_INFO ENDS


Many times I have used this converter from ToutEnMasm and so far there have been no errors  :t

Quote from: habran on July 30, 2016, 08:15:47 AM
Quote from: hutch-- on July 29, 2016, 11:53:51 PM
Tools of this type are tedious bastards of things to get working as a C compiler reads structures as linear scans backed up with large data structures that hold equates and previous #define data which means you would have to write a C compiler front end.
This is why I am interested in precompiled headers. It would be great if we had some compiler front end so that we could just use C headers with .h extension. If I knew the structure of precompiled headers I would be able to make HJWasm able to work with it. That would solve the hassle with translating headers to include files.
That would make assembler more acceptable to programmers. I've been thinking of using Visual Studio for creating  precompiled headers first and than use them to run HJWasm in the same program.

Creating Precompiled Header Files - https://docs.microsoft.com/en-us/cpp/build/reference/creating-precompiled-header-files
PreCompiled header Tool - to automatically generate precompiled headers (stdafx.h) files, powered by boost wave (Pct (PreCompiled header tool) aims to be a bag of tools to help reducing and analysing C/C++ compilation times. There is only one tool for now, extractheaders.) - https://github.com/g-h-c/pct and https://github.com/g-h-c/pct/releases
Precompiled Header (PCH) issues and recommendations 2017 - https://blogs.msdn.microsoft.com/vcblog/2017/07/13/precompiled-header-pch-issues-and-recommendations/

QuoteThere are very few decent C++ refactoring tools because parsing C++ code is hard (and therefore also slow). You'll probably have to write such a tool yourself, possibly with some assistance from GCC-XML.

Precompiled Header Refactoring Tool - Optimizes precompiled headers for a C++ projects.
The extension analyzes files included in your project's source files and runs a configurable set of rules on that data. It will then produce a list of recommended headers which you can choose to add to your (new or existing) precompiled header (PCH). In addition, the extension modifies compiler settings in your project to ensure they are set up correctly to use your PCH.

https://web.archive.org/web/20150912033755/https://visualstudiogallery.msdn.microsoft.com/e6ca02bb-1c89-4ccc-bada-fb772cf122bd
Download v0.8https://web.archive.org/web/20150912033755/https://visualstudiogallery.msdn.microsoft.com/e6ca02bb-1c89-4ccc-bada-fb772cf122bd/file/154184/2/PCHRecommendationPackage.vsix
Precompiled Header Refactoring Tool (cmd) https://www.file-upload.net/download-12670121/pchVersion3.1.7z.html

From the whole analytical overview, simple conversion of structures is easier to perform and easier to process code later. Does it make sense to think about this, when even the Microsoft itself does not support all Visual Studio universally?

hutch--

#6
I am much of the view that the guys writing assemblers need to design a C compiler front end that simply uses the C/C++ header files as they are for data like structures and equates as this would solve the messy problem of accurately converting C/C++ data into a form that an assembler can use. Over time I have used numerous tools, always self written, to extract as much information as possible from the M$ header files but over time they get harder and harder to parse which I suggest is an intentional act by Microsoft to make the data difficult to access by anything else apart from their own C/C++ compilers.

There is an option in CL.EXE where you can output all of the header file data by redirection to a text file and this strips all of the junk out of the structure definitions but you still must deal with the size of the data types to get a clear conversion. This is where an assembler that could directly read the C/C++ header files would have a great advantage.

nidud

#7
deleted

hutch--

 :biggrin:

Different approach.
Quote
You skipped the most relevant type thought:

    PTR

I suggested typedefs instead of hard-coded types for the conversion of the MASM32 headers. This way you wouldn't need a set of new header files.

A pointer is set to the native data size of the OS version, 32 bit or 64 bit. It is exactly the quagmire of phony data types that I have always avoided and after years of converting C/C++ header files, I know exactly why. I don't need to convert the MASM32 headers, they already work fine. The 64 bit versions I am using are different, started with Vasily's, rewrote a vast amount of it, produce a complete reliable set of include files for API functions and have most structures and equates that are useful.

I have no desire to emulate a C compiler, something that many have tried to push assembler coding into but a processor only understands how big a piece of data is and I stick to what the processor requires, not a phony naming convention.