News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Converting HLL to assembler

Started by prino, July 30, 2014, 12:50:12 AM

Previous topic - Next topic

prino

Sometime ago, well actually well over 20 years ago, the time-stamp of the oldest files that I still have are 1994-03-26, I wrote a tiny Turbo Pascal (V3.01) program to help me to process some of my hitchhike statistics. The program was tiny, just a 49kb .COM file, and did what it had to do. In the years that followed it slowly grew, because I added more and more tables, (Driver: "Have you ever calculated this or that?" Me: "No, but it might be interesting...")  and the last (60th) Turbo Pascal V3.01 version, with files dated @ 1996-08-05, came in at a hefty 54kb. :)

From version 46 (1995-07-30) the code contained lots of inline (the old type inline, with hexadecimal codes) code, and from version 51 (1995-10-13) yours truly started adding 0x66 (386) prefixes to it.

Then I got hold of Turbo Pascal 6.0 and given that my carefully byte counted inline code sometimes crossed actual Pascal code, it no longer ran, and I had to convert all inline code back to normal Pascal. Ouch, ouch, ouch...

The last (53rd) Turbo Pascal V6 version saw the light on 2008-10-06, and the reason for abandoning Turbo Pascal was the fact that it could no longer cope with the amount of data, at the time I had hitched 2,206 rides and the input file had reached a size of 502kb, too much to handle in the 16-bit DOS environment, even using using a special CONFIG.SYS menu and AUTOEXEC.BAT that left most of memory free, and a unit that extended the TP heap into unused UMBs. (I do have BP 7, but never considered using it)

I had to change again, and having been a lifelong use of Pascal, I choose to go for Virtual Pascal, a 32-bit clone of Turbo/Borland Pascal. I did think about FreePascal at the time, and I actually gave it a try, but in the end (don't ask me for the then "Why?") it didn't cut it, and having tried it again several times in the past few years, it still doesn't cut it, the problem *seems* to be that FP uses a different philosophy with regards to alignment of data, which even causes the "pure" Pascal (see below) version to generate code that eventually results in a GPF.

Obviously, having recovered from the TP3 inline() fiasco, I had made sure that nothing of this nature would ever occur again, by keeping two parallel versions of the TP6 source, a "pure" Pascal one, and one with inline assembler, and other than having to change a few registers in one not entirely "pure" Pascal routine, some integers to longints and, non-trivial, all Borland 6-byte reals to IEEE doubles, the first Virtual Pascal version was nearly identical to the last TP6 version.

Besides the fact that VP is a 32-bit compiler, which means I will never run out of space again, it's a real clone of the Borland software, which means that the generated code isn't, to say it politely, much better than the TP6 (AD 1990) code, but unlike the Borland compilers, it actually produces an assembler listing that shows the generated code and as a result I've spent quite a bit of time over the last few years converting code from Pascal to inline assembler. I let the compiler do the heavy lifting when it comes to new code, but once the Pascal code is working, the routine is duplicated between conditional compilation tags, and, using the original assembler listing as a starting point, converted to inline assembler, which in many cases now only vaguely resembles the original code, I try to keep data in registers, avoid the use of the VP RTL where possible, and if ebx, esi and edi, the inter-procedural to-be-saved registers aren't used in the (usually no deeper than two level) calling chain, I will remove the saving and restoring of them. The result of all of this fiddling? A program that is around 35% faster and nearly 25% smaller that the pure Pascal version... In figures, with the running time averaged over 15 runs?

Inline assembler version:  78,336 bytes, running time 0:00:00.507 sec
Pure Pascal version     : 103,424 bytes, running time 0:00:00.787 sec

Yes, all this work for a program that runs in less than a second...

I would now like to take things one step further, and go for a full assembler version, and one way of doing that would be to just take the code out of the VP RTL and paste it into my source as in-line assembler, but that still leaves me with all the start-up and finishing code that is so handily supplied by the compiler.

Any pointers from someone who's done a similar conversion in the past, and would like to share the potential pitfalls?

And please don't tell me it's an exercise in futility. I already know it is, but I'm just curious to see how fast and/or small such a version can be made, as, to name but one area ready for improvement, there are quite a few places in the code where the parallelism of SSE could be used.

For what it's worth, a link to the code can be found via the link on may page on Hitchwiki, it's on Yahoo! groups.
Robert AH Prins
robert.ah.prins @ the.15+Gb.Google thingy
No programming here :)

jj2007

Hi Robert,

Welcome to the Forum  :icon14: (first things first)

Conversions at that low level are always tricky. An intelligent parser can do miracles, but nonetheless you will pass a lot of time in front of the debugger watching it crash...

Consider rewriting the whole proggie. Masm32 has powerful library functions for file I/O and many other things, qWord has a great numeric library, and MasmBasic resembles Pascal a little bit (and it has a lot of SSE2 built in...). Much of what your code did in the past could be a lot easier nowadays.

I suggest you post some example code. Keep it short, 100 lines or so, that increases chances to get a useful answer.

dedndave

frankly, it's a lot more work to convert a 16-bit program to a 32-bit program,
than it is to start fresh and write a new 32-bit program

show us what the program does - maybe some screen-shots would be helpful
show us how the data file is organized

and we can give you tips and pointers - maybe even get you started


jj2007

Robert,

Re Data format used by Prino's current program: Is that still csv, or something else? Can you zip up a typical input file and post it here (->Attachments and other options)?

dedndave

looks like a good place for a rich listview, Jochen ?

i was thinking you could add a map with lines to plot the journey legs   ;)

prino

I'm dreadfully sorry for this very, very, very long overdue reply, but the past year has been rather hard in all kinds of ways, and other than actually removing a few more bugs, I've not done much about the whole project.  :(

As for the data, Virtual Pascal programs, and the assembler files generated by the compiler for the "pure" Pascal and "inline-assemblerized" (That's not a word...), they are, for those who might be interested, available on my ftp site, ftp://prino.selfip.org

Log in as anonymous and enter whatever you want as password.

The three files are:


  • lift32bit.rar - the VP sources and supporting VP files, plus the resulting executables
  • liftdat.rar - the input data
  • assembler.rar - the VP generated assembler listings for the "pure" Pascal and "inline-assemblerized" versions, only comparable if you cut out specific procedures and remove all comments.
Please note that the ftp site may appear and disappear if and when my PC goes into hibernation, and from 2015-04-230 it will be gone for at least two weeks.

The programs are pure command line ones, so not much to see in the way of screenshots.

As for the input data, it is pure text (in the ancient IBM 437 codepage) in a CSV format, but contains non CSV meta-data and frankly it's a right-royal mess due to adding ever more requirements over the years, using ".", ":" and "x" in time values to indicate certain conditions isn't really a good thing to do.

Using XML or jason? Why do you think I've rewritten most of the programs using in-line assembler? Even in this day and age of multi-terabyte disks and PCs with RAM that is measured in GB (16Gb in my PC, 24Gb in my notebook), I like to keep things tiny.
Robert AH Prins
robert.ah.prins @ the.15+Gb.Google thingy
No programming here :)

Gunther

Hi Robert,

I'm not sure if this is helpful. But the gcc (GNU compiler collection) can compile Pascal. If that's done, you can use simply the -S switch and voilla your assembly language source is ready to use.

Gunther
You have to know the facts before you can distort them.

prino

#8
I don't want to touch GNU Pascal with a barge pole, and it's for all intents and purposes dead. Virtual Pascal is a near perfect clone of Turbo Pascal, and suffers from the same problems, it's not an optimizing compiler. It gives me a great assembler listing in TASM format, and that's how I converted the rather less than optimal (let's stay friendly) code it generated into far more optimal inline assembler.

As mentioned in the initial post, I've also tried FreePascal. It also produces assembler listings, is supposed to be more optimizing, but I've given up on it as it's not even able to compile the pure Pascal version of the program due to "funny" variable alignment rules.

I've got about two dozen of RTL routines left over, most of them having to do with I/O. Theoretically I can get rid of most of these and move my output direct into the file buffers as these are way bigger than the actual output files. The bigger problems are the float to string routine, the routines that parse the command line, and the memory management, most of the data is held in linked lists and dynamic arrays on the heap, and the real I/O that actually writes the data out to disk.

Robert AH Prins
robert.ah.prins @ the.15+Gb.Google thingy
No programming here :)

Gunther

Robert,

Quote from: prino on April 27, 2015, 07:41:00 AM
I don't want to touch GNU Pascal with a barge pole, and it's for all intents and purposes dead. Virtual Pascal is a near perfect clone of Turbo Pascal, and suffers from the same problems, it's not an optimizing compiler. It gives me a great assembler listing in TASM format, and that's how I converted the rather less than optimal (let's stay friendly) code it generated into far more optimal inline assembler.

sure. Pascal isn't longer on vogue. Fashion languages are C++, C# or even Java. But if you have an assembler output by whatever compiler: What more do you want?

Gunther
You have to know the facts before you can distort them.