No project is created in isolation. This project is certainly no exception.
Assembly language lectures at my university tend to focus on the history and generally conclude with a discussion on the 32 bit mode. Students are introduced to the concepts of 16 bit CPUs with segment registers allowing access to 1 megabyte of internal memory. This is an unnecessary focus on the past. Therefore, I started this semester a course about 32 and 64 bit assembly language programming.
I found the Intel and AMD manuals to be an invaluable resource. They provide details on all the instructions of the CPU. Unfortunately the documents cover 16 bit, 32 bit and 64 bit instructions together which, along with the huge number of instructions, makes it difficult to learn assembly programming from these manuals. For the 32 bit world I've used Paul Carter's PC assembly language book about 32 bit assembly language programming. It's a free PDF file downloadable from his web site and it covers the basics of assembly language and is a great start at 32 bit assembly language.
The 64 bit world is a more complicated. It's nearly impossible to handle Windows, Linux, BSD and MacOS together with a frame program. There's on the one hand the Unix world with the LP64 data model and a lean and clean ABI. On the other hand we've the LLP64 data model under Windows with another ABI. For more details, please check that
resource. Moreover, there are new data types which are not congruent in any case inside both worlds.
Therefore I made the Windows frame first (similar to Paul Carter), but I used a stand alone assembly language solution. The students must be able to print different values in an easy way, or dumping out the processor's register content, or dumping out the XMM register content etc. That's exactly what my first example (ex1.asm) does:
That is a C string (zero terminated).
32 bit unsigned integer value = 4294967295
32 bit integer value = -2147483648
64 bit unsigned integer value = 858993459234567
64 bit integer value = -858993459234567
REAL4 (float) value = 178.125000
REAL8 (double) value = 3.1415926535897931
CPU register dump:
------------------
RAX = 1122334455667788 RBX = 2233445566778899 RCX = 33445566778899AA
RDX = 000007FEFE2A2AB0 RDI = 0000000000000000 RSI = 0000000000000000
RBP = 000000000012FEB0 R8 = 0000000000403002 R9 = 0000000000000000
R10 = 0000000000000200 R11 = 000007FEFE210000 R12 = 0000000000000000
R13 = 0000000000000000 R14 = 0000000000000000 R15 = 445566778899AABB
RSP = 000000000012FE90 Flags = 0000000000000206
XMM register dump:
------------------
XMM0 = 00000000000000001122334455667788
XMM1 = 22334455667788992233445566778899
XMM2 = 000000000000000033445566778899AA
XMM3 = 445566778899AABB445566778899AABB
XMM4 = 00000000000000005566778899AABBCC
XMM5 = 66778899AABBCCDD66778899AABBCCDD
XMM6 = 0000000000000000778899AABBCCDDEE
XMM7 = 8899AABBCCDDEEFF8899AABBCCDDEEFF
XMM8 = 000000000000000099AABBCCDDEEFF11
XMM9 = AABBCCDDEEFF1122AABBCCDDEEFF1122
XMM10 = 0000000000000000BBCCDDEEFF112233
XMM11 = CCDDEEFF11223344CCDDEEFF11223344
XMM12 = 0000000000000000DDEEFF1122334455
XMM13 = EEFF112233445566EEFF112233445566
XMM14 = 0000000000000000FF11223344556677
XMM15 = 11223344556677881122334455667788
YMM register dump:
------------------
YMM0 = 0000000000000000000000000000000000000000000000001122334455667788
YMM1 = 2233445566778899223344556677889922334455667788992233445566778899
YMM2 = 00000000000000000000000000000000000000000000000033445566778899AA
YMM3 = 445566778899AABB445566778899AABB445566778899AABB445566778899AABB
YMM4 = 0000000000000000000000000000000000000000000000005566778899AABBCC
YMM5 = 66778899AABBCCDD66778899AABBCCDD66778899AABBCCDD66778899AABBCCDD
YMM6 = 000000000000000000000000000000000000000000000000778899AABBCCDDEE
YMM7 = 8899AABBCCDDEEFF8899AABBCCDDEEFF8899AABBCCDDEEFF8899AABBCCDDEEFF
YMM8 = 00000000000000000000000000000000000000000000000099AABBCCDDEEFF11
YMM9 = AABBCCDDEEFF1122AABBCCDDEEFF1122AABBCCDDEEFF1122AABBCCDDEEFF1122
YMM10 = 000000000000000000000000000000000000000000000000BBCCDDEEFF112233
YMM11 = CCDDEEFF11223344CCDDEEFF11223344CCDDEEFF11223344CCDDEEFF11223344
YMM12 = 000000000000000000000000000000000000000000000000DDEEFF1122334455
YMM13 = EEFF112233445566EEFF112233445566EEFF112233445566EEFF112233445566
YMM14 = 000000000000000000000000000000000000000000000000FF11223344556677
YMM15 = 1122334455667788112233445566778811223344556677881122334455667788
The program is for teaching purposes for my students, but it could be of interest for other programmers too, which would like to have a look into the new and fascinating 64 bit world. It'll work under a 64 bit Windows and is tested with Windows 7 and Windows 8. It won't crash if, for example, AVX isn't available, because it checks the useable instruction sets during run time.
I'll update and replenish the package with new examples from time to time. Here is my to do list:
- The assembly language source is made for YASM/NASM. At the moment, I'm writing the variant for jWasm and ml64. The goal is to handle the package with a broad spectrum of available assemblers.
- I've used GoLink to build the running EXE and that works fine. I'm sure that PoLink or MS link could do the same job, but I have no clue how to do that. Could someone help out?
- Writing the variant for Linux, BSD and MacOS X.
- Adding the dump for the old FPU and a dump for memory regions.
- Adding more examples for the logical and arithmetic instructions, classic FPU programming, programming the multi media registers, interfacing with HLL etc.
I will anounce the updates inside this thread and upload the new and updated files to the first post of this thread. Suggestions and ideas for improvements are very welcome. Have fun.
Gunther