64 bit assembly language step by step

Gunther · May 09, 2013, 10:15:34 PM

No project is created in isolation. This project is certainly no exception.

Assembly language lectures at my university tend to focus on the history and generally conclude with a discussion on the 32 bit mode. Students are introduced to the concepts of 16 bit CPUs with segment registers allowing access to 1 megabyte of internal memory. This is an unnecessary focus on the past. Therefore, I started this semester a course about 32 and 64 bit assembly language programming.

I found the Intel and AMD manuals to be an invaluable resource. They provide details on all the instructions of the CPU. Unfortunately the documents cover 16 bit, 32 bit and 64 bit instructions together which, along with the huge number of instructions, makes it difficult to learn assembly programming from these manuals. For the 32 bit world I've used Paul Carter's PC assembly language book about 32 bit assembly language programming. It's a free PDF file downloadable from his web site and it covers the basics of assembly language and is a great start at 32 bit assembly language.

The 64 bit world is a more complicated. It's nearly impossible to handle Windows, Linux, BSD and MacOS together with a frame program. There's on the one hand the Unix world with the LP64 data model and a lean and clean ABI. On the other hand we've the LLP64 data model under Windows with another ABI. For more details, please check that resource. Moreover, there are new data types which are not congruent in any case inside both worlds.

Therefore I made the Windows frame first (similar to Paul Carter), but I used a stand alone assembly language solution. The students must be able to print different values in an easy way, or dumping out the processor's register content, or dumping out the XMM register content etc. That's exactly what my first example (ex1.asm) does:

Code Select


That is a C string (zero terminated).
32 bit unsigned integer value =  4294967295
32 bit integer value          = -2147483648
64 bit unsigned integer value =  858993459234567
64 bit integer value          = -858993459234567
REAL4 (float) value           =  178.125000
REAL8 (double) value          =  3.1415926535897931

CPU register dump:
------------------

RAX   = 1122334455667788   RBX   = 2233445566778899   RCX = 33445566778899AA
RDX   = 000007FEFE2A2AB0   RDI   = 0000000000000000   RSI = 0000000000000000
RBP   = 000000000012FEB0   R8    = 0000000000403002   R9  = 0000000000000000
R10   = 0000000000000200   R11   = 000007FEFE210000   R12 = 0000000000000000
R13   = 0000000000000000   R14   = 0000000000000000   R15 = 445566778899AABB
RSP   = 000000000012FE90   Flags = 0000000000000206

XMM register dump:
------------------

XMM0  = 00000000000000001122334455667788
XMM1  = 22334455667788992233445566778899
XMM2  = 000000000000000033445566778899AA
XMM3  = 445566778899AABB445566778899AABB
XMM4  = 00000000000000005566778899AABBCC
XMM5  = 66778899AABBCCDD66778899AABBCCDD
XMM6  = 0000000000000000778899AABBCCDDEE
XMM7  = 8899AABBCCDDEEFF8899AABBCCDDEEFF
XMM8  = 000000000000000099AABBCCDDEEFF11
XMM9  = AABBCCDDEEFF1122AABBCCDDEEFF1122
XMM10 = 0000000000000000BBCCDDEEFF112233
XMM11 = CCDDEEFF11223344CCDDEEFF11223344
XMM12 = 0000000000000000DDEEFF1122334455
XMM13 = EEFF112233445566EEFF112233445566
XMM14 = 0000000000000000FF11223344556677
XMM15 = 11223344556677881122334455667788

YMM register dump:
------------------

YMM0  = 0000000000000000000000000000000000000000000000001122334455667788
YMM1  = 2233445566778899223344556677889922334455667788992233445566778899
YMM2  = 00000000000000000000000000000000000000000000000033445566778899AA
YMM3  = 445566778899AABB445566778899AABB445566778899AABB445566778899AABB
YMM4  = 0000000000000000000000000000000000000000000000005566778899AABBCC
YMM5  = 66778899AABBCCDD66778899AABBCCDD66778899AABBCCDD66778899AABBCCDD
YMM6  = 000000000000000000000000000000000000000000000000778899AABBCCDDEE
YMM7  = 8899AABBCCDDEEFF8899AABBCCDDEEFF8899AABBCCDDEEFF8899AABBCCDDEEFF
YMM8  = 00000000000000000000000000000000000000000000000099AABBCCDDEEFF11
YMM9  = AABBCCDDEEFF1122AABBCCDDEEFF1122AABBCCDDEEFF1122AABBCCDDEEFF1122
YMM10 = 000000000000000000000000000000000000000000000000BBCCDDEEFF112233
YMM11 = CCDDEEFF11223344CCDDEEFF11223344CCDDEEFF11223344CCDDEEFF11223344
YMM12 = 000000000000000000000000000000000000000000000000DDEEFF1122334455
YMM13 = EEFF112233445566EEFF112233445566EEFF112233445566EEFF112233445566
YMM14 = 000000000000000000000000000000000000000000000000FF11223344556677
YMM15 = 1122334455667788112233445566778811223344556677881122334455667788

The program is for teaching purposes for my students, but it could be of interest for other programmers too, which would like to have a look into the new and fascinating 64 bit world. It'll work under a 64 bit Windows and is tested with Windows 7 and Windows 8. It won't crash if, for example, AVX isn't available, because it checks the useable instruction sets during run time.

I'll update and replenish the package with new examples from time to time. Here is my to do list:

The assembly language source is made for YASM/NASM. At the moment, I'm writing the variant for jWasm and ml64. The goal is to handle the package with a broad spectrum of available assemblers.
I've used GoLink to build the running EXE and that works fine. I'm sure that PoLink or MS link could do the same job, but I have no clue how to do that. Could someone help out?
Writing the variant for Linux, BSD and MacOS X.
Adding the dump for the old FPU and a dump for memory regions.
Adding more examples for the logical and arithmetic instructions, classic FPU programming, programming the multi media registers, interfacing with HLL etc.

I will anounce the updates inside this thread and upload the new and updated files to the first post of this thread. Suggestions and ideas for improvements are very welcome. Have fun.

Gunther

Antariy · May 09, 2013, 11:02:13 PM

Very good intent, Gunther, thank you for sharing it! It clearly is a very useful project.

sinsi · May 09, 2013, 11:32:04 PM

Prints out YMM15, then crashes

Faulting application name: ex1.exe, version: 0.0.0.0, time stamp: 0x51883147
Faulting module name: ntdll.dll, version: 6.2.9200.16420, time stamp: 0x505ab405
Exception code: 0xc0000005
Fault offset: 0x00000000000115d0

Windows 8 Pro x64, i7 3770K

qWord · May 10, 2013, 12:25:32 AM

There is the problem that the stack is not proper adjusted when calling exit().

Gunther · May 10, 2013, 12:34:24 AM

Hi Alex,

Quote from: Antariy on May 09, 2013, 11:02:13 PM
Very good intent, Gunther, thank you for sharing it! It clearly is a very useful project.

thank you.

Gunther

Gunther · May 10, 2013, 12:38:10 AM

Hi sinsi and qWord,

I see the problem and I've fixed it. The not crashing version is now avalable under the first post of the thread. There was a stack alignment problem. Thank you qWord. :t and thank you for reporting it, sinsi. :t

Gunther

qWord · May 10, 2013, 12:50:49 AM

I think this "fix" should be reworked quickly

Gunther · May 10, 2013, 01:36:12 AM

Hi qWord,

Quote from: qWord on May 10, 2013, 12:50:49 AM
I think this "fix" should be reworked quickly

the problem was in main.asm (stack alignment) and it's over. What else?

Gunther

qWord · May 10, 2013, 01:54:19 AM

Quote from: Gunther on May 10, 2013, 01:36:12 AMthe problem was in main.asm (stack alignment) and it's over. What else?

currently the code does this:

Code Select

push registers
sub rsp,40
pop registers

Gunther · May 10, 2013, 02:31:41 AM

qWord,

Quote from: qWord on May 10, 2013, 01:54:19 AM
Quote from: Gunther on May 10, 2013, 01:36:12 AMthe problem was in main.asm (stack alignment) and it's over. What else?
currently the code does this:
Code Select Expand
push registers sub rsp,40 pop registers

absolutely right, I've simply overlooked that. Thank you. :t

Gunther

Gunther · May 11, 2013, 03:26:18 AM

I've updated the Windows (64 bit version); the new archive for download is now Win64U1.zip. It contains the following:

The folder Win64\jWasm and ml64\ contains the version for jWasm; it should assemble with ml64, too, but that's not tested. Could someone try it?

The directory Win64\NASM and YASM\ contains the NASM and YASM version; it assembles with both assemblers.

Full source and binaries are included. I've to make the following remarks:

I've used GoLink for both packages to generate the running EXE. If one would prefer another linker, one has only to change the appropriate line inside the batch file.
The programs are tested under Windows 7 and 8 (64 bit).
At the moment I'm writing the Unix version.
With a few minor changes I could write a Sol Asm version. Is that necessary?

During the work I've found the following quirk:

Code Select


         movq        xmm0, rax

should write 8 bytes (1 qword) from RAX into the lower half of XMM0. So far so good. That instruction won't work with jWasm and probably with ml64, too. I had instead to write:

Code Select


         movd        xmm0, rax

Of course, the generated machine code (66 48 0F 6E C0) is the same. But the MASM/jWasm mnemonic seems to me a bit bewildering. Anyway it works.

Gunther

qWord · May 11, 2013, 03:46:55 AM

Quote from: Gunther on May 11, 2013, 03:26:18 AMBut the MASM/jWasm mnemonic seems to me a bit bewildering.

they use what AMD and Intel brought in: MOVQ does not allow GPRs.

Gunther · May 11, 2013, 08:30:20 AM

Hi qWord,

Quote from: qWord on May 11, 2013, 03:46:55 AM
they use what AMD and Intel brought in: MOVQ does not allow GPRs.

here is a quote from the Intel Manual, Vol. 2B, p. 4-46:

Quote
MOVQ (when destination operand is XMM register)
DEST[63:0] ← SRC[63:0];
DEST[127:64] ← 0000000000000000H;
DEST[VLMAX-1:128] (Unmodified)
MOVQ (when destination operand is r/m64)
DEST[63:0] ← SRC[63:0];
MOVQ (when source operand is XMM register or r/m64)
DEST ← SRC[63:0];

Gunther

qWord · May 11, 2013, 08:59:48 AM

hi,
with r64 they mean MMX registers:

Quote from: IntelCopies a quadword from the source operand (second operand) to the destination operand (first operand). The source and destination operands can be MMX technology registers, XMM registers, or 64-bit memory locations. This instruction can be used to move a quadword between two MMX technology registers or between an MMX technology register and a 64-bit memory location, or to move data between two XMM registers or between an XMM register and a 64-bit memory location. The instruction cannot be used to transfer data between memory locations.

Gunther · May 11, 2013, 09:09:17 PM

Hi qWord,

Quote from: qWord on May 11, 2013, 08:59:48 AM
hi,
with r64 they mean MMX registers:

no, in the Intel manual 253667.pdf (Volume 2B: Instruction Set Reference, M-Z) is the terminology as follows: r is a general purpose register, mm is a MMX register, xmm is a XMM register and m64 a 64 bit memory refernce.

Anyway, Intel has for movd and movq the same section inside the manual. I couldn't found your quote there, but I found this inside the instruction summary, p. 4-45:

Code Select


MOVQ xmm, r/m64   Move quadword from r/m64 to xmm.
66 0F 7E /r

But we should be very careful. It seems to me that Intel has different sets of manuals. I downloaded a fresh set 3 months ago. By the way, the Intel compiler accepts for it's inline assembler only movq.

Gunther

The MASM Forum

News: