The background for this post is this
old thread. The program was inspired by Dr. Paul Carter's book. It was written, assembled and linked under Ubuntu 16. I needed a framework so that students who are currently learning assembler can look at the registers in the processor and the contents of the floating point unit in their programs. So far so good.
You can of course do all the outputs through system calls. Then, however, routines must be written for output of all possible data types (strings, different integer types, different floating point numbers etc.). That works fine, but I was too lazy for that. That's why I liked Carter's idea of linking the assembly language code against libc and getting printf to do the job, for example. This works even under newer Linux distributions of Ubuntu or Debian and the application runs fine. The tricky problem is, for example, you can still assemble with Ubuntu 17 but no longer link the object modules against libc.
That has to do with position independent code and data. The precondition when writing 64-bit Linux assembly code is that the symbol must be "local" (i.e. not global), otherwise the linker will complain, a bit like this:
relocation R_X86_64_PC32 against symbol 'hello' can not be used when making a shared object; recompile with -fPIC
final link failed: Bad value
That's pretty tricky. The simple idea behind this is: It's possible to make the code section of the application truly position independent, in the sense that it can be easily mapped into different memory addresses without needing to change one bit. This is the default behavior among many 64-bit Linux distributions (Ubuntu, Debian, Gentoo, Fedora), especially since the advent of the
Gold Linker. It was written by Lance Taylor and he is a huge fan of that described technique.
But the main reason for this is that Position Independent Executables (PIE) are an output of the hardened package build process. A PIE binary and all of its dependencies are loaded into random locations within virtual memory each time the application is executed. This makes Return Oriented Programming (ROP) attacks much more difficult to execute reliably. This is the
address space layout randomization (ASLR). I will not discuss the various attack techniques (Buffer overflow, Return into libc, Stack buffer overflow etc.) in detail here, because that's forbidden by the forum rules. I just want to mention that all the effort is being made to complicate such attacks by Script Kiddies. We have a very serious background here.
What does this mean for a programmer who wants to link his assembly language code against libc or libm? Here's what I learned about it in the last few days. I found some important pieces of the puzzle, but others are still missing. My search process is far from finished.
In a gross generalization, I think there are two basic types indirect addressing: by PC-relative offset (i.e. by an offset from the current address of the instruction pointer), or through a pointer. There are two ELF constructs which assist in locating the run time addresses of symbols (function addresses as well): the Global Offset Table (GOT) and the Procedure Linkage Table (PLT). I'm sure you've already heard of them. Every newly started process gets a GOT and a PLT in the Unix world. While almost all position-independent 32-bit code will reference to the symbol _GLOBAL_OFFSET_TABLE_ (__GLOBAL_OFFSET_TABLE_ with double underscore under BSD) at some point, 64-bit code can use their relocation types to address values in the GOT, so you typically won't see the symbol being used in assembly language code as frequently. The address of the GOT is within the read-only segment.
The operation of the PLT is one of the better-documented areas of an ELF binary. Suffice it to say that on a lazy-linked function's initial invocation, the instruction pointer jumps using an address in the PLT to the dynamic linker — which changes the contents at that PLT address to be the real address of the function — and then jumps again to that target function (all the jumping is important as the target function's eventual ret statement will still find the original return address on the stack). Subsequent invocations no longer detour via the dynamic linker but jump from the PLT directly to the target function. So the PLT is a kind of a trampoline.
All this now means the following. You can't write, for example:
mov rax, [myVar]
That won't work no longer. Instead, the variables must be addressed via the GOT and the functions must be called via the PLT. That's the difficulty.
My first attempts were with YASM. The manual describes the WRT operator together with the PIC relocation types:
..gotpcrel, ..got, ..plt, ..sym
But it does not work. I could link the object module against libc, but there was a segmentation fault. In addition, YASM is apparently no longer maintained. The last release is from 2014. Yasm is not up to date anymore.
Next try with NASM. The manual knows a bit about GOT but nothing about
..gotpcrel
Bad luck.
The last salvation was the GNU assembler (GAS). GAS is tough, relentless and forgives no mistake. But it is well maintained, up to date and does a good job in the end. Here is the result. The attached ZIP file contains the sources and 64-bit binaries of the application. There are 3 sources:
- main.s is the frame of the application.
- asmio.s calls the different libc functions with the appropriate parameters.
- example.s prints out some values and makes a dump of the CPU registers.
Here is the output of the program:
That's a C string (zero terminated).
32 bit unsigned integer value = 4294967295
32 bit integer value = -2147483648
64 bit unsigned integer value = 858993459234567
64 bit integer value = -858993459234567
REAL4 (float) value = 1.730000
REAL8 (double) value = 3.1415926535897931
CPU register dump:
------------------
RAX = 1122334455667788 RBX = 0000000000000000 RCX = 2233445566778899
RDX = 33445566778899AA RDI = 00007F70DB3D2720 RSI = 0000000000000000
RBP = 00007FFC994EB5E8 R8 = 445566778899AABB R9 = 5566778899AABBCC
R10 = 66778899AABBCCDD R11 = 778899AABBCCDDEE R12 = 000055791CEFE540
R13 = 00007FFC994EB730 R14 = 0000000000000000 R15 = 8899AABBCCDDEEFF
RSP = 00007FFC994EB5C0 Flags = 0000000000000206
However, it takes strong nerves to read the source code. You will not find any clever hll constructs there, just mnemonics. Moreover, parts of the source are written in Intel syntax, but other parts in the archaic AT&T syntax. The reason is simple. I have no idea how to write something like that:
movq msg1@GOTPCREL(%rip), %rsi # get C string address
call print_Cstring@PLT # print_Cstring
in Intel syntax.
I have written to the maintainers of the binutils (GAS is a part of it), how can one write that in Intel syntax. That would be a big step forward, but so far no answer. I also wrote Frank Kotler (NASM developer) and asked for information, no answer so far. I will contact Hans Peter Anvin (chief maintainer of NASM), maybe he can help.
I hope that Branislav and johnsa know a way how to do that with UASM. The big advantage of JWASM was that it works on different operating systems (DOS, Windows, OS/2, Unix, Linux, BSD, etc.). This advantage should not be out of hand. So it should be possible to access GOT and PLT with UASM, right? For me that would be the cleanest option.
Some more comments at the end. Of course I could have done the register dump over the stack as well. But I wanted to show how to access any variables in data or bss section. The program still lacks parts: FPU dump, the dumps of the XMM, YMM and ZMM registers. But that's just a little hard work.
I hope nobody scared anyone.
Gunther