News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Some Relocations Required (not for the faint of heart)

Started by Gunther, January 08, 2018, 09:04:20 AM

Previous topic - Next topic

Gunther

The background for this post is this old thread.

The program was inspired by Dr. Paul Carter's book. It was written, assembled and linked under Ubuntu 16. I needed a framework so that students who are currently learning assembler can look at the registers in the processor and the contents of the floating point unit in their programs. So far so good.

You can of course do all the outputs through system calls. Then, however, routines must be written for output of all possible data types (strings, different integer types, different floating point numbers etc.). That works fine, but I was too lazy for that. That's why I liked Carter's idea of linking the assembly language code against libc and getting printf to do the job, for example. This works even under newer Linux distributions of Ubuntu or Debian and the application runs fine. The tricky problem is, for example, you can still assemble with Ubuntu 17 but no longer link the object modules against libc.

That has to do with position independent code and data. The precondition when writing 64-bit Linux assembly code is that the symbol must be "local" (i.e. not global), otherwise the linker will complain, a bit like this:

relocation R_X86_64_PC32 against symbol 'hello' can not be used when making a shared object; recompile with -fPIC
final link failed: Bad value

That's pretty tricky. The simple idea behind this is: It's possible to make the code section of the application truly position independent, in the sense that it can be easily mapped into different memory addresses without needing to change one bit. This is the default behavior among many 64-bit Linux distributions (Ubuntu, Debian, Gentoo, Fedora), especially since the advent of the Gold Linker. It was written by Lance Taylor and he is a huge fan of that described technique.

But the main reason for this is that Position Independent Executables (PIE) are an output of the hardened package build process. A PIE binary and all of its dependencies are loaded into random locations within virtual memory each time the application is executed. This makes Return Oriented Programming (ROP) attacks much more difficult to execute reliably. This is the address space layout randomization (ASLR). I will not discuss the various attack techniques (Buffer overflow, Return into libc, Stack buffer overflow etc.) in detail here, because that's forbidden by the forum rules. I just want to mention that all the effort is being made to complicate such attacks by Script Kiddies. We have a very serious background here.

What does this mean for a programmer who wants to link his assembly language code against libc or libm? Here's what I learned about it in the last few days. I found some important pieces of the puzzle, but others are still missing. My search process is far from finished.

In a gross generalization, I think there are two basic types indirect addressing: by PC-relative offset (i.e. by an offset from the current address of the instruction pointer), or through a pointer. There are two ELF constructs which assist in locating the run time addresses of symbols (function addresses as well): the Global Offset Table (GOT) and the Procedure Linkage Table (PLT). I'm sure you've already heard of them. Every newly started process gets a GOT and a PLT in the Unix world. While almost all position-independent 32-bit code will reference to the symbol _GLOBAL_OFFSET_TABLE_  (__GLOBAL_OFFSET_TABLE_ with double underscore under BSD) at some point, 64-bit code can use their relocation types to address values in the GOT, so you typically won't see the symbol being used in assembly language code as frequently. The address of the GOT is within the read-only segment.

The operation of the PLT is one of the better-documented areas of an ELF binary. Suffice it to say that on a lazy-linked function's initial invocation, the instruction pointer jumps using an address in the PLT to the dynamic linker — which changes the contents at that PLT address to be the real address of the function — and then jumps again to that target function (all the jumping is important as the target function's eventual ret statement will still find the original return address on the stack). Subsequent invocations no longer detour via the dynamic linker but jump from the PLT directly to the target function. So the PLT is a kind of a trampoline.

All this now means the following. You can't write, for example:

mov        rax, [myVar]

That won't work no longer. Instead, the variables must be addressed via the GOT and the functions must be called via the PLT. That's the difficulty.

My first attempts were with YASM. The manual describes the WRT operator together with the PIC relocation types:

..gotpcrel, ..got, ..plt, ..sym

But it does not work. I could link the object module against libc, but there was a segmentation fault. In addition, YASM is apparently no longer maintained. The last release is from 2014. Yasm is not up to date anymore.

Next try with NASM. The manual knows a bit about GOT but nothing about ..gotpcrel Bad luck.

The last salvation was the GNU assembler (GAS). GAS is tough, relentless and forgives no mistake. But it is well maintained, up to date and does a good job in the end. Here is the result. The attached ZIP file contains the sources and 64-bit binaries of the application. There are 3 sources:

  • main.s is the frame of the application.
  • asmio.s calls the different libc functions with the appropriate parameters.
  • example.s prints out some values and makes a dump of the CPU registers.
Here is the output of the program:

That's a C string (zero terminated).
32 bit unsigned integer value =  4294967295
32 bit integer value          = -2147483648
64 bit unsigned integer value =  858993459234567
64 bit integer value          = -858993459234567
REAL4 (float) value           =  1.730000
REAL8 (double) value          =  3.1415926535897931

CPU register dump:
------------------
RAX   = 1122334455667788   RBX   = 0000000000000000   RCX = 2233445566778899
RDX   = 33445566778899AA   RDI   = 00007F70DB3D2720   RSI = 0000000000000000
RBP   = 00007FFC994EB5E8   R8    = 445566778899AABB   R9  = 5566778899AABBCC
R10   = 66778899AABBCCDD   R11   = 778899AABBCCDDEE   R12 = 000055791CEFE540
R13   = 00007FFC994EB730   R14   = 0000000000000000   R15 = 8899AABBCCDDEEFF
RSP   = 00007FFC994EB5C0   Flags = 0000000000000206

However, it takes strong nerves to read the source code. You will not find any clever hll constructs there, just mnemonics. Moreover, parts of the source are written in Intel syntax, but other parts in the archaic AT&T syntax. The reason is simple. I have no idea how to write something like that:

       movq       msg1@GOTPCREL(%rip), %rsi               # get C string address
       call       print_Cstring@PLT                       # print_Cstring

in Intel syntax.

I have written to the maintainers of the binutils (GAS is a part of it), how can one write that in Intel syntax. That would be a big step forward, but so far no answer. I also wrote Frank Kotler (NASM developer) and asked for information, no answer so far. I will contact Hans Peter Anvin (chief maintainer of NASM), maybe he can help.

I hope that Branislav and johnsa know a way how to do that with UASM. The big advantage of JWASM was that it works on different operating systems (DOS, Windows, OS/2, Unix, Linux, BSD, etc.). This advantage should not be out of hand. So it should be possible to access GOT and PLT with UASM, right? For me that would be the cleanest option.

Some more comments at the end. Of course I could have done the register dump over the stack as well. But I wanted to show how to access any variables in data or bss section. The program still lacks parts: FPU dump, the dumps of the XMM, YMM and ZMM registers. But that's just a little hard work.

I hope nobody scared anyone.

Gunther
You have to know the facts before you can distort them.

hutch--

Could you use this instead of "mov        rax, [myVar]"

    lea rax, myVar

I don't know enough about Unix based systems for any of the higher level code but mnemonics like this should work if the random location code can still use LEA.

Gunther

#2
Steve,

thank you for your fast reply.

Quote from: hutch-- on January 08, 2018, 10:48:40 AM
Could you use this instead of "mov        rax, [myVar]"

    lea rax, myVar

That was my first idea, but that doesn't work. Here is a short explanation from the NASM manual. Sure, the manual is incomplete, partly unclear and only handles 32-bit code for shared libraries. But these principles have now been fully adopted for all applications in the 64-bit world. There is no other chance to access variables, this is only possible via the GOT (with several possibilities). Here is a very detailed explanation of the whole process by Ulrich Drepper, libc maintainer. Of course, he is a complicated personality, but what he writes has hand and foot.

All things considered, the whole thing is very vexing.

Gunther
You have to know the facts before you can distort them.

johnsa

I've been following the PIE saga for a bit now.. along with meltdown and spectre .. i'm almost of the opinion we shouldn't bother trying to code anymore..
they're complicating and slowing it down so significantly to try and secure the un-securable. No matter how many ways they come up with to make hacking more difficult, the hackers will over come and the only ones who suffer are legitimate users and developers.

The whole arrangement with GOT and PLTs is a total shambles.. I really hope the people who came up with such BS along with the people who designed ACPI and all go shoot themselves..
*rant*

The impact on performance is not negligible, let alone these people never seem to think beyond C ... they construct mechanisms which are so utterly filthy no one coming at them from an assembler point of view could stomach it. A classic point in case under Linux/Unix based systems is trying to extract a location out of libc for error messages.. which is non standard.. poor old win32 just provides something like GetLastError...

johnsa

The reason that even LEA won't work anymore is although lea rax,[var] is technically position independent, the distance between rip and var is known at compile time. With ASLR and PIE that distance is subject to change at any moment depending on the dynamic linker. So there really is no quick fix apart from using the GOT and PLT. It's filthy... horrible.. and has been hacked repeatedly already.. so their attempts at hardening and securing flaws have already failed miserably.. and we're left to suffer their incompetence.. *another rant* :)


hutch--

John,

Its good to hear some sense in this area, I have been looking around trying to find HOW it will infect any specific computer and all I hear at the moment is a mountain of waffling bullsh*t with no detail. If I start on the premise that an isolated computer can never be infected, then for it to get infected, it must be transferred to the computer by some means, either by connecting it to the internet OR something like data on a USB stick.

If this is the case and its not done by magic, then it sounds like "just another exploit" and the old rule has always been, let garbage into your computer and you can kiss your arse goodbye in terms of security. There have long been nasty and destructive exploits that will stop a computer stone dead and the only solution has always been to keep garbage out of a computer. Don't download rubbish, never ever let an email client run attachments and keep a boot disk image that can overwrite a damaged boot partition.

Gunther

John,

I can understand your anger and share it in part. The GOT and PLT for shared libraries (SO are pretty similar to the DLL) to use is one thing. But using the concept for all applications is a completely different matter. Here we agree 100%. It makes the code inefficient and increases the register pressure. But as I said, it was not our decision.

What I'm describing now is just the view of an assembler programmer. In the Windows world the MASM is unchallenged the top dog. It can only be used there, but you can use it to create all sorts of binary code (EXE, DLL, Drivers whatever you want). Behind MASM are decades of continuous development and a lot of manpower. That should be noted.

JWASM now UASM is 99% compatible with MASM. it has his own merits. It can run on different operating systems: Windows, Linux, OS/2, MacOS. Even DOS is still possible with an older version, although the DOS version is no longer maintained. These are undeniable advantages over MASM, and that makes UASM, of course, attractive. I can only hope that these benefits will continue in the future.

For now, however, you can not create a shared library with UASM, either under 32 or under 64 bit. Shared libraries are a concept in Linux, BSD or Solaris for many years if not decades. MacOS also uses this concept; Mach-O shared libraries have the file type MH_DYLIB and the .dylib (dynamic library) suffix. Unfortunately that is not possible at the moment with UASM.

However, it's also about using an assembler under an operating system to create all kinds of binary code. That's the main job of an assembler. From this perspective, it does not look very good at the moment. YASM had support for GOT and PLT, but has not been maintained for 4 years. The support of NASM is rudimentary, but seems to be flawed. It seems that FASM is able to allow access to the GOT and PLT so that shared objects can be created. By the way, the link from the FASM forum is from 2003. I have not yet checked how the situation is now. Besides, I have not worked with FASM yet. But maybe there will be nothing left for me in the future, who knows?

So only the gas remains. The GNU assembler is even more stubborn than MASM, but it makes the job. That was my luck. You can create any sort of binary code with it; that's proven. About this gas is present on a wide range of architectures and operating systems. I am not happy with this situation. I ask myself the following question: Will UASM continue to support various operating systems in the sense that every conceivable type of binary code can be generated? That would be very desirable. I realize that this means a lot of work. How can I help with my modest means?

Gunther
You have to know the facts before you can distort them.

aw27

Quote
For now, however, you can not create a shared library with UASM, either under 32 or under 64 bit. ... Unfortunately that is not possible at the moment with UASM.
Actually, it is possible and there is also a sample in there since the JWASM times.  :lol: (aproximately 10 years old)

hutch--

Ok,

Yet another suggestion. If as I grasp, the code can be located at random addresses without build time addresses available, I wonder if you can set a reference point at the location where the code is called which is within the relocatable code that itself can be used as a reference for LEA so that you have something like virtual addresses from that reference point ?

I don't know how ELF file formats work but I know the normal executable files in Windows environment using COFF format files have a starting address which has offsets based from its location that are used and any data available to the code is located in reference to the start address. Now of course if when the code is called, an argument is passed to it that gives the actual start address of the caller and thus its details become accessible, what would stop that from working.

It would mean that while the code can be located anywhere in application accessible memory, it could still access any data and code that the main app has available.

Guess work and unverified speculation from a Unix amateur.  :P

Gunther

Quote from: aw27 on January 09, 2018, 05:06:10 PM
Actually, it is possible and there is also a sample in there since the JWASM times.  :lol: (aproximately 10 years old)

So, try to link it with a new distribution and see what happens.

Gunther
You have to know the facts before you can distort them.

aw27

Quote from: Gunther on January 09, 2018, 05:54:24 PM
Quote from: aw27 on January 09, 2018, 05:06:10 PM
Actually, it is possible and there is also a sample in there since the JWASM times.  :lol: (aproximately 10 years old)

So, try to link it with a new distribution and see what happens.

Gunther

All right, let's bet $100.00 on that.
I have Ubuntu 64-bit 16.04.3 LTS. I will make an equivalent sample for 64-bit (even more difficult) and will make it work. Deal?

johnsa


I'm also on 16.04 ubuntu at present, but I believe the issue has become more apparent in 17.x where PIE has been made mandatory and default.
But I think it's definitely worth us trying to create some shared objects (.so) and see where we get too between aw, myself and Gunther across the different nix versions.
I know some other distros have made PIE mandatory going back a bit further.

In terms of planning for the future and how we get over this.. as much as my inner self really just wants to say "screw" linux then if they're going to come up with such nonsense .. I guess my obligation as maintainer of UASM is to do something about it :)

So here are my initial suggestions, and I think it would be valuable if Nidud joined in this debate. Given that in this particular area asmc and uasm should still be identical it would make sense that whatever solutions we come up with can be shared between the two projects, a collaboration of sorts to get it done as there is quite a bit involved.

My thinking thus far based on what I know of ELF/PIE:

1) Exports need to be declared as to whether they are data or function. Other assemblers provide a specific syntax for this however I don't think we need to given our implementation of PROC and invoke. If we rely on that we know which items are data and which are PROC, so this can be done seamlessly behind the scenes.

2) As far as I can tell Func@PLT type notation should only be required when calling a function which is NOT in the same compilation unit/module even under the PIE 17.x model, so if that is true then in theory we should be aware of this scenario by the very existence of a PROTO which has no PROC defined for it in the current module or an EXTERN PROC. If that is true then once again in theory we could make that transparent and avoid ugly syntax with all the @PLT stuff and implement that behind the scenes and make sure we add the relevant relocation type and PLT entry in the object.

3) movq var@GOTRELPC(%rip),%rdi should translate into a single instruction as far as I am aware, and much the same as with point 2 this should only be required to access the address of a variable in another compilation unit. So once again, we would know this as it would be marked as EXTERN and should be able to convert any memory address lookup to an item marked as EXTERN into the relevant GOT form.
This may require an additional check that prevents the use of LEA reg,VAR when VAR is extern as I figure that will need a different instruction.
In addition this would also change the implementation of ADDR when used in INVOKE and OFFSET would be prohibited for any inter-module reference too.

So in summary, from what I know..
1) I believe it should be possible to make this happen without ANY syntax modifications and filthy special operators to deal with GOT, PCREL, WRT etc.
2) We would probably want a command line switch -PIE to enable all of this.
3) The next step is to test the situation as it currently stands for .SO and DYLIB on ELF64 and MACHO64 and see where we get to.
4) Following on we need to verify the assumptions I've laid out above, Gunther perhaps you could add some tests into your GAS sample that references local functions and variables inside the same module without the gotpcrel/plt to test it?
5) Then we need to spec out the changes and "just do it"(tm).



johnsa

One more point is that obviously we have to make sure the changes can be applied to both ELF64 and MACHO64, which of course work slightly differently.. of course they do.. why would there be a standard...

aw27

All shared objects (.so libraries) are compiled as PIE since long. What Japheth has done with his sample is a trick, much in the line of Hutch's thoughts. I tested it long time ago and know it works. I am pretty sure it will work too as 64-bit, so I placed a challenge for that (I know Gunther is intelligent enough to not accept it).
Of course, nothing like doing the things according to the best rules and Johnsa has all my support.

johnsa

I believe you are correct, you could implement a workaround in a similar fashion to how libraries were loaded on the  Amiga, there is a fixed call which loads the library and returns it's base-address, from there all access to the library are relative to that base something along the lines of:

jsr LoadLibrary
move.l d0,a6
jsr _LVOLibFunction(a6)

roughly..

So it would be possible to have each module provide some form of hook to establish a base, but I guess it could get quite ugly having to remember to add that base to all the relevant address calculations.