COPY Relocations

2016-04-17

As part of my work on reducing Go binary size, I ran into the concept of linker copy relocations. This is a relatively obscure underdocumented concept so I want to scribble down some notes.

Some background: A relocation is a task created by the compiler and performed by the linker. The typical relocation is

"put the address of symbol X inside symbol Y at offset O"
. There are many reasons a compiler can't do this itself. The clearest reason is that it may not have a copy of symbol X. Many programming languages support compiling programs piecemeal and referring to other symbols by forward declaration. In C it is as easy as:

$ cat symY.c
void x();

void y() {
x();
}
$ cc -c symY.c
$ readelf -r symY.o

Relocation section '.rela.text' at offset 0x518 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000900000002 R_X86_64_PC32     0000000000000000 x - 4
$

…to get a relocation for the address of a symbol x inside symbol y, in an object file that knows nothing about x. When the linker runs, it is given symY.o, the moral equivalent of a symX.o object file, lays out the symbols and resolves the relocations. In an introduction to linking, this the end of the story. But not today.

Dynamic linking is an extra link phase that happens after the executable is linked. A long time after. It is performed by the operating system when an executable is run. Here is a traditional example building on symY.c:

$ cat main.c
void y();

void x() {}

int main(void) {
y();
return 0;
}

$ cc -shared -o libsymY.so symY.o
$ cc -fpic -c main.c
$ cc -g main.o -L . -lsymY

What's happening here is we turn symY.o into a shared library, compile main.c (which needs a symbol y, defined in our shared library), and then link it looking for the library (-lsymY) in the current directory (-L .). The result is a binary with a relocation:

$ readelf -r a.out
…
000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 y + 0

When you execute this program, the OS loader finds the .so file, maps it into memory, and does the job of the linker resolving relocations.

At this point it is really tempting to walk away, pretend shared libraries don't exist and go back to a sensible world where linkers link. For the purpose of everyday programming, please do. But this machinery is widely used and become more common. The increasingly popular ASLR security technique (Address Space Layout Randomization) maps the binary into a random location in memory when the program starts. To put the data and program text in different relative locations requires relocations that can only be resolved at load time. The result is PIE binaries (Position Independent Executable) that look like a shared object with a main function.

Back to COPY relocations.

A COPY relocation is a special kind of dynamic relocation that instructs the loader to copy a symbol to a particular location. It is used to enable what in a world of PIE binaries looks like a half measure: position-dependent main executables that use a shared library. The position-dependent code needs to be fully linked, that is, the traditional linker needs the address of a symbol that won't be known until we reach the dynamic linker. To make this work, it leaves space for a symbol at a known address, writes the main executable to expect the symbol to be there, and leaves a COPY relocation for the dynamic linker, asking it to move the symbol into place.

The job of a dynamic linker when faced with a COPY relocation is to move the symbol out of the memory region allocated for the shared object and into the region of the main executable. It then needs to resolve all relocations in the shared object to point to the symbol location in main executable. This is possible because the shared object is position-independent.

The result of these COPY relocations is another surprising way the memory of your program can end up laid out by linkers. You can produce two objects with linkers and expect the loader to load each somewhere. You know the loader can patch up references in each to point at the other (typically word-sized pointers). Well now it can take chunks out of one piece and put them in another piece. I got burned by this with the layout games I played in https://golang.org/cl/21285. I worked on the assumption that all the symbols I neatly laid out in a section would be in that section. Instead ld.bfd generated (incorrectly in this case, https://sourceware.org/bugzilla/show_bug.cgi?id=19962) a R_ARM_COPY for some of my symbols.

golang.org/issue/6853

Index
github.com/crawshaw
twitter.com/davidcrawshaw
david@zentus.com