Sunday, January 23, 2011

Ruminations on 'ld', ELF and entry point to a C program

Q. Can a C program start at a function other than main()?

To answer this question in depth, one needs to know the C
run-time environment, i.e, at least the basic difference between a
  • Hosted environment (where all C standard libraries are available, program starts at main()). E.g. GNU/Linux, Windows.
  • Freestanding environment (no libraries available, how to start/load is up to the environment). E.g. Embedded systems.
(for this discussion, lets consider only GNU/Linux and the GCC compiler tools)

Now, when we do a
$ gcc <file>.c

it goes through all the stages of compilation and linking to yield an executable (in ELF format) in that process, the default entry-points to the program is defined and that will be main() (crt0.o/crt1.o etc which GNU linker [ld] links).

If we want to establish a separate entry point, we have to use the linker option (to ld), and that is -e. And, if we have to mimic a total freestanding implementation, we will need lot of functions to try out this simple exercise, instead, lets use the stdio from libc, and change the entry point to
start().

$ cat tmp.c
#include <stdio.h>

int start()
{
printf ("Hello World.\n");
exit(0);
}

Note: we use exit instead of return. We cant return, because we wont link with the C run-time which has the handlers. Here, we need stdio.h (libc) for both exit() and printf().

Lets compile it:
$ gcc -c tmp.c

we got tmp.o, to link this and get a.out, we need the path to the run-time dynamic linker (which is the path to ld-linux), on my RHEL, it happens to be:
/lib64/ld-linux-x86-64.so.2
(since I have a 64-bit AMD, if you want to find out, just run gcc -v on any program and see the link stage output)

Note: ld is the GNU linker, ld-linux is the dynamic linker (or loader) which the kernel first loads, and is responsible for loading the actual executable and all the required dynamic libraries.

Link:
$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -e start tmp.o -lc

Here, -lc is to link libc, once this runs, we get a.out, and we're done. We can examine a.out using ldd.

$ ldd a.out
libc.so.6 => /lib64/tls/libc.so.6 (0x000000328b700000)
/lib64/ld-linux-x86-64.so.2 (0x000000328b300000)

If we don't use the -dynamic-linker option to ld, it picks default (/lib/ld64.so.1 for me)



At this point, we are done with our agenda of changing the entry point, but lets go a but deeper on ld-linux - the dynamic linker (part of OS) which actually loads the executable.
Why did we have to give the path to ld-linux in the link step ? That's because, the dynamic-linker is a separate binary, and maybe for modularity reasons, is not within in the kernel, when we give the path to dynamic-linker to ld, it will be added into a section called .interp in the ELF headers. When the ELF binary is run, the loader looks for the path to ld-linux in the .interp section, first loads it, and then hands over the program to ld-linux (we can see the ELF headers with the objdump or readelf utilities).
Click to enlarge

This is similar to shebang lines in script executables, i.e, if the first line of a shell (bash) script is:
#!/bin/bash
call it hello.sh, chmod +x and execute it, the loader does something similar to:
/bin/bash hello.sh
i.e, load the shell and hand over the script to it. We can as well try this on any dynamic executable (which are dependent on shared libraries)

$ /lib64/ld-linux-x86-64.so.2 /bin/echo hello
hello


References: