Code Rants: Ruminations on 'ld', ELF and entry point to a C program

Q. Can a C program start at a function other than main()?

To answer this question in depth, one needs to know the C
run-time environment, i.e, at least the basic difference between a

Hosted environment (where all C standard libraries are available, program starts at main()). E.g. GNU/Linux, Windows.
Freestanding environment (no libraries available, how to start/load is up to the environment). E.g. Embedded systems.

(for this discussion, lets consider only GNU/Linux and the GCC compiler tools)

Now, when we do a
$ gcc <file>.c

it goes through all the stages of compilation and linking to yield an executable (in ELF format) in that process, the default entry-points to the program is defined and that will be main() (crt0.o/crt1.o etc which GNU linker [ld] links).

If we want to establish a separate entry point, we have to use the linker option (to ld), and that is -e.

And, if we have to mimic a total freestanding implementation, we will need lot of functions to try out this simple exercise, instead, lets use the stdio from libc, and change the entry point to

start()

.

$ cat tmp.c
#include <stdio.h>

int start()
{
printf ("Hello World.\n");

exit(0);
}

Note: we use exit instead of return. We cant return, because we wont link with the C run-time which has the handlers. Here, we need stdio.h (libc) for both exit() and printf().

Lets compile it:
$ gcc -c tmp.c

we got tmp.o, to link this and get a.out, we need the path to the run-time dynamic linker (which is the path to ld-linux), on my RHEL, it happens to be:
/lib64/ld-linux-x86-64.so.2
(since I have a 64-bit AMD, if you want to find out, just run gcc -v on any program and see the link stage output)

Note: ld is the GNU linker, ld-linux is the dynamic linker (or loader) which the kernel first loads, and is responsible for loading the actual executable and all the required dynamic libraries.

Link:
$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -e start tmp.o -lc

Here, -lc is to link libc, once this runs, we get a.out, and we're done. We can examine a.out using ldd.

$ ldd a.out
libc.so.6 => /lib64/tls/libc.so.6 (0x000000328b700000)
/lib64/ld-linux-x86-64.so.2 (0x000000328b300000)

If we don't use the -dynamic-linker option to ld, it picks default (/lib/ld64.so.1 for me)

At this point, we are done with our agenda of changing the entry point, but lets go a but deeper on ld-linux - the dynamic linker (part of OS) which actually loads the executable.
Why did we have to give the path to ld-linux in the link step ? That's because, the dynamic-linker is a separate binary, and maybe for modularity reasons, is not within in the kernel, when we give the path to dynamic-linker to ld, it will be added into a section called .interp in the ELF headers. When the ELF binary is run, the loader looks for the path to ld-linux in the .interp section, first loads it, and then hands over the program to ld-linux (we can see the ELF headers with the objdump or readelf utilities).

This is similar to shebang lines in script executables, i.e, if the first line of a shell (bash) script is:
#!/bin/bash
call it hello.sh, chmod +x and execute it, the loader does something similar to:
/bin/bash hello.sh
i.e, load the shell and hand over the script to it. We can as well try this on any dynamic executable (which are dependent on shared libraries)

$ /lib64/ld-linux-x86-64.so.2 /bin/echo hello
hello

References:

LD man page
ELF: see .interp section

Code Rants

Sunday, January 23, 2011

Ruminations on 'ld', ELF and entry point to a C program

No comments:

About Me

Blog Archive