UnderstandingDSOLinkChange

From FedoraProject

Revision as of 19:18, 26 November 2009 by Rgrunber (Talk | contribs)

Jump to: navigation, search

Contents

Understanding the (Proposed) Change to DSO Linking

Basics

The default behaviour for ld is to not link objects that are listed as dependencies of another linked object. This is dangerous if the other object is ever changed to occlude the object on which your program depended, causing your program to break without any change to your code.

What's the difference?

For example (courtesy Roland McGrath):

 ==> foo1.c <==
 #include <stdio.h>
 extern int foo ();
 int
 main ()
 {
   printf ("%d\n", foo ());
 }
 ==> foo2.c <==
 extern int foo ();
 int bar () { return foo (); }
 ==> foo3.c <==
 int foo () { return 0; }


gcc -g -fPIC -c foo1.c foo2.c foo3.c

gcc -shared -o foo3.so foo3.o

gcc -shared -o foo2.so foo2.o foo3.so


(This Succeeds)

gcc -o foo1 foo1.o foo2.so -Wl,--rpath-link=.


(This Fails)

gcc -Wl,--no-add-needed -o foo1 foo1.o foo2.so -Wl,--rpath-link=.

/usr/bin/ld: �: invalid DSO for symbol `foo' definition

./foo3.so: could not read symbols: Bad value

collect2: ld returned 1 exit status

[Exit 1]


What it meant to say was:

gcc -Wl,--no-add-needed -o foo1 foo1.o foo2.so -Wl,--rpath-link=. -B/tmp/

/tmp/ld: ./foo3.so: invalid DSO for symbol `foo' definition

./foo3.so: could not read symbols: Bad value

collect2: ld returned 1 exit status

[Exit 1]


So, the difference is whether you can refer to a symbol that's in a DSO that you didn't list explicitly in your link line, but that is a DT_NEEDED dependency of one of those (or recursively of those, I think).

I find that error message not very explanatory, but it's what it says. Giving a generic "undefined symbol" error (which usually comes with source line info for the reference) would be less strange but also perhaps too generic for this specially weird case.


New result:

gcc -o foo1 foo1.o foo2.so -Wl,--rpath-link=.

/usr/bin/ld: foo1.o: undefined reference to symbol 'foo'

/usr/bin/ld: note: 'foo' is defined in DSO ./foo3.so so try adding it to the linker command line


The big difference is that with the proposed change in place, ld will no longer skip linking needed libraries by default. The current default behaviour will lead ld to skip linking with a library if it is listed as a needed by another library that the program uses. In abstract terms, if libA is needed by libB and your program requires both libA and libB, your program may only link to libB. Then if another version of libB comes out that does not list libA as a needed library, then a recompilation will mysteriously break.

A concrete example from Roland McGrath:

libxml2.so has:

 NEEDED            Shared library: [libdl.so.2]
 NEEDED            Shared library: [libz.so.1]

In this case, a program that links with libxml2 and uses dlopen may not link with libdl, and a program that links with libxml2 and uses gzopen may not link with libz. While these programs will work, they are at risk of failure if libxml2 is ever changed to omit the dependency on libdl/libz.

What do I do?

The error message will prompt you to explicitly link to the DSO that you need. From the foo example, adding foo3.so will get rid of the error:

gcc -o foo1 foo1.o foo2.so foo3.so -Wl,--rpath-link=.