Cat
Published on

Esoteric Errors: 'hidden symbol is referenced by DSO'

Authors
  • avatar
    Name
    icyveins7
    Twitter

A new series of posts

I recently started work on a new codebase - one that is very large and has multiple moving components, from a frontend desktop UI (not my business) to the hardware interface (also not my business) and the backend processing (this one's my business).

The standard build process had been setup on a remote server, and most people were ok with the process of connecting to it with tangible amounts of latency, but I was not. So my stubborn ass decided to figure out how to build it myself on a local machine. In the process, I encountered a ridiculous number of build errors; they were using custom Makefiles, but they had been generated by some other tool (possibly from Windows, even though it was being built in Linux, since there was a Windows VS solution as well).

This will be the beginning of a series where I will document these errors for myself (and hopefully for others!).

Topic for today

Alright enough backstory. The error being discussed here goes something like this:

hidden symbol X in Y is referenced by DSO

GCC (or rather, the linker) spat this out at me when it was trying to link in a Boost static library.

As a note, DSO stands for Dynamic Shared Object, which is just Linux's name for shared libraries.

It's not an undefined reference

You'll be forgiven for thinking so, because that's what came to mind at first. But no, it's in the description; the symbol was there in the .a archive (an nm and objdump confirmed this).

This was a different kind of error, and one that I think resulted from the choice to build multiple parts of the code into separate shared library objects (.sos).

If you want to go ahead and read the stackoverflow links that helped me, go ahead and click here.

It's also not a link order issue

You'll again be forgiven for thinking this might have been the cause. This was what I tried at first to reproduce the problem in a simple scenario.

For those who don't know, ld resolves symbol references in a very specific order:

  1. A compilation unit needs a particular symbol.
  2. Libraries specified after that compilation unit provide this symbol.
  3. Linker does its thing.

This is why specifying libraries in the right order matters, because if you provide the library 'before' you make the 'request', the ld will not go back and get it for you (but I think MSVC will, so 1 point to Microsoft I guess?).

I'll leave this excellent writeup on the topic here, as I think it explains the topic far better than I ever could (he even talks about the circular dependencies!)

A little more in-depth: what does the linker actually pull in?

What I tried to do to reproduce the hidden symbol issue at first was to create the following scenario:

  1. Compile two separate source files separately: we'll call them static.cpp and static_exclude.cpp for reasons that will become apparent. They will share the same header, static.h, and we will place them all into the same static library.
// static.h
int addstatic(int a, int b);
int notinshared(int a, int b);
float exclude(float a, float b);

// static.cpp
#include "static.h"

int addstatic(int a, int b){
  return a+b;
}

int notinshared(int a, int b){
  return a-b;
}

// static_exclude.cpp
#include "static.h"

float exclude(float a, float b){
  return a/b;
}

Honestly, I don't know how to write raw Makefiles (I'm more of a CMake dude), so I had ChatGPT spit one out for me:

STATIC_LIB = libstatic.a

# Compile static.cpp to a static library (.a)
$(STATIC_LIB): $(STATIC_SRC)
	$(CXX) $(CXXFLAGS) -c static.cpp -o static.o
	$(CXX) $(CXXFLAGS) -c static_exclude.cpp -o static_exclude.o
	ar rcs $@ static.o static_exclude.o

So we made our libstatic.a.

Fun fact: doing this taught me that Makefiles must use tabs in the recipe, not spaces!

  1. Now we make a shared library from shared.cpp. This will use some functions from the static library we just made.
// shared.h
int sharedaddscale(int a, int b, int c);

// shared.cpp
#include "shared.h"
#include "static.h"

int sharedaddscale(int a, int b, int c){
  return addstatic(a,b)*c;
}

And another excerpt from ChatGPT:

SHARED_OBJ = shared.o
SHARED_LIB = libshared.so

# Compile shared.cpp to an object file
$(SHARED_OBJ): $(SHARED_SRC)
	$(CXX) $(CXXFLAGS) -c $< -o $@

# Link shared.o and static library to create the shared object (.so)
$(SHARED_LIB): $(SHARED_OBJ) $(STATIC_LIB)
	$(CXX) -shared -o $@ $(SHARED_OBJ) -L. -lstatic

This makes libshared.so.

  1. Finally, we create our main executable file e.cpp.
#include "static.h"
#include "shared.h"
#include <iostream>

int main(){

  printf("%d\n", sharedaddscale(2,3,4));
  printf("%d\n", notinshared(5,3));
  printf("%d\n", exclude(5.0f,3.0f));

  return 0;
}

And we link our shared library to it:

E_SRC = e.cpp
E_OBJ = e.o
EXEC = final_executable

# Compile e.cpp to an object file
$(E_OBJ): $(E_SRC)
	$(CXX) $(CXXFLAGS) -c $< -o $@

# Link e.o with the shared library to create the final executable
$(EXEC): $(E_OBJ) $(SHARED_LIB)
	$(CXX) -o $@ $(E_OBJ) -L. -lshared

The question for you now is: does this build? If not, which function calls will cause errors?

You can ponder the code and build paths above before continuing down..

...

...

...

...

...

...

...

...

...

...

...

...

...

Alright that's enough blank space. If you said only exclude() will cause issues, then you would be correct, so congratulations! You're a qualified GCC nerd.

Why does exclude() cause an undefined reference?

When the linker finds a symbol reference, it searches the later inputs to try to find it. Here, we only linked the shared library, so the next question must be: why is exclude() not inside the shared library? Didn't we link the static library when we created it?

The answer is that the linker resolves and includes only symbol definitions that it requires. The shared library never needed exclude(), so it never got included in the final output.

Okay, but the shared library also didn't need notinshared(), so why is it there?

The above answer isn't technically complete. What the linker does exactly is to find a symbol reference it requries, and then include the entire object file. It doesn't matter whether it is a simple .o, or an archive (static library .a) of object files.

In the above example, we did need addstatic() inside shared.cpp. But addstatic() shares the same compilation unit (and hence the same object file) as notinshared() - both come from static.cpp which became static.o - so both of them end up inside the final shared library!

Static libraries are basically just containers of object files to the linker.

This is a pretty good post about the above linkage behaviour.

Yes, that is indeed the solution here. This way the linker sees the necessary symbol, which is present in libstatic.a. But technically, that's not the only way to fix the problem..

Including everything from the archive

We've already seen that the linker has the freedom to pick and choose the objects it wants to include. But if you're building a shared library, you probably just want to dump everything in; that way, the end-user can just use your library, instead of having to link in the original static library as well.

Well, we can do that too, with -Wl,--whole-archive:

# Link shared.o and static library to create the shared object (.so)
$(SHARED_LIB): $(SHARED_OBJ) $(STATIC_LIB)
	$(CXX) -shared -o $@ $(SHARED_OBJ) -L. -Wl,--whole-archive -lstatic

This inserts all object file components from the static library directly into the new shared library, so we can find the symbols again!

Finally we get back to the title question..

Of course, the above isn't the full story. Most libraries nowadays limit the visibility of their functions. For those that want the full story, just read this.

You might be thinking: okay, I don't really care about 'protecting' any part of my code, so should I ever use this if I'm writing a .so? The answer is still yes, because it is likely to generate more optimal and smaller code.

We could have built the shared library without linking the static one

Before we go into the details, let's return to the example above, and assume we linked libstatic.a in the link step for the final executable, in order to fix the issue. You should have noticed that we are now linking libstatic.a at both the DSO creation of libshared.so as well as the executable itself, and this is not necessary; DSOs are perfectly happy with 'postponing' looking for a symbol until a later time, so in this particular scenario we can do the following:

# Build .so, leaving out all libstatic symbols for now
$(SHARED_LIB): $(SHARED_OBJ) $(STATIC_LIB)
	$(CXX) -shared -o $@ $(SHARED_OBJ)

# ...

# Build final executable, linking in libstatic symbols to solve both the DSO's
# used symbol (sharedaddscale) and the ones the symbols from object files
# that were left out (excluded)
$(EXEC): $(E_OBJ) $(SHARED_LIB)
	$(CXX) -o $@ $(E_OBJ) -L. -lshared -lstatic

This would have solved all our undefined reference problems, and compiles correctly.

Now we introduce hidden visibility..

Continuing from above, I'm now going to turn on -fvisibility=hidden for both static.cpp and static_exclude.cpp:

# Compile static.cpp to a static library (.a)
$(STATIC_LIB): $(STATIC_SRC)  
	$(CXX) $(CXXFLAGS) -fvisibility=hidden -c static.cpp -o static.o
	$(CXX) $(CXXFLAGS) -fvisibility=hidden -c static_exclude.cpp -o static_exclude.o
	ar rcs $@ static.o static_exclude.o

# Build .so, leaving out all libstatic symbols for now
$(SHARED_LIB): $(SHARED_OBJ) $(STATIC_LIB)
	$(CXX) -shared -o $@ $(SHARED_OBJ)

# Link e.o with the shared library to create the final executable
# Also link static library to solve undefined reference
$(EXEC): $(E_OBJ) $(SHARED_LIB)
	$(CXX) -o $@ $(E_OBJ) -L. -lshared -lstatic

I did not change the internal code whatsoever, just this Makefile. Building again now results in the following error (yes, this is a screenshot from an iPhone, shoutout to the makers of iSH):

ish screenshot of hidden symbol error

Recap of what we did here and the possible fixes

The order of problems and solutions we just presented to reach this step is as follows:

  1. First build the static library as per normal, then create the shared library using it with a simple -lstatic, and finally create the executable linking only the shared library. This causes an undefined reference, since our executable uses a function in an object that was dropped by the linker when creating the shared library.
  2. We fixed this by linking our static library in the final executable as well. This fixes the undefined reference as we provide the object file containing the function we used.
  3. Since we are linking our static library in 2 steps - the shared library and the executable - we can remove the static library link for the shared library and the executable will still compile correctly, so we do this.
  4. We then turned on -fvisibility=hidden for all our initial static library source files. This causes the hidden symbol in DSO error, since ALL the functions we are using are now hidden by default, and hence cannot be used by our executable.

Now the fix for this is actually quite simple; we simply need to link the static library during the shared library creation again!

Doing this places our hidden static library symbols into the DSO, which can then be used by the shared library's exported symbol. Dumping the symbols using nm libshared.so I see:

T _Z14sharedaddscaleiii
t _Z15notusedbysharedii
t _Z9addstaticii

where the T means that it is included and t means it is hidden (if it's your first time, just look up nm or objdump output). Hence, the shared library now clearly contains the necessary hidden symbol for addstatic, and this is referenced by sharedaddscale in the final executable without having to look outside the current DSO; this is exactly what hidden visibility is meant to do.

Some concluding remarks

These recent discoveries have made me think about the importance of understanding not just the shiny algorithms (read: leet code) and computer science things, but also learning about the tools available to us as developers.

It's like a carpenter learning about the optimal angle to hammer a nail but not knowing how to use an electric drill (I'm not a carpenter but I build IKEA things so hopefully that was reasonable).