Minimal ABI Example

Posted at — Jul 27, 2020

In compiled languages, such as C++, developpers have the possibility to use and build shared libraries, also known as a dynamic libraries. To depend on a shared library (in an exectuable or another shared library), said library needs to be present at compile time, 1 but really symbols from the library (functions, types, global variables etc.) are resolved at runtime (when something is executed) by the dynamic linker.

Shared libraries have many advantages. They can be shared between multiple programs, reducing memory and disk usage. Because they are resolved at runtime, they can be changed without impacting any program that depend on them. This is convenient to deliver updates to user without requiring much tools on their machine (which can be hard to get right). It also means we can wait until program execution to choose the most appropriate library. For instance, a library such as NumPy is compiled against BLAS. BLAS has many implementations, for instance OpenBLAS, ATLAS, BLIS, that we can choose from. One of them, Intel MKL, is built specificly for Intel CPUs, and we can therefore expect it to be more efficient on these machines. But for NumPy, it does not change anything. It only needs to compile against a BLAS library, and it will work with all BLAS libraries.

All of them? What if we take something completly different and called it BLAS?
As you can expect that will not work. Some information (e.g. function names) have been looked up when compiling against the shared library, and this information need to be the same in the library used at runtime. This is kown as the library interface: the observable boundary of componnents (functions, classes, etc.) that can be used to interact with the library. In non compiled languages (e.g. Python), it is the same as the Application Programming Interface (API), i.e. the way we (humans) can write code using a library. For shared library however, there also exsits an Application Binary Interface to define the way computers can interact with it (in binary form). Understanding ABI is highly technical. It can change even if the shared library API (or even all of it source code) remains unchanged.2 ABI is what make a shared library compatible with another. The first thing that impacts ABI is the CPU architecture. If we build a library for a given type, say x86-64, we cannot expect it to be interchangable with another like ARM64.

If we want to update a library without having to recompile its dependent programs, the ABI has to remain stable. This is why in this post we will build our intuition by studying an example that breaks the ABI.

Should you care about ABI stability in your library?
If you are wondering this, then chances are the answer is no. Libraries that are ABI stable across many versions and implementations are usually cornerstones of our computing ecosystem and require dramatically more work and knowledge than what is in this post. With this article, you will however better understand the struggle of package managers.

libcomplex 0.1.0 - Hello world

Complex numbers are at the foundation of many algorithms and applications, so we have decided to put much of our knowledge into a library libcomplex so that other developpers can build it. Watch out for bugs, it is still in in beta! 😉

We build a class to represent complex numbers and put it in our include/complex.hpp header. It uses the carthesian representation, but can compute a complex’s modulus and argument. We also added a to_string method to get human readable representation.

#pragma once

#include <ostream>


struct Complexe {
  Complexe(double real, double imaginary) noexcept :
    real(real), imaginary(imaginary) {}

  double modulus() const noexcept;
  double argument() const noexcept;

  friend std::ostream& operator<<(std::ostream& out, const Complex& c);

private:
  double imaginary;
  double real;
};

And we put the implementation of the method in the associated src/complex.cpp file

#include <cmath>

#include "complex.hpp"


double Complexe::modulus() const noexcept {
  return std::sqrt(real*real + imaginary*imaginary);
}

double Complexe::argument() const noexcept {
  if(modulus() == 0.){
    return std::nan("");
  } else if(imaginary < 0.){
    return - std::asin(real / modulus());
  }
  return std::asin(real / modulus());
}

std::ostream& operator<<(std::ostream& out, const Complex& c) {
  out << c.real << " + " << c.imaginary << 'i';
  out << " = " << c.modulus() << "e^(" << c.argument() << " i)";
  return out;
}

Our library is an overnight success, and many projects use it as a dependency! One of them, awesome-app, has the following code

#include <iostream>
#include <string>

#include "complex.hpp"


int main(int argc, char** argv) {
  const Complex z{std::stod(argv[1]), std::stod(argv[2])};
  std::cout << z << '\n';
}

Because compiling a library can be tricky to get right, we write a short CMake file to do it. For simplicity, we will add awesome-app to the same file, but in practice this would be two separate projects.

cmake_minimum_required(VERSION 3.0)

project(Complex VERSION 0.1.0 LANGUAGES CXX DESCRIPTION "Complex number library")

add_library(complex SHARED src/complex.cpp)
target_include_directories(complex PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/include")
target_compile_features(complex PUBLIC cxx_std_14)

add_executable(awesome-app src/awesome-app.cpp)
target_link_libraries(awesome-app PRIVATE complex)

And now to compile and run all

cmake -B build
cmake --build build
./build/awesome-app 3 4

Yields

3 + 4i = 5e^(0.643501 i)

libcomplex 0.2.0 - Pardon my French

Too excited about our library, we named our class Complexe in French instead of the English Complex. We make the change and try to recompile only our library (i.e. not awesome-app).

cmake --build build --target complex

Now if our dependency awesome-app updates the library without recompiling

./build/awesome-app 3 4

It will get the error similar to (this is on MacOS)

dyld: Symbol not found: __ZlsRNSt3__113basic_ostreamIcNS_11char_traitsIcEEEERK8Complexe
  Referenced from: ./build/awesome-app
  Expected in: ./build/libcomplex.dylib

This means that the dynamic loader cannot find the (binary) function to print a Complexe (the names in the output are mangled). Indeed, when recompling the library we replaced the function to print a Complexe with a function to print a Complex. The ABI is no longer compatible. awseome-app needs to be recompiled

cmake --build build --target awesome-app

And this time we have the compilation error

./src/awesome-app.cpp:8:8: error:
      unknown type name 'Complexe'; did you mean 'Complex'?
        const Complexe z{std::stod(argv[1]), std::sto...
              ^~~~~~~~
              Complex

Indeed, the name has changed. The API has also been modified and awesome-app needs to modify its code to replace Complexe with Complex.

Changing the API will always change the ABI so evaluating if the ABI was stable here was a lost cause anyways.

libcomplex 0.2.1 - Bug fix

In our enthusiasm, we made a mistake in the formula to compute a complex’s argument. We used asin instead of acos. To be fair, there is an equivalent formula using asin, and also one using atan. There is actually a function in the standard library to compute directly this angle: std::atan2.3 Because it is made just for this, we can expect it to be numerically more accurate. It also reduce the amount of code we have to maintain and test, so we make the switch.

double Complex::argument() const noexcept {
  return std::atan2(imaginary, real);
}

If we recompile only the library

cmake --build build --target complex

And use it directly without recompiling awesome-app,

./build/awesome-app 3 4

We get

3 + 4i = 5e^(0.927295 i)

The ABI remains unchanged so we are able to deliver a bugfix to our users (and the users of awesome-app) by simply swaping in the new library. This could happen without any action from the developpers of awesome-app!

libcomplex 0.3.0 - ABI break

While putting some documentation in our code, we notice the following attributes in the Complex class

struct Complex {
  ...
private:
  double imaginary;
  double real;
};

Complex numbers are usually represented with their real part first, so we would like our code to look the same. We inverse the attributes

struct Complex {
  ...
private:
  double real;
  double imaginary;
};

These are private attributes. They cannot be accessed outside of the class, so our API has not changed. If we try again to only swap the new library

cmake --build build --target complex

And use it directly without recompiling awesome-app,

./build/awesome-app 3 4

This time, we get

4 + 3i = 5e^(0.643501 i)

Notice how 3 and 4 (and the phase) have been swapped, just like we swapped the attributes in the class? This happens because when the complex.hpp header get included in awesome-app.cpp, some code for Complex gets generated in awesome-app. In particluar, the compiler aligns all its attributes (like if it was a tuple) and loses the notion of name. The constructor of Complex is defined in the headers, so when the compiler compiles awesome-app, it puts the imaginary part first and the real part afterwards. When we use a function from the updated library, the library expects on the contrary to find the real attribute first and the imaginary second, hence the flip.

We made a modification that left the API unchanged but changed the ABI. This is a particluarily nasty one, because it does not give any errors. The program is completly well behaved for computers but it does not compute what we expect it. As before, the solution is to recompile awesome-app.

The topic of ABI stability is extremely complex. For instance, renaming imaginary to any other name would have preserved the ABI. Similarily, if the constructor was only declared in include/complex.hpp

struct Complex {
  Complex(double real, double imaginary) noexcept;
  ...
};

And defined in src/complex.cpp

Complex::Complex(double real, double imaginary) noexcept :
  real(real), imaginary(imaginary) {}

Then, in this limited example, we could swap the attributes without changing the ABI.

I hope this example helped you understand how and when package managers can decide to replace a shared library for another, and hopefully it will help you figure out related bugs in the future. This post was meant to be illustrative and is not an incitation to develop ABI stable library. In particular, the examples in this post may behave differently in a more complete library.


  1. Link time to be precise. ↩︎

  2. For instance, GCC changed its C++ standard library ABI in GCC 5. Therefore the same shared library compiled with GCC < 5 and GCC >= 5 may not be ABI compatible. ↩︎

  3. Ok, there is also a std::complex type, but 🤫. ↩︎