Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix audiowaveform isolation issues in API dockerfile #4680

Closed
sarayourfriend opened this issue Jul 31, 2024 · 3 comments · Fixed by #4942
Closed

Fix audiowaveform isolation issues in API dockerfile #4680

sarayourfriend opened this issue Jul 31, 2024 · 3 comments · Fixed by #4942
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Contributor

Current Situation

Despite realies/audiowaveform building audiowaveform using BUILD_STATIC=1, the output is still a dynamically linked binary. From within the latest image:

/ # ldd /usr/local/bin/audiowaveform
        /lib/ld-musl-x86_64.so.1 (0x7f17b78f9000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7f17b765a000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7f17b7636000)
        libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f17b78f9000)

The API Dockerfile copies these four libraries directly from the realies/audiowaveform image and overlays them on the base system. The audiowaveform binary we copy out of that image was built and tested with those libraries. However, as noted in our API Dockerfile, because we run apt update after copying these libraries, audiowaveform may not necessarily be running with the version of those libraries it was built and tested against.

Suggested Improvement

I propose we resolve this once and for all as follows:

Instead of copying the binary as built from realies/audiowaveform, download the Debian Bookworm .deb (the 1.10.1-1-12_<arch>.deb one, 12 indicating the Debian version) from https://github.com/bbc/audiowaveform/releases/tag/1.10.1 and install it alongside other system dependencies in our existing RUN apt [...] line. We should also specify the Bookworm variant of the python -slim tag, even though that is the default. The correct .deb needs to be pulled based on the architecture of the build host.

The main complexity of this is ensuring the correct architecture .deb is downloaded depending on the build host. On the other hand, it would allow us to delete a huge portion of the API Dockerfile dedicated to copying (and making it possible to copy) the audiowaveform binary out of realies/audiowaveform. It also removes an intermediate middle dependency between us and audiowaveform.

Benefit

Resolve this minor edge case in API build stability.

Additional context

I also considered the following alternatives:

  1. Continue copying the binary, but isolate it and its libraries in /opt/audiowaveform, and then set LD_LIBRARY_PATH when invoking audiowaveform. Do this without needing to modify the Python code that calls the binary by creating a shim /usr/local/bin/audiowaveform along the lines of...: LD_LIBRARY_PATH="/opt/audiowaveform/lib:/opt/audiowaveform/usr/lib" /opt/audiowaveform/usr/local/bin/audiowaveform "$@"
  2. Switch from the -slim to -alpine variant for the final Python image. The (partially) "static" build only leaves out standard library dependencies. These are essentially never going to change unless audiowaveform changes to a new language. We could therefore assume that so long as we ensure libstdc++ and libgcc are installed with apk when we update the base system (trivial), we wouldn't need to copy these final linked libraries anymore. libc.musl will already be in the alpine base system.

The first of these alternatives is, in theory, a smaller change, but might take some fiddling (I haven't actually tried it but it should work in theory as long as ld works the way it should!). The second might work, but I don't know whether the rest of our Python application runs on Alpine without issue.

Overall, the approach I chose to propose moving forward with is the simplest both in the immediate way it can be implemented and in that it significantly simplifies our API Dockerfile. It is the safest and cleanest option, all things considered.

@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Jul 31, 2024
@dhruvkb
Copy link
Member

dhruvkb commented Jul 31, 2024

audiowaveform may not necessarily be running with the version of those libraries it was built and tested against

The comment this links to is outdated. We are no longer using latest for the waveform image but rather a pinned version.

FROM docker.io/realies/audiowaveform:1.10.1 AS awf

Is the build process still unreliable, if we've pinned the version? To clarify I am not opposed to (rather very much in favour of) the idea, I only want to know more about the rationale behind the change.

@sarayourfriend
Copy link
Contributor Author

sarayourfriend commented Aug 1, 2024

It's unreliable if apt update or a base system update (e.g., new Debian version) includes g++ changes incompatible with whatever audiowaveform:1.10.1 was compiled against.

The unreliability isn't from selecting the dependencies or the locations of things, it's from the fact that the binary is dynamically linked to libraries that it was never validated against (and in particular built versions of those libraries from a distribution almost explicitly incompatible with the Debian approach to things).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
2 participants