This definitely comes under the heading of "stuff I'm writing down so if I ever have to do it again myself, I can remember my mistakes and avoid the blind alleys." But maybe there's something in here useful to anyone else with the same idea, so here it is.
So, I recently upgraded my Kubernetes cluster - I was starting to overload it with stuff, and more importantly I was starting to not play with things I wanted to because it didn't have the power. So I added one new node, a homebuilt machine running Linux that is about 100 times more powerful than the tiny machines in the cluster until now. Obviously it's not ideal having the cluster so unbalanced, but it's not like I'm hosting anything mission critical here, and I use Kubernetes *taints* to ensure that if that node goes down, the small nodes will only be allocated 'important' workloads (like this website) that they can cope with.
Anyway, when building this new node I decided the AMD Ryzen 3700X[1] offered the best compromise of price, performance - and in particular, number of threads, which is more useful for my container workloads than raw power - and power. I didn't want to build a gaming rig, so I want a relatively low-profile case, quiet fans and no crazy cooling solutions - but despite that, it's a relatively pokey little machine with fast NVMe storage, great for container hosting.
The only downside of the 3700X is it doesn't have any onboard graphics, so despite the fact that I will never use it as a desktop machine - it's basically a headless server - I had to invest in a graphics card, if only for long enough to get an OS installed. Nothing top of the line, a GEForce GT710 costing about 40 euro, but nevertheless it seemed sad for it to sit there idle all day when it could do something useful.
So, I set about wondering what I could do with it, and settled on using it as a remote rendering and CUDA machine - basically I want to be able to allocate GPU-accelerated tasks to this machine the same way I now delegate any CPU intensive tasks using Docker & Kubernetes. So I settled on deploying a Blender[2] render 'farm' (if you can call only one node a farm) as the first test case[^1].
Broadly, we need two things:
1. We need to get our Docker host (which happens also to be a node in my Kubernetes cluster) to be able to give containers access to the GPU, even though the server itself is actually headless.
2. We need to deploy Blender as a headless container, and find a way to get the Blender running on my desktop to farm out rendering jobs to the container instead of turning my little Mac Mini into a coffee warmer.
I'll explain the steps for part 1 today, and write up part 2 (which was in many ways much more complicated) tomorrow.
First, you will need a Linux installed. My teeny-tiny Kubernetes nodes use RancherOS, but when I set this server up I had the foresight to realise I'd probably want to use it for more exotic jobs, and so a more mainstream OS might be a good idea. So, for this machine I am running Ubuntu Server 20.04.1LTS[3].
I'm assuming you have got Docker installed and working already, but:
*Pro-Tip*: DO NOT accept the Ubuntu installer's offer of installing Docker
from a 'snap' image when you install the OS. The Ubuntu Snap installed
version of Docker is broken in mysterious ways - it will appear to work,
but you'll have trouble when it comes to getting Kubernetes container
networking running. Install Docker the 'manual' way[4],
it's incredibly easy and it will actually work. Trust me, you'll save
tearing your hair out.
I'm also assuming you can add the node to a Kubernetes cluster yourself, if you want to.
Oh, and one other prerequisite - an NVIDIA[5] graphics card with CUDA[6] support.
Now, assuming your server is set up, you're mostly good to go - except if it's a headless server, you won't have any graphics libraries installed. And even if you did install it with a desktop, the chances are you have the wrong version of the libraries. So we need two things:
1. Current NVIDIA drivers, with CUDA support,
2. NVIDIA's container support for Docker
"OK", you're thinking, I'll just install the drivers from Ubuntu with apt or driver-install... *Stop!*
The current Ubuntu NVIDIA libraries are version 440. They are fine for
desktop graphics, but they only support CUDA version 10. CUDA version 10
is also fine, but it's not the latest (which is version 11). You're likely
to find that Docker images for things that use CUDA are already dependant
on version 11 - you can get round that by manually specifying a
particular version with an image tag, or you can save yourself the bother
by making sure you are also compatible with the latest shiny.
So that means we want to install the version 450[^2] driver from a PPA repository, like so:
sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update sudo apt install -y nvidia-driver-450
This part is actually pretty simple, once you've discovered the reason things weren't working was the aforementioned CUDA version mismatch. NVIDIA distribute their own container support runtime that works with Docker; you install all their goodies like so:
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \ sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install nvidia-container-runtime sudo apt-get install -y nvidia-docker2
If you were done at this point, you'd restart the docker service, but there's one more change you want to make. By default, NVIDIA installs a new Docker runtime configuration called "nvidia", meaning if you want a Docker container to have access to the graphics card you have to pass `--runtime nvidia` to each Docker run command.
That's a bit of a chore, so I prefer to make it automatic. You do this by editing the `/etc/docker/daemon.json` file to add a "default-runtime" key; your edited file should look something like this:
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }
Now, you can use `sudo service docker restart` to restart the Docker runtime.
Finally, you want to check it works. You can do that by running the `nvidia-smi` command from within a suitable container. NVIDIA provide just such a suitable container, so you just need to execute this, and you should get similar output:
[Sasha:~] timwa% docker run nvidia/cuda nvidia-smi Tue Sep 8 16:07:41 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GT 710 Off | 00000000:08:00.0 N/A | N/A | | 40% 46C P8 N/A / N/A | 3MiB / 2000MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
That's it! You can now deploy Docker containers that depend on a GPU/CUDA, and they will work just fine on your headless server.
Actually, you don't need to do anything special with Kubernetes. Assuming you added the node to your cluster, if Kubernetes deploys a pod to that node it will have access to the GPU.
But, if you have a cluster like mine, where only some of the nodes have a GPU, you probably want to be able to tell Kubernetes to only deploy GPU-dependent workloads to the nodes that are capable of running them. You can do this by adding a *label* to the node, indicating its capability. Then, in your pod deployment specifications you can include a corresponding *selector* that will only match nodes with that label.
I added two labels to my server, `cuda.available` and `cuda.version`:
labels": { "beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", "cuda.available": "true", "cuda.version": "11", "kubernetes.io/arch": "amd64", "kubernetes.io/hostname": "hp-node0", "kubernetes.io/os": "linux", "node-role.kubernetes.io/worker": "true" },
So, that's all you need to do to get your GPU working on a Docker host (and Kubernetes node, if you're that way inclined.) Next time[7], I'll go through the steps needed to make it work with Blender as a rendering 'farm'.
[^1]: It so happens I have a 3D render animation I want to produce, so it's actually useful to me at the moment. Longer term though, I also want to play with non-graphics CUDA applications like Tensorflow for machine learning. The instructions in this episode apply equally well to those use cases. [^2]: As at the time of writing...
1: https://www.amd.com/en/products/cpu/amd-ryzen-7-3700x
3: https://ubuntu.com/download/server
4: https://docs.docker.com/engine/install/ubuntu/
5: https://www.nvidia.com/en-us/
6: https://developer.nvidia.com/about-cuda
7: /posts/2020-09-11-setting-up-a-blender-rendering-node-part-2.gmi
--------------------