If use old GPUs like GTX780ti passthrough to jellyfin container in Truenas scale, it will failed at deploy with "0/1 nodes are available: 1 Insufficient nvidia.com/gpu" error.

Similar error and discussion can be found here:

  1. https://www.truenas.com/community/threads/plex-nvidia-gpu-passthrough-scale-21-02.91152/page-3
  2. https://github.com/NVIDIA/k8s-device-plugin/issues/33

The problem here is we are using an old gpu model, like my GTX 780ti. If using a newer one, for example GTX 1060, should be no problem at all.

Summing up from these two post, it seems that if we can configure nvidia-device-plugin with DP_DISABLE_HEALTHCHECKS=xids, then the problem can be resolved.

But the question is, how? The discussion thread does not give much information about how to change the .yaml file, neither the github issue mention anything about truenas scale system.

Well then, if nobody have done before, I take the lead then. After thinkering around, I did found a solution. Here is the tldr steps.

  1. open truenas scale shell (system setting -> shell)
  2. input docker pull nvcr.io/nvidia/k8s-device-plugin:v0.9.0
  3. input k3s kubectl apply -f https://raw.githubusercontent.com/thomaslty/k8s-device-plugin/master/nvidia-device-plugin.yml
  4. done, you can now continue to deploy jellyfin and assign a GPU to it (Here I use truechart to deploy, you can add its catelog in truenas https://github.com/truecharts/catalog)

Side notes:

k3s kubectl edit daemonsets/nvidia-device-plugin-daemonset should also work in theory, but I didn't tried.

如果你覺得我的文章對你有幫助的話,歡迎關注本博客或者關注我的 Github