|
|
| (20 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| ==Introduction== | | ==Introduction== |
|
| |
|
| this page will be concerned with the actual hosts more general information about Docker can be found '''[[Linux Docker And GPU Passthrough | here]]'''. We need to have some '''[[Virtual Machines]]''' to host the containers so that processing is reasonably separated and constrained with no one container gaining all of the CPU or GPU cycles of the entire Proxmox host. Another, possibly more important, consideration is that if the OS of Pear is upgraded we don't want to break a load of containers. The last point for VMs for containers is that Nvidia are fairly well known for breaking there drivers sometimes while they are being upgraded. So better to leave the GPU driver update until other people have tested it and do any Proxmox updates as a separate job.
| | It became apparent that We need to be able to run Docker containers as well as LXCs so some new '''[[Virtual Machines]]''' are being created with the sole purpose of running containers. With the new Linux drivers for Nvidia 5060ti GPUs it is now practical to use GPU passthrough to a Linux VM rather than a Windows 11 host. In turn with the GPU on Linux, it can and should be shared amongst any applications that use GPU to speed up their operation, obvious candidates are Jellfin an LLMs, To that end Linux has been installed on the host Quince and Docker containers can run on it. |
|
| |
|
| ==GPU Host Quince== | | ==The Logical Choice== |
|
| |
|
| We are using Quince 192.168.100.75/24 as the host for GPU Passthrough and as a consequence it will have Jellyfin and Ollama docker containers.
| | Walnut had hosted Jellyfin for a while because Nvidia GPU drivers were not working well on Linux but did install easily on Windows 11 Pro and to have hardware transcoding Jellyfin could use the GPU so it made sense to have the GPU passthrough to Walnut but it was only ever intended as a temporary measure to make Jellyfin work until Nvidia got decent Linux drivers. Running Docker on a Window 11 pro VM was causing problems in that it is a '''[[Virtual Machines | Virtual Machine]]'''with another Virtualisation on top so it was only practical to run Jellyfin. However, now with working Linux drivers, the VM Quince is now using the GPU. Also Quince is running Docker. So with docker and GPU Jellyfin is setup as a docker container application and as the GPU is on the host it can share its processing for hardware transcoding. At the same time Quince also has Ollama running in a different container so it can also use the GPU. It seems reasonable that other docker images will be wanted at some point and some of them may benefit from the GPU's processing power while others will not really use the GPU at all, so to keep the load down and under control we will have at least one separate host. A possible second host will have a news archiving suite of containers that will not need the GPU at all but may well be moving large amounts of data around and we do not want it to interfere with transcoding or LLM processing on Quince. |
|
| |
|
| ===Quince Specification===
| |
|
| |
|
| To store the OS we have a 150gb drive allocated from the SSD Rpool and to keep all of the media files we have a 3TB hard drive allocated from Pearpool. As a temporary measure the media HD from Walnut has also been added to enable the media files to be copied to the new HD, it was impractical to keep the walnut HD on quince as it is NTFS and quince is Linux so while it would work it is not the preferred.
| | == Docker Hosts == |
| * Hostname is Quince
| |
| * IP Address is 192.168.100.75/24
| |
| * RAM is 32gb
| |
| * Processor is type Host and has 1 socket with 10 cores
| |
| * Bios is OVMF (UEFI)
| |
| * Machine type is q35
| |
| * OS Storage is 150gb allocated from Rpool
| |
| * Media storage is 3tb allocated from Pearpool
| |
| * NIC is on production VLAN
| |
| * Display is set to default
| |
| * PCI device is 0000:07:00 (the Nvidia 5060ti 16gb GPU)
| |
|
| |
|
| ===GPU setup on Quince===
| | '''[[Linux Docker And GPU Passthrough | Quince]]''' will be the Host for GPU passthrough, Jellyfin and Ollama. '''[[Data Archive |Blackberry]]''' will be the host for the news archive applications '''[[The Kiwix Archive]]''' and '''[[The Web Archive (ArchiveBox)]]'''. While Tayberry will have the '''[[OpenAlex]]''' research tool |
| | |
| We will include the full guide to GPU passthrough to a Linux host but it should be noted alot of the steps were already done while preparing to do the same on Walnut. Speaking of walnut we need to disable all of the GPU passthrough settings on walnut before we proceed so it should have the PCI device removed and a a display set to Virto-GPU. If walnut PCI is not reset the GPU passthrough will fail, obviously, but if the display is not changed it will have no screen to output to.
| |
|
| |
| ====Host Preparation (The Proxmox Level)====
| |
| | |
| First the host must be told to "ignore" the GPU so it can be handed over to the VM. Enable IOMMU in GRUB: on the Proxmox host
| |
| nano /etc/default/grub
| |
| The basic line to edit is
| |
| # For Intel CPUs:
| |
| GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
| |
| # For AMD CPUs:
| |
| GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
| |
| However, as it didn't work with the first try the line was changed but it is not known if it was this change that made it work or some other trouble shooting step. So try the above line but if it does not work try
| |
| GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pcie_aspm=off"
| |
| After save and close
| |
| update-grub
| |
| reboot
| |
| The next step is Load VFIO Modules: Add these to /etc/modules to allow the "hand-off" to the VM
| |
| nano /etc/modules
| |
| and add the following settings. Please note Proxmox 8.x WebGUI will possibly work without these settings but it is better to add them to avoid race conditions between two GPUs and more importantly in case future versions of Proxmox GUI changes how it handles PCI devices.
| |
| vfio
| |
| vfio_iommu_type1
| |
| vfio_pci
| |
| vfio_virqfd
| |
| Save and close. the last thing to do on the Proxmox host is to blacklist Drivers on Host to Prevent Proxmox from using the card (so that we can pass it through to a Guest) by creating /etc/modprobe.d/blacklist.conf
| |
| nano /etc/modprobe.d/blacklist.conf
| |
| and add the following lines
| |
| blacklist nouveau
| |
| blacklist nvidia
| |
| blacklist nvidiafb
| |
| Save and close.
| |
| Tell Proxmox to rebuild the initramfs so it knows to load these modules next time the host boots
| |
| update-initramfs -u -k all
| |
| Please note this may take some time to run
| |
| | |
| ====VM Configuration (The Quince Level)====
| |
| | |
| In the Proxmox GUI, the VM settings are the most delicate part.
| |
| * Machine Type: Set to q35 (essential for PCIe bus support).
| |
| * BIOS: Set to OVMF (UEFI) (required for modern GPUs).
| |
| * PCI Device: Add a "Raw Device" and select the 5060 Ti
| |
| ** PCI Device (hostpci0) 0000:07:00 (This should be listed as (Nvidia Corporation GB206 [GeForce RTX 5060 TI]
| |
| ** All Functions ticked. This ensures the Audio and Video components of the 5060 Ti are passed as one unit
| |
| ** PCI-Express ticked
| |
| ** Primary GPU: unticked as this is a headless server we don't need the GPU to output any screen. If this was to be the main output like on a Win11 VM then this would be checked.
| |
| ** ROM-Bar (Read-Only Memory Base Address Register) unchecked. This tells the VM to look for the "Video BIOS" (vBIOS) of the GPU. If Checked: The VM tries to read the BIOS directly from the physical chip to "initialize" the card before the driver takes over. If Unchecked: The VM skips reading the hardware ROM. It relies entirely on the NVIDIA Driver (which will be installed in Quince) to initialize the Blackwell silicon. Since we are using a Headless Linux Server and Modern UEFI (OVMF), the traditional "initialization" steps that require the ROM are less critical than they are for a Windows gaming VM. The NVIDIA drivers for Linux are very good at "talking" to the card without needing the VM's BIOS to see the ROM first. 50-series cards and newer UEFI motherboards often hand off the device state in a way that doesn't require the ROM-Bar "shim." In some cases, unchecking ROM-Bar actually prevents "Error 43" or initialization loops that happen when a VM tries to read a BIOS that the host has already "partially" claimed. If nvidia-smi stops responding or the card disappears after a VM reboots, try ticking the ROMBAR box to force a fresh BIOS read.
| |
| | |
| ====Proving the VM is using the GPU ====
| |
| | |
| To prove the GPU is being recognised run the following command.
| |
| lspci -v -s $(lspci | grep -i NVIDIA | awk '{print $1}' | head -n 1)
| |
| It should give an output something like
| |
| 01:00.0 VGA compatible controller: NVIDIA Corporation Device 2d04 (rev a1) (prog-if 00 [VGA controller])
| |
| Subsystem: Gigabyte Technology Co., Ltd Device 418f
| |
| Physical Slot: 0
| |
| Flags: bus master, fast devsel, latency 0, IRQ 16
| |
| Memory at 80000000 (32-bit, non-prefetchable) [size=64M]
| |
| Memory at 380000000000 (64-bit, prefetchable) [size=16G]
| |
| Memory at 380400000000 (64-bit, prefetchable) [size=32M]
| |
| I/O ports at 8000 [size=128]
| |
| Capabilities: <access denied>
| |
| Kernel driver in use: nvidia
| |
| Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
| |
| * Expected Device ID: 2d04 (RTX 5060 Ti)
| |
| * VRAM Confirmation: Look for the 16G memory block as this is the TI version with 16GB VRAM
| |
| * Driver Status: Must show Kernel driver in use: nvidia
| |
| | |
| To test the GPU is running at the full bandwidth of the PCI slot the following command can be run
| |
| sudo lspci -vv -s $(lspci | grep -i NVIDIA | awk '{print $1}' | head -n 1) | grep -E "LnkCap:|LnkSta:"
| |
| It should give the following output
| |
| LnkCap: Port #0, Speed 32GT/s, Width x16, ASPM L1, Exit Latency L1 <4us
| |
| LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
| |
| * LnkCap: Width x16: This is the "Capability" of the Motherboard Slot and the Virtual Port. It means the "highway" (the physical slot on your motherboard and the virtual bridge in Proxmox) is built wide enough to handle 16 lanes of traffic.
| |
| ** Port #0 is the PCI virtual slot number.
| |
| ** Speed 32GT/s. GT/s stands for GigaTransfers per second. Unlike "Gigabytes," which measure the actual data, GT/s measures the raw speed of the electrical signals jumping across the wires.
| |
| *** 32GT/s is the hallmark of PCIe Gen 5 (the latest standard in 2026)
| |
| *** Gen 3: 8GT/s
| |
| *** Gen 4: 16GT/s
| |
| *** Gen 5: 32GT/s ( the Blackwell card's specification
| |
| * LnkSta: Width x8 (downgraded): This is the "Status" of the GPU Silicon. The RTX 5060 Ti is a physical x8 card.
| |
| ** The reported speed in this case is wrong because It reports 2.5GT/s. Because we are using PCI Passthrough, the VM isn't actually "talking" to the physical wires; it's talking to a Virtual PCIe Bridge created by Proxmox.
| |
| this means to test what speed the GPU is actually using we must look at the Proxmox host, Pear. So open a terminal on pear and enter
| |
| lspci -vv -s 7 | grep LnkSta
| |
| The 7 is the pci slot number on Pear, it will be the number that the VM quince passed through on the hardware in the webgui.
| |
| LnkSta: Speed 16GT/s, Width x16
| |
| LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
| |
| This shows Speed 16GT/s which is what it should be.
| |
| | |
| == Docker Applications installed on Quince ==
| |
| | |
| ===Installation Strategy===
| |
| | |
| Once the Blackwell GPU passthrough was verified on the Pear host, we transitioned to the Quince VM to set up the containerized environment. This allows us to run high-performance AI (Ollama) and media (Jellyfin) apps while keeping the base OS clean.
| |
| | |
| ===Docker Engine Installation===
| |
| | |
| We use the official Docker repository to ensure access to v29+, which includes critical patches for Gen 5 PCIe and Blackwell architecture support.
| |
| sudo apt update
| |
| sudo apt install ca-certificates curl gnupg
| |
| Then setup the repository
| |
| sudo install -m 0755 -d /etc/apt/keyrings
| |
| curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
| |
| sudo chmod a+r /etc/apt/keyrings/docker.gpg
| |
| echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
| |
| Next Install Engine & Compose
| |
| sudo apt update
| |
| sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
| |
| ===NVIDIA Container Toolkit (The "Magic Bridge")===
| |
| This toolkit enables the libnvidia-container library, which maps the physical GPU device files (/dev/nvidia0, etc.) into the virtualized Docker namespace.
| |
| curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
| |
| curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
| |
| sudo apt update
| |
| sudo apt install -y nvidia-container-toolkit
| |
| and last we configure the Nvidia Container tool Kit and restart Docker | |
| sudo nvidia-ctk runtime configure --runtime=docker | |
| sudo systemctl restart docker
| |
| ===Verification & First App: Ollama ===
| |
| We used the "Pull-on-Demand" feature to deploy Ollama. Docker automatically fetched the image from Docker Hub since it wasn't in the local "storage locker."
| |
| docker run -d \
| |
| --gpus all \
| |
| -v ollama:/root/.ollama \
| |
| -p 11434:11434 \
| |
| --name ollama \
| |
| --restart unless-stopped \
| |
| ollama/ollama
| |
| Note on Parameters
| |
| * --gpus all: Crucial. Without this, the container sees a CPU only.
| |
| * -v ollama:/root/.ollama: Preserves model weights (like Qwen or Llama) even if the container is deleted/upgraded.
| |
| * Storage Logic: Docker Hub images are immutable; any changes you want to keep (like downloaded AI models) must be stored in a volume -v.
| |
| * --restart unless-stopped: Ensures your AI is always online after a reboot.
| |
| * Docker identifies the "app" by the Image Name at the very end of your command In this case, the name was ollama/ollama. Docker treats this like a URL. It looks at its local "storage locker" first. If it doesn't see ollama/ollama there, it automatically reaches out to Docker Hub (the global library of apps) to find it. This is the "Pull-on-Demand" feature.
| |
| * Status Check: After running this, use docker exec -it ollama nvidia-smi to prove the container sees the Blackwell card.
| |
| | |
| ===Final Integration Step ===
| |
| | |
| With Docker verified, we move from manual docker run commands to Docker Compose (.yaml). This allows for "Infrastructure as Code," where we can define our 16GB VRAM reservations and Pearpool log paths in a single, repeatable file.
| |
| ===The "Blackwell Stack" Compose File for Quince===
| |
| | |
| The Compose file should be created in the home directory
| |
| nano ~/compose.yaml
| |
| The configuration for Quince to use the 16GB VRAM of the 5060 Ti efficiently is as follows.
| |
| services:
| |
| jellyfin:
| |
| image: jellyfin/jellyfin:latest
| |
| container_name: jellyfin
| |
| network_mode: host # Best for DLNA/local discovery
| |
| user: 1000:1000 # Assuming nigel is UID 1000
| |
| volumes:
| |
| - /mnt/jellyfin/docker/jellyfin/config:/config
| |
| - /mnt/jellyfin/docker/jellyfin/cache:/cache
| |
| - /mnt/jellyfin:/media
| |
| deploy:
| |
| resources:
| |
| reservations:
| |
| devices:
| |
| - driver: nvidia
| |
| count: 1
| |
| capabilities: [gpu, video]
| |
| restart: unless-stopped
| |
| | |
| ollama:
| |
| image: ollama/ollama:latest
| |
| container_name: ollama
| |
| volumes:
| |
| - /mnt/jellyfin/docker/ollama:/root/.ollama
| |
| ports:
| |
| - "11434:11434"
| |
| deploy:
| |
| resources:
| |
| reservations:
| |
| devices:
| |
| - driver: nvidia
| |
| count: 1
| |
| capabilities: [gpu]
| |
| restart: unless-stopped
| |
| | |
| open-webui:
| |
| image: ghcr.io/open-webui/open-webui:main
| |
| container_name: open-webui
| |
| volumes:
| |
| - /mnt/jellyfin/docker/open-webui:/app/data
| |
| environment:
| |
| - 'OLLAMA_BASE_URL=http://ollama:11434'
| |
| - 'WEBUI_SECRET_KEY=change_me_to_a_long_random_string' # Crucial for security
| |
| - 'ENABLE_SIGNUP=true' # Set to false after you create your account
| |
| ports:
| |
| - "3000:8080"
| |
| extra_hosts:
| |
| - "host.docker.internal:host-gateway"
| |
| restart: unless-stopped
| |
| | |
| If the test container is still running stop it with the command
| |
| docker stop ollama && docker rm ollama
| |
| Launch the stack with
| |
| docker compose up -d
| |
| Verify the three apps are running with
| |
| docker ps
| |
| change the secretkey to a random string for security
| |
| 'WEBUI_SECRET_KEY=change_me_to_a_long_random_string' # Crucial for security
| |
| Once the OpenWebui has been logged in with a username and password change the signup to false
| |
| - 'ENABLE_SIGNUP=true' # Set to false after you create your account
| |