AI & Jellyfin: Difference between revisions
Wikisailor (talk | contribs) |
Wikisailor (talk | contribs) |
||
| Line 123: | Line 123: | ||
====Loading a model via SSH ==== | ====Loading a model via SSH ==== | ||
Since Ollama is running as a container, we don't run ollama run directly on the Quince host. We run it through Docker | Since Ollama is running as a container, we don't run ollama run directly on the Quince host. We run it through Docker | ||
* To download and start chatting with a model immediately: | |||
docker exec -it ollama ollama run llama3. | docker exec -it ollama ollama run llama3.1:8b | ||
* To just download (pull) a model to the disk without starting a chat: | |||
docker exec -it ollama ollama pull mistral:7b | |||
docker exec -it ollama ollama pull mistral | *What happens next | ||
** Ollama will download the model manifest and layers. | |||
** Because you mapped /mnt/docker_data/ollama, these models are saved to the SSD-backed 100GB config drive, ensuring they load into the 5060 Ti's VRAM almost instantly. | |||
* To check what is currently on your disk, complete with the version and size run: | |||
docker exec -it ollama ollama list | |||
{| class="wikitable" | |||
|+ Expected Models | |||
|- | |||
! Model !! Size !! Why run it? | |||
|- | |||
| Llama 3.2 (3B) || ~2.0 GB || Lightning fast. Great for simple summaries and quick tasks. | |||
|- | |||
| Mistral (7B) || ~4.1 GB || The "Gold Standard" for general purpose local AI. Very reliable. | |||
|- | |||
| Llama 3.1 (8B) || ~4.7 GB || Excellent reasoning; very "human" in its responses. | |||
|- | |||
|DeepSeek-Coder (6.7B) || ~4.0 GB || If you want help writing scripts or fixing Nginx configs. | |||
|- | |||
|Command R (35B) || ~20 GB || The Stretch Goal. At 4-bit quantization, this might fit or spill slightly into system RAM. It’s incredibly smart for document analysis. | |||
|} | |||
When you run the command, Ollama tells you exactly what it's doing in the terminal. | |||
* For Mistral: It pulls the 7B version by default so no need to be specific. | |||
docker exec -it ollama ollama run mistral | |||
* For Llama: it pulls the 8B version by default. | |||
docker exec -it ollama ollama run llama3.1 | |||
To be 100% specific, you can use "tags" like this: | |||
docker exec -it ollama ollama run mistral:7b | |||
docker exec -it ollama ollama run llama3.1:8b | |||
Revision as of 11:33, 11 February 2026
Introduction
The objective for Quince is to make it the power house Virtual Machine for the AI so it needs to have the Nvidia 5060TI GPU passthrough completed . At the same time it will also run a media server in the form of Jellyfin so that the GPU can do the transcoding. With the AI being serviced by this host we can use Blackberry for the Data Harvesting.
Docker Applications installed on Quince
We are going to need to install several applications that will share the GPU. The first will Dockge so that any new containers can be managed easily. We will also need to install Ollama so that we can run LLMs easily. Then to make use of the data archive we can use AnythingLLM.
Installation Strategy
Once the Blackwell GPU passthrough was verified on the Pear host, we transitioned to the Quince VM to set up the containerized environment. This allows us to run high-performance AI (Ollama) and media (Jellyfin) apps while keeping the base OS clean.
Docker Engine Installation
We use the official Docker repository to ensure access to v29+, which includes critical patches for Gen 5 PCIe and Blackwell architecture support.
sudo apt update sudo apt install ca-certificates curl gnupg
Then setup the repository
sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Next Install Engine & Compose
sudo apt update sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
NVIDIA Container Toolkit (The "Magic Bridge")
This toolkit enables the libnvidia-container library, which maps the physical GPU device files (/dev/nvidia0, etc.) into the virtualized Docker namespace.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt update sudo apt install -y nvidia-container-toolkit
and last we configure the Nvidia Container tool Kit and restart Docker
sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
Docker Applications
Dockge Setup
Now that docker is installed we can install Dockge. To get Dockge running on Quince, we followed a specific path that keeps it isolated but powerful enough to manage your other stacks (Jellyfin, AnythingLLM, etc.). Since Dockge is a "Docker manager that runs inside Docker," the installation is a bit like a "Inception" move, we create a directory for it, and then use a simple script or a compose.yaml to launch it.
- Create the Directory Structure On Quince, we created a dedicated space for Dockge and all the future stacks you'll build. open a terminal and run the following
mkdir -p /mnt/docker_data/dockge /mnt/docker_data/stacks cd /mnt/docker_data/dockge
- Download the Compose File We pulled the official configuration. Dockge needs access to the Docker Socket (/var/run/docker.sock) so it can "reach out" and control the other containers on Quince.
curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml --output compose.yaml
- Launch Dockge We started it up in "detached" mode.
docker compose up -d
Now that Dockge is running we have a web interface on Quince:5001 and we can use it to add new services.
Jellyfin Installation
To give the media server plenty of storage we have mounted a 3TB data drive at /mnt/jellyfin/ to store all of the media files, we have another drive at /mnt/docker_data to hold all of the configuration data for the containers. We had some difficulty with a left over jellyfin container instalation so we called this container jellyfin-new. The yaml file for "jellyfin-new" is as follows.
services:
jellyfin:
image: jellyfin/jellyfin
container_name: jellyfin
user: 1000:1000
# ADD THIS SECTION:
runtime: nvidia # Tells Docker to use the NVIDIA Container Toolkit
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu, video] # Grants access to NVENC/NVDEC
ports:
- 8096:8096
volumes:
- /mnt/docker_data/jellyfin/config:/config
- /mnt/docker_data/jellyfin/cache:/cache
- /mnt/jellyfin/audiobooks:/data/audiobooks
- /mnt/jellyfin/oldfilms:/data/movies
- /mnt/jellyfin/oldseries:/data/tvshows
- /mnt/jellyfin/dji:/data/dji
restart: unless-stopped
networks: {}
The web interface for Jellyfin is on port 8096 so http://quince:8096 will bring it up. To setup the local devices either with a webrowser or the Jellyfin Media player the firewall will forward 8096 to the media server.
- To use the Nvidia Toolkit with jellyfin we set the device driver to nvidia in the yaml file.
- After clicking "Deploy" in Dockge, we can verified the container could "talk" to the 5060 Ti with the command
docker exec -it jellyfin nvidia-smi
We should have an output with the line something like
NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0
Post-Install Playback Optimization
Inside the Jellyfin Web UI (Dashboard > Playback), we have enabled Nvidia NVENC and checked the following critical Blackwell features:
- Hardware Decoding: H264, HEVC, AV1, VP9.
- AV1 Encoding: Allowed (The 5060 Ti is one of the few cards that can do this, significantly saving bandwidth).
- Tone Mapping: Enabled (Necessary for playing 4K HDR DJI drone footage on SDR screens).
AI applications
We will need several inter-connected AI applications.
Ollama installation
We want to be able to run a variety of LLMs so will setup Ollama with a
- Container name ollama and a yaml file as follows
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- /mnt/docker_data/ollama:/root/.ollama
networks:
- ai-network
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities:
- gpu
restart: unless-stopped
networks:
ai-network:
external: true
- Note that the device driver is set to nvidia
Loading a model via SSH
Since Ollama is running as a container, we don't run ollama run directly on the Quince host. We run it through Docker
- To download and start chatting with a model immediately:
docker exec -it ollama ollama run llama3.1:8b
- To just download (pull) a model to the disk without starting a chat:
docker exec -it ollama ollama pull mistral:7b
- What happens next
- Ollama will download the model manifest and layers.
- Because you mapped /mnt/docker_data/ollama, these models are saved to the SSD-backed 100GB config drive, ensuring they load into the 5060 Ti's VRAM almost instantly.
- To check what is currently on your disk, complete with the version and size run:
docker exec -it ollama ollama list
| Model | Size | Why run it? |
|---|---|---|
| Llama 3.2 (3B) | ~2.0 GB | Lightning fast. Great for simple summaries and quick tasks. |
| Mistral (7B) | ~4.1 GB | The "Gold Standard" for general purpose local AI. Very reliable. |
| Llama 3.1 (8B) | ~4.7 GB | Excellent reasoning; very "human" in its responses. |
| DeepSeek-Coder (6.7B) | ~4.0 GB | If you want help writing scripts or fixing Nginx configs. |
| Command R (35B) | ~20 GB | The Stretch Goal. At 4-bit quantization, this might fit or spill slightly into system RAM. It’s incredibly smart for document analysis. |
When you run the command, Ollama tells you exactly what it's doing in the terminal.
- For Mistral: It pulls the 7B version by default so no need to be specific.
docker exec -it ollama ollama run mistral
- For Llama: it pulls the 8B version by default.
docker exec -it ollama ollama run llama3.1
To be 100% specific, you can use "tags" like this:
docker exec -it ollama ollama run mistral:7b docker exec -it ollama ollama run llama3.1:8b