AI & Jellyfin: Difference between revisions

From Sea of Fate
Jump to navigationJump to search
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Introduction==
The objective for Quince is to make it the power house Virtual Machine for the AI so it needs to have the [[Linux Docker And GPU Passthrough | Nvidia 5060TI GPU passthrough ]] completed . At the same time it will also run a media server in the form of Jellyfin so that the GPU can do the transcoding. With the AI being serviced by this host we can use '''[[Data Archive | Blackberry]]''' for the Data Harvesting.
== Docker Applications installed on Quince ==
== Docker Applications installed on Quince ==


===Installation Strategy===
We are going to need to install several applications that will share the GPU. The first will be Dockge so that any new containers can be managed easily. We will also need to install Ollama so that we can run LLMs easily. Then to make use of the data archive we can use AnythingLLM.
 
==Installation Strategy==


Once the Blackwell GPU passthrough was verified on the Pear host, we transitioned to the Quince VM to set up the containerized environment. This allows us to run high-performance AI (Ollama) and media (Jellyfin) apps while keeping the base OS clean.
Once the Blackwell GPU passthrough was verified on the Pear host, we transitioned to the Quince VM to set up the containerized environment. This allows us to run high-performance AI (Ollama) and media (Jellyfin) apps while keeping the base OS clean.
Line 27: Line 33:
  sudo nvidia-ctk runtime configure --runtime=docker
  sudo nvidia-ctk runtime configure --runtime=docker
  sudo systemctl restart docker
  sudo systemctl restart docker
===Verification & First App: Ollama ===
'''Update''' all Container yaml are separate


We used the "Pull-on-Demand" feature to deploy Ollama. Docker automatically fetched the image from Docker Hub since it wasn't in the local "storage locker."
==Docker Applications==
docker run -d \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  --restart unless-stopped \
  ollama/ollama
Note on Parameters
* --gpus all: Crucial. Without this, the container sees a CPU only.
* -v ollama:/root/.ollama: Preserves model weights (like Qwen or Llama) even if the container is deleted/upgraded.
* Storage Logic: Docker Hub images are immutable; any changes you want to keep (like downloaded AI models) must be stored in a volume -v.
* --restart unless-stopped: Ensures your AI is always online after a reboot.
* Docker identifies the "app" by the Image Name at the very end of your command In this case, the name was ollama/ollama. Docker treats this like a URL. It looks at its local "storage locker" first. If it doesn't see ollama/ollama there, it automatically reaches out to Docker Hub (the global library of apps) to find it. This is the "Pull-on-Demand" feature.
* Status Check: After running this, use docker exec -it ollama nvidia-smi to prove the container sees the Blackwell card.


===Final Integration Step ===
===Dockge Setup===
Now that docker is installed we can install Dockge. To get Dockge running on Quince, we followed a specific path that keeps it isolated but powerful enough to manage your other stacks (Jellyfin, AnythingLLM, etc.).
Since Dockge is a "Docker manager that runs inside Docker," the installation is a bit like a "Inception" move, we create a directory for it, and then use a simple script or a compose.yaml to launch it.
* Create the Directory Structure On Quince, we created a dedicated space for Dockge and all the future stacks you'll build. open a terminal and run the following
mkdir -p /mnt/docker_data/dockge /mnt/docker_data/stacks
cd /mnt/docker_data/dockge
* Download the Compose File We pulled the official configuration. Dockge needs access to the Docker Socket (/var/run/docker.sock) so it can "reach out" and control the other containers on Quince.
curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml --output compose.yaml
* Launch Dockge We started it up in "detached" mode.
docker compose up -d
Now that Dockge is running we have a web interface on Quince:5001 and we can use it to add new services.


With Docker verified, we move from manual docker run commands to Docker Compose (.yaml). This allows for "Infrastructure as Code," where we can define our 16GB VRAM reservations and Pearpool log paths in a single, repeatable file.
===Jellyfin Installation===
===The "Blackwell Stack" Compose File for Quince===
 
To give the media server plenty of storage we have mounted a 3TB data drive at /mnt/jellyfin/ to store all of the media files, we have another drive at /mnt/docker_data to hold all of the configuration data for the containers. We had some difficulty with a left over jellyfin container instalation so we called this container jellyfin-new to start with. The yaml file for "jellyfin-new" is as follows.


The Compose file should be created in the home directory
  nano ~/compose.yaml
The configuration for Quince to use the 16GB VRAM of the 5060 Ti efficiently is as follows.
  services:
  services:
   jellyfin:
   jellyfin:
     image: jellyfin/jellyfin:latest
     image: jellyfin/jellyfin
     container_name: jellyfin
     container_name: jellyfin
    network_mode: host # Best for DLNA/local discovery
     user: 1000:1000
     user: 1000:1000  # Assuming nigel is UID 1000
     # ADD THIS SECTION:
     volumes:
    runtime: nvidia # Tells Docker to use the NVIDIA Container Toolkit
      - /mnt/jellyfin/docker/jellyfin/config:/config
      - /mnt/jellyfin/docker/jellyfin/cache:/cache
      - /mnt/jellyfin:/media
     deploy:
     deploy:
       resources:
       resources:
Line 70: Line 65:
             - driver: nvidia
             - driver: nvidia
               count: 1
               count: 1
               capabilities: [gpu, video]
               capabilities: [gpu, video] # Grants access to NVENC/NVDEC
    ports:
      - 8096:8096
    volumes:
      - /mnt/docker_data/jellyfin/config:/config
      - /mnt/docker_data/jellyfin/cache:/cache
      - /mnt/jellyfin/audiobooks:/data/audiobooks
      - /mnt/jellyfin/oldfilms:/data/movies
      - /mnt/jellyfin/oldseries:/data/tvshows
      - /mnt/jellyfin/dji:/data/dji
     restart: unless-stopped
     restart: unless-stopped
networks: {}


The web interface for Jellyfin is on port 8096 so http://quince:8096 will bring it up. To setup the local devices either with a webrowser or the Jellyfin Media player the firewall will forward 8096 to the media server.
* To use the Nvidia Toolkit with jellyfin we set the device driver to nvidia in the yaml file.
* After clicking "Deploy" in Dockge, we can verified the container could "talk" to the 5060 Ti with the command
docker exec -it jellyfin nvidia-smi
We should have an output with the line something like
NVIDIA-SMI 580.126.09            Driver Version: 580.126.09    CUDA Version: 13.0
====Post-Install Playback Optimization====
Inside the Jellyfin Web UI (Dashboard > Playback), we have enabled Nvidia NVENC and checked the following critical Blackwell features:
* Hardware Decoding: H264, HEVC, AV1, VP9.
* AV1 Encoding: Allowed (The 5060 Ti is one of the few cards that can do this, significantly saving bandwidth).
* Tone Mapping: Enabled (Necessary for playing 4K HDR DJI drone footage on SDR screens).
===AI applications ===
We will need several inter-connected AI applications.
====Ollama installation====
We want to be able to run a variety of LLMs so will setup Ollama with a
* Container name ollama and a yaml file as follows
services:
   ollama:
   ollama:
     image: ollama/ollama:latest
     image: ollama/ollama:latest
     container_name: ollama
     container_name: ollama
     volumes:
     volumes:
       - /mnt/jellyfin/docker/ollama:/root/.ollama
       - /mnt/docker_data/ollama:/root/.ollama
     ports:
     networks:
       - "11434:11434"
       - ai-network
     deploy:
     deploy:
       resources:
       resources:
         reservations:
         reservations:
           devices:
           devices:
             - driver: nvidia  
             - driver: nvidia
               count: 1
               count: 1
               capabilities: [gpu]
               capabilities:
                - gpu
     restart: unless-stopped
     restart: unless-stopped
networks:
  ai-network:
    external: true
* Note that the device driver is set to nvidia
* To verify it's in the GPU
docker exec -it ollama nvidia-smi


  open-webui:
====Loading a model via SSH ====
    image: ghcr.io/open-webui/open-webui:main
Since Ollama is running as a container, we don't run ollama run directly on the Quince host. We run it through Docker
     container_name: open-webui
* To download and start chatting with a model immediately:
docker exec -it ollama ollama run llama3.1:8b
* To just download (pull) a model to the disk without starting a chat:
docker exec -it ollama ollama pull mistral:7b
*What happens next
** Ollama will download the model manifest and layers.
** Because you mapped /mnt/docker_data/ollama, these models are saved to the SSD-backed 100GB config drive, ensuring they load into the 5060 Ti's VRAM almost instantly.
* To check what is currently on your disk, complete with the version and size run:
docker exec -it ollama ollama list
 
{| class="wikitable"
|+ Expected Models
|-
! Model !! Size !! Why run it?
|-
| Llama 3.2 (3B) || ~2.0 GB || Lightning fast. Great for simple summaries and quick tasks.
|-
| Mistral (7B) || ~4.1 GB || The "Gold Standard" for general purpose local AI. Very reliable.
|-
| Llama 3.1 (8B) || ~4.7 GB || Excellent reasoning; very "human" in its responses.
|-
|DeepSeek-Coder (6.7B) || ~4.0 GB || If you want help writing scripts or fixing Nginx configs.
|-
|Command R (35B) || ~20 GB || The Stretch Goal. At 4-bit quantization, this might fit or spill slightly into system RAM. It’s incredibly smart for document analysis.
|-
|Qwen2.5-Coder || ~4.7 GB || Good for coding in Go
|-
|GPT-OSS:20B || ~13 GB ||
|-
|gemma3:12b || ~8.1 GB ||
|}
 
When you run the command, Ollama tells you exactly what it's doing in the terminal.
* For Mistral: It pulls the 7B version by default so no need to be specific.
docker exec -it ollama ollama run mistral
* For Llama: it pulls the 8B version by default.
docker exec -it ollama ollama run llama3.1
To be 100% specific, you can use "tags" like this:
docker exec -it ollama ollama run mistral:7b
docker exec -it ollama ollama run llama3.1:8b
 
* How to verify it's in the GPU (The "Acid Test"). This is the most important part for the 5060 Ti. While chatting with a model in one SSH window, open a second SSH window and run:
docker exec -it ollama nvidia-smi
Look at the Memory-Usage and the Processes list at the bottom. It should show ollama using ~4.5 GB or ~5.2 GB of VRAM, it is successfully running on the GPU. If it stays at 4MiB / 16311MiB, it’s "hallucinating" on your CPU instead, and we’d need to check the drivers.
 
===AnythingLLM ===
 
The objective of AnythingLLM is to take data from personal data stores like a collection of books or other data store and use a LLM from Ollama to extract information as required, for example if the data source is a concerned with Go programming language, the user could have the LLM write a Go application while being safe in the knowledge that no outside influence will have tainted the code. So the main uses can be summed up as:
* '''Retrieval-Augmented Generation (RAG):''' This is the "killer feature." It allows the user to ask a model (like Mistral) questions about their own files. Instead of the AI guessing, it "reads" the ArchiveBox documents first and provides answers based only on their data.
* '''Workspace Isolation:''' It can use different "Workspaces" (e.g., one for Research, one for Home Server Logs, one for Family History). Each workspace has its own specific set of documents and its own "personality," so the AI doesn't get confused between work and hobbies.
* '''Multi-User "ChatGPT" Experience:''' It provides a polished, web-based interface that feels like ChatGPT but is hosted at http://quince.seaoffate.net(only from within the local network). This allows the use of the AI from any device on the local network (or via Raisin) without needing to use the SSH terminal.
* '''AI Agent Hub:''' Beyond just chatting, AnythingLLM can act as an Agent. It can browse the web, scrape new URLs given to it, even summarize entire folders of transcripts from the new Whisper-WebUI setup.
 
Why? Docker’s internal DNS resolves the name ollama to the correct internal IP automatically because they are on the same network.
* It should be mentioned that the network isolation provided by both Ollama and Anythingllm residing on the ai-network means AnythingLLM can reach the LLM at http://ollama:11434. This internal traffic never touches the main LAN, keeping the AI conversations extremely fast and private from other devices on the local network.
 
The AnythingLLM can be installed with the name "anythingllm" and the yaml file below
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
     container_name: anythingllm
    ports:
      - 3001:3001
    cap_add:
      - SYS_ADMIN
    environment:
      - STORAGE_DIR=/app/server/storage
     volumes:
     volumes:
       - /mnt/jellyfin/docker/open-webui:/app/data
       - /mnt/docker_data/anythingllm:/app/server/storage
    environment:
       - /mnt/docker_data/anythingllm/.env:/app/server/.env
       - 'OLLAMA_BASE_URL=http://ollama:11434'
       # This links the Blackberry NFS mount into AnythingLLM
       - 'WEBUI_SECRET_KEY=change_me_to_a_long_random_string' # Crucial for security
       - /mnt/archive_data/archivebox:/app/server/storage/documents/archivebox:ro
       - 'ENABLE_SIGNUP=true' # Set to false after you create your account
     networks:
    ports:
       - ai-network
      - "3000:8080"
     extra_hosts:
       - "host.docker.internal:host-gateway"
     restart: unless-stopped
     restart: unless-stopped
networks:
  ai-network:
    external: true
====🌐The "ai-network" Handshake====
Since we defined ai-network as an external network that both Ollama and AnythingLLM share, they can talk to each other using their Container Names instead of IP addresses. Inside the AnythingLLM Settings UI:
* '''LLM Provider:''' Select Ollama.
* '''Ollama URL:''' Use http://ollama:11434 (Not localhost and not the IP).
* Why? Docker’s internal DNS resolves the name ollama to the correct internal IP automatically because they are on the same network.
====📂The ArchiveBox "Read-Only" Link====
The volume mapping for ArchiveBox is very clever
/mnt/archive_data/archivebox:/app/server/storage/documents/archivebox:ro
'''We should note:'''
* The :ro flag: This is critical. It ensures AnythingLLM can read the archives to "learn" from them, but it can never accidentally delete or modify the ArchiveBox data.
* Visibility: In the AnythingLLM UI, when we go to "Upload Documents," we will see a folder named archivebox. We can then move those files into your Workspace to start chatting with our archived websites.
====🛠️Environment & Permissions====
* '''The .env file:''' We are mounting /mnt/docker_data/anythingllm/.env. Before we run docker compose up, we make sure that file actually exists (even if it's empty) or Docker might create it as a folder by mistake.
** Quick fix: touch /mnt/docker_data/anythingllm/.env
* SYS_ADMIN Capability: This is required by AnythingLLM because it uses a technology called Puppeteer to scrape websites. Without this "cap_add," the document scraper will likely fail.


If the test container is still running stop it with the command
====📊 Quince GPU Status (Ollama + AnythingLLM)====
  docker stop ollama && docker rm ollama
  Ollama Memory: 4900MiB / 16311MiB
Launch the stack with
The "Quiet" Documentation: Now that AnythingLLM is joining the party on Quince:
docker compose up -d
* VRAM: AnythingLLM itself uses almost zero VRAM. It just sends instructions to Ollama.
Verify the three apps are running with
* Concurrency: You can have Jellyfin transcoding a movie, Ollama running Mistral, and AnythingLLM scraping a document all at once on that 5060 Ti.
docker ps
change the secretkey to a random string for security
'WEBUI_SECRET_KEY=change_me_to_a_long_random_string' # Crucial for security
Once the OpenWebui has been logged in with a username and password change the signup to false
- 'ENABLE_SIGNUP=true' # Set to false after you create your account

Latest revision as of 18:53, 20 April 2026

Introduction

The objective for Quince is to make it the power house Virtual Machine for the AI so it needs to have the Nvidia 5060TI GPU passthrough completed . At the same time it will also run a media server in the form of Jellyfin so that the GPU can do the transcoding. With the AI being serviced by this host we can use Blackberry for the Data Harvesting.

Docker Applications installed on Quince

We are going to need to install several applications that will share the GPU. The first will be Dockge so that any new containers can be managed easily. We will also need to install Ollama so that we can run LLMs easily. Then to make use of the data archive we can use AnythingLLM.

Installation Strategy

Once the Blackwell GPU passthrough was verified on the Pear host, we transitioned to the Quince VM to set up the containerized environment. This allows us to run high-performance AI (Ollama) and media (Jellyfin) apps while keeping the base OS clean.

Docker Engine Installation

We use the official Docker repository to ensure access to v29+, which includes critical patches for Gen 5 PCIe and Blackwell architecture support.

sudo apt update
sudo apt install ca-certificates curl gnupg

Then setup the repository

sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Next Install Engine & Compose

sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

NVIDIA Container Toolkit (The "Magic Bridge")

This toolkit enables the libnvidia-container library, which maps the physical GPU device files (/dev/nvidia0, etc.) into the virtualized Docker namespace.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit

and last we configure the Nvidia Container tool Kit and restart Docker

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Docker Applications

Dockge Setup

Now that docker is installed we can install Dockge. To get Dockge running on Quince, we followed a specific path that keeps it isolated but powerful enough to manage your other stacks (Jellyfin, AnythingLLM, etc.). Since Dockge is a "Docker manager that runs inside Docker," the installation is a bit like a "Inception" move, we create a directory for it, and then use a simple script or a compose.yaml to launch it.

  • Create the Directory Structure On Quince, we created a dedicated space for Dockge and all the future stacks you'll build. open a terminal and run the following
mkdir -p /mnt/docker_data/dockge /mnt/docker_data/stacks
cd /mnt/docker_data/dockge
  • Download the Compose File We pulled the official configuration. Dockge needs access to the Docker Socket (/var/run/docker.sock) so it can "reach out" and control the other containers on Quince.
curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml --output compose.yaml
  • Launch Dockge We started it up in "detached" mode.
docker compose up -d

Now that Dockge is running we have a web interface on Quince:5001 and we can use it to add new services.

Jellyfin Installation

To give the media server plenty of storage we have mounted a 3TB data drive at /mnt/jellyfin/ to store all of the media files, we have another drive at /mnt/docker_data to hold all of the configuration data for the containers. We had some difficulty with a left over jellyfin container instalation so we called this container jellyfin-new to start with. The yaml file for "jellyfin-new" is as follows.

services:
  jellyfin:
    image: jellyfin/jellyfin
    container_name: jellyfin
    user: 1000:1000
    # ADD THIS SECTION:
    runtime: nvidia # Tells Docker to use the NVIDIA Container Toolkit
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu, video] # Grants access to NVENC/NVDEC
    ports:
      - 8096:8096
    volumes:
      - /mnt/docker_data/jellyfin/config:/config
      - /mnt/docker_data/jellyfin/cache:/cache
      - /mnt/jellyfin/audiobooks:/data/audiobooks
      - /mnt/jellyfin/oldfilms:/data/movies
      - /mnt/jellyfin/oldseries:/data/tvshows
      - /mnt/jellyfin/dji:/data/dji
    restart: unless-stopped
networks: {}

The web interface for Jellyfin is on port 8096 so http://quince:8096 will bring it up. To setup the local devices either with a webrowser or the Jellyfin Media player the firewall will forward 8096 to the media server.

  • To use the Nvidia Toolkit with jellyfin we set the device driver to nvidia in the yaml file.
  • After clicking "Deploy" in Dockge, we can verified the container could "talk" to the 5060 Ti with the command
docker exec -it jellyfin nvidia-smi

We should have an output with the line something like

NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0 

Post-Install Playback Optimization

Inside the Jellyfin Web UI (Dashboard > Playback), we have enabled Nvidia NVENC and checked the following critical Blackwell features:

  • Hardware Decoding: H264, HEVC, AV1, VP9.
  • AV1 Encoding: Allowed (The 5060 Ti is one of the few cards that can do this, significantly saving bandwidth).
  • Tone Mapping: Enabled (Necessary for playing 4K HDR DJI drone footage on SDR screens).

AI applications

We will need several inter-connected AI applications.

Ollama installation

We want to be able to run a variety of LLMs so will setup Ollama with a

  • Container name ollama and a yaml file as follows
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - /mnt/docker_data/ollama:/root/.ollama
    networks:
      - ai-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
    restart: unless-stopped
networks:
  ai-network:
    external: true
  • Note that the device driver is set to nvidia
  • To verify it's in the GPU
docker exec -it ollama nvidia-smi

Loading a model via SSH

Since Ollama is running as a container, we don't run ollama run directly on the Quince host. We run it through Docker

  • To download and start chatting with a model immediately:
docker exec -it ollama ollama run llama3.1:8b
  • To just download (pull) a model to the disk without starting a chat:
docker exec -it ollama ollama pull mistral:7b
  • What happens next
    • Ollama will download the model manifest and layers.
    • Because you mapped /mnt/docker_data/ollama, these models are saved to the SSD-backed 100GB config drive, ensuring they load into the 5060 Ti's VRAM almost instantly.
  • To check what is currently on your disk, complete with the version and size run:
docker exec -it ollama ollama list
Expected Models
Model Size Why run it?
Llama 3.2 (3B) ~2.0 GB Lightning fast. Great for simple summaries and quick tasks.
Mistral (7B) ~4.1 GB The "Gold Standard" for general purpose local AI. Very reliable.
Llama 3.1 (8B) ~4.7 GB Excellent reasoning; very "human" in its responses.
DeepSeek-Coder (6.7B) ~4.0 GB If you want help writing scripts or fixing Nginx configs.
Command R (35B) ~20 GB The Stretch Goal. At 4-bit quantization, this might fit or spill slightly into system RAM. It’s incredibly smart for document analysis.
Qwen2.5-Coder ~4.7 GB Good for coding in Go
GPT-OSS:20B ~13 GB
gemma3:12b ~8.1 GB

When you run the command, Ollama tells you exactly what it's doing in the terminal.

  • For Mistral: It pulls the 7B version by default so no need to be specific.
docker exec -it ollama ollama run mistral 
  • For Llama: it pulls the 8B version by default.
docker exec -it ollama ollama run llama3.1

To be 100% specific, you can use "tags" like this:

docker exec -it ollama ollama run mistral:7b
docker exec -it ollama ollama run llama3.1:8b
  • How to verify it's in the GPU (The "Acid Test"). This is the most important part for the 5060 Ti. While chatting with a model in one SSH window, open a second SSH window and run:
docker exec -it ollama nvidia-smi

Look at the Memory-Usage and the Processes list at the bottom. It should show ollama using ~4.5 GB or ~5.2 GB of VRAM, it is successfully running on the GPU. If it stays at 4MiB / 16311MiB, it’s "hallucinating" on your CPU instead, and we’d need to check the drivers.

AnythingLLM

The objective of AnythingLLM is to take data from personal data stores like a collection of books or other data store and use a LLM from Ollama to extract information as required, for example if the data source is a concerned with Go programming language, the user could have the LLM write a Go application while being safe in the knowledge that no outside influence will have tainted the code. So the main uses can be summed up as:

  • Retrieval-Augmented Generation (RAG): This is the "killer feature." It allows the user to ask a model (like Mistral) questions about their own files. Instead of the AI guessing, it "reads" the ArchiveBox documents first and provides answers based only on their data.
  • Workspace Isolation: It can use different "Workspaces" (e.g., one for Research, one for Home Server Logs, one for Family History). Each workspace has its own specific set of documents and its own "personality," so the AI doesn't get confused between work and hobbies.
  • Multi-User "ChatGPT" Experience: It provides a polished, web-based interface that feels like ChatGPT but is hosted at http://quince.seaoffate.net(only from within the local network). This allows the use of the AI from any device on the local network (or via Raisin) without needing to use the SSH terminal.
  • AI Agent Hub: Beyond just chatting, AnythingLLM can act as an Agent. It can browse the web, scrape new URLs given to it, even summarize entire folders of transcripts from the new Whisper-WebUI setup.

Why? Docker’s internal DNS resolves the name ollama to the correct internal IP automatically because they are on the same network.

  • It should be mentioned that the network isolation provided by both Ollama and Anythingllm residing on the ai-network means AnythingLLM can reach the LLM at http://ollama:11434. This internal traffic never touches the main LAN, keeping the AI conversations extremely fast and private from other devices on the local network.


The AnythingLLM can be installed with the name "anythingllm" and the yaml file below

services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    ports:
      - 3001:3001
    cap_add:
      - SYS_ADMIN
    environment:
      - STORAGE_DIR=/app/server/storage
    volumes:
      - /mnt/docker_data/anythingllm:/app/server/storage
      - /mnt/docker_data/anythingllm/.env:/app/server/.env
      # This links the Blackberry NFS mount into AnythingLLM
      - /mnt/archive_data/archivebox:/app/server/storage/documents/archivebox:ro
    networks:
      - ai-network
    restart: unless-stopped
networks:
  ai-network:
    external: true


🌐The "ai-network" Handshake

Since we defined ai-network as an external network that both Ollama and AnythingLLM share, they can talk to each other using their Container Names instead of IP addresses. Inside the AnythingLLM Settings UI:

  • LLM Provider: Select Ollama.
  • Ollama URL: Use http://ollama:11434 (Not localhost and not the IP).
  • Why? Docker’s internal DNS resolves the name ollama to the correct internal IP automatically because they are on the same network.

📂The ArchiveBox "Read-Only" Link

The volume mapping for ArchiveBox is very clever

/mnt/archive_data/archivebox:/app/server/storage/documents/archivebox:ro

We should note:

  • The :ro flag: This is critical. It ensures AnythingLLM can read the archives to "learn" from them, but it can never accidentally delete or modify the ArchiveBox data.
  • Visibility: In the AnythingLLM UI, when we go to "Upload Documents," we will see a folder named archivebox. We can then move those files into your Workspace to start chatting with our archived websites.

🛠️Environment & Permissions

  • The .env file: We are mounting /mnt/docker_data/anythingllm/.env. Before we run docker compose up, we make sure that file actually exists (even if it's empty) or Docker might create it as a folder by mistake.
    • Quick fix: touch /mnt/docker_data/anythingllm/.env
  • SYS_ADMIN Capability: This is required by AnythingLLM because it uses a technology called Puppeteer to scrape websites. Without this "cap_add," the document scraper will likely fail.

📊 Quince GPU Status (Ollama + AnythingLLM)

Ollama Memory: 4900MiB / 16311MiB

The "Quiet" Documentation: Now that AnythingLLM is joining the party on Quince:

  • VRAM: AnythingLLM itself uses almost zero VRAM. It just sends instructions to Ollama.
  • Concurrency: You can have Jellyfin transcoding a movie, Ollama running Mistral, and AnythingLLM scraping a document all at once on that 5060 Ti.