Unified Monitoring Stack: Difference between revisions

From Sea of Fate
Jump to navigationJump to search
Line 5: Line 5:
By consolidating these services, we reduce network overhead and simplify the management of our monitoring infrastructure while maintaining 12-month data retention on a dedicated 500GB storage pool.
By consolidating these services, we reduce network overhead and simplify the management of our monitoring infrastructure while maintaining 12-month data retention on a dedicated 500GB storage pool.


== 🐋[[Mango Management Interfaces|Mango as a Dockge Master]]==
== 🐋Mango as a Dockge Master==


We will also be installing Docker and Dockge so that we can manage all of the docker installations from a single master management interface rather than the three individual Dockge WebGUIs
We will also be installing Docker and Dockge so that we can manage all of the docker installations from a single master management interface rather than the three individual Dockge WebGUIs. It transforms Mango from just a monitoring server into a true Management Hub for the entire cluster. Since Mango is running a native VictoriaMetrics install, we need to be careful with Docker installation so it doesn't conflict with your existing systemd services.
 
===The Strategy: The Manager's Desk===
 
We aren't moving the actual containers (like Jellyfin or Ollama) to Mango—those stay on Quince, Blackberry, and Tayberry. Instead, we are setting up Dockge on Mango to act as the single interface that logs into the other three using the Dockge Agent or Remote Contexts.
 
===Phase One Install Docker on Mango===
 
Since Mango is on Debian, we’ll use the official Docker repository to ensure we get the latest Compose features that Dockge relies on. We will start with the addition Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0.755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
Then add the repository to the apt sources:
echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  sudo apt-get update
Install Docker and Compose:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin


== 🚦Security & Network Architecture==
== 🚦Security & Network Architecture==

Revision as of 02:05, 23 February 2026

📖Introduction

Mango, located at 192.168.110.133 on the Infra network, is the unified successor to the Prometheus & Grafana and Victoria triad. It serves as the central hub for the Home Lab's observability. Mango natively scrapes metrics from all Virtual Machines, the Proxmox host(Pear) and the services, stores them in a high-performance VictoriaMetrics time-series database, and provides a Grafana interface for visualization. As an additional step we can manage the Docker installations that have Dockge with one master Dockge installation on Mango and possibly manage any Minecraft servers from this same host.

By consolidating these services, we reduce network overhead and simplify the management of our monitoring infrastructure while maintaining 12-month data retention on a dedicated 500GB storage pool.

🐋Mango as a Dockge Master

We will also be installing Docker and Dockge so that we can manage all of the docker installations from a single master management interface rather than the three individual Dockge WebGUIs. It transforms Mango from just a monitoring server into a true Management Hub for the entire cluster. Since Mango is running a native VictoriaMetrics install, we need to be careful with Docker installation so it doesn't conflict with your existing systemd services.

The Strategy: The Manager's Desk

We aren't moving the actual containers (like Jellyfin or Ollama) to Mango—those stay on Quince, Blackberry, and Tayberry. Instead, we are setting up Dockge on Mango to act as the single interface that logs into the other three using the Dockge Agent or Remote Contexts.

Phase One Install Docker on Mango

Since Mango is on Debian, we’ll use the official Docker repository to ensure we get the latest Compose features that Dockge relies on. We will start with the addition Docker's official GPG key:

sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0.755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

Then add the repository to the apt sources:

echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 sudo apt-get update

Install Docker and Compose:

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

🚦Security & Network Architecture

Mango sits within the Infra network. Because it aggregates data from every host in the lab, it is a high-value target.

  • Web Interfaces: Grafana (Port 3000) and VictoriaMetrics VMUI (Port 8428) are restricted via pfSense to be accessible only from the MGT network (Cinnamon/Lemon).
  • Scraping Flow: Mango acts as the source for all scrape requests. pfSense rules must allow Mango to reach out to Production, VPN, and Terminal networks on specific exporter ports (9100, 9113, 9117, etc.).
  • Storage Pool: Data is stored on a dedicated 500GB virtual disk (PearPool), mounted at /mnt/metrics_data to ensure that metric growth never impacts the OS root partition.

🏛️Environment & Storage Setup

The VM was created using the Debian Gold Master template.

  • Hostname: Mango
  • IP/Gateway: 192.168.110.133 / 192.168.110.1
  • Disk 1 (OS): 32GB
  • Disk 2 (Data): 500GB (Added via Proxmox)

Storage Initialization To handle the long-term metrics, the 500GB disk was initialized and mounted:

# Identify disk (sdb), format, and mount
sudo mkfs.ext4 /dev/sdb
sudo mkdir -p /mnt/metrics_data
sudo mount /dev/sdb /mnt/metrics_data
# Ensure persistence in /etc/fstab
/dev/sdb  /mnt/metrics_data  ext4  defaults  0  2

🔧Installation

⚡VictoriaMetrics Installation

VictoriaMetrics was installed as a native binary (not Docker) to replace both the Prometheus scraper and the Victoria storage VM.

  • User & Directory Setup
sudo useradd --no-create-home --shell /bin/false victoriametrics
sudo mkdir /etc/victoriametrics
sudo chown -R victoriametrics:victoriametrics /etc/victoriametrics /mnt/metrics_data
  • Binary Installation

Binaries were retrieved from the VictoriaMetrics GitHub.

wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.xx.x/victoria-metrics-linux-amd64-v1.xx.x.tar.gz
tar -xvf victoria-metrics-linux-amd64-v1.xx.x.tar.gz
sudo mv victoria-metrics-prod /usr/local/bin/victoriametrics
sudo chown victoriametrics:victoriametrics /usr/local/bin/victoriametrics
  • Service Configuration
sudo nano /etc/systemd/system/victoriametrics.service
[Service]
ExecStart=/usr/local/bin/victoriametrics \
  --storageDataPath=/mnt/metrics_data \
  --retentionPeriod=12 \
  --promscrape.config=/etc/victoriametrics/prometheus.yml \
  --httpListenAddr=0.0.0.0:8428

Note: The --retentionPeriod=12 ensures one year of history.

🔍Scraping Configuration (prometheus.yml)

VictoriaMetrics uses the standard Prometheus YAML format for its scraper. The file was copied from the older Prometheus host Pineapple and copied to:

sudo nano /etc/victoriametrics/prometheus.yml

Key Change: The evaluation_interval directive was removed as it is not natively supported by the VictoriaMetrics single-binary scraper (it expects vmalert for that).

🧪Target Jobs

The configuration includes the legacy fleet plus the new 2026 additions:

  • Infrastructure: Mango (Self), CTNS1.
  • Production:
    • Reverse proxy (Nginx) Raisin
    • Webservers (Apache) Plum, Satsuma, Fig
    • Database server (MySQL) Mandarin
  • New 2026 Hosts: Blackcurrant (Data & Archive), Quince (AI/Media), Tayberry (OpenAlex).
  • Gaming: Apple & Cherry (Minecraft Servers).
  • Terminals:
    • (NoMachine) Kiwiberry
    • (XRDP), Kapok
    • (Windows, RDP) Wahoo/Walnut .

Scrape Interval: Set to 120s to balance data resolution with disk I/O and longevity.

Adding the Dockge Targets

we had to update the /etc/victoriametrics/prometheus.yml to include the docker containers

#scrape_configs:
  - job_name: 'docker_containers'
    static_configs:
      - targets:
          - 'quince.seaoffate.net:8080'      # cAdvisor (AI Stack)
          - 'blackberry.seaoffate.net:8080'  # cAdvisor (Data Archive)
          - 'tayberry.seaoffate.net:8080'    # cAdvisor (OpenAlex)
  - job_name: 'gpu_metrics'
    static_configs:
      - targets: ['quince.seaoffate.net:9400'] # DCGM Exporter
  - job_name: 'jellyfin'
    metrics_path: '/metrics' # Crucial: tells VM where to look on port 8096
    static_configs:
      - targets: ['quince.seaoffate.net:8096']

Target Agent Installation (Scrapers)

For Mango to collect data, each target VM must run a specific exporter. Most Linux hosts use the node_exporter for OS metrics, while application-specific exporters are used for Nginx, Apache, and MySQL.

Linux Node Exporter (Standard for all VMs)

Installed on all Linux hosts (Raisin, Plum, Satsuma, Apple, Cherry, etc.) to monitor CPU, RAM, and Disk. Any hosts that don't show on the targets webpage need to have the agent installed.

  • Install via APT
sudo apt update && sudo apt install -y prometheus-node-exporter
  • Enable and Start
sudo systemctl enable --now prometheus-node-exporter
  • Verification (Run on target VM)
curl http://localhost:9100/metrics
  • Firewall Requirement: Target VM must allow Inbound TCP 9100 from Mango (192.168.110.133).

Nginx (Raisin)

Used to monitor active connections and request rates. the Nginx exporter is a standalone binary that talks to Nginx's stub_status module.

  • Enable Nginx Status: On Raisin, edit the Nginx config (e.g., /etc/nginx/sites-available/default) and add this block:
server {
    listen 127.0.0.1:8080;
    location /metrics {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}
sudo nginx -s reload
  • Install & Run Exporter
wget https://github.com/nginx/nginx-prometheus-exporter/releases/latest/download/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
tar -xvf nginx-prometheus-exporter_*.tar.gz
sudo mv nginx-prometheus-exporter /usr/local/bin/
  • Create a Systemd Service
sudo nano /etc/systemd/system/nginx-exporter.service

Paste this into the service file

[Unit]
Description=Nginx Prometheus Exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/nginx-prometheus-exporter \
    -nginx.scrape-uri=http://127.0.0.1:8080/metrics
Restart=always
[Install]
WantedBy=multi-user.target

Enable the service with

sudo systemctl enable --now nginx-exporter

MySQL (Mandarin)

Used to monitor query throughput and database health.

  • Database User: Create a mysqld_exporter user in MySQL with PROCESS, REPLICATION CLIENT, SELECT privileges.
  • Configuration: Store credentials in /etc/.mysqld_exporter.cnf.
  • Service: Install prometheus-mysqld-exporter via APT.
  • Port: TCP 9104

Apache (Plum, Fig, Satsuma)

  • Enable Mod Status:
sudo a2enmod status.
  • Install Exporter:
sudo apt install prometheus-apache-exporter
  • Port: TCP 9117

Windows Exporter (Wahoo & Walnut)

For the Windows 11 desktops, we use the windows_exporter

  • Download: Latest .msi from the Prometheus Community GitHub.
  • Install: Run the installer; it defaults to port 9182.
  • Firewall: The installer typically adds a "Windows Firewall" exception automatically.

Docker & Container App Exporters

Since we are using Dockge to manage our containers on hosts like Quince (AI), Blackcurrant (Archive), and Tayberry (OpenAlex), we should standardize how metrics are pulled from these environments. The most efficient way to do this is to add cAdvisor to each of our Dockge stacks. This allows Mango to "see" inside the Docker engine of that specific VM and report on the health of every individual container (Ollama, Jellyfin, etc.).

Docker Container Monitoring (The Dockge Layer)

For every VM running Dockge, you need to add a Monitoring Stack or add these services to your existing stacks. cAdvisor is the primary agent here; it scrapes resource usage from the Docker socket.

  • Create a "Monitoring" Stack in Dockge for Blackberry and Tayberry: In the Dockge UI, create a new stack and use the following
    • Container_name: cadvisor
version: "3.8"
services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.0 # Use a version compatible with your kernel
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    ports:
      - 8080:8080
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
  • Create a "Monitoring" Stack in Dockge for Quince: In the Dockge UI, create a new stack and use the following
    • Container_name: cadvisor
services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.0
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    ports:
      - 8080:8080
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    devices:
      - /dev/kmsg
  nv-exporter:
    image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.0-ubuntu22.04
    container_name: nvidia_exporter
    restart: unless-stopped
    # Use 'command' to force it to listen on the network interface
    command:
      - -a
      - 0.0.0.0:9400
    ports:
      - 9400:9400 # Map host 9400 to container 9400
    cap_add:
      - SYS_ADMIN
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities:
                - gpu
networks: {}


Port Summary for Dockge Hosts (these ports must be opened in Pfsense for the exporters to report their status to Mango

  • 8080: cAdvisor (Container CPU/RAM/Network)
  • 9400: NVIDIA Exporter (GPU VRAM/Temp - Quince only)
Jellyfin monitoring

the switch to enable the metrics is not an option in the WebUI of our version of Jellyfin so we will have to enable it in the xml config. Since our Jellyfin config is mapped to /mnt/docker_data/jellyfin/config, finding the needle in the haystack is much easier. Because we are using the official Jellyfin image, the system.xml file is the principal config of the operation. On your host (Quince), the file you need to edit is right here:

/mnt/docker_data/jellyfin/config/config/system.xml

We could stop the container and modify the .xml directly from the terminal on Quince. We can also use sed to find the false value and make it true without having to hunt through the XML manually, after stopping the container (within dockge) and making a backup:

cd /mnt/docker_data/jellyfin/config/config/
cp system.xml system.xml.bak

Change EnableMetrics from false to true

sed -i 's/<EnableMetrics>false<\/EnableMetrics>/<EnableMetrics>true<\/EnableMetrics>/' system.xml 

Now the metrics are switched on we can restart the container app in dockge

Proxmox Host (Pear)

To monitor the physical hardware and ZFS pools:

  • Node Exporter: Installed directly on the Proxmox Debian host.
  • SMART Metrics: Use the smartctl_exporter_script.sh (as detailed in legacy notes) to pipe drive health into the node_exporter's textfile collector.

Post-Installation validation on Mango

After installing an agent on a target, confirm Mango sees it:

  • Open VMUI: http://mango:8428/targets.
  • Search for the hostname.
  • Status must be "UP". If "Connection Refused," check the service on the target; if "Timeout," check pfSense rules

📈 Grafana Installation

Grafana was installed on the same Mango host to provide the local visualization layer.

  • Repository & App Setup
sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /usr/share/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install grafana -y
sudo systemctl enable --now grafana-server

🧩Network & Firewall Rules (pfSense)

To allow the new ports to function, pfSense was updated with a new Alias for Monitoring Ports:

  • 3000: Grafana UI (remains the same from the previous Grafana installation)
  • 8428: VictoriaMetrics UI/API ( The new port added for viewing of the scraping progress as was done by Prometheus web gui)
  • 9090: removed the older Prometheus webgui port

Critical Rules

  • MGT -> Mango: Allow ports 3000 & 8428 (Access from Cinnamon or other management console).
  • Mango -> All Networks:
    • Allow port 9100 (Node)
    • Allow port 9113 (Nginx)
    • Allow port 9117 (Apache)
    • Allow port 9104 (MySQL)
    • Allow port 9182 (Windows)
    • Allow port 8080 (Docker, cadvisor)
    • Allow port 9400 (NVIDIA DCGM)
    • Allow port 8096 (Jellyfin)

Obviously for clarity the above ports should all have aliases in Pfsense so the rules are easier to read.

🔦Verification Steps

  • Check Dockge: Ensure the cadvisor container shows as "Green/Running" in the Dockge UI.
  • Service Status: (Confirmed Active/Running).
sudo systemctl status victoriametrics

Targets Check: verify all hosts are Green/UP. We should see the new entries for ports 8080, 9400, etc for the docker containers

http://mango:8428/targets

Data Source: In Grafana, added Prometheus data source pointing to

http://localhost:8428.

Disk Write Check: confirms ingestion of samples to the PearPool disk.

du -sh /mnt/metrics_data

We can verify the various agents are reporting with the curl command from mango. for example to test the docker container on Tayberry is working use: (docker app uses 8080 for its scraper)

curl http://tayberry.seaoffate.net:8080/metrics

We should We could modify the example to test other agents by using a different hostname and/or a different port number for example Tayberry also uses the standard Linux Exporter on 9100 so we could have :

curl http://tayberry.seaoffate.net:9100/metrics

and it should also present a wall of text from tayberry, assuming the the node exporter was installed. If any curl statements that don't work we should check the exporter is installed on the target, the appropriate Pfsense rule is setup and working, and that we have the correct port in the curl. If the curl works but the http://mango:8428/targets does not check the yaml file on mango

Summary of Legacy Retirement

With Mango fully operational:

  • Pineapple (.130) services stopped.
  • Granadilla (.131) services stopped.
  • Victoria (.132) services stopped.
  • Lychee identified as legacy and marked for rebuild via new Gold Master Template.

Build Complete: February, 2026