Prometheus & Grafana

From Sea of Fate
Jump to navigationJump to search

Introduction

It has been decided that it would be better if we had some detailed knowledge of what is happening with the VMs and indeed on the host Pear as well. The Proxmox GUI does give an approximation of what the state of resources are on the Host and VMs but it does seem to be a bit vague and out of date. It is with the Proxmox lack of detail that prompted the installation of Prometheus and as Prometheus does not do justice to the data that it collects Grafana has also been installed, further long term data storage requires some thing like Victoria Metrics is needed. So what started out to be a simple data collection issue ended with a suite of VMs to gather, store and view the metrics of the Pear suite.

Prometheus

Prometheus is a powerful, open-source monitoring system and time-series database. Developed originally at SoundCloud, it has become a cornerstone of modern cloud-native observability stacks, renowned for its flexible data model, powerful query language (PromQL), and efficient data collection mechanism. Unlike traditional monitoring systems that often rely on agents pushing data, Prometheus primarily uses a "pull" model, actively scraping metrics from configured endpoints. This pull model simplifies network setup in many scenarios and ensures that Prometheus controls the monitoring cadence.

In our home lab environment, Prometheus is strategically positioned to act as the central nervous system for all operational metrics, providing a comprehensive view of the health and performance across various network segments and critical infrastructure.

Architectural Placement and Core Function

Prometheus is installed and running on the host named Pineapple, which resides within the dedicated infra network segment. This placement is deliberate, allowing pineapple to securely access and scrape metrics from devices and services across all monitored networks without imposing undue load on production resources. As the central collector, Pineapple's Prometheus instance is responsible for:

Metric Collection: Actively reaching out to configured targets to pull metrics at regular intervals. These targets are typically "exporters" – small agents that expose metrics in a Prometheus-readable format (plain text over HTTP).

Time-Series Database: Storing these collected metrics locally in its optimized time-series database. This allows for efficient storage and retrieval of large volumes of numerical data indexed by timestamps and label sets.

PromQL Engine: Providing the PromQL query language, which is used for powerful and flexible data analysis, aggregation, and mathematical operations on the collected time-series data. This enables the creation of custom dashboards and alert conditions.

Alerting Rules

Evaluating predefined alerting rules against the collected data and sending notifications when thresholds are breached. While Prometheus evaluates the rules, the actual notification delivery is typically handled by its companion component, Alertmanager .

Monitoring Scope Across the Home Lab

From its vantage point on pineapple, Prometheus is configured to monitor a wide array of systems across different network segments, ensuring holistic observability:

  • Production Network Devices: Monitoring application servers, web services, and databases that host critical services. This involves deploying specific exporters (e.g., Node Exporter for Linux hosts, various application-specific exporters for databases or web servers) on these production machines.
  • Infra Network Devices: Monitoring core infrastructure components like DNS servers, directory services, and network appliances within the infra network itself.

Management Network Devices: Keeping an eye on systems dedicated to managing the lab, such as configuration management servers, backup solutions, or other utility services.

  • VPNNet Network: Monitoring VPN gateways and tunnels, ensuring connectivity and performance for remote access. This might involve ping exporters for reachability checks or VPN-specific metrics.

Terminals Network: For systems like thin clients or specific workstations, basic uptime and resource utilization can be monitored to ensure availability.

  • Proxmox Host (Pear): Critically, Prometheus monitors the physical Proxmox host named Pear. This typically involves deploying Node Exporter on Pear itself, providing low-level system metrics such as CPU usage, memory consumption, disk I/O, and network traffic for the hypervisor. Proxmox Exporter is an optional dedicated exporter that provides metrics specific to Proxmox VE, such as VM/LXC container states, storage pool usage, and cluster health. By strategically deploying various "exporters" on target systems across these diverse networks, Prometheus on Pineapple aggregates a centralized stream of performance and health data.

Data Flow and Integration with Other Tools

Prometheus on pineapple is not a standalone solution for data visualization or long-term storage in our setup. It integrates seamlessly with other specialized tools to form a comprehensive observability stack:

Data Presentation to Grafana (granadilla): Prometheus serves as the primary data source for our visualization platform, Grafana, which is running on the host granadilla. When we access a dashboard in Grafana, it queries Prometheus (on pineapple) using PromQL to retrieve the necessary time-series data. This separation allows Grafana to focus purely on presenting compelling dashboards without needing to manage data collection or storage. Data Storage on VictoriaMetrics (victoria): For efficient long-term storage and scalability, Prometheus on pineapple is configured to remotely write all its collected data to the VictoriaMetrics VM named victoria. This offloads the responsibility of high-volume, long-term data retention from pineapple's local storage and Prometheus's internal TSDB. VictoriaMetrics, optimized for large-scale time-series data, acts as a robust, scalable backend, ensuring that historical metrics are readily available for analysis, even years into the future. This architecture allows us to leverage Prometheus's excellent scraping and querying capabilities while benefiting from VictoriaMetrics' superior storage efficiency and scalability. This integrated approach, with Prometheus as the data collection and querying engine, provides a flexible, powerful, and scalable monitoring solution for our home lab, paving the way for advanced visualization with Grafana and robust long-term data retention with VictoriaMetrics.