Pineapple: Difference between revisions
Wikisailor (talk | contribs) |
Wikisailor (talk | contribs) |
||
| Line 1: | Line 1: | ||
==Introduction== | ==Introduction== | ||
Pineapple, at x.x.x.130 on the Infra network, is the host to the Prometheus application to gather metrics from each VM host and from Pear using agents installed on each host. The partner application, Grafana hosted on '''[[Granadilla]]''' is used to view the data collected by Prometheus. | Pineapple, at x.x.x.130 on the Infra network, is the host to the Prometheus application to gather metrics from each VM host and from Pear using agents installed on each host. The partner application, Grafana hosted on '''[[Granadilla]]''' is used to view the data collected by Prometheus. An overview of the facilities offered by '''[[Prometheus & Grafana | a Prometheus & Grafana ]]''' can be found '''[[Prometheus & Grafana | here]]'''. | ||
===Security concerns=== | ===Security concerns=== | ||
The purpose of Prometheus is to gather data concerning all of the hosts on the network making it a good source of information to any hostile actor. Keeping it inside Infra and not publishing it's webserver to the Internet would be obvious security measures. Making specific aliases & rules on Pfsense for it to access it's agents would also be required actions (aliases for these obscure ports does make it a lot more secure and readable). | The purpose of Prometheus is to gather data concerning all of the hosts on the network making it a good source of information to any hostile actor. Keeping it inside Infra and not publishing it's webserver to the Internet would be obvious security measures. Making specific aliases & rules on Pfsense for it to access it's agents would also be required actions (aliases for these obscure ports does make it a lot more secure and readable). | ||
==Prometheus Installation== | ==Prometheus Installation== | ||
Revision as of 10:09, 5 June 2025
Introduction
Pineapple, at x.x.x.130 on the Infra network, is the host to the Prometheus application to gather metrics from each VM host and from Pear using agents installed on each host. The partner application, Grafana hosted on Granadilla is used to view the data collected by Prometheus. An overview of the facilities offered by a Prometheus & Grafana can be found here.
Security concerns
The purpose of Prometheus is to gather data concerning all of the hosts on the network making it a good source of information to any hostile actor. Keeping it inside Infra and not publishing it's webserver to the Internet would be obvious security measures. Making specific aliases & rules on Pfsense for it to access it's agents would also be required actions (aliases for these obscure ports does make it a lot more secure and readable).
Prometheus Installation
The setup of Prometheus will have several separate parts.
- Server software installation
- Server configuration
- Firewall rules setup
- Agent installation
Prometheus Setup
The first thing was to create a VM in the Infra network and give it a hostname of Pineapple and IP/gateway (x.x.x.130/24) to match. To set the hostname & IP address just use the script but we must remember to edit the gateway address in /etc/netplan
sudo nano /etc/netplan/some_config_file.yaml sudo netplan apply
We need to make sure that the host is also listed in dns by logon to ctns1 and using the add_combined_hostadd.sh. Then we do the ubiquitous
sudo apt update && sudo apt upgrade -y
We will need wget and tar if they are not already installed
sudo apt install -y wget tar
Next we have to make a user "prometheus" for the application to run as
sudo useradd --no-create-home --shell /bin/false prometheus
and make some dirs with the user as owner
sudo mkdir /etc/prometheus sudo mkdir /var/lib/prometheus sudo chown prometheus:prometheus /var/lib/prometheus
To download the Prometheus application we use wget but we have to locate the up to date file so browse to https://prometheus.io/download/ find the file prometheus-x.x.x.linux-amd64.tar.gz and copy the link address. Once we have the address we can wget it and extract it with the following command examples
wget prometheus-3.4.1.linux-amd64.tar.gz tar -xvf prometheus-3.4.1.linux-amd64.tar.gz cd prometheus-3.4.1.linux-amd64
Then copy the binaries to the relevant dirs and set permissions
sudo mv prometheus /usr/local/bin/ sudo mv promtool /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/prometheus sudo chown prometheus:prometheus /usr/local/bin/promtool
Prometheus Configuration
The application is now installed so we can now configure it to scrape al of the target VMs with a yaml file that we will create.
sudo nano /etc/systemd/system/prometheus.service
The config file will look something like
global:
scrape_interval: 15s # How frequently to scrape targets
evaluation_interval: 15s # How frequently to evaluate rules
scrape_configs:
# Prometheus monitoring itself (optional, but good for health checks)
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Node Exporters for your infrastructure VMs
- job_name: 'node_exporter_infra'
static_configs:
- targets: ['x.x.x.x:9100', 'x.x.x.x:9100', 'x.x.x.x:9100'] # pineapple (Prometheus) and granadilla (Grafana) and ctns1 (dnsmasq)
# Node Exporters for your production VMs (Webservers, Reverse Proxy, MySQL server if not using mysqld_exporter)
- job_name: 'node_exporter_production'
static_configs:
- targets:
- 'x.x.x.x:9100' # raisin Reverse Proxy nginx
- 'x.x.x.x:9100' # Strawberry (backupserver)
- 'x.x.x.x:9100' # plum webserver (photo, wiki and www) apache2
- 'x.x.x.x:9100' # satsuma (samba, photosort)
- 'x.x.x.x:9100' # fig (nextcloud)
- 'x.x.x.x:9100' # mandarin (Mysql)
# Add other production VM IPs here as needed
# Node Exporters for your VPN servers
- job_name: 'node_exporter_vpn'
static_configs:
- targets:
- 'x.x.x.x:9100' # Vanilla Wireguard VPN Server
- 'x.x.x.x:9100' # voavanga OpenVPN VPN server
# Add other VPN server IPs here as needed
# Node Exporters for your terminal VMs
- job_name: 'node_exporter_terminals'
static_configs:
- targets:
- 'x.x.x.x:9182' # Wahoo Win 11 desktop
- 'x.x.x.x:9182' # Walnut Win 11 desktop (with jellyfin)
- 'x.x.x.x:9100' # Lychee linux desktop
# Add other terminal VM IPs here as needed
# Node Exporters for your mgt network VMs (if any you want to monitor)
- job_name: 'node_exporter_mgt'
static_configs:
- targets:
- 'x.x.x.x:9100' # Lemon
# Add other mgt VM IPs here as needed
# Job for Nginx Exporter on Raisin (192.168.100.9)
- job_name: 'nginx_reverse_proxy_raisin'
static_configs:
- targets: ['x.x.x.x:9113'] # Default port for nginx-exporter
# Job for MySQL Exporter on Mandarin (192.168.100.8)
- job_name: 'mysql_server_mandarin'
static_configs:
- targets: ['x.x.x.x:9104'] # Default port for mysqld_exporter
# job for Apache Exporter on webservers
- job_name: 'apache_webservers'
static_configs:
- targets:
- 'x.x.x.x:9117' # plum webserver (photo, wiki and www) apache2
- 'x.x.x.x:9117' # satsuma (samba, apache2, photosort)
- 'x.x.x.x:9117' # fig (nextcloud)
# Job for Proxmox Host
- job_name: 'proxmox_host_pear'
static_configs:
- targets:
- 'x.x.x.x:9100' # Replace with your Proxmox host's actual IP
At the end of the file there is a load of comments to give some guidance on how to write the config, it would be better to leave them in for future reference.
The Prometheus server application has a webserver component that can be viewed on port 9090 as shown in the scrape_configs: section above. As has been noted there is a security implication to Prometheus in that it is giving detailed information about the state of the whole network so with that in mind the Pfsense rule allowing access should be kept specifically to the MGT network. It will not make any difference to Grafana on Granadilla because it is on the the same network.
Pfsense Rules
Before we can see any data from Prometheus we will need to add the exporter agent to each machine and we will also need to add a rule to Pfsense to allow Prometheus to access the host being monitored, note the rule will be for Pineapple (Prometheus) on the Infra network to be the source and the host's network to be the destination because it is up to Prometheus to request the data, not the agent to send it. Assuming the above config we will need the following TCP rules
- On the Infra Interface allow source Pineapple port 9100 destination Production, MGT, VPNnet and Terminals port 9100. # This is the basic exporter
- On the Infra Interface allow source Pineapple port 9113 destination Production port 9113 # This is for Nginx specific exporter
- On the Infra Interface allow source Pineapple port 9117 destination Production port 9117 # This is for Apache specific exporter
- On the Infra Interface allow source Pineapple port 9104 destination Production port 9104 # This is for MySQL specific exporter
- On the Infra Interface allow source Pineapple port 9182 destination Terminals port 9182 # This is for Windows specific exporter
- On the Infra Interface allow source Pineapple port 9100 destination pear port 9100 # This is specifically to allow pineapple to access Pear and it will probably need to be on the WAN interface. Note that this rule is passing out of the network and onto the host Pear.
- On the MGT interface allow source MGT port 9090 destination Pineapple port 9090 # This rule is to allow lemon or any host on the MGT network to be able to view the Prometheus webserver on Pinapple port 9090
Agent Installation
When the rules are made to allow Prometheus to pull the data from it's agents we can start adding them to the VMs. We will install the node_exporter on everything as this is a basic CPU, RAM, Network ETC agent, the only exception is the two Windows 11 hosts. The other agents are specifically geared to a particular application so not required on every host.
Node Exporter
The basic agent to be installed on every Linux host. Start by adding a user to run the agent and a directory to put it.
sudo useradd --no-create-home --shell /bin/false node_exporter sudo mkdir /etc/node_exporter
Then we need to locate the agent binaries for the most up to date version so browse to the github web page at https://prometheus.io/download/ and look for the version that says prometheus-x.x.x.linux-amd64.tar.gz and copy the link address, then we download it, uncompress it move it to the correct directory and set appropriate permissions.
wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz tar -xvf node_exporter-1.9.1.linux-amd64.tar.gz cd node_exporter-1.9.1.linux-amd64/ sudo mv node_exporter /usr/local/bin/ sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
We will need it to be a service so we need a service file
sudo nano /etc/systemd/system/node_exporter.service
and we need to add some boilerplate code to the service config
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address="0.0.0.0:9100"
[Install]
WantedBy=multi-user.target
When we have saved and exited from the config we will need to Reload Systemd
sudo systemctl daemon-reload
Next we should enable, start and check the service
sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
To test that it is working we need to go back to pineapple and attempt to extract the data from the agent with the curl command
curl http://x.x.x.x:9100/metrics
We should see a load of data flow on to the screen. If nothing is displayed there is a problem with either the firewall or the agent. The easiest way to isolate the problem to one or the other is to go back to the client where the agent was installed and run
curl http://127.0.0.1:9100/metrics
As this is the local host we should see output from the agent. If we see output we need to check firewall/s both on the localhost and on Pfsense because clearly the agent is doing it's stuff but Pineapple cant read it. If there is no output we can be reasonably sure that the agent is not working.
As soon as curl on Pineapple starts returning data from the agent it will trigger the webserver on part of Prometheus to show the host as up. Lemon has a desktop and browser installed and the firewalll rule allows 9090 from MGT so from Lemon http://pineapple:9090 and select Status -> Target Health, a listing of all of the endpoint will be displayed showing the last scrape time and the current state. If the target is showing as unknown and the state is down try waiting a few seconds or however long the refresh time is at the top of the Prometheus config file is set at. Note that the running configuration can be viewed from the same menu Status -> Configuration.
Mysql Exporter on Mandarin
Madarin is the My SQL server so as well as the basic node explorter, we will have a MySQL exporter installed so that it will give metrics specific to MySQL in addition to the normal CPU, RAM, Network and similar metrics. To scrape thes details we will need to set up another user, install the agent and setup a MySQL user. First add the user for the agent and create the directory for the agent:
sudo useradd --no-create-home --shell /bin/false mysqld_exporter sudo mkdir /etc/mysqld_exporter
next we will need to locate and download the binaries so we need to browse to https://prometheus.io/download/ and scroll down to the mysqld_exporter section and copy the link address to mysqld_exporter-x.x.x.linux-amd64.tar.gz then use wget to download it
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.17.2/mysqld_exporter-0.17.2.linux-amd64.tar.gz
we extract it with
tar -xvf mysqld_exporter-0.17.2.linux-amd64.tar.gz
Then we copy the binary to the directory created above and set permissions
cd mysqld_exporter-0.17.2.linux-amd64/ sudo mv mysqld_exporter /usr/local/bin/ sudo chown mysqld_exporter:mysqld_exporter /usr/local/bin/mysqld_exporter
The next step will be to create a MySQL user that has access to the metrics so we will need a new password generated and added to Keepass
sudo mysql -u root -p
When logged in to MySQL we create the user
CREATE USER 'mysqld_exporter'@'x.x.x.x' IDENTIFIED BY 'your_secure_password'; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqld_exporter'@'localhost'; FLUSH PRIVILEGES; EXIT;
We now need an exporter configuration
sudo nano /etc/mysqld_exporter/.my.cnf
With the following text
[client] host=x.x.x.x # Crucial: MySQL binds to this IP user=mysqld_exporter password=YOUR_SECURE_PASSWORD
save and exit
- It should be noted that the user gave alot of trouble to make it login to MySQL not because of the password but because of the manner in which the MySQL_Exporter logs in to the database. To make it more confusing was that curl does it's login differently to the MySQL_Exporter service.It would appear that curl logs in to MySQL using localhost as the source but MySQL_Exporter does so by the IP address. This should not normally matter but in this case the minor difference between [email protected] is interpreted differently to mysqld_exporter@localhost by MySQL authentication. In the MySQL config at /etc/mysql/mysql.conf.d/mysqld.cnf the directive bind-address was set to the IP address of Mandarin (bind-address = x.x.x.x) so when the user mysqld_exporter was set to localhost and that was resolving to 127.0.0.1 MySQL would not accept it as it was @ the wrong host IP. It would be possible to set the bind-address to localhost or 127.0.0.1 but that would possibly break some other login so we will just remember to check the bind-address variable in /etc/mysql/mysql.conf.d/mysqld.cnf if we have to create another user @localhost
Now that we have the user set we need to set Permissions to be quite restrictive
sudo chown mysqld_exporter:mysqld_exporter /etc/mysqld_exporter/.my.cnf sudo chmod 600 /etc/mysqld_exporter/.my.cnf
We create a service config with
sudo nano /etc/systemd/system/mysqld_exporter.service
With the following configuration
[Unit]
Description=Prometheus MySQL Exporter
Wants=network-online.target
After=network-online.target mysql.service
[Service]
User=mysqld_exporter
Group=mysqld_exporter
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter \
--config.my-cnf=/etc/mysqld_exporter/.my.cnf \
--web.listen-address=0.0.0.0:9104
[Install]
WantedBy=multi-user.target
After we save and exit we reload the systemd and start the service
sudo systemctl daemon-reload sudo systemctl start mysqld_exporter sudo systemctl enable mysqld_exporter sudo systemctl status mysqld_exporter
Assuming the status looks good we need to check that Pineapple can read the data so login to Pineapple and do the curl thing
curl http://x.x.x.x:9104/metrics
There should be a bucket load of metrics returned by curl if everything is working. If there is no data go back to Mandarin and do the same curl, if there is now a load of metrics coming out the problem is the firewall rule is not allowing Pineapple to access Mandarin on port 9104. If there is no output on Mandarin there is a problem with the mysqld_exporter service or MySQL login.
Assuming any problems are resolved as a final check login to Lemon and browse to http://pinapple:9090 and select Status -> Target Health and check that the endpoint in the section mysql_server_mandarin is showing as up. If not check the configuration in Status -> Configuration has the correct details for Mandarin.
Nginx Exporter
In the same way that MySQL has certain metrics that are exclusive to MySQL so to does Nginx have it's own set metrics and therefore an exporter was created especially for Nginx.
Raisin is the only host with nginx installed and as it is only a Reverse Proxy it does not have any websites that it is serving directly, instead it forwards any and all requests to the relevant webserver. what we will do in this case is to create a stub server that is only going to listen on 127.0.0.1:8080 so that it cannot be accessed by any other host but itself. To do that we need to create the stub with
sudo nano /etc/nginx/conf.d/nginx_stub_status.conf
and add the following configuration
server {
listen 127.0.0.1:8080; # Internal port for status page
server_name localhost;
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
After save and exit the config must be tested with
sudo nginx -t
If that looks good restart Nginx with
sudo systemctl reload nginx sudo systemctl status nginx
although if the config was bad the -t and status would show errors we can still verify locally with
curl http://127.0.0.1:8080/nginx_status
We should see something like
server accepts handled requests 1001 1001 940 Reading: 0 Writing: 1 Waiting: 0
With the stub webserver setup we can create the service user and directory with
sudo useradd --no-create-home --shell /bin/false nginx_exporter sudo mkdir /etc/nginx_exporter
Now it is time to locate the binaries at https://github.com/nginx/nginx-prometheus-exporter/releases . As before we will need the link to the latest exporter( the icon marked "latest" is a link) and we use the link in the wget
wget https://github.com/nginx/nginx-prometheus-exporter/releases/download/v1.4.2/nginx-prometheus-exporter_1.4.2_linux_amd64.tar.gz
Uncompress with
tar -xvf nginx-prometheus-exporter_1.4.2_linux_amd64.tar.gz
and copy the binary with the correct permissions with
sudo mv nginx-prometheus-exporter /usr/local/bin/ sudo chown nginx_exporter:nginx_exporter /usr/local/bin/nginx-prometheus-exporter
Also as before we create a service file with
sudo nano /etc/systemd/system/nginx_exporter.service
and populate it with
[Unit]
Description=Prometheus Nginx Exporter
Wants=network-online.target
After=network-online.target nginx.service
[Service]
User=nginx_exporter
Group=nginx_exporter
Type=simple
ExecStart=/usr/local/bin/nginx-prometheus-exporter \
--web.listen-address=0.0.0.0:9113 \
--nginx.scrape-uri="http://127.0.0.1:8080/nginx_status"
[Install]
WantedBy=multi-user.target
Save & exit. reload systemd, start the service and enable the service with
sudo systemctl daemon-reload sudo systemctl start nginx_exporter sudo systemctl enable nginx_exporter sudo systemctl status nginx_exporter
Assuming the status looks ok login to Pineapple and do the curl with Raisin's IP address and Nginx port number 9113
curl http://x.x.x.x:9113/metrics
As previous notes have stated if pineapple cannot read the exporter's results check the metrics are returned locally to see if it is the service or the firewall that is stopping it from working. Assuming the service is being read by curl on Pineapple it should also be checked in the Lemon's web browser at http://pinapple:9090 status -> Target Health, the section nginx_Reverse_proxy_raisin should have the endpoint as up.
Apache Exporter
Although they are both webservers Apache has a different exporter to Nginx. Apache has a mod that reads status and unsurprisingly it is called mod_status, we can check that is installed with:
sudo a2enmod status
As we did with Nginx we can create a stub with
sudo nano /etc/apache2/sites-available/apache-status.conf
and add the following config
Listen 127.0.0.1:8081 # Add to /etc/apache2/ports.conf
<VirtualHost 127.0.0.1:8081>
ServerName localhost
DocumentRoot /var/www/html
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
</VirtualHost>
Save and exit. Note Doc root must point at a valid file or apache will not start, the file will never be read but it still must be valid and accessible by apache. We enable the site with
sudo a2ensite apache-status.conf
then test and reload apache with
sudo apache2ctl configtest sudo systemctl reload apache2 sudo systemctl status apache2
Again as in the Nginx errors should have shown by now but it is still best to check with
curl http://127.0.0.1:8081/server-status?auto
Assuming the stub is working as expected we can create the system user
sudo useradd --no-create-home --shell /bin/false apache_exporter sudo mkdir /etc/apache_exporter
AS maybe expected by now we need to locate the binaries for the apache exporter browse to https://github.com/Lusitaniae/apache_exporter/releases and click the "latest" button then copy the link to the file apache_exporter-X.Y.Z.linux-amd64.tar.gz then
wget https://github.com/Lusitaniae/apache_exporter/releases/download/v1.0.10/apache_exporter-1.0.10.linux-amd64.tar.gz tar -xvf apache_exporter-1.0.10.linux-amd64.tar.gz cd apache_exporter-1.0.10.linux-amd64/ sudo mv apache_exporter /usr/local/bin/ sudo chown apache_exporter:apache_exporter /usr/local/bin/apache_exporter
Now create the service config file
sudo nano /etc/systemd/system/apache_exporter.service
and populate with
[Unit]
Description=Prometheus Apache Exporter
Wants=network-online.target
After=network-online.target apache2.service
[Service]
User=apache_exporter
Group=apache_exporter
Type=simple
ExecStart=/usr/local/bin/apache_exporter \
--web.listen-address=0.0.0.0:9117 \
--scrape_uri=http://127.0.0.1/server-status?auto
[Install]
WantedBy=multi-user.target
Save & exit and Reload Systemd, start and enable the service
sudo systemctl daemon-reload sudo systemctl start apache_exporter sudo systemctl enable apache_exporter sudo systemctl status apache_exporter
Assuming the status looks good login to pineapple and do the curl test with
curl http://x.x.x.x:9117/metrics
If that does not produce a result test on the local host and adjust either the firewall rule or the fix the service. When curl is producing results check that the server is showing as up on the website http://pineapple:9090 menu item Status -> Target Health and the relevant endpoint in apache_webservers.
If the stub doesn't work we can delete the stub created above and create a different web stub but we must modify the ports config file at
sudo nano /etc/apache2/ports.conf
and add in a new directive that listens to a new port 8081 on 127.0.0.1 so the file should look something like
Listen 80 Listen 127.0.0.1:8081
<IfModule ssl_module>
Listen 443
</IfModule>
<IfModule mod_gnutls.c>
Listen 443
</IfModule>
After a save and close we can create a new config in sites-available
sudo nano /etc/apache2/sites-available/apache-status.conf
with the contents
<VirtualHost 127.0.0.1:8081>
ServerName localhost
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/apache-status_error.log
CustomLog ${APACHE_LOG_DIR}/apache-status_access.log combined
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
</VirtualHost>
Save & exit. Note as previously stated the docroot must have a valid location or apache will not start.
Copy the file to sites-enabled with
sudo a2ensite apache-status.conf
and test the config with
sudo apache2ctl configtest
before the
sudo systemctl reload apache2 sudo systemctl status apache2
If status is good try
curl http://127.0.0.1:8081/server-status?auto
if this works carry on with the installation
Windows 11 Exporter on Walnut and Wahoo
All of the previous exporters have been for Linux servers but as we have two Windows 11 Pro hosts that we also want to monitor. It is unusual to monitor Windows desktops and the browsers in windows do give warning about the exporter being a rare download. It should also be noted that the exporter is a beta test version although it looks like a release candidate is immanent.
we can download the beta release from https://github.com/prometheus-community/windows_exporter/releases. A folder needs to be created for the service to use c:\program files\Windows_exporter. Then the downloaded file needs to be moved to the newly created folder and the binary extracted if the file does not contain an archive it is because it is the binary so does not need to be extracted. From now we need to use powershell, it can be started by R/H mouse on the start menu and select Terminal(Admin), it must be run as administrator.
cd "c:\program files\windows_exporter"
If there is an old service still present (like if this doesn't work first time)it can be removed by the command
Stop-Service -Name windows_exporter -ErrorAction SilentlyContinue; sc.exe delete windows_exporter
we install the service with the command
sc.exe create windows_exporter binPath="C:\Program Files\windows_exporter\windows_exporter-0.30.7-amd64.exe --web.listen-address=0.0.0.0:9182 --log.level=info" DisplayName="Prometheus Windows Exporter" start=auto
To break down the command
- sc.exe : This is the Service Control command-line utility in Windows. It's used to communicate with the Service Control Manager (SCM) to create, delete, query, or configure Windows services.
- create : This is the subcommand for sc.exe that tells it to create a new service.
- windows_exporter : This is the ServiceName (or ServiceKeyName). This is the internal, unique name that Windows will use to identify this service. It's typically a short, descriptive name without spaces. You'd use this name in other sc.exe commands (e.g., sc.exe start windows_exporter).
- binPath="C:\Program Files\windows_exporter\windows_exporter-0.30.7-amd64.exe --web.listen-address=0.0.0.0:9182 --log.level=info" : This is the Binary Path parameter of sc.exe. It specifies the full path to the executable file that the service will run, along with any command-line arguments that should be passed to that executable when the service starts.
- C:\Program Files\windows_exporter\windows_exporter-0.30.7-amd64.exe : This is the absolute path to the windows_exporter executable. This is the actual program that will run as the service.
- --web.listen-address=0.0.0.0:9182 : This is a command-line argument passed to the windows_exporter.exe executable.
- --web.listen-address : A common flag used by Prometheus exporters to specify the network address and port on which they should listen for incoming scrape requests from Prometheus.
- 0.0.0.0:9182 : Means the windows_exporter will listen on all available network interfaces of the Windows machine on port 9182.
- --log.level=info : Another command-line argument passed to the windows_exporter.exe executable
This sc.exe command creates a new Windows service named windows_exporter (internally). This service will be displayed as "Prometheus Windows Exporter" in the Services Manager. When the Windows system boots, this service will automatically start. Upon starting, it will execute the windows_exporter-0.30.7-amd64.exe program, passing it arguments to listen for incoming Prometheus scrape requests on all network interfaces on port 9182, and to log informational messages.
start the service with
Start-Service windows_exporter
verify the service with
Get-Service windows_exporter
If this is not the first try run
Stop-Service -Name windows_exporter -ErrorAction SilentlyContinue
sc.exe delete windows_exporter It will complain that there are two commands but it should remove any service that was unsuccessfully installed
We can create a firewall rule for windows firewall with
New-NetFirewallRule -DisplayName "Prometheus Windows Exporter" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 9182 -Profile Private,Public,Domain
It probably does not need to allow Private, Public and Domain most likely Domain would work
As with all of the Linux hosts we test with curl from Pineapple on port 9182
curl http://x.x.x.x:9182/metrics
if successful check on http://pineapple:9090 Status -> Target Health in the node_exporter-terminals section
smartctl_exporter_script.sh (Proxmox Disk SMART Metrics)
Detailed Installation Notes: smartctl_exporter_script.sh (Proxmox Disk SMART Metrics)
- Purpose: To collect detailed S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) data from the physical hard drives and SSDs in the Proxmox host (pear) and expose them as Prometheus metrics via the node_exporter's textfile collector. This allows monitoring of drive health (temperature, read/write errors, power-on hours, etc.).
- Metrics Collected: smartctl_health_status, smartctl_temperature_celsius, smartctl_power_on_hours_total, and potentially others (lbas_read_total, lbas_written_total, reallocated_sectors_total, NVMe-specific stats) if available from the drives.
- Default Port: 9100/TCP (metrics are collected by node_exporter which listens on this port).
- Method: A custom Bash script that runs smartctl, parses its JSON output with jq, formats it into Prometheus textfile format, and writes it to a file (smart.prom) that node_exporter is configured to read. The script is scheduled via cron.
- Install Prerequisites: The script relies on smartctl (from smartmontools) to query drive S.M.A.R.T. data and jq to parse the JSON output from smartctl.
apt update && apt install -y jq smartmontools
- Enable node_exporter Collectors for S.M.A.R.T. and ZFS: The node_exporter needs to be told to activate its zfs collector (for ZFS pool statistics) and its textfile collector (which will read the smart.prom file generated by our script). Edit the node_exporter Systemd service file :
nano /etc/systemd/system/node_exporter.service
Locate the ExecStart line and ensure it includes these flags:
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address="0.0.0.0:9100" \
--collector.zfs \
--collector.textfile.directory=/var/lib/node_exporter/textfile_collector
[Install]
WantedBy=multi-user.target
save & exit. Restart node_exporter service: This applies the new collector settings.
systemctl restart node_exporter
- Create and set permissions for collector directories: The node_exporter user needs write access to the textfile_collector directory for the script to deposit its metrics, and read/write access to the log directory.
mkdir -p /var/lib/node_exporter/textfile_collector chown node_exporter:node_exporter /var/lib/node_exporter/textfile_collector mkdir -p /var/log/node_exporter/ chown node_exporter:node_exporter /var/log/node_exporter/
Add node_exporter user to the disk group: This is crucial for node_exporter (and thus the script running as this user) to have the necessary permissions to read raw disk data via smartctl.
usermod -a -G disk node_exporter
Important: For this group change to fully take effect, the node_exporter service (which runs as this user) must be restarted:
systemctl restart node_exporter.
- Create the S.M.A.R.T. Exporter Script (/usr/local/bin/smartctl_exporter_script.sh): This Bash script contains the logic to query smartctl for each drive, parse the output, and format it. Create the file:
nano /usr/local/bin/smartctl_exporter_script.sh
the file should look like this
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status
# --- Configuration ---
SMARTCTL_BIN="/usr/sbin/smartctl"
OUTPUT_DIR="/var/lib/node_exporter/textfile_collector"
OUTPUT_FILE="$OUTPUT_DIR/smart.prom"
TEMP_FILE="$OUTPUT_FILE.tmp"
LOG_FILE="/var/log/node_exporter/smartctl_exporter.log" # Log file for the script's o>
# --- Helper Function for Logging ---
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
# --- Setup Directories (Ensure permissions are correct) ---
mkdir -p "$OUTPUT_DIR"
chown node_exporter:node_exporter "$OUTPUT_DIR"
mkdir -p "$(dirname "$LOG_FILE")"
chown node_exporter:node_exporter "$(dirname "$LOG_FILE")"
# --- Ensure necessary tools are available ---
if ! command -v smartctl &> /dev/null; then
log_message "ERROR: smartctl not found. Please install smartmontools."
exit 1
fi
if ! command -v jq &> /dev/null; then
log_message "ERROR: jq not found. Please install jq."
exit 1
fi
# --- Start Metric Collection ---
log_message "Starting SMART metric collection."
echo "" > "$TEMP_FILE" # Clear old metrics, ensuring the file is initially empty
# --- Define Specific Disks to Monitor ---
TARGET_DISKS=(
"/dev/disk/by-id/ata-ST16000NE000-2RW103_ZL2JK4TL"
"/dev/disk/by-id/ata-ST16000NE000-2RW103_ZL2JK4TM"
"/dev/disk/by-id/ata-ST16000NE000-2RW103_ZL2JK4VE"
"/dev/disk/by-id/ata-Lexar_SSD_NQ100_960GB_QCT180R0006120S334"
"/dev/disk/by-id/nvme-CT4000P3PSSD8_2339E879DC47"
"/dev/disk/by-id/nvme-CT4000P3SSD8_2332E8684258"
)
for DISK_ID_PATH in "${TARGET_DISKS[@]}"; do
# Determine device type for smartctl
DEVICE_TYPE=""
if [[ "$DISK_ID_PATH" == *"/dev/disk/by-id/nvme-"* ]]; then
DEVICE_TYPE="nvme"
elif [[ "$DISK_ID_PATH" == *"/dev/disk/by-id/ata-"* ]]; then
DEVICE_TYPE="ata"
else
log_message "WARN: Unknown device type for $DISK_ID_PATH. Skipping."
continue
fi
DEVICE_BASENAME=$(basename "$(readlink -f "$DISK_ID_PATH")")
log_message "Processing disk: $DISK_ID_PATH (type: $DEVICE_TYPE, basename: $DEVICE_>
# Run smartctl and get JSON output
SMART_DATA=$(smartctl -a -j -d "$DEVICE_TYPE" -T permissive "$DISK_ID_PATH" 2>/dev/>
SMARTCTL_EXIT_CODE=$?
if [ $SMARTCTL_EXIT_CODE -ne 0 ] || [ -z "$SMART_DATA" ]; then
log_message "ERROR: smartctl failed for $DISK_ID_PATH (exit code: $SMARTCTL_EXIT_>
echo "# HELP smartctl_exporter_error_running_smartctl Could not run smartctl or p>
echo "# TYPE smartctl_exporter_error_running_smartctl gauge" >> "$TEMP_FILE"
echo "smartctl_exporter_error_running_smartctl{device=\"$DEVICE_BASENAME\",id=\"$>
continue
fi
# Extract common SMART attributes regardless of type
HEALTH_STATUS=$(echo "$SMART_DATA" | jq -r '.smart_status.passed // "null"')
DISK_SERIAL=$(echo "$SMART_DATA" | jq -r '.serial_number // "unknown"')
DISK_MODEL=$(echo "$SMART_DATA" | jq -r '.model_name // "unknown"' | sed 's/ /_/g')
DISK_VENDOR=$(echo "$SMART_DATA" | jq -r '(.vendor // .ata_identify_device.vendor_i>
# Basic overall health (1=passed, 0=failed)
if [ "$HEALTH_STATUS" == "true" ]; then
echo "smartctl_health_status{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERIAL\",>
else
echo "smartctl_health_status{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERIAL\",>
fi
# --- Conditional JQ parsing based on DEVICE_TYPE ---
# Initialize all values to null for safety before parsing
TEMP_CELSIUS="null"
POWER_ON_HOURS="null"
TOTAL_LBAS_READ="null"
TOTAL_LBAS_WRITTEN="null"
REALLOC_SECTORS="null"
PERCENT_USED="null"
AVAILABLE_SPARE="null"
if [ "$DEVICE_TYPE" == "nvme" ]; then
TEMP_CELSIUS=$(echo "$SMART_DATA" | jq -r '.nvme_smart_health_information_log.t>
POWER_ON_HOURS=$(echo "$SMART_DATA" | jq -r '.nvme_smart_health_information_log>
TOTAL_LBAS_READ=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_information_l>
TOTAL_LBAS_WRITTEN=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_informatio>
PERCENT_USED=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_information_log.>
AVAILABLE_SPARE=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_information_l>
elif [ "$DEVICE_TYPE" == "ata" ]; then
TEMP_CELSIUS=$(echo "$SMART_DATA" | jq -r '(.temperature.current // (.ata_smart>
POWER_ON_HOURS=$(echo "$SMART_DATA" | jq -r '(.power_on_time.hours // (.ata_sma>
TOTAL_LBAS_READ=$(echo "$SMART_DATA" | jq -r '(.ata_smart_attributes.table[] | >
TOTAL_LBAS_WRITTEN=$(echo "$SMART_DATA" | jq -r '(.ata_smart_attributes.table[]>
REALLOC_SECTORS=$(echo "$SMART_DATA" | jq -r '(.ata_smart_attributes.table[] | >
fi
# --- Output extracted metrics (common to write after parsing) ---
if [ "$TEMP_CELSIUS" != "null" ]; then
echo "smartctl_temperature_celsius{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SER>
fi
if [ "$POWER_ON_HOURS" != "null" ]; then
echo "smartctl_power_on_hours_total{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SE>
fi
if [ "$TOTAL_LBAS_READ" != "null" ]; then
echo "smartctl_lbas_read_total{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERIAL\>
fi
if [ "$TOTAL_LBAS_WRITTEN" != "null" ]; then
echo "smartctl_lbas_written_total{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERI>
fi
if [ "$REALLOC_SECTORS" != "null" ]; then
echo "smartctl_reallocated_sectors_total{device=\"$DEVICE_BASENAME\",serial=\"$DI>
fi
if [ "$PERCENT_USED" != "null" ]; then
echo "smartctl_nvme_percentage_used{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SE>
fi
if [ "$AVAILABLE_SPARE" != "null" ]; then
echo "smartctl_nvme_available_spare_percent{device=\"$DEVICE_BASENAME\",serial=\">
fi
done
# --- Finalize Output ---
mv "$TEMP_FILE" "$OUTPUT_FILE" || log_message "ERROR: Failed to move temp file to $OU>
log_message "SMART metric collection completed."
exit 0
Make the Script Executable:
chmod +x /usr/local/bin/smartctl_exporter_script.sh
Schedule the Script with Cron to run the script every 12 hours (There does not seem much point in having the second by second status of SMART as it is unlikely to change particularly quickly, so twice per day seems to be more than adequate). To pen the crontab for the node_exporter user:
crontab -e -u node_exporter
and add
0 */12 * * * /usr/local/bin/smartctl_exporter_script.sh > /dev/null 2>&1
The contab entry has six fields: Minute (0-59), Hour (0-23, where 0 is midnight), Day of Month (1-31), Month (1-12 or Jan-Dec),Day of Week (0-7, where 0 and 7 are Sunday) and finally the command to execute. So that means this command will run
- 0 (First field: Minute): "at the 0th minute of the hour" (i.e., on the hour).
- */12 (Second field: Hour):
- The * means "every possible value" for that field.
- The / indicates a step value.
- The 12 combined with the last / means every 12 hours
- The * in the third field (day of the Month) means every day
- The next * in the forth field (Month) means every month
- The * in the fifth field (day of the week) means everyday
- The sixth and final field is the path to the executable file to be run.
- It is always the best practice to have the path as an absolute value and not as a relative path like ~/some_script.sh.
- It is better to have the script handle it's own logging.
- Output from cron jobs (to stdout or stderr) will typically be emailed to the user who owns the crontab entry so if we don't want emails we can suppress all output by adding the > /dev/null 2>&1 to the end of the options so all output will be redirected to the null device. If we wanted output to go to a log file we would append it with the suffix >> /var/log/smartctl_exporter.log 2>&1
- Ensure the script (smartctl_exporter_script.sh) has execute permissions (chmod +x /usr/local/bin/smartctl_exporter_script.sh).
- Cron jobs run in a very minimal environment. If your script relies on specific environment variables (e.g., HOME, LANG, PATH), you might need to set them explicitly within the script itself or at the top of your crontab file.
Post-Installation Verification Steps
- Manual Test (once script is created):
- Test as node_exporter user:
su -c "bash /usr/local/bin/smartctl_exporter_script.sh" node_exporter
- Check log file (should show successful execution, no errors) :
cat /var/log/node_exporter/smartctl_exporter.log
- Check metrics file (should show formatted SMART metrics for all drives) :
cat /var/lib/node_exporter/textfile_collector/smart.prom
- Prometheus Checks
- After the script runs and node_exporter is restarted, check Prometheus UI directly for node_zfs_ and smartctl_ metrics.
http://x.x.x.x:9100/metrics | grep zfs
- Confirm proxmox_host_pear target is UP in Prometheus UI http://pineapple:9090
- Add Grafana Dashboards Import Node - ZFS Stats (ID 7968) and S.M.A.R.T. (ID 22604)with x.x.x.x:9100 (pear's IP) as the instance.
This was a major installation that did take a lot of time to install and troubleshoot all of the agents but it is done now. The next steps would be either to create rules & alerts in the Prometheus Web GUI or to install Grafana on Granadilla. There is a a dedicated Proxmox exporter that will be worth looking at