Pineapple

From Sea of Fate
Jump to navigationJump to search

Introduction

Pineapple, at x.x.x.130 on the Infra network, is the host to the Prometheus application to gather metrics from each VM host and from Pear using agents installed on each host. The partner application, Grafana hosted on Granadilla is used to view the data collected by Prometheus. An overview of the facilities offered by a Prometheus & Grafana partnership can be found here.

Security concerns

The purpose of Prometheus is to gather data concerning all of the hosts on the network making it a good source of information to any hostile actor. Keeping it inside Infra and not publishing it's webserver to the Internet would be obvious security measures. Making specific aliases & rules on Pfsense for it to access it's agents would also be required actions (aliases for these obscure ports does make it a lot more secure and readable).

Prometheus Installation

The setup of Prometheus will have several separate parts.

  • Server software installation
  • Server configuration
  • Firewall rules setup
  • Agent installation

Prometheus Setup

The first thing was to create a VM in the Infra network and give it a hostname of Pineapple and IP/gateway (x.x.x.130/24) to match. To set the hostname & IP address just use the script but we must remember to edit the gateway address in /etc/netplan

sudo nano /etc/netplan/some_config_file.yaml
sudo netplan apply

We need to make sure that the host is also listed in dns by logon to ctns1 and using the add_combined_hostadd.sh. Then we do the ubiquitous

sudo apt update && sudo apt upgrade -y

We will need wget and tar if they are not already installed

sudo apt install -y wget tar

Next we have to make a user "prometheus" for the application to run as

sudo useradd --no-create-home --shell /bin/false prometheus

and make some dirs with the user as owner

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

To download the Prometheus application we use wget but we have to locate the up to date file so browse to https://prometheus.io/download/ find the file prometheus-x.x.x.linux-amd64.tar.gz and copy the link address. Once we have the address we can wget it and extract it with the following command examples

wget prometheus-3.4.1.linux-amd64.tar.gz
tar -xvf prometheus-3.4.1.linux-amd64.tar.gz
cd prometheus-3.4.1.linux-amd64

Then copy the binaries to the relevant dirs and set permissions

sudo mv prometheus /usr/local/bin/
sudo mv promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool

Prometheus Configuration

The application is now installed so we can now configure it to scrape al of the target VMs with a yaml file that we will create.

sudo nano /etc/systemd/system/prometheus.service

The config file will look something like

 global:
  scrape_interval: 15s # How frequently to scrape targets
  evaluation_interval: 15s # How frequently to evaluate rules

 scrape_configs:
   # Prometheus monitoring itself (optional, but good for health checks)
   - job_name: 'prometheus'
     static_configs:
       - targets: ['localhost:9090']

   # Node Exporters for your infrastructure VMs
   - job_name: 'node_exporter_infra'
     static_configs:
       - targets: ['x.x.x.x:9100', 'x.x.x.x:9100', 'x.x.x.x:9100'] # pineapple (Prometheus) and granadilla (Grafana) and ctns1 (dnsmasq)

   # Node Exporters for your production VMs (Webservers, Reverse Proxy, MySQL server if not using mysqld_exporter)
   - job_name: 'node_exporter_production'
     static_configs:
       - targets:
           - 'x.x.x.x:9100' # raisin Reverse Proxy nginx
           - 'x.x.x.x:9100' # Strawberry (backupserver)
           - 'x.x.x.x:9100' # plum  webserver (photo, wiki and www) apache2
           - 'x.x.x.x:9100' # satsuma (samba, photosort)
           - 'x.x.x.x:9100' # fig (nextcloud)
           - 'x.x.x.x:9100' # mandarin (Mysql)
           # Add other production VM IPs here as needed

   # Node Exporters for your VPN servers
   - job_name: 'node_exporter_vpn'
     static_configs:
       - targets:
           - 'x.x.x.x:9100' # Vanilla Wireguard VPN Server 
           - 'x.x.x.x:9100' # voavanga OpenVPN VPN server
           # Add other VPN server IPs here as needed

   # Node Exporters for your terminal VMs
   - job_name: 'node_exporter_terminals'
     static_configs:
       - targets:
           - 'x.x.x.x:9182'  # Wahoo Win 11 desktop
           - 'x.x.x.x:9182'  # Walnut Win 11 desktop (with jellyfin) 
           - 'x.x.x.x:9100'  # Lychee linux desktop
           # Add other terminal VM IPs here as needed

   # Node Exporters for your mgt network VMs (if any you want to monitor)
   - job_name: 'node_exporter_mgt'
     static_configs:
       - targets:
           - 'x.x.x.x:9100' # Lemon
           # Add other mgt VM IPs here as needed

   # Job for Nginx Exporter on Raisin (192.168.100.9)
   - job_name: 'nginx_reverse_proxy_raisin'
     static_configs:
       - targets: ['x.x.x.x:9113'] # Default port for nginx-exporter  

   # Job for MySQL Exporter on Mandarin (192.168.100.8)
   - job_name: 'mysql_server_mandarin'
     static_configs:
       - targets: ['x.x.x.x:9104'] # Default port for mysqld_exporter

   # job for Apache Exporter on webservers
   - job_name: 'apache_webservers'
     static_configs:
       - targets:
           - 'x.x.x.x:9117' # plum  webserver (photo, wiki and www) apache2
           - 'x.x.x.x:9117' # satsuma (samba, apache2, photosort)
           - 'x.x.x.x:9117' # fig (nextcloud)

    # Job for Proxmox Host
    - job_name: 'proxmox_host_pear'
      static_configs:
        - targets:
            - 'x.x.x.x:9100' # Replace with your Proxmox host's actual IP

At the end of the file there is a load of comments to give some guidance on how to write the config, it would be better to leave them in for future reference.

The Prometheus server application has a webserver component that can be viewed on port 9090 as shown in the scrape_configs: section above. As has been noted there is a security implication to Prometheus in that it is giving detailed information about the state of the whole network so with that in mind the Pfsense rule allowing access should be kept specifically to the MGT network. It will not make any difference to Grafana on Granadilla because it is on the the same network.

Pfsense Rules

Before we can see any data from Prometheus we will need to add the exporter agent to each machine and we will also need to add a rule to Pfsense to allow Prometheus to access the host being monitored, note the rule will be for Pineapple (Prometheus) on the Infra network to be the source and the host's network to be the destination because it is up to Prometheus to request the data, not the agent to send it. Assuming the above config we will need the following TCP rules

  • On the Infra Interface allow source Pineapple port 9100 destination Production, MGT, VPNnet and Terminals port 9100. # This is the basic exporter
  • On the Infra Interface allow source Pineapple port 9113 destination Production port 9113 # This is for Nginx specific exporter
  • On the Infra Interface allow source Pineapple port 9117 destination Production port 9117 # This is for Apache specific exporter
  • On the Infra Interface allow source Pineapple port 9104 destination Production port 9104 # This is for MySQL specific exporter
  • On the Infra Interface allow source Pineapple port 9182 destination Terminals port 9182 # This is for Windows specific exporter
  • On the Infra Interface allow source Pineapple port 9100 destination pear port 9100 # This is specifically to allow pineapple to access Pear and it will probably need to be on the WAN interface. Note that this rule is passing out of the network and onto the host Pear.
  • On the MGT interface allow source MGT port 9090 destination Pineapple port 9090 # This rule is to allow lemon or any host on the MGT network to be able to view the Prometheus webserver on Pinapple port 9090

Agent Installation

When the rules are made to allow Prometheus to pull the data from it's agents we can start adding them to the VMs. We will install the node_exporter on everything as this is a basic CPU, RAM, Network ETC agent, the only exception is the two Windows 11 hosts. The other agents are specifically geared to a particular application so not required on every host.

Node Exporter

The basic agent to be installed on every Linux host. Start by adding a user to run the agent and a directory to put it.

sudo useradd --no-create-home --shell /bin/false node_exporter
sudo mkdir /etc/node_exporter

Then we need to locate the agent binaries for the most up to date version so browse to the github web page at https://prometheus.io/download/ and look for the version that says prometheus-x.x.x.linux-amd64.tar.gz and copy the link address, then we download it, uncompress it move it to the correct directory and set appropriate permissions.

wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.9.1.linux-amd64.tar.gz 
cd node_exporter-1.9.1.linux-amd64/
sudo mv node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

We will need it to be a service so we need a service file

sudo nano /etc/systemd/system/node_exporter.service

and we need to add some boilerplate code to the service config

 [Unit]
 Description=Prometheus Node Exporter
 Wants=network-online.target
 After=network-online.target

 [Service]
 User=node_exporter
 Group=node_exporter
 Type=simple
 ExecStart=/usr/local/bin/node_exporter \
     --web.listen-address="0.0.0.0:9100"

 [Install]
 WantedBy=multi-user.target

When we have saved and exited from the config we will need to Reload Systemd

sudo systemctl daemon-reload

Next we should enable, start and check the service

sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

To test that it is working we need to go back to pineapple and attempt to extract the data from the agent with the curl command

 curl http://x.x.x.x:9100/metrics

We should see a load of data flow on to the screen. If nothing is displayed there is a problem with either the firewall or the agent. The easiest way to isolate the problem to one or the other is to go back to the client where the agent was installed and run

 curl http://127.0.0.1:9100/metrics

As this is the local host we should see output from the agent. If we see output we need to check firewall/s both on the localhost and on Pfsense because clearly the agent is doing it's stuff but Pineapple cant read it. If there is no output we can be reasonably sure that the agent is not working.

As soon as curl on Pineapple starts returning data from the agent it will trigger the webserver on part of Prometheus to show the host as up. Lemon has a desktop and browser installed and the firewalll rule allows 9090 from MGT so from Lemon http://pineapple:9090 and select Status -> Target Health, a listing of all of the endpoint will be displayed showing the last scrape time and the current state. If the target is showing as unknown and the state is down try waiting a few seconds or however long the refresh time is at the top of the Prometheus config file is set at. Note that the running configuration can be viewed from the same menu Status -> Configuration.

Mysql Exporter on Mandarin

Madarin is the My SQL server so as well as the basic node explorter, we will have a MySQL exporter installed so that it will give metrics specific to MySQL in addition to the normal CPU, RAM, Network and similar metrics. To scrape thes details we will need to set up another user, install the agent and setup a MySQL user. First add the user for the agent and create the directory for the agent:

sudo useradd --no-create-home --shell /bin/false mysqld_exporter
sudo mkdir /etc/mysqld_exporter

next we will need to locate and download the binaries so we need to browse to https://prometheus.io/download/ and scroll down to the mysqld_exporter section and copy the link address to mysqld_exporter-x.x.x.linux-amd64.tar.gz then use wget to download it

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.17.2/mysqld_exporter-0.17.2.linux-amd64.tar.gz

we extract it with

tar -xvf mysqld_exporter-0.17.2.linux-amd64.tar.gz

Then we copy the binary to the directory created above and set permissions

cd mysqld_exporter-0.17.2.linux-amd64/
sudo mv mysqld_exporter /usr/local/bin/
sudo chown mysqld_exporter:mysqld_exporter /usr/local/bin/mysqld_exporter

The next step will be to create a MySQL user that has access to the metrics so we will need a new password generated and added to Keepass

sudo mysql -u root -p

When logged in to MySQL we create the user

CREATE USER 'mysqld_exporter'@'x.x.x.x' IDENTIFIED BY 'your_secure_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqld_exporter'@'localhost';
FLUSH PRIVILEGES;
EXIT;

We now need an exporter configuration

sudo nano /etc/mysqld_exporter/.my.cnf

With the following text

[client]
host=x.x.x.x # Crucial: MySQL binds to this IP
user=mysqld_exporter
password=YOUR_SECURE_PASSWORD

save and exit

  • It should be noted that the user gave alot of trouble to make it login to MySQL not because of the password but because of the manner in which the MySQL_Exporter logs in to the database. To make it more confusing was that curl does it's login differently to the MySQL_Exporter service.It would appear that curl logs in to MySQL using localhost as the source but MySQL_Exporter does so by the IP address. This should not normally matter but in this case the minor difference between [email protected] is interpreted differently to mysqld_exporter@localhost by MySQL authentication. In the MySQL config at /etc/mysql/mysql.conf.d/mysqld.cnf the directive bind-address was set to the IP address of Mandarin (bind-address = x.x.x.x) so when the user mysqld_exporter was set to localhost and that was resolving to 127.0.0.1 MySQL would not accept it as it was @ the wrong host IP. It would be possible to set the bind-address to localhost or 127.0.0.1 but that would possibly break some other login so we will just remember to check the bind-address variable in /etc/mysql/mysql.conf.d/mysqld.cnf if we have to create another user @localhost

Now that we have the user set we need to set Permissions to be quite restrictive

sudo chown mysqld_exporter:mysqld_exporter /etc/mysqld_exporter/.my.cnf 
sudo chmod 600 /etc/mysqld_exporter/.my.cnf

We create a service config with

sudo nano /etc/systemd/system/mysqld_exporter.service

With the following configuration

 
 [Unit]
 Description=Prometheus MySQL Exporter
 Wants=network-online.target
 After=network-online.target mysql.service

 [Service]
 User=mysqld_exporter
 Group=mysqld_exporter
 Type=simple
 ExecStart=/usr/local/bin/mysqld_exporter \
    --config.my-cnf=/etc/mysqld_exporter/.my.cnf \
    --web.listen-address=0.0.0.0:9104

 [Install]
 WantedBy=multi-user.target

After we save and exit we reload the systemd and start the service

sudo systemctl daemon-reload
sudo systemctl start mysqld_exporter
sudo systemctl enable mysqld_exporter
sudo systemctl status mysqld_exporter

Assuming the status looks good we need to check that Pineapple can read the data so login to Pineapple and do the curl thing

curl http://x.x.x.x:9104/metrics

There should be a bucket load of metrics returned by curl if everything is working. If there is no data go back to Mandarin and do the same curl, if there is now a load of metrics coming out the problem is the firewall rule is not allowing Pineapple to access Mandarin on port 9104. If there is no output on Mandarin there is a problem with the mysqld_exporter service or MySQL login.

Assuming any problems are resolved as a final check login to Lemon and browse to http://pinapple:9090 and select Status -> Target Health and check that the endpoint in the section mysql_server_mandarin is showing as up. If not check the configuration in Status -> Configuration has the correct details for Mandarin.

Nginx Exporter

In the same way that MySQL has certain metrics that are exclusive to MySQL so to does Nginx have it's own set metrics and therefore an exporter was created especially for Nginx.

Raisin is the only host with nginx installed and as it is only a Reverse Proxy it does not have any websites that it is serving directly, instead it forwards any and all requests to the relevant webserver. what we will do in this case is to create a stub server that is only going to listen on 127.0.0.1:8080 so that it cannot be accessed by any other host but itself. To do that we need to create the stub with

sudo nano /etc/nginx/conf.d/nginx_stub_status.conf

and add the following configuration

server {
    listen 127.0.0.1:8080; # Internal port for status page
    server_name localhost;
    location /nginx_status {
        stub_status on;
        allow 127.0.0.1;
        deny all;
    }
}

After save and exit the config must be tested with

sudo nginx -t

If that looks good restart Nginx with

sudo systemctl reload nginx
sudo systemctl status nginx

although if the config was bad the -t and status would show errors we can still verify locally with

curl http://127.0.0.1:8080/nginx_status

We should see something like

server accepts handled requests
 1001 1001 940 
Reading: 0 Writing: 1 Waiting: 0 

With the stub webserver setup we can create the service user and directory with

sudo useradd --no-create-home --shell /bin/false nginx_exporter
sudo mkdir /etc/nginx_exporter 

Now it is time to locate the binaries at https://github.com/nginx/nginx-prometheus-exporter/releases . As before we will need the link to the latest exporter( the icon marked "latest" is a link) and we use the link in the wget

wget https://github.com/nginx/nginx-prometheus-exporter/releases/download/v1.4.2/nginx-prometheus-exporter_1.4.2_linux_amd64.tar.gz

Uncompress with

tar -xvf nginx-prometheus-exporter_1.4.2_linux_amd64.tar.gz

and copy the binary with the correct permissions with

sudo mv nginx-prometheus-exporter /usr/local/bin/
sudo chown nginx_exporter:nginx_exporter /usr/local/bin/nginx-prometheus-exporter

Also as before we create a service file with

sudo nano /etc/systemd/system/nginx_exporter.service

and populate it with

 
 [Unit]
 Description=Prometheus Nginx Exporter
 Wants=network-online.target
 After=network-online.target nginx.service

 [Service]
 User=nginx_exporter
 Group=nginx_exporter
 Type=simple
 ExecStart=/usr/local/bin/nginx-prometheus-exporter \
     --web.listen-address=0.0.0.0:9113 \
     --nginx.scrape-uri="http://127.0.0.1:8080/nginx_status"

 [Install]
 WantedBy=multi-user.target

Save & exit. reload systemd, start the service and enable the service with

sudo systemctl daemon-reload
sudo systemctl start nginx_exporter
sudo systemctl enable nginx_exporter
sudo systemctl status nginx_exporter

Assuming the status looks ok login to Pineapple and do the curl with Raisin's IP address and Nginx port number 9113

curl http://x.x.x.x:9113/metrics

As previous notes have stated if pineapple cannot read the exporter's results check the metrics are returned locally to see if it is the service or the firewall that is stopping it from working. Assuming the service is being read by curl on Pineapple it should also be checked in the Lemon's web browser at http://pinapple:9090 status -> Target Health, the section nginx_Reverse_proxy_raisin should have the endpoint as up.

Apache Exporter

Although they are both webservers Apache has a different exporter to Nginx. Apache has a mod that reads status and unsurprisingly it is called mod_status, we can check that is installed with:

sudo a2enmod status

As we did with Nginx we can create a stub with

sudo nano /etc/apache2/sites-available/apache-status.conf

and add the following config

Listen 127.0.0.1:8081 # Add to /etc/apache2/ports.conf
<VirtualHost 127.0.0.1:8081>
    ServerName localhost
    DocumentRoot /var/www/html
    <Location /server-status>
        SetHandler server-status
        Order deny,allow
        Deny from all
        Allow from 127.0.0.1
    </Location>
</VirtualHost>

Save and exit. Note Doc root must point at a valid file or apache will not start, the file will never be read but it still must be valid and accessible by apache. We enable the site with

sudo a2ensite apache-status.conf

then test and reload apache with

sudo apache2ctl configtest 
sudo systemctl reload apache2
sudo systemctl status apache2

Again as in the Nginx errors should have shown by now but it is still best to check with

curl http://127.0.0.1:8081/server-status?auto

Assuming the stub is working as expected we can create the system user

sudo useradd --no-create-home --shell /bin/false apache_exporter
sudo mkdir /etc/apache_exporter

AS maybe expected by now we need to locate the binaries for the apache exporter browse to https://github.com/Lusitaniae/apache_exporter/releases and click the "latest" button then copy the link to the file apache_exporter-X.Y.Z.linux-amd64.tar.gz then

wget https://github.com/Lusitaniae/apache_exporter/releases/download/v1.0.10/apache_exporter-1.0.10.linux-amd64.tar.gz
tar -xvf apache_exporter-1.0.10.linux-amd64.tar.gz
cd apache_exporter-1.0.10.linux-amd64/
sudo mv apache_exporter /usr/local/bin/
sudo chown apache_exporter:apache_exporter /usr/local/bin/apache_exporter

Now create the service config file

sudo nano /etc/systemd/system/apache_exporter.service

and populate with

 [Unit]
 Description=Prometheus Apache Exporter
 Wants=network-online.target
 After=network-online.target apache2.service

 [Service]
 User=apache_exporter
 Group=apache_exporter
 Type=simple
 ExecStart=/usr/local/bin/apache_exporter \
     --web.listen-address=0.0.0.0:9117 \
     --scrape_uri=http://127.0.0.1/server-status?auto

 [Install]
 WantedBy=multi-user.target

Save & exit and Reload Systemd, start and enable the service

sudo systemctl daemon-reload
sudo systemctl start apache_exporter
sudo systemctl enable apache_exporter
sudo systemctl status apache_exporter

Assuming the status looks good login to pineapple and do the curl test with

 curl http://x.x.x.x:9117/metrics

If that does not produce a result test on the local host and adjust either the firewall rule or the fix the service. When curl is producing results check that the server is showing as up on the website http://pineapple:9090 menu item Status -> Target Health and the relevant endpoint in apache_webservers.

If the stub doesn't work we can delete the stub created above and create a different web stub but we must modify the ports config file at

sudo nano /etc/apache2/ports.conf

and add in a new directive that listens to a new port 8081 on 127.0.0.1 so the file should look something like

Listen 80
Listen 127.0.0.1:8081
<IfModule ssl_module>
        Listen 443
</IfModule>
<IfModule mod_gnutls.c>
        Listen 443
</IfModule>

After a save and close we can create a new config in sites-available

sudo nano /etc/apache2/sites-available/apache-status.conf

with the contents

<VirtualHost 127.0.0.1:8081>
    ServerName localhost
    DocumentRoot /var/www/html 
    ErrorLog ${APACHE_LOG_DIR}/apache-status_error.log
    CustomLog ${APACHE_LOG_DIR}/apache-status_access.log combined
    <Location /server-status>
        SetHandler server-status
        Order deny,allow
        Deny from all
        Allow from 127.0.0.1 
    </Location>
</VirtualHost>

Save & exit. Note as previously stated the docroot must have a valid location or apache will not start.

Copy the file to sites-enabled with

sudo a2ensite apache-status.conf

and test the config with

sudo apache2ctl configtest 

before the

sudo systemctl reload apache2
sudo systemctl status apache2

If status is good try

curl http://127.0.0.1:8081/server-status?auto 

if this works carry on with the installation

Windows 11 Exporter on Walnut and Wahoo

All of the previous exporters have been for Linux servers but as we have two Windows 11 Pro hosts that we also want to monitor. It is unusual to monitor Windows desktops and the browsers in windows do give warning about the exporter being a rare download. It should also be noted that the exporter is a beta test version although it looks like a release candidate is immanent.

we can download the beta release from https://github.com/prometheus-community/windows_exporter/releases. A folder needs to be created for the service to use c:\program files\Windows_exporter. Then the downloaded file needs to be moved to the newly created folder and the binary extracted if the file does not contain an archive it is because it is the binary so does not need to be extracted. From now we need to use powershell, it can be started by R/H mouse on the start menu and select Terminal(Admin), it must be run as administrator.

cd "c:\program files\windows_exporter"

If there is an old service still present (like if this doesn't work first time)it can be removed by the command

Stop-Service -Name windows_exporter -ErrorAction SilentlyContinue; sc.exe delete windows_exporter

we install the service with the command

sc.exe create windows_exporter binPath="C:\Program Files\windows_exporter\windows_exporter-0.30.7-amd64.exe --web.listen-address=0.0.0.0:9182 --log.level=info" DisplayName="Prometheus Windows Exporter" start=auto

To break down the command

  • sc.exe : This is the Service Control command-line utility in Windows. It's used to communicate with the Service Control Manager (SCM) to create, delete, query, or configure Windows services.
  • create : This is the subcommand for sc.exe that tells it to create a new service.
  • windows_exporter : This is the ServiceName (or ServiceKeyName). This is the internal, unique name that Windows will use to identify this service. It's typically a short, descriptive name without spaces. You'd use this name in other sc.exe commands (e.g., sc.exe start windows_exporter).
  • binPath="C:\Program Files\windows_exporter\windows_exporter-0.30.7-amd64.exe --web.listen-address=0.0.0.0:9182 --log.level=info" : This is the Binary Path parameter of sc.exe. It specifies the full path to the executable file that the service will run, along with any command-line arguments that should be passed to that executable when the service starts.
    • C:\Program Files\windows_exporter\windows_exporter-0.30.7-amd64.exe : This is the absolute path to the windows_exporter executable. This is the actual program that will run as the service.
    • --web.listen-address=0.0.0.0:9182 : This is a command-line argument passed to the windows_exporter.exe executable.
      • --web.listen-address : A common flag used by Prometheus exporters to specify the network address and port on which they should listen for incoming scrape requests from Prometheus.
      • 0.0.0.0:9182 : Means the windows_exporter will listen on all available network interfaces of the Windows machine on port 9182.
    • --log.level=info : Another command-line argument passed to the windows_exporter.exe executable

This sc.exe command creates a new Windows service named windows_exporter (internally). This service will be displayed as "Prometheus Windows Exporter" in the Services Manager. When the Windows system boots, this service will automatically start. Upon starting, it will execute the windows_exporter-0.30.7-amd64.exe program, passing it arguments to listen for incoming Prometheus scrape requests on all network interfaces on port 9182, and to log informational messages.

start the service with

Start-Service windows_exporter

verify the service with

Get-Service windows_exporter

If this is not the first try run

Stop-Service -Name windows_exporter -ErrorAction SilentlyContinue

sc.exe delete windows_exporter It will complain that there are two commands but it should remove any service that was unsuccessfully installed

We can create a firewall rule for windows firewall with

New-NetFirewallRule -DisplayName "Prometheus Windows Exporter" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 9182 -Profile Private,Public,Domain

It probably does not need to allow Private, Public and Domain most likely Domain would work

As with all of the Linux hosts we test with curl from Pineapple on port 9182

curl http://x.x.x.x:9182/metrics

if successful check on http://pineapple:9090 Status -> Target Health in the node_exporter-terminals section

smartctl_exporter_script.sh (Proxmox Disk SMART Metrics)

Detailed Installation Notes: smartctl_exporter_script.sh (Proxmox Disk SMART Metrics)

  • Purpose: To collect detailed S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) data from the physical hard drives and SSDs in the Proxmox host (pear) and expose them as Prometheus metrics via the node_exporter's textfile collector. This allows monitoring of drive health (temperature, read/write errors, power-on hours, etc.).
  • Metrics Collected: smartctl_health_status, smartctl_temperature_celsius, smartctl_power_on_hours_total, and potentially others (lbas_read_total, lbas_written_total, reallocated_sectors_total, NVMe-specific stats) if available from the drives.
  • Default Port: 9100/TCP (metrics are collected by node_exporter which listens on this port).
  • Method: A custom Bash script that runs smartctl, parses its JSON output with jq, formats it into Prometheus textfile format, and writes it to a file (smart.prom) that node_exporter is configured to read. The script is scheduled via cron.
  • Install Prerequisites: The script relies on smartctl (from smartmontools) to query drive S.M.A.R.T. data and jq to parse the JSON output from smartctl.
apt update && apt install -y jq smartmontools
  • Enable node_exporter Collectors for S.M.A.R.T. and ZFS: The node_exporter needs to be told to activate its zfs collector (for ZFS pool statistics) and its textfile collector (which will read the smart.prom file generated by our script). Edit the node_exporter Systemd service file :
nano /etc/systemd/system/node_exporter.service

Locate the ExecStart line and ensure it includes these flags:

 
 [Service]
 User=node_exporter
 Group=node_exporter
 Type=simple
 ExecStart=/usr/local/bin/node_exporter \
     --web.listen-address="0.0.0.0:9100" \
     --collector.zfs \
     --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

 [Install]
 WantedBy=multi-user.target

save & exit. Restart node_exporter service: This applies the new collector settings.

systemctl restart node_exporter
  • Create and set permissions for collector directories: The node_exporter user needs write access to the textfile_collector directory for the script to deposit its metrics, and read/write access to the log directory.
mkdir -p /var/lib/node_exporter/textfile_collector
chown node_exporter:node_exporter /var/lib/node_exporter/textfile_collector
mkdir -p /var/log/node_exporter/
chown node_exporter:node_exporter /var/log/node_exporter/

Add node_exporter user to the disk group: This is crucial for node_exporter (and thus the script running as this user) to have the necessary permissions to read raw disk data via smartctl.

usermod -a -G disk node_exporter

Important: For this group change to fully take effect, the node_exporter service (which runs as this user) must be restarted:

systemctl restart node_exporter.
  • Create the S.M.A.R.T. Exporter Script (/usr/local/bin/smartctl_exporter_script.sh): This Bash script contains the logic to query smartctl for each drive, parse the output, and format it. Create the file:
nano /usr/local/bin/smartctl_exporter_script.sh

the file should look like this

 #!/bin/bash
 set -e # Exit immediately if a command exits with a non-zero status

 # --- Configuration ---
 SMARTCTL_BIN="/usr/sbin/smartctl"
 OUTPUT_DIR="/var/lib/node_exporter/textfile_collector"
 OUTPUT_FILE="$OUTPUT_DIR/smart.prom"
 TEMP_FILE="$OUTPUT_FILE.tmp"
 LOG_FILE="/var/log/node_exporter/smartctl_exporter.log" # Log file for the script's o>

 # --- Helper Function for Logging ---
 log_message() {
   echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
 }

 # --- Setup Directories (Ensure permissions are correct) ---
 mkdir -p "$OUTPUT_DIR"
 chown node_exporter:node_exporter "$OUTPUT_DIR"
 mkdir -p "$(dirname "$LOG_FILE")"
 chown node_exporter:node_exporter "$(dirname "$LOG_FILE")"

 # --- Ensure necessary tools are available ---
 if ! command -v smartctl &> /dev/null; then
     log_message "ERROR: smartctl not found. Please install smartmontools."
     exit 1
 fi
 if ! command -v jq &> /dev/null; then
     log_message "ERROR: jq not found. Please install jq."
     exit 1
 fi 
 
 # --- Start Metric Collection ---
 log_message "Starting SMART metric collection."
 echo "" > "$TEMP_FILE" # Clear old metrics, ensuring the file is initially empty

 # --- Define Specific Disks to Monitor ---
 TARGET_DISKS=(
   "/dev/disk/by-id/ata-ST16000NE000-2RW103_ZL2JK4TL"
   "/dev/disk/by-id/ata-ST16000NE000-2RW103_ZL2JK4TM"
   "/dev/disk/by-id/ata-ST16000NE000-2RW103_ZL2JK4VE"
   "/dev/disk/by-id/ata-Lexar_SSD_NQ100_960GB_QCT180R0006120S334"
   "/dev/disk/by-id/nvme-CT4000P3PSSD8_2339E879DC47"
   "/dev/disk/by-id/nvme-CT4000P3SSD8_2332E8684258"
 )

 for DISK_ID_PATH in "${TARGET_DISKS[@]}"; do
   # Determine device type for smartctl
   DEVICE_TYPE=""
   if [[ "$DISK_ID_PATH" == *"/dev/disk/by-id/nvme-"* ]]; then
     DEVICE_TYPE="nvme"
   elif [[ "$DISK_ID_PATH" == *"/dev/disk/by-id/ata-"* ]]; then
     DEVICE_TYPE="ata"
   else
     log_message "WARN: Unknown device type for $DISK_ID_PATH. Skipping."
     continue
   fi

   DEVICE_BASENAME=$(basename "$(readlink -f "$DISK_ID_PATH")")

   log_message "Processing disk: $DISK_ID_PATH (type: $DEVICE_TYPE, basename: $DEVICE_>

   # Run smartctl and get JSON output
   SMART_DATA=$(smartctl -a -j -d "$DEVICE_TYPE" -T permissive "$DISK_ID_PATH" 2>/dev/>
   SMARTCTL_EXIT_CODE=$?

   if [ $SMARTCTL_EXIT_CODE -ne 0 ] || [ -z "$SMART_DATA" ]; then
     log_message "ERROR: smartctl failed for $DISK_ID_PATH (exit code: $SMARTCTL_EXIT_>
     echo "# HELP smartctl_exporter_error_running_smartctl Could not run smartctl or p>
     echo "# TYPE smartctl_exporter_error_running_smartctl gauge" >> "$TEMP_FILE"
     echo "smartctl_exporter_error_running_smartctl{device=\"$DEVICE_BASENAME\",id=\"$>
     continue
   fi

   # Extract common SMART attributes regardless of type
   HEALTH_STATUS=$(echo "$SMART_DATA" | jq -r '.smart_status.passed // "null"')
   DISK_SERIAL=$(echo "$SMART_DATA" | jq -r '.serial_number // "unknown"')
   DISK_MODEL=$(echo "$SMART_DATA" | jq -r '.model_name // "unknown"' | sed 's/ /_/g')
   DISK_VENDOR=$(echo "$SMART_DATA" | jq -r '(.vendor // .ata_identify_device.vendor_i>

   # Basic overall health (1=passed, 0=failed)
   if [ "$HEALTH_STATUS" == "true" ]; then
     echo "smartctl_health_status{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERIAL\",>
   else
     echo "smartctl_health_status{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERIAL\",>
   fi

   # --- Conditional JQ parsing based on DEVICE_TYPE ---
   # Initialize all values to null for safety before parsing
   TEMP_CELSIUS="null"
   POWER_ON_HOURS="null"
   TOTAL_LBAS_READ="null"
   TOTAL_LBAS_WRITTEN="null"
   REALLOC_SECTORS="null"
   PERCENT_USED="null"
   AVAILABLE_SPARE="null"

   if [ "$DEVICE_TYPE" == "nvme" ]; then
       TEMP_CELSIUS=$(echo "$SMART_DATA" | jq -r '.nvme_smart_health_information_log.t>
       POWER_ON_HOURS=$(echo "$SMART_DATA" | jq -r '.nvme_smart_health_information_log>
       TOTAL_LBAS_READ=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_information_l>
       TOTAL_LBAS_WRITTEN=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_informatio>
       PERCENT_USED=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_information_log.>
       AVAILABLE_SPARE=$(echo "$SMART_DATA" | jq -r '(.nvme_smart_health_information_l>

   elif [ "$DEVICE_TYPE" == "ata" ]; then
       TEMP_CELSIUS=$(echo "$SMART_DATA" | jq -r '(.temperature.current // (.ata_smart>
       POWER_ON_HOURS=$(echo "$SMART_DATA" | jq -r '(.power_on_time.hours // (.ata_sma>
       TOTAL_LBAS_READ=$(echo "$SMART_DATA" | jq -r '(.ata_smart_attributes.table[] | >
       TOTAL_LBAS_WRITTEN=$(echo "$SMART_DATA" | jq -r '(.ata_smart_attributes.table[]>
       REALLOC_SECTORS=$(echo "$SMART_DATA" | jq -r '(.ata_smart_attributes.table[] | >
   fi

   # --- Output extracted metrics (common to write after parsing) ---
   if [ "$TEMP_CELSIUS" != "null" ]; then
     echo "smartctl_temperature_celsius{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SER>
   fi
   if [ "$POWER_ON_HOURS" != "null" ]; then
     echo "smartctl_power_on_hours_total{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SE>
   fi
   if [ "$TOTAL_LBAS_READ" != "null" ]; then
     echo "smartctl_lbas_read_total{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERIAL\>
   fi
   if [ "$TOTAL_LBAS_WRITTEN" != "null" ]; then
     echo "smartctl_lbas_written_total{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SERI>
   fi
   if [ "$REALLOC_SECTORS" != "null" ]; then
     echo "smartctl_reallocated_sectors_total{device=\"$DEVICE_BASENAME\",serial=\"$DI>
   fi
   if [ "$PERCENT_USED" != "null" ]; then
     echo "smartctl_nvme_percentage_used{device=\"$DEVICE_BASENAME\",serial=\"$DISK_SE>
   fi
   if [ "$AVAILABLE_SPARE" != "null" ]; then
     echo "smartctl_nvme_available_spare_percent{device=\"$DEVICE_BASENAME\",serial=\">
   fi

 done

 # --- Finalize Output ---
 mv "$TEMP_FILE" "$OUTPUT_FILE" || log_message "ERROR: Failed to move temp file to $OU>

 log_message "SMART metric collection completed."

 exit 0

Make the Script Executable:

chmod +x /usr/local/bin/smartctl_exporter_script.sh

Schedule the Script with Cron to run the script every 12 hours (There does not seem much point in having the second by second status of SMART as it is unlikely to change particularly quickly, so twice per day seems to be more than adequate). To pen the crontab for the node_exporter user:

crontab -e -u node_exporter

and add

0 */12 * * * /usr/local/bin/smartctl_exporter_script.sh > /dev/null 2>&1

The contab entry has six fields: Minute (0-59), Hour (0-23, where 0 is midnight), Day of Month (1-31), Month (1-12 or Jan-Dec),Day of Week (0-7, where 0 and 7 are Sunday) and finally the command to execute. So that means this command will run

  • 0 (First field: Minute): "at the 0th minute of the hour" (i.e., on the hour).
  • */12 (Second field: Hour):
    • The * means "every possible value" for that field.
    • The / indicates a step value.
    • The 12 combined with the last / means every 12 hours
  • The * in the third field (day of the Month) means every day
  • The next * in the forth field (Month) means every month
  • The * in the fifth field (day of the week) means everyday
  • The sixth and final field is the path to the executable file to be run.
    • It is always the best practice to have the path as an absolute value and not as a relative path like ~/some_script.sh.
    • It is better to have the script handle it's own logging.
    • Output from cron jobs (to stdout or stderr) will typically be emailed to the user who owns the crontab entry so if we don't want emails we can suppress all output by adding the > /dev/null 2>&1 to the end of the options so all output will be redirected to the null device. If we wanted output to go to a log file we would append it with the suffix >> /var/log/smartctl_exporter.log 2>&1
    • Ensure the script (smartctl_exporter_script.sh) has execute permissions (chmod +x /usr/local/bin/smartctl_exporter_script.sh).
    • Cron jobs run in a very minimal environment. If your script relies on specific environment variables (e.g., HOME, LANG, PATH), you might need to set them explicitly within the script itself or at the top of your crontab file.
Post-Installation Verification Steps
  • Manual Test (once script is created):
    • Test as node_exporter user:
su -c "bash /usr/local/bin/smartctl_exporter_script.sh" node_exporter
    • Check log file (should show successful execution, no errors) :
cat /var/log/node_exporter/smartctl_exporter.log
    • Check metrics file (should show formatted SMART metrics for all drives) :
cat /var/lib/node_exporter/textfile_collector/smart.prom
  • Prometheus Checks
    • After the script runs and node_exporter is restarted, check Prometheus UI directly for node_zfs_ and smartctl_ metrics.
http://x.x.x.x:9100/metrics | grep zfs
  • Add Grafana Dashboards Import Node - ZFS Stats (ID 7968) and S.M.A.R.T. (ID 22604)with x.x.x.x:9100 (pear's IP) as the instance.

This was a major installation that did take a lot of time to install and troubleshoot all of the agents but it is done now. The next steps would be either to create rules & alerts in the Prometheus Web GUI or to install Grafana on Granadilla. There is a a dedicated Proxmox exporter that will be worth looking at