Prometheus

UFW Config

sudo ufw allow 9090 comment 'Allow Prometheus UI in'
sudo ufw allow 9093 comment 'Allow Alertmanager UI in'

Docker Config

Create Directories

mkdir ~/alerting
mkdir ~/alerting/prometheus
mkdir ~/alerting/alertmanager

Since prometheus doesn't support direct variable replacement in the .yml configuration, I'm using a template and script to run when the docker image starts to create the prometheus.yml file dynamically.

This is useful so it uses the correct ports from the execution and beacon config files automatically.

Prometheus Entrypoint Script

vim ~/alerting/prometheus/entrypoint.sh
#!/bin/sh

# Source environment variables
source /etc/default/execution-variables.env
source /etc/default/beacon-variables.env
export $(cut -d= -f1 /etc/default/execution-variables.env)
export $(cut -d= -f1 /etc/default/beacon-variables.env)

# Output file
OUTPUT="/tmp/prometheus.yml"

# Start with an empty output file
: > "$OUTPUT"

# Process the template file with awk to replace environment variables
awk '{
    while (match($0, /\$\{[^}]+\}/)) {
        varname = substr($0, RSTART + 2, RLENGTH - 3);
        value = ENVIRON[varname];
        if (value == "") value = "UNDEFINED";
        $0 = substr($0, 1, RSTART - 1) value substr($0, RSTART + RLENGTH);
    }
    print;
}' "/etc/prometheus/prometheus.yml.template" > "$OUTPUT"

# Continue with Prometheus startup
exec /bin/prometheus "$@"

Create Docker Compose

Prometheus Config

Since each client has a different URL path for metrics, and I want a unified endpoint for Prometheus to use, configure an NGINX server to redirect requests to a single endpoint.

It will try each possible endpoint until it finds the actively running client, and if it doesn't find any, it will assume that it is down.

Install NGINX

NGINX Config Script

Edit this script to add additional client metrics paths.

NGINX Service Config

  • Edit the NGINX service file to run the /etc/default/nginx-config-script.sh script before every start.

NGINX Service - Restart CRON

  • I couldn't get the NGINX service to reliably wait for the EL/BN to start, so as a workaround, run this script with CRON every minute, and if NGINX isn't running, manually restart the service.

  • Runs every minute.

NGINX Template

Prometheus.yml Template

Alerts Config

Alertmanager Config

  • Edit PAGERDUTY_SERVICE_API_KEY

Last updated