Since prometheus doesn't support direct variable replacement in the .yml configuration, I'm using a template and script to run when the docker image starts to create the prometheus.yml file dynamically.
This is useful so it uses the correct ports from the execution and beacon config files automatically.
Prometheus Entrypoint Script
vim~/alerting/prometheus/entrypoint.sh
#!/bin/sh# Source environment variablessource/etc/default/execution-variables.envsource/etc/default/beacon-variables.envexport $(cut-d=-f1/etc/default/execution-variables.env)export $(cut-d=-f1/etc/default/beacon-variables.env)# Output fileOUTPUT="/tmp/prometheus.yml"# Start with an empty output file:>"$OUTPUT"# Process the template file with awk to replace environment variablesawk'{ while (match($0, /\$\{[^}]+\}/)) { varname = substr($0, RSTART + 2, RLENGTH - 3); value = ENVIRON[varname]; if (value == "") value = "UNDEFINED"; $0 = substr($0, 1, RSTART - 1) value substr($0, RSTART + RLENGTH); } print;}'"/etc/prometheus/prometheus.yml.template">"$OUTPUT"# Continue with Prometheus startupexec/bin/prometheus"$@"
# Make the scrip executable by everyone, not just the current userchmod+x~/alerting/prometheus/entrypoint.sh
Since each client has a different URL path for metrics, and I want a unified endpoint for Prometheus to use, configure an NGINX server to redirect requests to a single endpoint.
It will try each possible endpoint until it finds the actively running client, and if it doesn't find any, it will assume that it is down.
Install NGINX
sudoapt-getupdatesudoapt-getinstall-ynginx
NGINX Config Script
Edit this script to add additional client metrics paths.
sudovim/etc/default/nginx-config-script.sh
#!/bin/bashsource/etc/default/execution-variables.envsource/etc/default/beacon-variables.envEXECUTION_METRICS_FULL_URL=""BEACON_METRICS_FULL_URL=""# ******************# EXECUTION CLIENTS# ******************# Check if Geth service is runningstatus_code=$(curl -o /dev/null -s -w "%{http_code}" http://localhost:${EXECUTION_METRICS_PORT}/debug/metrics/prometheus)
if [ "$status_code"="200" ]; then EXECUTION_METRICS_FULL_URL=http://localhost:${EXECUTION_METRICS_PORT}/debug/metrics/prometheusfi# Check if Besu service is runningstatus_code=$(curl-o/dev/null-s-w"%{http_code}"http://localhost:${EXECUTION_METRICS_PORT}/metrics)if [ "$status_code"="200" ]; then EXECUTION_METRICS_FULL_URL=http://localhost:${EXECUTION_METRICS_PORT}/metricsfi# ***************# BEACON CLIENTS# ***************# Check if LH or Teku service is running (they both use /metrics)status_code=$(curl-o/dev/null-s-w"%{http_code}"http://localhost:${BEACON_METRICS_PORT}/metrics)if [ "$status_code"="200" ]; then BEACON_METRICS_FULL_URL=http://localhost:${BEACON_METRICS_PORT}/metricsfiexport $(cut-d=-f1/etc/default/execution-variables.env)export $(cut-d=-f1/etc/default/beacon-variables.env)export EXECUTION_METRICS_FULL_URLexport BEACON_METRICS_FULL_URLVARS='${NGINX_PROXY_EXECUTION_METRICS_PORT},\${EXECUTION_METRICS_FULL_URL},\${NGINX_PROXY_BEACON_METRICS_PORT},\${BEACON_METRICS_FULL_URL}'envsubst"$VARS"</etc/nginx/sites-available/default.template>/etc/nginx/sites-available/defaultecho"Configuration for NGINX has been updated."
sudochmod+x/etc/default/nginx-config-script.sh
NGINX Service Config
Edit the NGINX service file to run the /etc/default/nginx-config-script.sh script before every start.
sudovim/lib/systemd/system/nginx.service
[Unit]Description=A high performance web server and a reverse proxy serverDocumentation=man:nginx(8)After=network.target nss-lookup.target[Service]Type=forkingPIDFile=/run/nginx.pid# ***********# CHANGE HERE# ↓↓↓↓↓↓↓↓↓↓↓ExecStartPre=/usr/bin/sudo /etc/default/nginx-config-script.shExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reloadExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pidTimeoutStopSec=5KillMode=mixed[Install]WantedBy=multi-user.target
daemon-reload
NGINX Service - Restart CRON
I couldn't get the NGINX service to reliably wait for the EL/BN to start, so as a workaround, run this script with CRON every minute, and if NGINX isn't running, manually restart the service.
#!/bin/bash# Check if NGINX is active (running)if!systemctlis-active--quietnginx; thenecho"NGINX is not running. Attempting to restart..."# Reset the systemd state for NGINX to clear any failure statessudosystemctlreset-failednginx# Attempt to restart NGINXsudosystemctlrestartnginxecho"NGINX restart attempted."fi
groups:- name:ServiceDownAlertsrules: - alert:ServiceDownexpr:up == 0for:5mlabels:severity:criticalannotations:summary:"Service {{ $labels.job }} down"description:"{{ $labels.job }} has been down for more than 5 minutes."