Author: Mr. Watson 🦄 Date: 2026-02-08
- Goal
- Quick checks
- 1) Resource limits for AI services
- 2) Service failure notifications (iGotify)
- 3) High-load watchdog notification
- 4) Postfix decommissioned
- 5) eGPU watchdog (Thunderbolt)
- 6) Reverse-proxy consolidation for admin services
- 7) Legacy exposure cleanup (2026-02-08)
- 8) Apache exposure reduced (pgAdmin backend only)
- Verification commands
- Notes
Keep the server stable under heavy workloads and alert on failures before they become incidents.
systemctl status rag-library-ingest whisper-web system-load-guard.timer
systemctl list-timers --all | grep system-load-guard
journalctl -u system-load-guard.service -n 50 --no-pagerDrop-in:
/etc/systemd/system/rag-library-ingest.service.d/safeguards.conf
[Unit]
OnFailure=service-failure-notify@%n.service
[Service]
MemoryHigh=2500M
MemoryMax=3G
CPUQuota=200%
TasksMax=256
Nice=10Drop-in:
/etc/systemd/system/whisper-web.service.d/safeguards.conf
[Unit]
OnFailure=service-failure-notify@%n.service
[Service]
MemoryHigh=3G
MemoryMax=4G
CPUQuota=250%
TasksMax=256
Nice=5Files:
/usr/local/bin/service-failure-notify.sh/etc/systemd/system/service-failure-notify@.service
Template service runs on failure and sends iGotify notification.
Files:
/usr/local/bin/system-load-guard.sh/etc/systemd/system/system-load-guard.service/etc/systemd/system/system-load-guard.timer
Timer cadence:
- every 5 minutes
Alert conditions (with cooldown):
- load average above threshold (
cores * 2.0) - available RAM below threshold (default
700MB)
Cooldown default: 30 minutes to avoid notification spam.
Postfix was previously adjusted to reduce lookup noise, but was later removed from exposure and disabled due to spam-risk concerns.
Applied:
sudo ufw delete allow Postfix
sudo systemctl disable --now postfixCurrent state:
- SMTP port 25 no longer allowed in UFW
postfixservice disabled/inactive
Added automated eGPU monitoring + auto-recovery attempts + iGotify alerts.
Files:
/usr/local/bin/egpu-watchdog.sh/etc/systemd/system/egpu-watchdog.service/etc/systemd/system/egpu-watchdog.timer/etc/egpu-watchdog.env
Behavior:
- checks if expected Thunderbolt enclosure is connected (
boltctl) - checks if NVIDIA device exists in PCI (
lspci) - if missing, attempts auto-recovery:
- restart
bolt echo 1 > /sys/bus/pci/rescan- try
modprobe nvidia*
- restart
- if still missing, sends iGotify alert (with cooldown)
- sends recovery notification when NVIDIA reappears
Current env:
EGPU_NAME=Razer Core X
COOLDOWN_S=1800To reduce direct internet exposure, these admin UIs were moved behind Nginx + TLS:
nodered.beachlab.org->https://127.0.0.1:1880openhab.beachlab.org->http://127.0.0.1:8080pgadmin.beachlab.org->http://127.0.0.1:5050
Nginx files:
/etc/nginx/sites-available/nodered.beachlab.org/etc/nginx/sites-available/openhab.beachlab.org/etc/nginx/sites-available/pgadmin.beachlab.org
Legacy host redirects kept for compatibility:
node.beachlab.org->https://nodered.beachlab.orgpostgres.beachlab.org->https://pgadmin.beachlab.org
Before creating/reloading vhosts, verify cert existence:
sudo certbot certificatesMissing certs were requested with:
sudo certbot certonly --nginx \
-d nodered.beachlab.org \
-d openhab.beachlab.org \
-d pgadmin.beachlab.orgAPI endpoint was also moved to TLS with its own certificate:
sudo certbot certonly --nginx -d api.beachlab.orgapi.beachlab.org now redirects HTTP->HTTPS and serves only over 443.
Current API exposure state:
/air/disabled (pending DB/API setup)/data/disabled (legacy target unknown)- default response is
403until explicit upstreams are defined
After reverse-proxy cutover, direct public access rules were removed for admin/data ports:
sudo ufw delete allow 1880/tcp # Node-RED direct
sudo ufw delete allow 8080/tcp # openHAB direct
sudo ufw delete allow 5050 # pgAdmin direct
sudo ufw delete allow 5432 # PostgreSQL direct
sudo ufw delete allow 3000 # legacy direct API portResult:
- admin UIs are reachable through Nginx/TLS hostnames only
- direct PostgreSQL internet exposure removed
- external consumers should use
api.beachlab.orgendpoints, not raw DB ports
Per current usage, additional legacy/public rules were removed:
sudo ufw delete allow OpenSSH # removes 22/tcp profile rule
sudo ufw delete allow 1935
sudo ufw delete allow 443/udp
sudo ufw delete allow 622/udp
sudo ufw delete allow 1800
sudo ufw delete allow 1803
sudo ufw delete allow 5678/tcp
sudo ufw delete allow 3478
sudo ufw delete allow 49152:65535/tcp
sudo ufw delete allow 49152:65535/udpService/vhost removals:
sudo systemctl disable --now coturn
sudo rm -f /etc/nginx/sites-enabled/<retired-webrtc-vhost>
sudo rm -f /etc/nginx/sites-enabled/n8n
sudo systemctl disable --now n8n
sudo nginx -t && sudo systemctl reload nginxThis retires public obsninja/coturn exposure and n8n public entrypoint for now.
Apache is kept only as a local backend for pgAdmin web mode (*:5050), behind Nginx.
Public exposure removals:
sudo ufw delete allow 'Apache Full'
sudo ufw delete allow 8090
sudo ufw delete allow 10000Apache vhost cleanup:
sudo a2dissite 000-default-le-ssl
sudo a2dissite tileserver_site.conf
sudo sed -i 's/^\s*Listen\s\+8090/# Listen 8090/' /etc/apache2/ports.conf
sudo rm -f /etc/apache2/sites-available/tileserver_site.conf
sudo systemctl reload apache2Result:
- Apache listens on
5050only (internal backend use) - no public Apache SSL (
8090) exposure - no public tileserver Apache exposure (
10000)
# services
systemctl is-active rag-library-ingest whisper-web system-load-guard.timer egpu-watchdog.timer
systemctl is-enabled postfix && systemctl is-active postfix # expected: disabled/inactive
# effective limits
systemctl show rag-library-ingest --property=MemoryMax,MemoryHigh,CPUQuotaPerSecUSec,OnFailure
systemctl show whisper-web --property=MemoryMax,MemoryHigh,CPUQuotaPerSecUSec,OnFailure
# timers
systemctl list-timers --all | grep -E 'system-load-guard|egpu-watchdog'
# run one eGPU check manually
sudo /usr/local/bin/egpu-watchdog.sh --verbose
# check certs
sudo certbot certificates
# verify UFW rules (1880/8080/5050 should be absent)
sudo ufw status numbered
# validate and reload nginx
sudo nginx -t && sudo systemctl reload nginx
# smoke tests
curl -I https://nodered.beachlab.org/
curl -I https://openhab.beachlab.org/
curl -I https://pgadmin.beachlab.org/
# confirm SMTP no longer listening
ss -ltnp | grep ':25 ' || echo 'OK: no smtp listener'- SSH access for
pinkmust never be removed. - SFTP access for RAG uses ACLs on folder paths (not forced via SSH group side effects).