Fleet health
Fleet heat map
health score per device — green healthy, red needs attentionNeeds attention
Recent activity
Fleet roster · 7-day status
Quick links
| Label | Type | Target | Status | Detail | Checked | |
|---|---|---|---|---|---|---|
| No monitors configured. | ||||||
| Device | Alert | Memory | Swap | CPU load | Disks | |
|---|---|---|---|---|---|---|
| Device | Alert | CPU | Memory | Storage | Temp | Uptime | |
|---|---|---|---|---|---|---|---|
| No SNMP devices yet — enable SNMP on an agentless device's Settings tab. | |||||||
| Script | Device | Group | Status | Last output | Last run | Duration | |
|---|---|---|---|---|---|---|---|
| Click Refresh to load results. | |||||||
| Process | PID | Device | CPU % | Mem % |
|---|---|---|---|---|
| Click Refresh to load. | ||||
| Username | Created | Role | |
|---|---|---|---|
Profile
Two-Factor Authentication (2FA)
Protect your account with an authenticator app (Google Authenticator, Authy, etc.).
SSH preferences
Your default SSH username. Used by the quick-SSH link on the Devices page so you don't retype it each time. Stored per-user, not shared.
My acknowledged alerts
Open alerts you've taken ownership of (acknowledged but not yet resolved).
Getting started
A quick checklist of the essentials for a new RemotePower install. Each step reflects your current state and links straight to where to do it.
Server identity
Display name shown in the page title, push notification subject lines, and webhook payloads. Helps tell instances apart when you have more than one.
Default poll interval
How often newly-enrolled agents heartbeat back to the server. Existing devices are not affected — change theirs from the device detail page.
Online TTL
A device is considered offline if no heartbeat is received within this window. Should be at least 2 × poll_interval to avoid false offline alerts during a single missed poll.
Monitor check interval
How often monitors auto-check when the Monitor page is open. Also controls server-side offline/patch check frequency.
Wake-on-LAN
Magic packets are sent via UDP broadcast from the server. Adjust if your network uses a directed broadcast address.
Healthchecks.io watchdog
RemotePower pings this URL on a fixed interval so an external watchdog (Healthchecks.io, BetterStack, your own) flips red when the server stops serving requests. Off by default. Cadence defaults to 60 s, minimum 30 s.
Webhook destinations
Fire each event to any number of destinations simultaneously. Each entry has its own format adapter (Discord, Slack, Pushover, Teams, ntfy, generic JSON) and optional per-destination event filter. Use case: Pushover for critical-only push notifications + Discord channel for all events.
Pushover credentials (token + user key) are stored encrypted in config.json and redacted from the backup export.
Legacy webhook URL
The original single-URL field, still honoured for backward compatibility. New setups should use "Webhook destinations" above — it supports the same formats plus Pushover, Teams, per-event filtering, and labeling.
Per-event toggles
Enable or disable each event type. Disabled events are recorded as "disabled" in the webhook log so you can see what was suppressed.
Email notifications (SMTP)
Send the same events via email as a sibling channel to webhooks. Both channels respect maintenance windows. Email is opt-in per event in the table above.
✓ Password is currently being read from RP_SMTP_PASSWORD. The field above is ignored.
Webhook log
| Time | Event | Status | Detail | |
|---|---|---|---|---|
| No webhook deliveries yet. | ||||
Status endpoint
A machine-readable fleet summary at /api/status for external dashboards — Uptime Kuma, Homepage, Grafana. It needs a status token (not a login), so a monitoring tool can poll it, but it is not public. Generate a token to enable it.
Inbound webhooks & syslog
Two token types share this table:
- Alert webhook — POST JSON
{severity, title, …}to/api/webhook/in/<token>. Lands in the Alerts inbox. For Grafana, Alertmanager, Authentik, n8n, Home Assistant. - Syslog ingestion — POST RFC 3164/5424 lines (JSON
{lines:[…]}or plain text) to/api/syslog/in/<token>. Lines append to the device's log_watch under unitsyslog; existing log_alert rules fire as normal. For rsyslog (omhttp), fluent-bit (HTTP output), or any tool that can POST. Must be pinned to a device.
| Label | Type | Scope | Token | Hits | Last seen | Status | |
|---|---|---|---|---|---|---|---|
| No inbound tokens yet. | |||||||
Relay satellites
Run a satellite (client/remotepower-satellite.py) inside a segmented network — agents there reach this server through it (agent → satellite → server). Each satellite authenticates with its own token, so you can see and revoke relays independently of the agents. Set the minted token as RP_SATELLITE_TOKEN and RP_UPSTREAM on the satellite host, then point that segment's agents at http://<satellite>:8800.
OIDC / OpenID Connect
Sign in via an external identity provider (Authelia, Authentik, Keycloak, Pocket-ID, Google, etc.). Users authenticate against your IdP and arrive with the role mapped from their group membership. Existing local users and LDAP keep working in parallel.
CVE details cache
How long OSV.dev vulnerability details are cached before being re-fetched. Higher = fewer external requests, lower = newer descriptions on existing CVEs.
IP allowlist
Restrict UI/API access to specific source IPs. Loopback (127.0.0.1, ::1) is always allowed so the local MCP sidecar keeps working. Agent paths (heartbeat, enrollment, agent download) are exempt — enabling this never blocks an agent. Off by default.
Change approval (maker-checker)
When enabled, an arbitrary command run (Run command) is parked as a pending approval instead of executing immediately. A second admin approves it on the Confirmations page.
Listening-port & firewall audit
When enabled, the audit watches each host's listening sockets and active firewall ruleset, raising New listening port, World-exposed service and Host firewall changed alerts when a port first appears, first binds to a world-reachable address (0.0.0.0), or the firewall fingerprint drifts. This is noisy on Docker hosts, where docker-proxy publishes every container port to 0.0.0.0 — so it is off by default. The Exposure page still shows every socket regardless, so you keep the visibility without the alert noise.
Muted services — silence new-port / world-exposed alerts for a specific process and/or proto/port (e.g. docker-proxy) without turning the whole audit off. You can also mute straight from the Exposure table.
Audit log forwarding
Mirror every audit entry to an external SIEM (HTTP JSON POST) or a syslog collector (RFC 5424 over UDP/TCP). Best-effort and non-blocking — a forwarding outage never affects local logging.
Session length
How long a session lasts. Short by default; "remember me" extends to long.
LDAP / LDAPS authentication
External authentication source. Local users in users.json are tried first — emergency access never depends on LDAP being reachable. Users authenticated via LDAP are auto-provisioned with the role determined by group membership.
✓ Bind password is currently being read from RP_LDAP_BIND_PASSWORD. The field above is ignored.
Content-Security-Policy reporting
When the browser blocks something the CSP forbids (an injected inline script, a stylesheet from a disallowed origin, …) it POSTs a report to /api/csp-report. Each report becomes one audit-log line tagged csp:…. Disable here if reports get noisy; raise the per-IP throttle if you're investigating an active issue.
Audit log
Records every login, command, settings change, and CSP report. Live entries persist in audit_log.json; entries older than the retention window roll into the gzipped archive next to it. Retention is set in Settings → Advanced.
Transport security (HSTS)
HSTS forces every future visit to use HTTPS — defence against a downgrade attack. Set by nginx, not the application; this panel just reports whether the response on your current page is carrying the header.
add_header Strict-Transport-Security line in server/conf/remotepower.conf (or your in-place nginx config), then sudo nginx -t && sudo systemctl reload nginx. Only do this once you're certain the site is HTTPS-only — HSTS is sticky in the browser for the configured max-age.Audit log retention
Entries older than this many days are moved to audit_log_archive.jsonl.gz and kept indefinitely (gzipped, append-only). The live audit_log.json only holds recent entries — keeps the loaded-into-memory file small. Default 90 days. Set to 0 to disable age-based eviction (legacy count-only cap still applies).
Scheduled backup
Daily snapshot of /var/lib/remotepower to a tarball. Triggered once per 24h via the heartbeat hook (cheap when not due). Backup state and a "Run now" button are on the Server status page.
Debug logging
When enabled, verbose logs are written to the browser console (F12 → Console) and to /var/lib/remotepower/debug.log on the server. Disable in production — the log file grows unbounded.
Backup & restore
Two kinds of backup. The redacted export is a ZIP with API keys and integration secrets stripped — safe to store or share. The full disaster-recovery backup is a tar.gz of the entire data directory including the encrypted credentials vault and integration secrets — it can fully rebuild this controller, so keep it private. Restore applies a full backup and overwrites current data; a safety snapshot of the existing data is taken first (under restore-snapshots/ in the data dir).
Storage backend
RemotePower stores all data as flat-JSON files by default. For large fleets with many devices and frequent writes, switch to an embedded SQLite database (WAL mode) — a heartbeat becomes a single-row update instead of rewriting a whole file. Switching migrates your data in place after taking a rollback snapshot, verifies it, and only then flips the active backend. It is fully reversible (switch back to JSON the same way).
Data retention & maintenance
Cap how long historical records are kept. Set a value in days; 0 keeps everything (subject only to the per-log count limits). Old entries are pruned automatically once a day, and immediately when you run maintenance below. Open alerts are never purged — only resolved ones past their age limit.
Proxmox VE connection
Connect a single Proxmox VE node. The RemotePower server calls the Proxmox API directly to list QEMU VMs (Virtualization page) and LXC containers (Containers page), and to start / shut them down. Authentication uses a Proxmox API token — create one under Datacenter → Permissions → API Tokens.
Note: the API token secret is stored in the server's config.json (file mode 600). It is not encrypted at rest. Use an API token scoped to only the permissions it needs (VM.PowerMgmt, VM.Audit), not a full-access token.
More secure option: set the token secret in the environment variable RP_PROXMOX_TOKEN_SECRET (in the systemd unit or container env) instead of here. When that variable is set it takes precedence, the secret stays out of config.json, and it is not included in the backup export.
✓ The token secret is currently being read from the RP_PROXMOX_TOKEN_SECRET environment variable. The field below is ignored.
Proxmox ships a self-signed certificate by default — if you haven't installed a trusted cert, set Verify TLS to Off. Prefer installing a real certificate.
Mailbox monitor
Count messages in a mailbox without IMAP/SMTP. Pick a device, give it one or more directory paths, and the agent counts the regular files in each (a Maildir new folder holds one file per unread message). Tick "Show on dashboard" to surface a device's count as a tile on the Home dashboard.
AI provider
Wire in an LLM for the AI buttons on the command output, journal, scripts, CVE findings, devices, and notifications. Disabled by default. Cloud providers send the content of the request to a third party; pick a local provider (Ollama or LocalAI) if you don't want data to leave the building.
Privacy — what gets sent to the provider
By default, hostnames and IP addresses are stripped from every request before it leaves the building. Long hex strings, bearer tokens, and AWS access keys are always redacted regardless of these toggles. Cleartext journal content and command output are sent only when you explicitly opt in.
Context awareness (v2.1.7)
A small block of background ("you are an assistant inside RemotePower; the agent polls every 60s…") plus a one-line-per-device fleet snapshot is prepended to every AI request. The model stops giving generic Linux advice and starts giving advice that references your devices and your conventions. Project context is non-sensitive. Fleet context contains hostnames and group names by design — if you're on a cloud provider and don't want hostnames egressing, turn the fleet toggle off (project context will stay on).
Knowledge index (RAG) (v3.4.0)
Indexes your own infrastructure — device state, watched services, CVEs, containers, CMDB metadata & docs, runbooks, recent commands and alerts, plus the product docs — so the assistant answers from your fleet instead of generic knowledge. Lexical (keyword) search works with every provider, including Anthropic. Optional embeddings add semantic search when you run an embedding-capable provider. The credentials vault is never indexed.
Test retrieval
See exactly which indexed chunks a question would pull in — no model call, no tokens spent. Useful for checking coverage before you trust the assistant's answers.
Limits
Bounds to keep cloud-provider costs predictable. The token limit is per response. The daily request cap is per user — set to 0 to disable.
Test connection
Round-trip a one-word "say hi" request against the configured provider. Saves a settings round-trip before you start using the AI buttons in anger.
Prompt customization
Each AI feature uses a system prompt to set tone and constraints. Defaults are tuned for general-purpose models. Edit if your model needs different wording (e.g. DeepSeek reasoning models, smaller local models, format-specific quirks). Clearing a field reverts to the default for that feature.
Brute-force detection
Counts failed login attempts per source IP in a rolling window. Fires a webhook and shows in Needs Attention when the threshold is exceeded.
Health-score alerts
Fires health_degraded (and a health_recovered follow-up) when a device's fleet health score drops below the threshold. Edge-triggered — one alert per crossing, not every heartbeat. Set to 0 to disable.
Quiet hours
Hold non-critical webhook/email notifications during a daily window — events still land in the Alerts inbox and Recent Activity, they just don't page you overnight. Anything at or above the chosen severity always goes through. The window may cross midnight (e.g. 22:00–07:00).
After-hours activity detection
Flag selected events that fire outside business hours (a login, a new port, a command at 3am is more suspicious). Surfaces as a Needs Attention item. Applies Mon–Fri within the window below; anything outside it counts as after-hours.
On-call & escalation
An unacknowledged critical/high alert should get louder. Define escalation tiers (re-notify your webhook destinations after N minutes unacked) and an on-call rotation (the named person is included in the escalation message). Escalations re-fire the original alert through its existing channels — no extra setup.
Backup file monitoring
Agent checks the mtime of each configured path on every heartbeat. Fires backup_stale and shows in Needs Attention when a file is older than the threshold.
Log ignore patterns
Regex patterns (case-insensitive). Any log line matching a pattern is silently dropped before storage and alerting — for harmless kernel notes, vendor noise, etc.
Channel routing
For every alert kind, choose which surfaces it appears on. Needs Attention = the priority cards on Home; Recent Activity = the home feed; Alerts = the alerts inbox; Webhook = external delivery (Slack, Discord, etc.). Unchecking a column for a row silences that kind on just that surface — uncheck whole rows for a kind you never want to hear about.
Ignored items
Items you've hidden using the × button on Needs Attention cards, device rows on the Containers page, or stale container rows. Click Restore on any entry to bring it back into view.
Click Generate IaC to begin.
| Time | Actor | Device | Command |
|---|---|---|---|
| Device | Command | Scheduled for | By | |
|---|---|---|---|---|
| No scheduled jobs. | ||||
What's new — v3.13.0 — bind round four: surface every signal the agent already collects, cap overflowing boxes, plus perf & hardening
This release is a bind-it-together sweep: a lot of data the agent has always reported was stored and even served by the API, but had no home in the UI. v3.13.0 wires those signals into the device drawer and device cards, caps every panel so it scrolls instead of growing without bound, and adds a round of performance and security hardening. Hard-reload once after upgrading (cache remotepower-shell-v3.13.0).
Newly surfaced host signals
- Access — recent logins. The device drawer now lists who logged in and from which distinct source IPs (the same data the new login source alert fires off). The highest-value security signal on a host finally has a table.
- Scheduled jobs / timers. A failed-first table of every systemd timer (with what it activates and its last state), per device.
- Pools / arrays. This host's own ZFS / mdadm / btrfs storage health (state, capacity, scrub) — previously only on the fleet Storage page.
- Listening ports gain Address & Scope. The drawer Ports card now shows the bind address and a world / LAN / local badge for each socket, so you can see that
:22is bound0.0.0.0right where you're inspecting the host. - Firewall ruleset summary. The Firewall card shows the active backend, rule count and fingerprint (the drift baseline the firewall changed alert compares against).
- Brute-force lockouts & pressure pills. Active brute-force sources now show as a badge on the device card, and the drawer adds at-a-glance Disk and Swap pills.
- Named drift profiles. Reusable sets of watched config files, managed on the Drift page and assigned to a device, tag, or group — instead of editing each host's watched-file list by hand. A device's own explicit list still overrides any profile, and the drift detail explains which rule won.
- Network-mount trends. NFS/SMB/CIFS shares now appear as their own line on the Trends chart and feed disk-fill forecasting, alongside local filesystems.
- Controller backup & restore. Settings → Advanced → Backup & restore adds a full disaster-recovery backup (tar.gz of the whole data dir, including the encrypted vault) and a restore that takes a safety snapshot first — in addition to the existing redacted ZIP export.
Every box now fits
- Capped, scrolling panels. Drawer cards and large page tables (Compliance, Pools, Ports, Mounts, Containers, SMART) cap at roughly 15 rows and scroll internally instead of stretching the page. Two latent clip bugs (host-config dump, patch history) where content was cut off with no scrollbar are fixed.
Performance & hardening
- Faster loads. Version-busted static assets are now cached immutably for a year (no more per-load revalidation round-trips), scripts load deferred, and the fleet-risk computation is file-cached for 10s so
/api/homeand/api/riskshare the work. - Tighter security. Agent-supplied SCAP reports are served under a self-contained sandboxed CSP (stored-XSS can't reach an operator's session even if the upstream policy is loosened); OIDC id_tokens are now checked for expiry, issuer and audience; and the syslog audit-forwarder resolves its target once to close a DNS-rebinding window.
See docs/v3.13.0.md for the full list.
What's new — v3.12.0 — optional SQLite storage backend (switch from flat JSON in Settings → Advanced)
RemotePower has always stored its state as flat-JSON files. That's simple and robust, but every write rewrites a whole file — on a busy fleet devices.json is rewritten on every heartbeat. v3.12.0 adds an optional embedded SQLite backend (WAL mode, stdlib only — no new dependencies) you can switch to under Settings → Advanced → Storage backend. Hard-reload once after upgrading (cache remotepower-shell-v3.12.0).
What you get
- Decomposed storage. Hot, high-cardinality data (devices, alerts, command output, the history / events / metrics logs) is stored row-per-entity instead of as one big document. A device update writes a single row rather than rewriting the whole file, and appends to the event/command logs are O(1).
- In-place, reversible migration. The switch takes a rollback snapshot, migrates every file, verifies a full round-trip, and only then flips the active backend. You can switch back to JSON the same way. A Preview button shows what would move without changing anything.
- Same behaviour, same API. The backend lives behind the existing storage layer, so every feature works identically — only the on-disk representation changes. Flat JSON remains the default; nothing changes until you opt in.
- Self-maintaining. The database checkpoints its write-ahead log hourly and runs a
VACUUM+integrity_checkweekly. DB size and the last integrity verdict appear on Server Status; a failed integrity check raises a critical Database integrity check failed alert.
Quieter posture alerts
- Listening-port & firewall audit is now opt-in. The New listening port, World-exposed service and Host firewall changed alerts are noisy on Docker hosts (where
docker-proxypublishes every container port to 0.0.0.0), so they are off by default. Turn them on under Settings → Security → Listening-port & firewall audit. - Surgical mutes. Prefer to keep the audit on but silence one noisy service? Use the per-process Mute button on the Exposure page (or manage rules in Settings → Security). Muting resolves matching open alerts in the same click; turning the whole audit off clears the backlog at once.
- Discovery. When alerting is off but world-reachable services exist, the Exposure page shows a banner linking back to the toggle — you keep the visibility either way.
My Account & ticketing
- My Account. A new account menu in the top-right corner (your avatar → My Account / Sign out). The My Account page gathers your personal settings in one place — profile picture, your role & permissions, 2FA and default SSH username (both moved here from Settings → Security), and My acknowledged alerts.
- Acknowledge → ticket webhook. Each webhook destination now has an “Also fire on alert ACK” option. When you acknowledge an alert, the flagged destinations receive the full alert (id, severity, device, payload, who/when) — point a generic / GitHub-issue / PagerDuty destination at your ticket system to open a ticket the instant a human takes ownership.
CLI: tools/migrate_storage.py --to sqlite (and --to json to revert; --dry-run / --verify-only supported). On a network filesystem (NFS/CIFS) WAL is unsafe, so a rollback journal is used automatically — a local disk is strongly recommended. See docs/v3.12.0.md.
What's new — v3.11.0 — fleet posture batch: exposure map, software policy, storage health, access watch, firewall drift, timer failures, posture digest
Seven features that turn already-collected (or cheaply-collectable) agent data into first-class security and operational signals. No new daemons, no new dependencies. Hard-reload once after upgrading (cache remotepower-shell-v3.11.0).
New views & signals
- Exposure (attack surface). The agent kept each listening socket's port but discarded its bind address — the one attribute that says whether a service is reachable from the world. It now classifies a scope (
local/lan/world); a service first binding to a world-reachable address raisesport_exposed_world. New Exposure page with a World/LAN/Local filter. - Fleet Software Policy. Rules — banned / required / min version, optionally tag-scoped — evaluated against the installed-package inventory every host already pushes. Violations are tracked and
software_policy_violationfires edge-triggered. New policy editor + violations table. - Storage / RAID health. New agent probe for ZFS / mdadm / btrfs pool state, capacity and last-scrub.
storage_degraded/storage_recovered(auto-resolving) andscrub_overdue. New Storage page, degraded-first. - Access watch. Successful logins and their source IPs are collected;
login_new_sourcefires on a first-seen source address. - Host firewall drift. A stable fingerprint of the active ufw/nftables/iptables ruleset rides the heartbeat;
firewall_changedfires when it diverges from baseline. - Scheduled-job failures. Systemd timers are inventoried;
timer_failedfires when a timer's backing job fails — catching the silent dead-backup case. - Scheduled posture digest. An opt-in daily/weekly email summarising offline hosts, pending updates, critical CVEs, policy violations and degraded storage, over the existing SMTP path. "Send test now" in Settings.
All detections are edge-triggered with per-device state, so a steady-state condition doesn't re-alert each heartbeat. See docs/v3.11.0.md.
What's new — v3.10.0 — fleet-wide container restart tracking, registry-SSRF & config-leak fixes, more bound signals
A bind-it-together and security sweep on top of v3.9.0 — agent data that was collected but stuck at zero now flows through, two real SSRF / secret-disclosure gaps are closed, and a couple of alert-label bugs are fixed. No new headline features. Hard-reload once after upgrading (cache remotepower-shell-v3.10.0).
Security
- Container image-registry SSRF closed. The image-update check was the one outbound path that didn't use the connect-time SSRF guard: it followed redirects, re-resolved DNS between the pre-flight and the fetch, and fetched the registry-controlled bearer-token realm URL with no check at all (which could exfiltrate configured registry credentials). Every fetch — manifest and token realm — now routes through the SSRF-safe opener (peer-IP re-validation, redirects refused) and the realm is pre-flighted and forced to HTTPS.
GET /api/configsecret-scrub backstop. The endpoint redacted known secrets by name, which meant the AI providerapi_keyand the per-registry credentials map leaked to any viewer or read-only MCP key. A recursive pass now strips any secret-named field at any depth (keeping every*_set/*_from_envindicator), so a newly-added config secret can't leak before someone remembers to redact it.- The TCP uptime monitor and the Healthchecks.io ping picked up the same IP-class SSRF checks (and connect-time peer recheck) the HTTP paths already had — a TCP monitor can no longer be used as a blind internal port scanner. See
docs/security-review-3.10.0.md.
Bind it together
- Container restart tracking now works fleet-wide. Docker/Podman containers reported
restart_count,started_atanduptime_secondshardcoded to zero — so the container restarting alert only ever fired for Kubernetes pods and the drawer's container age was blank. The agent now fills them from a single batcheddocker inspectper heartbeat, so the alert fires everywhere and container age renders. - ClamAV last-scan time (parsed from the scan summary) and per-interface MAC addresses now show in the device drawer — both were collected/stored but never displayed.
Fixes
- The config-drift alert title read a field the drift events never send, so every one said “
? file(s)”. It now names the file that changed (file-integrity drift) or the number of sections that drifted (host-config drift). - The Devices table view showed a sort arrow on the Hostname column but never reordered (the sort key was missing); it sorts correctly now.
What's new — v3.9.0 — bind-it-together round two, alerting/patch fixes, monitor SSRF, polish
A second bind-it-together and hardening sweep on top of v3.8.0 — more dropped agent data wired into the UI, a few correctness bugs fixed in the alerting and patch-verification paths, an SSRF gap closed in the uptime monitor, and front-end polish. No new headline features. Hard-reload once after upgrading (cache remotepower-shell-v3.9.0).
Security
- HTTP uptime-monitor SSRF closed. The monitor's
http/httpscheck used a string-prefix blocklist and a bare fetch — bypassable with an IPv6 loopback ([::1]), an integer-encoded IPv4, or a hostname that rebinds to a metadata/loopback address. It now runs through the same connect-time SSRF guard as the webhook/audit/OIDC channels (the connected peer IP is re-validated, redirects refused). RFC1918 LAN targets stay allowed by design. Seedocs/security-review-3.9.0.md. - Inbound-webhook alert links are scheme-validated (
http(s)only) at ingest, matching the operator quick-links and CVE reference-links.
Fixes
- “Patched — didn't take” no longer cries wolf. The post-upgrade verification badge wrongly flagged hosts that had nothing to patch (a fleet-wide upgrade touches already-patched hosts) and offline hosts whose command was still queued. Both now read as “nothing to verify” / “pending” instead of a failure.
- A stray
returnin the metric-threshold engine could skip per-mount disk alerting for a heartbeat when CPU load was easing back through the recovery band — fixed. - TLS-expiry alerts now carry the correct severity and title (they read the wrong field, so every cert-expiry alert was labelled
high/ “expires in ?d”).
Bind it together & polish
- CPU-load history is now plotted on the Trends page (load ÷ cores, as a saturation %), and swap joins the per-device metrics sparkline — both were collected but never charted.
- rkhunter last-run time shows on the AV pill; the systemd alias a watched unit resolved to (e.g.
mysql.service→mariadb.service) shows in the Services table; livepatch state shows on the kernel pill when no patch is applied. - Three tables gained their missing sort wiring (Log Alert per-device & fleet-wide rules, Maintenance suppression log); typographic button glyphs were replaced with Lucide SVG icons; icon-only close buttons gained aria-labels.
- Command Queue: ACME actions now show in the “recently dispatched” log (they used to queue invisibly), plus Clear all pending and Clear log buttons next to the existing per-command cancel and per-device clear.
- Image Updates: a one-click Update button on stale, compose-managed rows runs
docker compose pull+up -dto fetch the new image and recreate the container. The agent now recovers the real image name whendocker psshows a bare untagged ID (right after a pull), and rows show the container name — so an image is never just an anonymoussha256:….
Older releases — v3.8.0 and earlier
Per-release notes for the five most recent versions are kept above. The complete, forensic release history — every version, newest first — lives in CHANGELOG.md at the repository root.
Enrolling devices — get a host into RemotePower
Quick (interactive): Click "Enroll device" in the dashboard. You get a 6-digit PIN, valid for 10 minutes. On the target machine: sudo remotepower-agent enroll, paste the PIN.
Automated (Ansible / cloud-init): Use the API to mint a one-time-use enrollment token (Settings → API or POST /api/enrollment-tokens), pass it via $REMOTEPOWER_ENROLL_TOKEN or /etc/remotepower/enroll-token, then run remotepower-agent enroll-token --server https://<host>. Token is consumed atomically — same one can't enroll twice. Default expiry 24h, max 7 days.
The token resolution order is: --token arg → environment variable → on-disk file (mode 600, auto-deleted on success).
Metric alerts — disk / memory / CPU thresholds
RemotePower fires metric_warning, metric_critical, and metric_recovered webhooks when a device's resource crosses its configured threshold. Defaults:
| Metric | Warning | Critical |
|---|---|---|
| Disk usage (per mount) | 80% | 90% |
| Memory | 85% | 95% |
| Swap | 20% | 50% |
| CPU load ratio (loadavg / cpu_count) | 1.5× | 3.0× |
Per-device overrides: Devices page → ⋯ menu → "Metric thresholds". The modal shows current sysinfo values for context, then warn/crit fields per metric (empty = use default). Per-mount disk overrides go in the bottom section — useful when /var fills with logs or /backup is meant to fill.
Hysteresis: a metric must drop 5 points below its warn threshold before "recovered" fires. Without this, oscillation around 80% would generate webhook spam every 60s.
Trends over time: Monitor page → Device metrics row → "Trend" button. Same chart as the per-device sysinfo modal.
Web terminal — open SSH from the browser
Click "Web terminal" on a device. Modal asks for SSH user, port, password, plus your RemotePower admin password (re-prompted every time, by design).
Architecture: a separate remotepower-webterm daemon handles the WebSocket+SSH bits because RemotePower's CGI can't hold persistent connections. nginx proxies /api/webterm/connect to it. SSH credentials live only in memory for the duration of the session — never persisted.
Setup (one-time): sudo bash packaging/install-webterm.sh. Auto-detects the CGI user (www-data, nginx, http, etc.), installs Python deps, generates the daemon ↔ CGI shared secret, prints the nginx snippet you need to add. Run with --dry-run first if you want a preview.
Recordings: Every session recorded to /var/lib/remotepower/webterm-sessions/<id>.cast in asciinema v2 format. Output-only by default (set RECORD_INPUT=1 in daemon env to capture keystrokes too — only do this if you've thought about who can read the recordings dir). Replay with asciinema play <file>.
Commands — run shell, shutdown, reboot, upgrade
Per-device dropdown menu has the common actions: shutdown, reboot, Wake-on-LAN (if device has a known MAC), agent self-update, upgrade packages, custom command.
Commands queue in cmds.json. The agent picks them up on its next heartbeat (default 60s, configurable per-device). Output comes back on the heartbeat after execution finishes — for apt upgrade that can be a few minutes.
Batch mode: tick checkboxes on multiple devices, the batch bar at the bottom of the page lets you run any of those actions across all selected devices at once.
Custom commands: "Custom command" in the menu, or save reusable ones to the Library (Admin → Library) and pick from a dropdown when you run them.
Webhooks — get notified when things happen
Settings → Webhooks. RemotePower auto-detects the format from your URL: Discord webhook URLs get embed-style messages; ntfy URLs get priority + emoji tags; everything else gets generic JSON.
Events you can subscribe to: device_offline / device_online, monitor_down / monitor_up, service_down / service_up, patch_alert, cve_found, log_alert, container_stopped / container_restarting, metric_warning / metric_critical / metric_recovered, command_queued / command_executed.
Each event can be toggled independently. The "Send test event" button fires a sample payload so you can verify your endpoint works before relying on it.
External monitors — ping/TCP/HTTP probes
Monitor page → "Add target". The server runs these probes itself, not the agents — useful for checking your ISP gateway is up, a web service is responding, or an SSH port is open.
Probes run on a schedule (monitor_interval, default 300s, minimum 60s). The schedule piggybacks on incoming CGI requests, so as long as any agent is heartbeating or anyone's browsing, monitors run on time. State transitions fire monitor_down / monitor_up webhooks.
Proxmox virtualization — managing QEMU VMs
RemotePower can connect to one Proxmox VE node and manage its guests. Configure it under Settings → Proxmox: the node's host, the node name, and a Proxmox API token (created in Proxmox under Datacenter → Permissions → API Tokens). Use a scoped token — VM.Audit and VM.PowerMgmt are enough — not a full-access token. There's a "Test connection" button to verify the setup.
Once configured, the Virtualization page lists the node's QEMU virtual machines: status, CPU and memory while running, uptime. Each guest has Start and graceful Shutdown actions. The connection is server-to-API — the RemotePower server talks to the Proxmox REST API directly; no agent runs on the Proxmox node.
This is a single-node integration. The Virtualization nav entry is always visible; if Proxmox isn't configured the page simply tells you so.
Proxmox LXC containers
LXC containers on the Proxmox node appear on the Containers page, in a section below the agent-reported Docker/Podman containers. They carry the same Start and Shutdown actions as QEMU VMs.
The LXC section only appears once Proxmox is configured (Settings → Proxmox). Like the VM list, it's fetched live from the Proxmox API each time you open the page.
Snapshots & rollback — point-in-time guest states
Each Proxmox guest — QEMU on the Virtualization page, LXC on the Containers page — has a Snapshots button. It opens a panel listing the guest's snapshots and lets you create, roll back, and delete them.
Create — give the snapshot a name (a letter followed by letters, digits or underscores) and an optional description. Snapshots are disk-only — the VM's RAM state is not captured. A typical workflow: take a snapshot named before_upgrade immediately before a risky change.
Rollback — returns the guest to a snapshot's state. This is destructive: every change made since the snapshot was taken is discarded. Because of that, RemotePower asks you to type the guest's name to confirm — a deliberate speed bump, not just an OK button. After a rollback the guest is at a crash-consistent disk state (no live memory, since snapshots are disk-only).
Delete — removes a snapshot. Irreversible, but it does not affect the running guest — only the saved point-in-time state is gone.
Snapshot operations run asynchronously on Proxmox. RemotePower sends the request and refreshes the list shortly after; a slow operation may not show on the first refresh — reopen the panel. Every snapshot action is recorded in the fleet event log.
Quick SSH from the Devices page
Set a default SSH username under Settings → Security → SSH preferences. It's stored per-user.
On the Devices page, a small SSH icon appears next to each device's hostname. Clicking it builds an ssh:// link to that device — using the device's IP if known, otherwise its hostname — with your default username, and also copies the plain ssh user@host command to your clipboard.
Whether the ssh:// link opens a terminal depends on your own machine having an ssh:// handler registered (the OS, PuTTY, a terminal emulator). If it doesn't, the copied command is the reliable path — paste it into any terminal.
Mailbox monitor — unread-message counts
A lightweight way to see a mailbox's message count without any IMAP/SMTP setup. Configure it under Settings → Mailbox monitor: pick a device, enter one or more absolute directory paths, and Save. The agent counts the regular files directly inside each directory and reports the numbers in its heartbeat — for a Maildir new/ folder that file count is the unread-message count.
Tick "Show this device's mailbox count on the dashboard" to add an "Unread mail" tile to the Home dashboard, alongside the devices / updates / drift / CVE tiles.
Counts refresh on the agent's schedule — roughly every five minutes, not instantly. The agent needs read access to the directory; if it doesn't have it, the count shows the reason (e.g. "permission denied") rather than a number. No email content is ever read — only files counted.
Scan packages now — on-demand inventory
The agent submits its full package inventory (used for CVE scanning) and the patch/upgradable count only every few hours. Right after you patch a host, use the "Scan packages now" item in the device action menu to get a fresh report sooner.
It sets a one-shot request; the device sends a fresh package list and patch count within a heartbeat or two — typically a minute or two, not instantly. The agent must be running 2.4.5 or newer to act on it.
Status endpoint — for external dashboards
RemotePower can expose a small machine-readable fleet summary at /api/status for external dashboard tools — Uptime Kuma, Homepage, a Grafana panel.
Generate a status token under Settings → Advanced → Status endpoint. The endpoint then answers at /api/status?token=YOUR_TOKEN — reachable by a polling tool, but not public. It returns a rolled-up health word (ok / warning / critical), device online/offline counts, and attention counts by severity. Rotate or disable the token any time from the same place.
Two-factor authentication (TOTP)
Settings → Security → "Enable two-factor". Scan the QR with any authenticator app (1Password, Authy, Google Authenticator, etc.). After enabling, every login asks for a 6-digit code in addition to your password.
To disable, you need to authenticate with a current TOTP code first — prevents someone with stolen session cookies from removing your second factor.
Tables: filter, sort, density
Every fleet table — Devices, Services, CVE Findings, Containers, Monitor, TLS, Patches, Audit Log, Command History, Schedule, Maintenance, plus admin tables — has a substring filter and clickable column headers.
First click sorts ascending, second descending, third clears. Hold Shift to add a secondary sort key (small superscript shows the priority).
The Devices grid has four density modes: Minimal (table layout, multi-select via checkboxes), Compact, Comfortable, Spacious. All preferences (filter, sort, density) sync per-user across browsers.
Backup & restore
All state lives in /var/lib/remotepower/ as JSON files. To back up:
sudo tar czf rp-backup-$(date +%F).tar.gz /var/lib/remotepower/
To restore: stop nginx briefly so no writes interleave, untar, restart.
v1.12.1+ keeps a rolling .bak next to each file. If a file ever ends up corrupted, load() automatically falls back to the .bak with a warning to the nginx error log. The dashboard keeps working with last-known-good data.
Troubleshooting common issues
Devices show "Offline" but agent is running: agent token doesn't match server. Check journalctl -u remotepower-agent for "Credentials rejected" — re-enroll on the device with sudo remotepower-agent enroll.
Web terminal fails with 404: nginx routes /api/webterm/connect to fcgiwrap instead of the daemon. The exact-match location = /api/webterm/connect block must come BEFORE any location ^~ /api/ in the same server block.
Web terminal fails with 502: nginx is correctly trying the daemon but the daemon isn't running. sudo systemctl status remotepower-webterm and journalctl -u remotepower-webterm.
Per-mount disk thresholds aren't taking effect: Agent must be v1.11.10+ to report per-mount data. Push an agent self-update from the toolbar.
Update history is empty for old agents: Pre-v1.11.7 had a bug where command output never reached the server. Push agent self-updates; the next upgrade will populate history.
Full reference: Manual.html.
API access — automation and integrations
Every dashboard action has an underlying API endpoint. Browse the full schema at /swagger.html (also linked from the sidebar as "API Reference").
Auth: two methods. Session tokens (login flow, time-limited) for short-lived scripts. Named API keys (Admin → API Keys → "New key") for CI / cron / Ansible — these don't expire. Pass via X-Token: <token> header.
Common patterns:
# List devices
curl -H "X-Token: $T" https://remote.example.com/api/devices | jq
# Reboot a specific device
curl -X POST -H "X-Token: $T" -H "Content-Type: application/json" \
-d '{"device_id":"abc123"}' \
https://remote.example.com/api/reboot
# Set per-device thresholds
curl -X PATCH -H "X-Token: $T" -H "Content-Type: application/json" \
-d '{"mem_warn_percent": 70, "disk_per_mount": {"/var": {"warn":70,"crit":85}}}' \
https://remote.example.com/api/devices/abc123/metric-thresholds
Devices page — your fleet, at a glance
Home page. Every enrolled device shown as a card or table row, depending on density. Each card shows: status (online/offline), name, OS icon, last-seen time, group badge, tag pills, key sysinfo (CPU/RAM/disk sparklines if metrics are flowing), pending updates count, open CVEs count.
Density modes: top-right toggle. Minimal = table, one row per device, sortable, multi-select. Compact = small cards. Comfortable = default. Spacious = roomy cards with bigger metric blocks.
Per-device dropdown (⋯ button) covers the common actions: reboot, shutdown, custom command, agent update, upgrade packages, web terminal, metrics chart, metric thresholds, edit notes, edit tags, change group, delete.
Batch actions: tick checkboxes on multiple devices, the batch bar at the bottom lets you run any of the above across the selection.
Filter & sort: top-bar substring filter matches name, hostname, group, tags, IP, OS. Click any column header (in minimal mode) to sort; Shift-click for secondary sort. Filter and sort persist per user.
CMDB page — asset metadata + credentials vault
Per-device structured metadata that the agent doesn't auto-discover: asset ID, server function (web / db / nas / firewall…), hypervisor URL, SSH port, free-form Markdown documents.
Multiple documents per asset (v2.0): attach as many titled Markdown docs as you want — runbook, hardware spec, change log, vendor contacts. Each doc has its own title, body, and timestamp. Docs render as expandable cards in the asset view.
Credentials vault: store SSH passwords, BMC credentials, vendor portal logins encrypted at rest. Encryption is AES-GCM with PBKDF2-SHA256-derived keys; the passphrase is shared across admins (set once at vault setup). Reveal events are audit-logged. Each credential row exposes an ssh:// link button if the asset has an SSH user/port set.
Server functions list: editable in Settings, used as autocomplete suggestions when filling the function field on an asset.
Containers page — what's running where
All containers across your fleet, in one table. Auto-detected from each agent: Docker (via socket), Podman (rootless or root), Kubernetes pods (via kubelet, where applicable). Read-only — RemotePower doesn't manipulate containers, just observes them.
Per-row info: device, container name, image, status, restart count, ports, namespace (k8s). Filter by name, image, device, or namespace.
Alerts: three webhook events fire automatically — container_stopped (unexpected stop), container_restarting (restart count climbed), containers_stale (device hasn't reported container info recently — usually means the agent's container detection isn't working).
Network page — topology graph
Manual topology map drawn from each device's connected_to field. RemotePower doesn't auto-discover topology — set the upstream switch / router for each device, and the graph renders.
Switches, APs, and other agentless devices live as full-fledged records in the device list (with their own CMDB / vault / SSH-link as agented devices). Use Devices → Add agentless device.
The graph is positional — drag nodes around, the layout persists. Useful for finding the "who's downstream of this switch" answer at a glance.
Monitoring page — probes, metrics, ports, scripts, services, logs
The Monitoring sidebar group has six items that each deep-link to a section of this page and smooth-scroll to it on arrival:
- Targets — ICMP ping, TCP port, HTTP HEAD probes the server runs against external targets. State transitions fire
monitor_down/monitor_upwebhooks. - Device Metrics (v1.12.0) — fleet-wide memory / swap / CPU / disk view, color-coded by alert level. Filter by name, group, mount path.
- Listening Ports (v2.8.1) — all open ports across monitored devices, grouped by port number with process name and device list. Live filter by port, process, or device.
- Custom Scripts — fleet-wide results table for your bash health-check scripts. Filter by name or status.
- Services — systemd unit health matrix for watched services.
- Logs — fleet log tail with regex pattern-alert rules.
Trend button (v2.0): next to each device in Device Metrics, opens the time-series chart modal for that device.
TLS / DNS page — certificate & record watchlist
Server-side probes that warn before things expire. Add a hostname (or hostname:port for non-443) and the type — TLS for cert expiry, DNS for record-set health. Probes run on schedule alongside the monitor probes.
TLS: connects, fetches the leaf cert, reports expiry days remaining + issuer + SAN list. Alerts at <30 days, critical at <7. Self-signed and otherwise-invalid certs flagged with reason.
DNS: resolves A/AAAA/MX/TXT (whichever exist), reports the TTL of the soonest-expiring record. Useful for catching domain registrar issues before the domain itself lapses.
Both fire alerts via the existing webhook system — no separate config needed.
Patches page — pending updates across the fleet
Aggregates pending-update counts from every agent. Linux agents run apt list --upgradable / dnf check-update / pacman -Qu / apk list --upgradable on a schedule (default every 3 hours) and report the count + package list back.
Per-device row: total upgradable, security-only count (where the package manager exposes that), oldest pending update age. Click into a device for the full list.
Patch alert webhook: Settings → Webhooks → Patch alert. Configurable threshold; fires patch_alert when a device has more than N pending updates.
Run upgrade: per-device dropdown → "Upgrade packages" runs the appropriate package manager non-interactively and returns the output as Update history. Batch select to upgrade many at once.
Custom Scripts — your own bash health checks on devices
Define arbitrary bash scripts server-side and assign them to any set of enrolled devices. The agent runs each assigned script every 5 minutes with a 30-second timeout. Exit code 0 = OK; anything else = FAIL.
Create a script: Custom Scripts → New script. Paste a body directly, or type a description and click AI Generate to have the AI draft one. Assign the script to one or more devices using the device picker. Save.
Results: The Custom Scripts page shows a fleet-wide table — script name, device, status badge, last output snippet, when it last ran, and how long it took. Click any output snippet to see the full stdout/stderr.
Alerts: Status changes fire webhooks — custom_script_fail when a script flips from OK to FAIL (includes the first line of output), custom_script_recover when it returns to OK. Both are edge-triggered: they fire once on the transition, not on every failing run.
Script execution: Scripts run as the agent user (root on most setups), with stdout and stderr merged, capped at 4 KB. The script body is written to a private temp file (chmod 700), executed by /bin/bash, then deleted. Scripts are pushed from the server — no SSH needed.
Examples: check a web endpoint returns 200, verify a backup file was recently modified, test a database port is accepting connections, confirm a cron job's sentinel file exists.
Host Configuration — declare and enforce host state from the server
Define the desired state of each Linux host server-side. The agent applies it on the next heartbeat (~60 s) and reports current state every 15 minutes so the server can detect drift.
Open the editor: Devices page → device dropdown → Host Config. A modal opens with one tab per section.
Sections managed:
- Repos — full content of
/etc/apt/sources.listor/etc/yum.repos.d/remotepower.repo - Netplan — written to
/etc/netplan/01-remotepower.yaml, thennetplan apply - nmcli — connection file at
/etc/NetworkManager/system-connections/remotepower-managed.nmconnection - resolv.conf — DNS resolver config; resolves symlinks on systemd-resolved hosts
- /etc/hosts — static host entries
- Services — list of systemd units that should be enabled; agent runs
systemctl enable --now - Users — ensure local users exist with correct shell, groups, and SSH authorized_keys (no passwords)
- Groups — ensure local groups exist
- Sudoers — written to
/etc/sudoers.d/remotepower, validated withvisudo -cbefore applying - MOTD — login banner written to
/etc/motd
Fetch current: Each tab has a "⬇ Fetch current" button that loads the live state last reported by the agent. Use it to pre-fill a section before editing, or to inspect what's actually running.
Drift detection: The agent reports current state every 15 minutes. The server compares it against the desired config. If any section diverges, an amber banner appears on the modal and a config_drift webhook fires — once on first detection, not on every heartbeat. Drift is audit-only; the agent does not auto-remediate.
Security: Only admins can write host config. Sudoers content is syntax-checked before writing. authorized_keys are written with mode 0600 and correct ownership. No passwords are ever stored — SSH keys only.
CVEs page — known vulnerabilities in installed packages
Cross-references each agent's installed-package list against OSV.dev on a schedule (default daily). Findings are severity-ranked and grouped by device.
Each finding shows: CVE ID, severity (CRITICAL / HIGH / MEDIUM / LOW), affected package + installed version, fixed version (if any), summary, references. Click for the full OSV record.
Ignore list: known false positives or accepted risk can be ignored per CVE-ID + per-package combo. Ignored findings disappear from the active view but stay in audit history. The "Show ignored" toggle reveals them.
cve_found webhook: fires when new CVEs appear that aren't on the ignore list. Useful for plugging into the team's incident channel.
Services page — systemd unit health matrix
Per-device watched-services view. Define which units to watch in Settings → Service watch (or via API), and the agent reports their state on every heartbeat: active, inactive, failed, activating, etc.
Matrix view: rows are devices, columns are services; cell color reflects state. Filter by service name, device, or group; sort by failure count.
Webhooks: service_down when a watched unit transitions to inactive/failed; service_up on recovery. Maintenance windows suppress these without losing the audit trail.
Logs page — log tail across the fleet with pattern matching
Per-device journalctl output for watched units. Agents submit log lines on every heartbeat; server keeps a rolling 6-hour buffer.
Search: regex over the buffered logs. Filter by device, unit, time range. Useful for "did this error happen anywhere in the last hour?"
Pattern alerts: Settings → Log rules. Define regex patterns; matches fire log_alert webhook with the matched line, device, and unit. Useful for kernel oops, OOM kills, fail2ban bans, anything you want a notification on.
Per-device or fleet-global rules: rules can be scoped to a device (only matches there fire) or fleet-global (matches on any device fire).
Schedule page — one-shot & recurring commands
Schedule reboots, shutdowns, package upgrades, or arbitrary commands. Two modes:
One-shot: pick a date+time, RemotePower fires the command once then discards the job.
Recurring: choose a friendly preset (every hour, every N hours, daily at HH:MM, weekly on day+time, monthly on day+time) or enter a custom 5-field cron expression. DOW follows cron convention (0 = Sunday).
Run a script: select "Run script…" in the command picker to choose any saved script from the Scripts library. The body is resolved at fire time so editing the script updates all pending recurring jobs.
Per-device, per-group, or fleet-global. Two optional checkboxes on create:
- Maintenance window: auto-checked for disruptive commands (reboot, upgrade); suppresses all suppressible alerts for 1 hour around the scheduled time.
- Add to calendar: creates a calendar entry at the next computed occurrence. For recurring schedules (daily, weekly, monthly), the calendar event inherits the same recurrence so all future runs appear in the calendar automatically.
Calendar page — visual timeline of scheduled events
Month-view calendar of upcoming scheduled commands and maintenance windows. Useful for "what's planned this week?" and avoiding accidentally double-booking maintenance.
Create events: click any day cell to create a one-off event. The event modal has a Recurrence dropdown: None, Daily, Weekly, Monthly, or Yearly. Recurring events are stored once on the server and expanded into occurrences within the displayed window (cap: 500). They appear with a refresh icon in the grid.
Delete recurring events: opening a recurring event shows a "Delete all occurrences" confirmation so a single click removes the entire series.
Schedule integration: the "Add to calendar" checkbox on the Schedule form creates a recurring event automatically when the schedule itself is recurring — daily/weekly/monthly schedules produce matching calendar recurrence.
Tasks page — operational checklist
Free-form per-fleet task list. Write down what you need to do, check items off, leave notes. Doesn't drive any automation — just a place to keep track of "rebuild X next maintenance window" or "investigate why Y is restarting".
Tasks have title, description, optional due date, status (open / in-progress / done). Filter by status; sort by due date.
Maintenance page — alert-suppression windows
Schedule windows during which webhook alerts are suppressed for specific devices, groups, or the whole fleet. Useful for planned downtime where you don't want to be paged for an expected outage.
Two modes: one-shot (start time + end time) or recurring (cron pattern + duration). Scope: device, group, or all.
Suppressed alerts are still logged — you can see exactly what would have fired during the window via the audit log. So if something unexpected happens during maintenance, you can find it after the fact.
History page — command execution log
Every command run via RemotePower, with actor (which user), target device, command, exit code, output preview, timestamp.
Used together with the Audit log (which covers admin actions like creating users, changing settings) to reconstruct what happened during an incident.
Rolling buffer of last 200 commands. For longer-term retention, configure log forwarding to your central logging system.
Settings page — server-wide configuration
Tabs:
- Account — change password, enable/disable TOTP 2FA, view active sessions, revoke sessions.
- Webhooks — add destination URLs (Discord, ntfy, Slack, generic JSON), per-event toggles, "send test event" buttons.
- SMTP — outbound email settings for the digest endpoint and password resets.
- LDAP — bind URL, search base, attribute mappings; "Test connection" button.
- Service watch — list of systemd units the agents should monitor, applied to all devices unless overridden.
- Log rules — regex patterns for log_alert events.
- Server functions — autocomplete list for CMDB asset function field.
- Backup — one-click ZIP download of all state JSON files.
Users page (Admin) — local accounts
Manage local user accounts. Each user has a username, bcrypt password, role (admin or viewer), optional TOTP secret, optional email for password reset.
Roles: admin can do everything. Viewer is read-only — can browse all pages but every write endpoint returns 403.
LDAP users auto-create on first successful bind; their role defaults to viewer until an admin promotes them.
API Keys page (Admin) — non-expiring tokens for automation
Named API keys that don't expire (unlike session tokens). Use for CI scripts, cron jobs, Ansible, Grafana scraping /api/metrics.
Keys are bound to a user account — when you create a key, it inherits your role. To make a "ci-bot" key with limited access, create a viewer-role user first and create the key while logged in as that user.
Pass via X-Token: <key> header (same as session tokens). Keys can be revoked instantly from this page; revoked keys 401 immediately.
Library page (Admin) — saved command snippets
Reusable shell-command bookmarks. Save common diagnostics ("show disk usage by mount"), restart routines ("kick the failing service"), or chained operations as named entries.
When running a custom command on a device, the modal has a dropdown that auto-populates from this library — no need to remember the exact find incantation.
Audit page (Security) — admin event log
Every administrative action: user created/modified/deleted, password changed, settings changed, API key minted/revoked, enrollment token created, vault unlocked, credentials revealed.
Each entry: timestamp, actor (user), action, detail string, source IP. Filter by actor, action, or substring; sort by any column.
For command execution, see the History page (separate). The Audit log is for "who changed what configuration" rather than "who ran what command". Now in the Security sidebar group alongside TLS/DNS, Patches, CVEs, and Drift.
Links page — shared bookmark dashboard
Card grid grouped by category. Click any card to open the link in a new tab. Group links by category — useful for "Monitoring", "Infrastructure", "Vendor portals", etc.
Scope: mark a link as Internal (LAN-only, behind VPN, etc.) or External. Internal links get an amber dashed border; external links get an accent (blue) solid border — so it's visually obvious at a glance which links require VPN.
Dashboard widget: once at least one link is saved, a "Quick links" card automatically appears on the Home dashboard showing all links as a compact grid. Click "Manage →" to jump to the full Links page.
Display-only — RemotePower doesn't proxy or health-check these URLs.
Navigation: Links is a standalone top-level item in the sidebar, directly below Home — accessible to all users regardless of role.
Scripts — the multi-line script library (v2.1+)
Like the command library but for full bash scripts — multi-line, with `set -euo pipefail`, error handling, the works. Stored in scripts.json; created and edited via the Scripts page.
Save flow: paste a script → editor runs bash -n (syntax check) and a regex-based dangerous-pattern detector (rm -rf /, fork bomb, dd if=/dev/zero of=/dev/sda, curl|bash, chmod 777 /, etc.). Both must pass before save.
Run on a device: Devices page → ⋯ menu → "Run script…", pick from the library, confirm. Output comes back on the next heartbeat after execution finishes.
Batch run: tick checkboxes on multiple devices, batch bar → "Run script", pick from the library. The batch tracker shows per-device progress; output collects under the batch job ID for ~1h.
Tagged runs: instead of ticking devices, target a tag or group via the API: POST /api/exec/batch with tag, group, or device_ids.
Full reference: docs/scripts.md.
AI assistant — five providers, AI buttons across the dashboard (v2.1.3+)
Disabled by default. Enable in Settings → AI assistant. Five providers: Anthropic (Claude), OpenAI (ChatGPT), DeepSeek, Ollama (local), LocalAI (local). Pure stdlib — no extra pip deps. Pick a local provider (Ollama / LocalAI) if you don't want data leaving the building.
Where the AI buttons appear:
- Device dropdown (⋯ menu) → Investigate, Generate runbook.
- Device detail modal → Explain on each command output, Find the problem on the journal panel.
- Services page → service detail → Diagnose on failed/inactive units.
- TLS page → table row → Triage on warning/critical/error certs.
- Patches page → table row → Prioritise on devices with pending updates.
- CVE Findings → row → Triage.
- Scripts editor → Generate from prompt / Explain / Audit for risks.
- Notifications → webhook log row → Explain.
- Help → AI Assistant → standalone chat with model picker and Ollama server stats.
Privacy: hostnames and IPs are redacted before leaving the building unless you toggle "Send hostnames" / "Send IP addresses" in Settings. Bearer tokens, AWS keys, and long hex strings are always redacted regardless. AI-generated scripts go through the same dry-run + dangerous-pattern detection as anything else — no AI-trusted bypass.
Slow local models (smallthinker, qwq, deepseek-r1) routinely take 60–180 s per response. The HTTP timeout is 5 min on the Python side, but nginx's fastcgi_read_timeout defaults to 60 s and will cut you off first — add a location /api/ai/ block with fastcgi_read_timeout 300s;. See docs/ai.md for the snippet and the full reference.
Rate limits: per-user per-day cap in Settings (default 100; 0 = unlimited). Per-button max_tokens tuned client-side so a short Explain doesn't sit waiting for 4000 tokens to generate.
Device runbooks — AI-generated, per device (v2.1.7)
What it is: a structured operations document for each device, generated by the AI from the device's current state (sysinfo, watched services, containers, recent commands, journal, CVE findings, patch status). Saved per-device; regenerable any time.
How to generate: Devices page → ⋯ menu → Inspect → Generate runbook. The AI gets the live snapshot of the device plus the fleet context (so it can say "this is your mail server" rather than just "this is a Linux box"). Takes 15–90 s depending on provider and model.
What's in it: sections for purpose / role, installed stack, running services, listening ports, scheduled jobs, recent activity, "things to know," and known risks. Format is Markdown — rendered inline on the device detail modal under a Runbook section.
When it's worth re-running: after a major change to the device, before handover, when documenting a system for a colleague, after the OS upgrade. The data is current as of when it ran — the timestamp is shown alongside the runbook.
Privacy: runbooks contain hostnames and IPs by design (a redacted runbook is useless). The redaction toggles in Settings → AI assistant apply, but you'll want "Send hostnames" + "Send IP addresses" on for runbook generation specifically. Storing hostnames in a runbook that lives in your own JSON file isn't a privacy concern; what matters is whether they cross over to a cloud provider on generate.
Configuration drift detection — per-device file integrity monitoring (v2.2.0)
What it is: the agent computes SHA-256 hashes of a list of watched config files every few heartbeats and ships them in the heartbeat payload. The server stores baselines (first-seen hash per file per device) and fires a drift_detected webhook when a current hash diverges from the baseline.
Hash-only by design. The contents of /etc/sudoers, /etc/ssh/sshd_config, etc. never cross the wire on routine polling. To see what actually changed, an operator action queues a cat command through the existing exec mechanism (subject to the usual audit + permission checks).
Where to find it: sidebar → Security → Drift. Fleet-wide table shows devices sorted with drifted ones at the top; click Detail for per-file status, drift count, history of the last 20 changes, and the Accept as baseline button.
Default watched list: /etc/ssh/sshd_config, /etc/sudoers, /etc/fstab, /etc/crontab, /etc/hosts, /etc/resolv.conf, /etc/nsswitch.conf, /etc/pam.d/sshd. Customisable globally (cfg['drift']['default_watched_files']) and per-device (device.watched_files).
Webhook: drift_detected fires once per change (debounced — not on every poll that reports the same new hash). Route it to a channel you check, especially for after-hours alerts.
Compliance: covers configuration management controls in SOC 2 (CC6.1, CC6.6), ISO 27001 (A.12.4.3, A.14.2.4), HIPAA (164.312(c)), PCI DSS (11.5), FedRAMP. Baseline-acceptance audit-log entries are designed to be readable as evidence.
Agent version: requires v2.2.0+ for the agent-side hash reporting. Older agents work normally otherwise but show "no drift data" on the Drift page.
Full reference: docs/drift.md.
MCP server — natural-language fleet queries from Claude Desktop, Cursor, VS Code (v2.2.0)
What it is: RemotePower ships an MCP (Model Context Protocol) server at mcp/remotepower-mcp.py. Connect it to an MCP-compatible AI host and you can ask things like "Which devices have pending security updates?" or "Show me the journal for mail01 from the last hour" in plain English.
Architecture: the MCP server runs on the operator's laptop, not on the RemotePower server. The AI host (Claude Desktop, Cursor, etc.) spawns it as a stdio subprocess. The MCP server makes HTTPS calls to RemotePower's REST API on behalf of the AI using an API token you provision. Credentials live in the host's config file and never reach the AI provider.
Read-only, by design. 12 tools: list_devices, get_device, get_journal, get_services, get_containers, get_cves, get_drift, get_recent_commands, get_runbook, get_patches, get_tls, search_devices. No write tools. No run_command, no run_script, no reboot_device. The test suite asserts no write-shaped names slipped in. Write tools land in a future release once a server-side allow-list, per-MCP-token role, and confirmation flags are in place — not before.
Setup (Claude Desktop):
- Settings → API keys → Generate a viewer token; copy the
rpk_...value. - Copy
remotepower-mcp.pyto your laptop (scpworks). - Edit
~/Library/Application Support/Claude/claude_desktop_config.json(macOS) or%APPDATA%\Claude\claude_desktop_config.json(Windows) — add anmcpServers.remotepowerentry with the command, args, and env (URL + token). - Restart Claude Desktop. The tool indicator should show remotepower.
Device-name resolution: exact → prefix → substring → ambiguity error. You can pass web01 and the server finds web01.example.com automatically (as long as there's exactly one match).
Pure stdlib Python. No pip install. Single 470-line file.
Full reference: docs/mcp.md.
Notification setup — getting alerts to land where you'll see them
RemotePower has webhook (Discord / Slack / ntfy / generic JSON) and SMTP email channels. Pick whichever you actually check.
Recommended baseline (most operators want):
- One webhook for page-me-now events:
device_offline,service_down,monitor_down,metric_critical,cve_found. Point at ntfy / Pushover / your phone. - One webhook for FYI-during-business-hours events:
metric_warning,patch_alert,log_alert,container_stopped. Point at a Discord/Slack channel. - Email digest for the slow stuff (weekly CVE summary, patches pending) once
cron_email_digestis enabled.
Per-event toggles: each webhook URL has independent toggles for every event type. The "Send test event" button fires a sample payload so you can verify the integration works before relying on it.
Maintenance windows suppress alerts during planned downtime. Suppressed alerts are still logged — you can see what would have fired in the audit log. So if something unexpected happens during maintenance, you can find it after.
Explain on alerts: each webhook log row has an "Explain" button that asks the AI to rewrite the raw alert into a single short paragraph ("postfix on mail01 stopped at 14:32 after matching connection refused 3 times — likely upstream MX is unreachable; check DNS and try a manual delivery"). Useful during on-call when you're seeing a wall of cryptic alerts at 3am.
Device Drawer — click any device to open
Clicking a device name (Devices page or Dashboard fleet roster) opens a full-screen drawer with two tabs:
Actions & Settings tab — all actions for that host in one grid: Run command · Reboot · Shut down · Wake on LAN · Upgrade packages · Scan packages · Web terminal · Run script · Update agent · Docker compose · Host config · AI Investigate · CMDB · Runbook · Maintenance · Adjust poll · Remove device. Below the grid: inline settings form (group, tags, monitored toggle, poll interval, watched services, log rules, drift files, command allowlist) with a Save button.
Audit tab — 11 collapsible sections, all lazy-loaded on first open: System info (uptime, IPs, mounts, AI "Find the problem") · Listening ports (searchable) · Packages · Logs (unit selector) · Command history (last 5, expand per entry) · Fleet events · Drift state · CVE summary · Containers · Metrics · Host config.
The ⋮ button on the Devices page also opens the drawer directly on the Actions & Settings tab.
Settings → Dashboard — personalise the dashboard
Settings → Dashboard tab controls what appears on the home dashboard for all users:
Brute-force detection — enable/disable, set threshold (default 20 failed attempts from same IP in a 5-minute window), and window length.
Backup file monitoring — add file paths and max-age thresholds. The agent checks mtime every heartbeat; backup_stale webhook fires edge-triggered and a Needs Attention item appears.
Log ignore patterns — regex patterns (case-insensitive) that silently drop matching log lines before storage and alerting. Useful for noisy kernel notes like Note: After setting.*old kernels.
Needs Attention kinds — one toggle per alert type. Toggling a kind off hides it from both the Needs Attention panel and the Recent Activity feed simultaneously.
Recent Activity events — toggle the remaining informational events (device online, command queued/executed) independently.
Listening Ports — Monitor page + per-device Audit
Fleet-wide view: Monitor page → Listening Ports section. Shows all open ports across all online monitored devices, grouped by port number, with process name and which devices are listening. Shows 10 rows by default; expand button for the rest.
Per-device view: Device drawer → Audit tab → Listening Ports section. Searchable table filtered by port number or process name.
Port data is collected by the agent as part of sysinfo (every ~10 minutes) and persisted in devices.json. New ports trigger a new_port_detected webhook compared against the port baseline.
CVE Findings — package vulnerability scanning
RemotePower checks installed packages against OSV.dev. The dashboard tile always shows Critical CVEs first with the count, and high/med/low counts in the subtitle.
Scan all devices — runs OSV lookups for every device that has submitted a package list. Can take a minute for large fleets. Button shows progress and result.
Send list (per-device) — asks the agent to send its full installed package list on the next heartbeat (~60 seconds). The CVE scanner then runs automatically. Use this after installing new packages before the agent's next scheduled scan.
Scan (per-device) — re-runs the OSV lookup using the already-stored package list. No need to wait for the agent.
The dashboard tile shows 0 for critical even when there are only high/medium/low findings — click it to see the full breakdown per device.
IaC Generator — generate Infrastructure-as-Code for a device
The IaC Generator page (left nav) produces Terraform, Ansible, Pulumi, or Cloud-init code that describes a managed device's current state. Useful for codifying a host you've configured by hand, or for migrating a host to a new platform.
Flow:
- Select a device, the categories of state you care about, and an output format.
- Click Generate IaC (full LLM flow) or Gather RAW JSON (collect only, download masked state).
- On the agent's next heartbeat (~60 seconds), it collects the requested categories and sends raw state back.
- The server calls your configured AI provider (Settings → AI) with the raw state, and returns the resulting code.
Categories (18): OS & identity, installed packages, systemd services (enabled), local users (uid≥1000), groups (gid≥1000), SSH authorized_keys (LLM converts to variables), network configuration, mounts (fstab), containers (Docker/Podman), custom repos, firewall, cron jobs (incl. RemotePower scheduled), TLS certificates (paths only), system environment (non-default), snaps (Ubuntu), kernel modules (persistent), sysctl parameters (non-default), RemotePower-specific (tags, group, custom scripts, host-config desired state).
Output formats: Terraform (HCL), Ansible (YAML), Pulumi (Python), Pulumi (TypeScript), Cloud-init (YAML).
Output tabs: Generated Code | AI Conversation. The Conversation tab shows the full system+user prompt and the raw LLM response, so you can see exactly what the model produced before fence-stripping.
Re-run AI / JSON download: After a successful generation, the Re-run AI button re-prompts the LLM using the already-collected data (no second agent wait). The { } JSON button downloads the masked state as a JSON file.
Extra instructions: A textarea below the format dropdown lets you append custom guidance ("use Terraform 1.5+ syntax", "wrap in a reusable module"). Persisted in localStorage.
Reasoning-model safety net: The prompt instructs the LLM to wrap output between <<<BEGIN_IAC>>> and <<<END_IAC>>> markers. The stripper extracts only what's between them — so DeepSeek-R1-style reasoning prose gets discarded automatically.
Security: Before the payload leaves your server for the LLM, env vars whose name matches PASSWORD|SECRET|TOKEN|KEY|PASS|AUTH|CRED|PRIVATE are masked with <REDACTED_BY_REMOTEPOWER>. TLS certificate paths are sent but never the certificate contents. SSH authorized_keys are sent raw — the LLM is instructed to convert them to variables.
Endpoints: POST /api/iac/request · GET /api/iac/status/<id> · POST /api/iac/generate · GET /api/iac/payload/<id> (v3.0.0)
Per-item ignore lists (v3.0.1) — × button on Needs Attention and stale containers
Each Needs Attention card and each stale Containers row now has a × button that hides that specific entry from view. Ignores are per-item (one specific CVE on one specific device, not the whole category).
Restore: Settings → AI Assistant → Ignored items lists every hidden entry with a Restore button. Three categories are tracked separately: Needs Attention, Stale containers, Devices.
Stability: Needs Attention items are keyed by a stable SHA1 of kind+device+summary, so the same alert firing in a later poll stays hidden until you restore it. New variants (e.g. the same CVE but a new package version) get a fresh key and re-appear.
Storage: /var/lib/remotepower/ignored_items.json
Endpoints: GET /api/ignored · POST /api/ignored · POST /api/ignored/remove
Log ingestion (v3.0.1) — embedded timestamps + content dedupe
The Logs page ingestion path was rearchitected to handle re-submitted file-based logs (apt.history, nginx access, syslog) correctly.
Previous bug: agents resubmit log files on every poll. The server used to stamp every line with now, so apt.history entries from days ago appeared as "new" and the 6-hour TTL never evicted them. Worse, the resulting bloat pushed nginx and brute-force lines past the 2 MB byte cap (silently dropped).
Fix (v3.0.1):
- Each incoming line is hashed (
sha1[:16]) and compared against the buffer. Duplicates are dropped at ingestion. - The line's own embedded timestamp is used when present. Supported formats: apt.history
Start-Date: YYYY-MM-DD HH:MM:SS, nginx access[DD/MMM/YYYY:HH:MM:SS +TZ], syslogMon DD HH:MM:SS, ISOYYYY-MM-DDTHH:MM:SS. - The 2 MB byte cap was removed. With dedupe + TTL, the buffer can't grow unboundedly on idle units anymore.
Impact: If you upgraded from a buffer that had 11 000+ apt.history lines, give it one full TTL window (6 hours) for old entries to drop out. Nginx and brute-force events should re-appear immediately after restart since they're no longer being crowded out.
Logwatch severity (v3.0.1) — OK / WARN / CRIT classification
Log alert rules (per-device and fleet-wide) carry a severity field — OK, WARN, or CRIT — modelled after CheckMK's classification.
Semantics:
- WARN (default) — fires the
log_alertwebhook withseverity: "WARN". - CRIT — same as WARN but with
severity: "CRIT"in the payload, so your alerting routes accordingly. - OK — silent: the rule fires (the match is recorded internally) but no webhook is sent. Use OK rules as noise-suppressors that confirm an expected pattern is still present, without alert spam.
Set the severity when creating or editing a rule in the Logs page rule modal.
AI prompt customization & fine-tuning (v3.0.1) — per-feature settings
Each AI feature (IaC Generator, Investigate, Diagnose Service, Runbook, etc.) uses a system prompt to set tone and constraints. Defaults are tuned for general-purpose models but may need adjusting per-model — a DeepSeek-R1 prompt doesn't necessarily fit a small local Llama.
Where: Settings → AI Assistant → Prompt customization. One card per AI feature (15 cards total).
Per card:
- System prompt textarea — edit; click Save prompt to persist; clear and save to revert to default.
- ● customized badge appears when overridden.
- Default button restores the hardcoded default for that feature.
- Fine-tuning (collapsible) — per-call temperature (0.0–2.0), top_p (0.0–1.0), max_tokens (1–16000), num_ctx (512–131072, Ollama/LocalAI only). Empty fields fall back to provider defaults.
- ● tuned badge appears when any fine-tuning value is set.
Storage: config.json under ai_prompt_overrides and ai_param_overrides.
Endpoints: GET /api/ai/prompts · POST /api/ai/prompts · GET /api/ai/params · POST /api/ai/params
Force-upgrade agent (v3.0.1) — re-deploy regardless of version
The Device Drawer's Force-upgrade button pushes the currently-bundled agent binary to a device, even if the agent already reports the same version.
Use cases: recovery after a corrupt or truncated update, pushing a rebuilt binary at the same version, testing the self-update path.
How it works: the server sets force_agent_upgrade: true on the device record. The next heartbeat (within online_ttl seconds) delivers the flag and the server clears it. The agent calls check_for_update(force=True), which skips the version comparison and re-downloads + replaces its binary. Systemd then respawns the agent with the new binary.
Endpoint: POST /api/devices/<id>/agent/force-upgrade (admin only)
Confirmed in the audit log as agent_force_upgrade.
Update banner snooze 30d (v3.0.1)
The "RemotePower v<X> is available" banner has a Snooze 30d button. Click it to hide the banner for that specific version for 30 days.
Snooze is per-version, so a newer release re-shows the banner. State lives in localStorage under rp_version_snooze_<version>.
Custom monitoring scripts (v2.6.0)
Custom scripts let you ship arbitrary monitoring logic to selected devices, run on a schedule, and surface output in the UI.
Library: Scripts page → New script. Each script has a name, description, runtime (bash or python3), assignment selector (devices/tags/groups), schedule (interval in minutes), and timeout. Server pushes the body and metadata in the heartbeat response; agent stores them locally and runs them on schedule.
Output: per-device → Scripts page → Click a device row, or via the Device Drawer → Custom scripts chip. Each run records rc, output (capped at 8 KB), and ts.
Security: Scripts are pushed by the server, signed implicitly via the device token, and run as the agent user (root by default). Treat the script library as you would a CI/CD pipeline.
Host configuration management (v2.6.0)
Per-device Host Configuration captures a desired state for a small set of system attributes; the agent reconciles on every poll.
Where: Device Drawer → Host config.
Tabs:
- Packages / Sysctl / Units / Modules — structured fields. Each reconciled by the agent on every poll.
- Logrotate — free-text content for
/etc/logrotate.d/remotepower. The agent writes this file verbatim; useful for adding custom log rotation without SSH. - Cron (root) — free-text content for root's crontab (
crontab -u root). The agent applies via a temp file on each poll; useful for device-specific jobs you want managed centrally.
Reconciliation: agent reports the current state alongside the desired state. A diff is visible in the UI. Drift events are surfaced under the device's Drift tab.
Files: server keeps host_config_current/<dev_id>.json for each device's reported state.
Debug logging (v2.7.0+) — client-side observability
The UI ships an in-page debug logger that captures api() calls, toast() events, click handlers, and other instrumented events.
Enable: Settings → Advanced → Enable debug logging. State persists in localStorage under rp_debug, so a fresh page load picks it up immediately.
Output: debounced (250 ms) flush to the server, surfaced in the audit log and (when enabled) a developer console panel.
Use this when something looks broken but the server logs are quiet. Most "the button doesn't fire" bugs show up clearly here.
Security hardening (v2.3.2 / v3.0.0)
Several security improvements accumulated through the 2.x and 3.0 line:
- Password storage: PBKDF2-HMAC-SHA256 at 600 000 iterations with per-user salt. The previous bare-SHA-256 fallback was removed.
- Default admin: the first
admin/adminaccount hasmust_change_passwordset. A persistent banner in the UI nags until the password is changed. - Backup export: the
/api/backupexport now redacts Proxmox tokens, SMTP passwords, LDAP bind passwords, and similar secrets. The backup is otherwise full-fidelity. - IaC payload masking: env vars whose name matches
PASSWORD|SECRET|TOKEN|KEY|PASS|AUTH|CRED|PRIVATEare masked before being sent to the LLM. TLS cert paths are sent, never the contents. - Recursion depth guard (v3.0.1) on the IaC mask walker so a pathological JSON payload can't trigger a RecursionError.
ACME / Let's Encrypt — manage acme.sh certs across the fleet
Lives under Security → TLS / DNS expiry, in the "ACME certificates" section below the existing TLS watchlist.
How it works: the agent walks ~/.acme.sh/ on each device (root, $HOME, or /etc/acme.sh/ — first match wins), parses every <domain>/<domain>.conf, and reports state to the server. No credentials leave the device — the agent uses whatever's already in ~/.acme.sh/account.conf or env. acme.sh's own cron handles renewals; RemotePower visualises and provides force-renew/revoke/cancel.
DNS-01 only. v1 supports dns_cf (Cloudflare) explicitly with credential-location hints; the provider dropdown lists Route53, DigitalOcean, deSEC, Hetzner, Porkbun, etc. RemotePower never touches nginx/apache/HTTP-01 plumbing.
Wildcards are supported via the Issue wizard's checkbox. Must-staple is intentionally not exposed — Let's Encrypt is sunsetting OCSP (August 2025), so OCSP-stapled certs would fail.
Cancel pending actions: any queued renew/issue/revoke with no rc yet shows a Cancel button. If the action is still in the queue it's removed cleanly; if the agent has already grabbed it, RemotePower stops polling but the agent may still complete the operation (last-write wins on the meta).
Per-domain action logs (captured stdout from acme.sh, 256 KB cap) live in /var/lib/remotepower/acme_logs/.
Mitigation runners — investigate alerts with diagnostic + AI suggestion + confirmed fix
Every Needs Attention card with a supported alert kind (patches, disk, drift, service_down, reboot, brute_force) shows an Investigate button. Three-tab modal:
- Diagnostic — server queues a hardcoded read-only command on the agent (e.g. for disk:
df -h+ top dirs + files >500MB + journal disk usage). Live-polls every 2s, up to 3 min. - AI Analysis — when diagnostic completes, the AI runs automatically with a playbook-specific system prompt (5 new keys in Settings → AI Assistant:
mitigate_cpu,_memory,_disk,_service,_patches). It outputs a root cause + one specific fix command betweenBEGIN_FIX/END_FIXmarkers. - Apply Fix — choose pre-approved playbook fix, AI-suggested fix, or paste your own. Safety classifier shows red banner for denylisted commands (
rm -rf /,dd of=/dev/sd*, fork bombs, etc. — refused outright) or amber for sensitive ones (reboot,kill -9,systemctl stop,apt purge,curl | bash— requires typing RUN).
All investigations and fixes go to the audit log. Output captured per-action in /var/lib/remotepower/mitigate_logs/.
Diagnostic commands are server-defined — user input never flows into the shell. Service unit names go through a strict regex (alphanumeric + ._@-) before any template substitution; shell metachars and path traversal are rejected.
Needs Attention vs Recent Activity — what goes where
Needs Attention = "fix this NOW". Only items whose underlying state is currently broken: a service that's down right now, a probe that's failing right now, pending patches that still need to be applied, etc.
Recent Activity = event log of things that happened in the past — transitions, dispatches, ACK'd alerts. A service that went down at 14:30 and came back up at 14:35 shows in Recent Activity but disappears from Needs Attention once it's healthy again. Click the ✕ Clear button in the Recent Activity header to hide all current entries; this lasts for the browser session (persisted via sessionStorage) and resets when you close the tab.
v3.0.1 audit (attention coverage): these state-derived kinds now appear in NA whenever the condition is active:
service_down— any watched systemd unit infailed(critical) orinactive/deactivating(warning). Carries the unit name astargetso the Investigate button can runsystemctl status+journalctl -u <unit>automatically.monitor_down— any monitor target whose latest probe came backok: false.custom_script_fail— any custom monitoring script reporting non-zerorcin its latest result.
Already covered: offline, patches, cve, drift, mailbox, brute_force, snapshot, backup, disk, tls, reboot, agent_version.
Log watching — systemd units AND arbitrary file paths
Per-device and fleet-wide log rules live in Settings → AI Assistant → Log alerts. Each rule:
- Source type: systemd unit (via
journalctl) or arbitrary file path (agent tails the file directly). - Pattern: Python regex against each new line.
- Threshold: minimum matches before firing.
- Severity: OK (silent on match — used to suppress noise), WARN, CRIT.
File-path rules: agent tracks inode + byte position in /var/lib/remotepower/file-log-state.json. Rotation (inode change) resets to position 0; truncation also resets. First-time setup skips existing content — only new lines from then on are sent, so a freshly-configured rule doesn't dump the entire historic file.
Submitted as synthetic unit file:<path>. Caps: 200 lines and 256 KB per poll, per file.
Force-upgrade agent — push a fresh binary to a single device
Per-device drawer → Force upgrade button. Sets a one-shot flag on the device record. On the next heartbeat (within 60s), the server includes the flag in the response; agent calls check_for_update(force=True), which bypasses the version-compare check and re-downloads the binary regardless of "we're already at version X". Flag is cleared atomically inside the heartbeat lock so it fires exactly once.
v3.0.1 fix: the flag had been silently dropped (copied to the wrong scope inside the heartbeat handler) so the operator got a success toast but nothing happened. Now wired through correctly, with a regression test. Use for re-deploys, recovery from corrupted binaries, or rolling out same-version rebuilds.
Sidebar navigation — collapsible groups and section deep-links
Grouped layout: the sidebar organises all pages into five collapsible sections. Each group opens and closes independently; state persists in localStorage. Fleet and Monitoring are expanded by default; Admin is collapsed.
| Group | Pages |
|---|---|
| Fleet | Devices, CMDB, Containers, Virtualization, Network |
| Monitoring | Targets, Device Metrics, Listening Ports, Custom Scripts, Services, Logs |
| Security | TLS/DNS, Patches, CVEs, Drift, Audit |
| Planning | Schedule, Calendar, Tasks, Maintenance, History |
| Admin | Settings, Users, API Keys, Library, Scripts, IaC Generator, Server Status |
Monitoring deep-links: each item in the Monitoring group navigates to the Monitoring page and smooth-scrolls to its section — so clicking "Listening Ports" skips straight to that section without manual scrolling.
Narrow mode: click ◀ at the top to shrink the sidebar to a 56-px icon strip. Click ▶ to expand. Preference saved in localStorage and applied before first paint. Mobile ignores this — the slide-in drawer is unchanged.
Standalone items: Home and Links sit above the groups as always-visible direct links (no group required to reach them).
Package manager coverage — apt, dnf, yum, pacman
Patch status detection works across all four: apt (apt list --upgradable), dnf (dnf check-update), yum (yum check-update — RHEL 7 / older CentOS, reports as manager: dnf so it shares the CVE path), pacman (pacman -Sy + pacman -Qu).
pacman 7+ sandbox: the new download sandbox user (alpm) fails on hosts where that user isn't usable — CachyOS in particular. v3.0.1 probes pacman --help for --disable-sandbox support and passes it when available, in both the agent's status check and the server's _UPGRADE_CMD. On real failure, upgradable is reported as None (UI shows "unknown") instead of 0 ("fully patched") — the previous silent-0 was the reason CachyOS devices appeared up-to-date when they weren't.
CVE scanning: uses OSV.dev. Mapping: Debian/Ubuntu by codename, Rocky/AlmaLinux/RedHat for rpm-based, no Fedora (no OSV feed). Arch / CachyOS intentionally unsupported — Arch packages aren't in OSV.
Ignored items — hide alerts without deleting the underlying state
Every Needs Attention card and Containers row has an × button. Click it to hide the item; restore from Settings → Ignored items. Three categories: Needs Attention (per-alert), Stale containers (per-container), Devices (whole-device hide on Containers page).
Stored in /var/lib/remotepower/ignored_items.json. Ignores survive backups and aren't redacted in the export.
Multi-webhook destinations — fire every alert to multiple endpoints (v3.0.2)
Up to 20 webhook destinations, each with its own format adapter (Discord, Slack, Pushover, Teams, ntfy, generic JSON). Per-destination filters: limit by event name or minimum priority. Configure in Settings → Notifications → Webhook destinations.
Pushover uses form-encoded POST with token + user + message + priority. Internal critical (priority=2) maps to Pushover priority=1 (high), not 2 (emergency tier) — that requires retry+expire and forced acknowledgment, must be operator-explicit. Credentials are write-once-and-redacted (UI shows ••••• (set)) and redacted from the backup export.
The legacy single webhook_url field still works for backward compatibility. New setups should use the multi-webhook editor.
Full doc at docs/webhooks.md.
Server status — RemotePower watching itself (v3.0.2)
New sidebar entry. Reports server version + memory, /var/lib/remotepower disk usage with the top 20 largest files, fleet device freshness, webhook delivery success rate (24h + 7d), audit log size, and scheduled backup state.
Closes the "who monitors the monitor?" gap — if RemotePower itself starts misbehaving (webhooks failing silently, disk filling, agents stuck), the page surfaces it. GET /api/self/status works for external monitoring (Uptime Kuma, Grafana, Homepage).
Full doc at docs/self-monitoring.md.
Scheduled backup — daily snapshot of /var/lib/remotepower (v3.0.2)
Daily gzipped tarball, 14-day retention (both configurable in Settings → Advanced → Scheduled backup). Triggered via the heartbeat hook with a 24h sentinel and stale-lock recovery.
Manual triggers (Server status page):
- Run backup now — immediately snapshots
/var/lib/remotepowerregardless of the 24h cadence. - Export backup — also triggers a snapshot before streaming the ZIP download.
- Clear backup archives — deletes all
remotepower_data_*.tar.gzfrom the backup directory and resets the backup state. Useful before a migration or to reclaim disk space. Requires confirmation.
Excluded from tarball: the backups directory itself, in-flight .tmp.* files, existing .gz archives. Owner/group stripped so restoring on a different host doesn't fail.
No in-UI restore — backups are tarballs you extract yourself. Procedure in docs/self-monitoring.md.
Bulk operations — fleet-wide operations on filtered devices (v3.0.2)
From the command palette (/ → "Bulk actions") or Settings → Advanced → Bulk actions. Filter by all monitored / by group / by tag, then pick an action: upgrade packages, reboot, shutdown, force package scan, force ACME rescan.
Destructive actions (reboot, shutdown) require typing RUN to confirm. Every queued command is audit-logged. No undo once a command is queued — operator must reach the agent another way (SSH) to cancel before the next heartbeat picks it up.
Full doc at docs/bulk-operations.md.
Command palette & keyboard shortcuts — /, Ctrl-K, ? (v3.0.2)
Press / or Ctrl-K to open the global search. Indexes all pages, every device (from the cached list), and quick actions. Arrow keys to navigate, Enter to activate.
? shows the cheat sheet. g-prefix shortcuts: press g, then h/d/l/s/c/m/a/v within 1.5 seconds to jump to Home/Devices/Logs/Settings/CVE/Monitor/Audit/serVer-status.
All shortcuts disabled when an input field has focus. Full doc at docs/keyboard-shortcuts.md.
Force ACME rescan — bypass the hourly scan cadence (v3.0.2)
When you've issued or renewed a cert via the acme.sh CLI on the host and don't want to wait up to an hour for RemotePower to catch up: from the ACME table, click Rescan on the device row.
Sets a one-shot force_acme_rescan flag on the device. Next heartbeat (within 60s by default) carries the signal, agent re-walks ~/.acme.sh/ regardless of the ACME_CHECK_EVERY cadence. Same flag-on-heartbeat-lock pattern as force-upgrade and force-package-scan.
Per-request load() cache — performance under the hood (v3.0.2)
CGI gives a fresh interpreter per request, so the cache lives only as long as one handler. Within that handler though, CONFIG_FILE was being parsed up to 4× per heartbeat and LONGPOLL_FILE 3× in handle_longpoll_exec. The cache deduplicates redundant reads.
Two safety properties: deep-copy on every load() so caller mutations don't corrupt the cache, and explicit invalidation in _LockedUpdate.__exit__ when the save aborts due to exception — otherwise the next load would return uncommitted in-flight changes.
Conversation history is kept in your browser (localStorage) — not on the server. Clearing the conversation clears only your view.
oscap xccdf eval on the endpoint against its SCAP Security Guide datastream and reports the score, pass/fail tallies and failing rule ids. Requires openscap-scanner plus the SSG content for the host's OS — on RHEL/Fedora that's scap-security-guide; on Debian/Ubuntu it's ssg-debian / ssg-debderived. The profile list below is what your fleet's datastreams actually contain (it fills in after the first scan). Profiles are OS-specific: Debian/Ubuntu ship the ANSSI BP-028 profiles (anssi_np_nt28_minimal → …_high) — those have real rules and produce a meaningful score; the Debian standard profile selects almost no rules (expect 0). CIS / PCI-DSS / STIG / OSPP exist only in the RHEL/Fedora scap-security-guide, not on Debian. A profile that isn't in the host's datastream, or that evaluates no applicable rules, reports "not available" with the reason. The content must match the host's OS release — oscap scores 0 if it doesn't (every rule "not applicable"). Best results per OS: Ubuntu — install Canonical's usg (Ubuntu Security Guide); the agent uses it automatically for CIS/STIG profiles and it ships content for the exact release (incl. 24.04, where the distro ssg-ubuntu datastream lags). Debian — ssg-debian matching the release, then an ANSSI BP-028 profile. RHEL/Fedora — scap-security-guide. If a scan reports "not available", the reason names exactly what to install. Scans run in the background and report on the next heartbeat. The full oscap / usg HTML report is uploaded with each successful scan — click Report in the results row to open it./tmp, /run, /dev/shm, …) are excluded, and a heavily-fluctuating mount shows fluctuating instead of a misleading date. Documentation.Fleet posture report
Capacity
Agent integrity
Resource anomalies
Software metering
Uptime (SLA)
Scheduled email delivery
Fleet health score
Fleet compliance %
Device resources
New rule
Server-side signing is the convenient mode. The private key lives on this server, so it protects against tampering of the published files at rest (mirror/CDN), but not a full compromise of this server. For the strongest guarantee, sign off-server in CI with tools/sign-agent-release.sh and only publish the public key here.
Distribute the public key to agents
Pin this on each agent host at /etc/remotepower/release.pub. Once present, that agent enforces signatures (fail-closed). No key pinned → agent keeps using sha256-only verification.

| Agent version | — |
| GitHub | github.com/tyxak/remotepower |
| License | MIT |
| Latest release | checking… |
Self-hosted remote device management — shutdown, reboot, Wake-on-LAN, monitoring, scheduled commands, and agent self-update.
No inbound firewall rules on clients. Agents poll the server over HTTPS. Flat JSON storage, Nginx + Python CGI. No Docker, no Node.js.
…
| Name | Role | User | Created | Expires | |
|---|---|---|---|---|---|
| Name | Slug | Devices | Created | |
|---|---|---|---|---|
| Loading… | ||||
| Device | Group | OS | Status | Pkg Manager | Pending | Patch Status | Recent Patch Cmds | |
|---|---|---|---|---|---|---|---|---|
| Click Refresh to load patch report. | ||||||||
Software inventory search
Patch catalog
Install software
exec permission; honours quarantine and change-windows.| Device | Group | Ecosystem | Critical | High | Medium | Low | Last scan | |
|---|---|---|---|---|---|---|---|---|
| Click Refresh to load findings. | ||||||||
| Device | Group | Watched | Up | Down | Last report | |
|---|---|---|---|---|---|---|
| Click Refresh to load. | ||||||
log_alert webhook.| Device | Group | Unit | Pattern | Threshold | |
|---|---|---|---|---|---|
| No per-device rules configured. | |||||
* for the unit to match any unit on any device.
| Unit | Pattern | Exclude pattern | Threshold | Created | |
|---|---|---|---|---|---|
| Reason | Scope | Target | When | Events | Status | |
|---|---|---|---|---|---|---|
| Click Refresh. | ||||||
| When | Event | Device | Window | Reason |
|---|
| Name | Target | Schedule | Reboot | Enabled | |
|---|---|---|---|---|---|
| Loading… | |||||
| Name | Device | Schedule | Last run | Enabled | |
|---|---|---|---|---|---|
| Loading… | |||||
| Name | Last run | Result | |
|---|---|---|---|
| Loading… | |||
| Name | Asset ID | Function | OS | IP | Hypervisor | Docs | Creds | |
|---|---|---|---|---|---|---|---|---|
| Device | OS | Total | Running | Stopped | Restarting (≥5) | Runtimes | Reported | |
|---|---|---|---|---|---|---|---|---|
| Image | Tag | Hosts | Status | Registry | Last checked | |
|---|---|---|---|---|---|---|
| Stack | Device | Status | Last action | |
|---|---|---|---|---|
connected_to links and tunnels (peer links). Drag nodes to reposition — positions persist across refresh. Add agentless devices on the Devices page.Reusable named sets of watched files. Assign a profile to a device, tag, or group and every matching host monitors that set. A device's own explicit file list (set in its drawer) still overrides any profile; an unassigned host falls back to the global default.
| Device | Group | Files watched | Drift | Missing | Last check | |
|---|---|---|---|---|---|---|
| Device | Proto/Port | Process | Bind address | Scope | |
|---|---|---|---|---|---|
| Loading… | |||||
| Device | Risk | Level | Top factors |
|---|---|---|---|
| Loading… | |||
| Device | Pool | Type | State | Capacity | Last scrub |
|---|---|---|---|---|---|
| Loading… | |||||
| Device | Rule | Package | Expected | Found |
|---|---|---|---|---|
| Loading… | ||||
Every package installed across the fleet, with the versions in use and how many hosts run each. Type to filter; click a row to see which hosts (and versions) on the Patches → inventory search.
| Status | Host | Port | Days left | Expires | Issuer | Last check | |
|---|---|---|---|---|---|---|---|
acme.sh on each device. Server scans
~/.acme.sh/ and shows next renewal, alt names, and the configured DNS
provider. Renewal stays under acme.sh's own cron; this page just visualises and
provides force-renew, revoke, and a wizard to issue new certs (DNS-01 only).
| Device | Domain | Challenge | Provider | Created | Next renewal | Status | Actions |
|---|---|---|---|---|---|---|---|
| Loading… | |||||||
| Severity | Time | Title | Device | Ack by | ||
|---|---|---|---|---|---|---|
| Loading… | ||||||
require_confirmation=true. Each entry shows the originating AI host and the natural-language prompt that led to the action. Approve to run, reject to discard. Pending entries expire after 1 hour.| Status | Requested | Action | Device | AI host | Prompt | |
|---|---|---|---|---|---|---|
| Loading… | ||||||
| Time | Actor | Action | Detail | Source IP |
|---|---|---|---|---|
| Name | Command | Description | |
|---|---|---|---|
| No snippets yet. | |||
bash -n + dangerous-command detection before they go anywhere. Run on a single device from the device dropdown, or on a batch via the multi-select bar.| Name | Description | Size | Updated | Flags | |
|---|---|---|---|---|---|