Version 3.13.0 — the long-form reference: architecture, agent and heartbeat, commands, monitoring, CVE scanning, drift detection, ACME, mitigation runners, IaC generator, Proxmox, the MCP server (now with write tools), inbound webhooks, OIDC SSO, alert inbox, channel routing, RAG over your infrastructure, the per-device timeline, fleet health score, posture reports, and the security model.
docs/<feature>.md files are the source of truth for each subsystem. Where a section here is intentionally brief, follow the link to the matching docs/*.md.CHANGELOG.md):
login_new_source alert fires off); Scheduled jobs / timers (failed-first systemd timer inventory); Pools / arrays (this host's own ZFS/mdadm/btrfs storage & RAID health, previously fleet-page only); the listening-ports card gains a bind address column and a world / LAN / local scope badge; the firewall card shows the active backend, rule count and fingerprint (the drift baseline the firewall_changed alert compares against); active brute-force lockouts show as a device-card badge; and the drawer adds Disk/Swap pressure pills. Every box fits: drawer cards and large page tables (Compliance, Pools, Ports, Mounts, Containers, SMART) cap at ~15 rows and scroll internally; two latent clip bugs (host-config dump, patch history) are fixed. Performance: version-busted static assets are cached immutably for a year (no more per-load 304 revalidations), front-end scripts load defer, and _compute_fleet_risk() is file-cached for 10s so /api/home and /api/risk share the work. Hardening: agent-supplied SCAP reports are served under a self-contained sandboxed CSP (default-src 'none'; … sandbox;) so stored XSS can't reach an operator's session even if the upstream CSP is loosened; OIDC id_tokens are checked for expiry, issuer and audience; and the syslog audit-forwarder resolves its target once to close a DNS-rebinding window. Service-worker cache remotepower-shell-v3.13.0. See docs/v3.13.0.md.devices.json is rewritten on every heartbeat; SQLite (WAL mode, stdlib sqlite3, no new dependencies) stores hot, high-cardinality data row-per-entity (devices, alerts, cmd_output, metrics, and the history/fleet_events/metrics_history logs) so a device update writes a single row and log appends are O(1). The new backend sits behind the existing storage helpers, so every feature behaves identically — only the on-disk representation changes. Switching is in-place and reversible: the migration takes a rollback snapshot, migrates every file, verifies a full round-trip, and only then flips the active backend (a Preview dry-run shows what would move). Endpoints GET /api/storage-backend/status and POST /api/storage-backend/migrate; CLI tools/migrate_storage.py --to sqlite|json (--dry-run/--verify-only). Flat JSON stays the default; existing installs are unaffected until you opt in. On a network filesystem (NFS/CIFS) WAL is unsafe, so a rollback journal is used automatically — a local disk is recommended. The DB self-maintains (hourly WAL checkpoint, weekly VACUUM + integrity_check; size + last integrity on Server Status, a failed check raises a critical db_integrity_failed alert). Quieter posture alerts: the listening-port & firewall audit (new_port_detected / port_exposed_world / firewall_changed) is now off by default (Settings → Security) because it is noisy on Docker hosts; you can instead keep it on and mute a specific process (e.g. docker-proxy) from the Exposure page or the managed list in Settings — muting resolves matching open alerts, and the Exposure page shows a banner when alerting is off but world-reachable services exist. My Account: a top-right account menu (avatar → My Account / Sign out) and a dedicated page consolidating per-user settings — profile picture (downscaled in-browser, stored under DATA_DIR/avatars/), your role & permissions, 2FA and default SSH username (moved here from Settings → Security), and your acknowledged alerts (GET /api/me, /api/me/avatar). Acknowledge → ticket webhook: each webhook destination gains an “also fire on alert ACK” option (webhook_urls[].on_ack) that POSTs the full alert record to that destination when an operator acknowledges it — point a generic/GitHub-issue/PagerDuty destination at a ticket system to open a ticket on ack. Service-worker cache remotepower-shell-v3.12.0. See docs/v3.12.0.md.local / lan / world; a service first binding to a world-reachable address raises port_exposed_world, and a new Exposure page lists every socket with a World/LAN/Local filter. Fleet Software Policy: banned / required / min version rules (optionally tag-scoped) evaluated against the installed-package inventory every host already pushes; software_policy_violation fires edge-triggered, with a policy editor + violations table. Storage / RAID health: a new probe reports ZFS / mdadm / btrfs pool state, capacity and last-scrub — storage_degraded / storage_recovered (auto-resolving) and scrub_overdue, on a new Storage page. Access watch: successful logins and their source IPs are collected; login_new_source fires on a first-seen source. Host firewall drift: a stable fingerprint of the active ufw/nftables/iptables ruleset rides the heartbeat; firewall_changed fires on divergence. Scheduled-job failures: systemd timers are inventoried; timer_failed fires when a timer's backing job fails. Posture digest: an opt-in daily/weekly email summarising offline hosts, pending updates, critical CVEs, policy violations and degraded storage, over the existing SMTP path. All detections are edge-triggered with per-device state. Service-worker cache remotepower-shell-v3.11.0. See docs/v3.11.0.md.GET /api/config gained a recursive secret-scrub backstop — it redacted known secrets by name, so the AI provider api_key and the per-registry credentials map leaked to any viewer or read-only MCP key; a recursive pass now strips any secret-named field at any depth while keeping every *_set / *_from_env indicator. The TCP uptime monitor and the Healthchecks.io ping picked up the same IP-class SSRF checks (and connect-time peer recheck) the HTTP paths already had, so a TCP monitor can't be turned into a blind internal port scanner. See docs/security-review-3.10.0.md. Bind it together: Docker/Podman containers reported restart_count / started_at / uptime_seconds hardcoded to zero, so the container-restarting alert only ever fired for Kubernetes pods and the drawer's container age was blank — the agent now fills them from a single batched docker inspect per heartbeat, so the alert fires fleet-wide and age renders; ClamAV last-scan time (parsed from the scan summary) and per-interface MAC addresses now show in the device drawer. Fixes: the config-drift alert title read a field the drift events never send, so every one said “? file(s)” — it now names the file that changed or the number of sections that drifted; the Devices table view showed a sort arrow on the Hostname column but never reordered (missing sort key) — fixed. Service-worker cache remotepower-shell-v3.10.0. See docs/v3.10.0.md.[::1]), an integer/octal/hex-encoded IPv4, or a hostname that rebinds to a metadata/loopback address after the pre-flight check — it now routes through the same connect-time SSRF guard as the webhook/audit/OIDC channels (the connected peer IP is re-classified, redirects refused, the shared IP classifier replaces the string list); RFC1918 LAN targets stay allowed by design and cloud-metadata/link-local is always blocked. Inbound-webhook alert links are scheme-validated (http(s) only) at ingest. See docs/security-review-3.9.0.md. Fixes: the post-upgrade “Patched — didn't take” badge no longer false-alarms on hosts that had nothing to patch (a fleet-wide upgrade touches already-patched hosts) or on offline hosts whose command is still queued; a stray return in the metric-threshold engine could skip per-mount disk alerting for a heartbeat when CPU load eased back through the recovery band; TLS-expiry alerts now use the correct severity and title (they read a field the event never sends, so every one was labelled high / “expires in ?d”). Bind it together: CPU-load history is now plotted on the Trends page (load ÷ cores, as a saturation %) and swap joins the per-device metrics sparkline — both were collected but never charted; rkhunter last-run time shows on the AV pill; the systemd alias a watched unit resolved to (e.g. mysql.service→mariadb.service) shows in the Services table; livepatch state shows on the kernel pill when no patch is applied. Polish: three tables gained their missing sort wiring (Log Alert per-device & fleet-wide rules, Maintenance suppression log); typographic button glyphs were replaced with Lucide SVG icons; icon-only close buttons gained aria-labels. Image Updates: a one-click Update button on stale, compose-managed rows runs docker compose pull + up -d to fetch the new image and recreate the container, and the agent recovers the real image name when docker ps shows a bare untagged ID after a pull. Command Queue: ACME certificate actions now show in the recently-dispatched log, with Clear all pending / Clear log controls. Service-worker cache remotepower-shell-v3.9.0. See docs/v3.9.0.md.CHANGELOG.md at the repository root for the complete release history, newest first.RP_READ_ONLY=1), multi-doc CMDB.RemotePower is a self-hosted fleet manager for Linux (and Windows, with limits). Each managed host runs an agent that polls the central server every 60 s — outbound HTTPS only, no inbound ports. The dashboard runs commands, monitors patches and services and CVEs and containers, alerts on metric thresholds, opens browser-based SSH terminals, and (from v3.0.0 onward) generates Terraform / Ansible / Pulumi / cloud-init from live host inventory.
What it isn't: a configuration-management system. It runs ad-hoc commands and reports back; it doesn't enforce desired state. Pair with Ansible / Puppet / Salt for that. The IaC Generator (§18) bridges the two — it produces IaC from what RemotePower sees, so you can hand off to a CM tool with a real starting point.
The architecture is deliberately boring:
/var/lib/remotepower/. No database, no Node.js, no Redis. Scales to 50–500 devices comfortably; beyond that look at SQLite.tar xzf remotepower-3.3.4.tar.gz
cd remotepower-3.3.4
sudo bash deploy-server.sh
The script writes:
/var/www/remotepower/ — CGI scripts and the dashboard HTML/var/lib/remotepower/ — state files (JSON), owned by your nginx user (www-data on Debian/Ubuntu, nginx on Fedora/RHEL, http on Arch)/etc/nginx/sites-available/remotepower — server blockAfter deploy, browse to https://your-host/ and log in with the default admin password (printed on first run). Change it immediately under Settings.
sudo bash packaging/install-webterm.sh
The script auto-detects your CGI user, installs Python dependencies via apt/dnf/pacman/apk/zypper, creates the rp-webterm daemon user, generates the daemon ↔ CGI shared secret, installs the systemd unit, and prints the nginx snippet you need to add. Run with --dry-run first if you want to see what it'll do.
Then add the printed snippet to your nginx server block above any catch-all location ^~ /api/ rule:
location = /api/webterm/connect {
proxy_pass http://127.0.0.1:8765;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# … (full snippet shown by install-webterm.sh)
}
And ensure your http { … } block has the upgrade map (only needed once globally):
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
Then sudo nginx -t && sudo systemctl reload nginx.
On each managed machine:
sudo install -m 755 client/remotepower-agent /usr/local/bin/
sudo cp packaging/remotepower-agent.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo remotepower-agent enroll # interactive PIN-based
sudo systemctl enable --now remotepower-agent
Two flows: interactive PIN for hand-installs, API tokens for automation.
Click Enroll device in the dashboard. You get a 6-digit PIN, valid for 10 minutes. Run sudo remotepower-agent enroll on the target machine, paste the PIN. Done.
For Ansible / cloud-init / golden-image stamping. Create a token via API:
curl -X POST https://remote.example.com/api/enrollment-tokens \
-H "X-Token: $YOUR_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"label": "ansible-batch-2026-05-07",
"default_group": "prod",
"default_tags": ["linux"],
"expires_in": 3600
}'
# Returns: {"token": "...", "expires": ..., ...}
Tokens are one-time use and expire after the specified duration (default 24h, max 7 days). The full token is shown ONCE in the create response; the list endpoint only returns 8-character prefixes.
On the target machine, the agent reads the token from one of three sources, in order:
--token CLI arg — visible in ps, only use for testing$REMOTEPOWER_ENROLL_TOKEN environment variable — good for systemd EnvironmentFile=/etc/remotepower/enroll-token file (must be mode 600) — auto-deleted after success, best for golden imagessudo remotepower-agent enroll-token --server https://remote.example.com
# (with token in /etc/remotepower/enroll-token, auto-deleted on success)
If the token has default_group or default_tags set, those are applied to the device at enrollment.
To revoke a token before it's used:
curl -X DELETE https://remote.example.com/api/enrollment-tokens/PREFIX \
-H "X-Token: $YOUR_ADMIN_TOKEN"
# PREFIX is the first 8+ chars shown in the GET listing
The agent polls the server every poll_interval seconds (default 60). On each heartbeat it sends:
device_id, token, OS version, IP, MACThe server responds with:
command the agent should execute next (shutdown/reboot/exec/upgrade)poll_interval override (the server can ask the agent to poll faster or slower)For long-running commands (apt upgrade can take minutes), the agent runs the command synchronously, then sends a separate follow-up heartbeat with just the output. Output appears under the device's "Update history" panel.
From the dashboard you can:
apt-get -y upgrade / dnf -y upgrade / pacman -Syu --noconfirm depending on what the host hasCommands are queued in /var/lib/remotepower/cmds.json per device. Only one command per device is in flight at a time; the agent acknowledges by sending the output back, which clears the queue slot.
The Monitoring page → Targets section lets you configure ICMP / TCP / HTTP probes that the server runs against external targets — not the same as the agent-side metrics or service watches. Useful for:
From v1.11.8 monitors run on a periodic schedule (monitor_interval, default 300s, minimum 60s) — not just when the page is open. The schedule is piggy-backed on incoming CGI requests, so as long as any agent is heartbeating or anyone's browsing the dashboard, monitors run on time. A truly idle server (no agents, no users) won't run them, but in practice that doesn't happen.
State transitions fire the monitor_down and monitor_up webhook events.
Each agent reports the following via sysinfo on every heartbeat (v1.11.10+):
| Metric | What it is |
|---|---|
| Memory % | Virtual memory in use |
| Swap % | Swap in use (0 if no swap) |
| Disk % per mount | Each non-pseudo mount (skipping tmpfs/squashfs/overlay) |
| CPU load ratio | 1-minute load average ÷ logical CPU count |
| Metric | Warning | Critical |
|---|---|---|
| Disk usage (per mount) | 80% | 90% |
| Memory usage | 85% | 95% |
| Swap usage | 20% | 50% |
| CPU load ratio | 1.5× | 3.0× |
Hysteresis: a metric must drop 5 points below its warn threshold before metric_recovered fires. Without this, a metric oscillating around 80% would generate a webhook every ~60s.
From the device dropdown menu, click Metric thresholds. The modal shows:
/var at 70/85% (logs grow fast), /backup at 95/98% (designed to fill).Validation: warn must be less than crit for every kind. Percentages are 1–99 (0 and 100 don't make sense as alerts). Load ratios are 0.1–100. Out-of-range values are rejected rather than silently clamped.
Setting thresholds clears the device's metric_state, so the next heartbeat re-evaluates under the new values. Otherwise a metric currently at "warning" would silently stay there even after you raised the threshold.
The Monitor page now has a Device metrics section below the external probes. It shows every device's current memory/swap/CPU/disk numbers, color-coded by alert level (green/amber/red), with clickable thresholds buttons that jump straight to the editor.
Filter by name, group, tag, or mount path. Sort by any column. Critical-state devices sort to the top when sorting by status ascending.
# Get current overrides + effective values for a device
curl -H "X-Token: $TOKEN" \
https://remote.example.com/api/devices/$ID/metric-thresholds
# Set overrides (any subset of fields; empty fields stay at default)
curl -X PATCH -H "X-Token: $TOKEN" -H "Content-Type: application/json" \
https://remote.example.com/api/devices/$ID/metric-thresholds \
-d '{
"mem_warn_percent": 70,
"mem_crit_percent": 85,
"disk_per_mount": {
"/var": {"warn": 70, "crit": 85}
}
}'
# Reset all overrides
curl -X DELETE -H "X-Token: $TOKEN" \
https://remote.example.com/api/devices/$ID/metric-thresholds
RemotePower's CGI model can't hold persistent WebSocket connections — fcgiwrap is request-response. So the web terminal is a separate remotepower-webterm daemon listening on 127.0.0.1:8765, with nginx proxying /api/webterm/connect to it. The CGI handles auth and audit logging via /api/webterm/auth and /api/webterm/audit; the daemon handles the actual SSH proxy.
Browser ──wss──> nginx :443 ──ws──> remotepower-webterm :8765 ──ssh──> device
│
└─http──> remotepower CGI (existing) for auth + audit
/api/webterm/auth. CGI re-validates your admin password (every time — by design). On success, issues a 32-byte URL-safe ticket with 60-second TTL and stores it in webterm_tickets.json. On failure: 403 + webterm_auth_failed audit entry.wss://host/api/webterm/connect?ticket=…. nginx proxies to the daemon.{host, user, port, password, cols, rows}.asyncssh. Bytes pump until either side disconnects./api/webterm/audit. Authenticated via shared secret in /etc/remotepower/webterm-secret (matches config.json[webterm_daemon_secret]).NoNewPrivileges, ProtectSystem=strict, RestrictNamespaces, dedicated user, ReadWritePaths limited to the recordings dir + ticket file.known_hosts management would mean a first-connect prompt for every device. If you reinstall a device, you won't get a "host key changed" warning. If that's a concern for your environment, this is the right discussion to have.Every session is recorded to /var/lib/remotepower/webterm-sessions/<session_id>.cast in asciinema v2 format. Replayable in any asciinema player; also greppable as plain text JSON Lines.
Output-only by default. Keystrokes are excluded because they could include sudo SECRET_VALUE and similar. Set RECORD_INPUT=1 in the daemon's environment if you have compliance reasons that require keystroke capture; only do this if you've thought through who can read the recordings directory.
10 MiB cap per recording — at the cap we stop recording but keep proxying bytes. To replay:
asciinema play /var/lib/remotepower/webterm-sessions/<id>.cast
Recordings aren't auto-pruned. Manage retention with cron:
0 3 * * * find /var/lib/remotepower/webterm-sessions -mtime +30 -delete
RemotePower fires webhooks for fleet events. As of v3.0.2 you can configure multiple destinations, each with its own format and filter, in Settings → Notifications. The legacy single webhook_url field is still honoured for backward compatibility but the multi-destination array is the recommended shape.
Each destination is self-contained: URL, format adapter, optional event allowlist, optional minimum priority. Supported formats:
Per-destination filters let you do things like "Pushover only for critical-and-above" while another destination gets everything. Format is auto-detected from the URL when not explicitly set.
Full reference and the wire formats: webhooks.md.
| Event | Default | Trigger |
|---|---|---|
device_offline | on | No heartbeat in 2× poll_interval |
device_online | on | Heartbeat resumed after offline |
monitor_down | on | External monitor target down |
monitor_up | on | External monitor target recovered |
service_down | on | Watched systemd unit went inactive |
service_up | on | Watched systemd unit recovered |
patch_alert | on | Pending updates exceed configured threshold |
cve_found | on | OSV scan turned up new CVEs |
log_alert | on | Watched log pattern matched (systemd unit or file path) |
container_stopped | on | Container/pod disappeared or stopped |
container_restarting | on | Container restart count climbed |
containers_stale | on | No container report for >TTL |
metric_warning | on | Resource crossed warning threshold |
metric_critical | on | Resource crossed critical threshold |
metric_recovered | on | Resource dropped below warn − buffer |
drift_detected | on | Watched config file changed against its baseline |
mailbox_threshold | on | Mailbox file count crossed its alert threshold |
proxmox_action | on | VM/LXC start, shutdown or snapshot via the Virtualization page |
snapshot_old (v2.7.0) | on | Proxmox snapshot first crosses age threshold |
tls_expiry (v2.6.1) | on | Cert within 30d (warning) or 7d / expired (critical) |
reboot_required (v2.6.1) | on | /run/reboot-required appears on a host |
ssh_key_added (v2.8.0) | on | New authorized_keys entry on a host (edge-triggered) |
new_port_detected (v2.8.0) | on | New listening port on a host |
brute_force_detected (v2.8.0) | on | 10+ SSH or web login failures from one IP in 5 minutes |
backup_stale (v2.8.1) | on | Configured backup path exceeded its max age |
custom_script_fail | on | Custom monitoring script returned non-zero |
custom_script_recover | on | Custom monitoring script returned 0 after a failure |
command_queued | off | Command sent to a device |
command_executed | off | Command completed on a device |
Each destination has a Test button that fires a synthetic event using that destination's config only, without touching others. Pushover credentials are stored write-once and never echoed back from /api/config; they are also excluded from the backup export. Any event carrying a device_id is suppressed for devices marked unmonitored — operators silencing a host during a migration won't get pinged about its disk usage anyway.
Every agent reports its installed-package inventory (dpkg-query / rpm / pacman / apk). The server cross-checks that inventory against the OSV.dev vulnerability database on a schedule and records the findings per device.
The CVEs page lists findings fleet-wide, ranked by severity. Severity is computed with the real CVSS v3.1 formula, and each finding records where its score came from. Debian Security Tracker urgency — a patching-priority hint, not a CVSS severity — is capped at medium and never promoted to high/critical.
Findings drill down per device with fixed-version hints. A per-CVE ignore list covers accepted risk: ignore on one device or fleet-wide, with a reason. Ignored findings drop out of the Needs Attention digest. New findings fire the cve_found webhook.
Use Scan packages now in a device's action menu to force a fresh inventory upload on the next heartbeat — handy right after patching a host, rather than waiting for the periodic scan.
RemotePower hashes a watch-list of critical config files on every host — sshd_config, sudoers, and similar — on each heartbeat, and flags any change against an accepted baseline.
For a drifted file you can view a diff, accept the new content as the baseline, or mark the difference as ignored (per file) so it stops raising a red status. Watched-file lists are pushed to the agent in the heartbeat. Drift surfaces on the dashboard and in the fleet event log, and fires the drift_detected webhook.
Full reference — the watched-file list, customising it, re-baselining, the compliance angle: drift.md.
Connect one Proxmox VE node under Settings → Proxmox using a scoped API token (use a scoped token, not a full-access one). This is a server-to-API integration — no agent runs on the Proxmox node itself.
The Virtualization page lists the node's QEMU virtual machines with start / shutdown actions and a search box that filters by name or VMID. LXC containers appear on the Containers page with the same actions.
Every Proxmox guest — QEMU and LXC — has a Snapshots panel: create, list, roll back and delete. Snapshots are disk-only (no RAM state). Rollback is destructive and requires typing the guest name to confirm. Guest actions fire the proxmox_action webhook.
Create LXC containers (v3.4.0). The Containers page → LXC section has a Create container button that opens a wizard. It pulls live options from the Proxmox API — OS templates, root-disk storages, network bridges (Linux and Open vSwitch), and the next free VMID — and creates an unprivileged container in one validated request: hostname, template, disk size, cores, memory, swap, network (DHCP or a static CIDR + gateway), and a root password and/or SSH key, with start-now and start-on-boot toggles. Every field is validated server-side before the API call; the action is admin-only and audited, and the root password is passed straight to Proxmox — never logged or stored.
Delete LXC containers (v3.4.0). Each LXC card has a Delete button that opens a type-to-confirm dialog (you type the container's name, or its VMID if unnamed; the destructive button stays disabled until it matches exactly). Deleting is admin-only and audited: a running container is force-stopped and polled until it is down (bounded wait) before the container is removed. No purge — backup and replication entries are left intact. This permanently deletes the container and its disk and cannot be undone.
RemotePower ships an MCP (Model Context Protocol) server that lets an MCP-capable AI client — Claude Desktop, for example — query fleet state through a set of defined read-only tools: device lists, status, patches, CVEs, drift and more.
It is read-oriented by design: the MCP tools report state, they do not queue commands or change configuration. Setup, the Claude Desktop config block, the full tool list and the security model are in mcp.md.
The Home dashboard's Needs Attention panel is a single ranked list, computed server-side, that merges every fleet-wide signal — offline devices, critical/high CVEs, configuration drift, pending-patch pile-ups and mailbox threshold breaches — into one prioritised view. Unmonitored devices are excluded, the same gate the alert pipeline uses.
The mailbox monitor is a lightweight count-only check with no IMAP/SMTP: configure a device with one or more directory paths and the agent counts the regular files in each (for a Maildir new/ folder that is the unread-message count). A monitor can carry an alert threshold; crossing it fires the mailbox_threshold webhook, edge-triggered — it fires once on the crossing and re-arms when the count drops back below.
/api/status is a machine-readable fleet summary for external dashboards — Uptime Kuma, Homepage, Grafana. It is authenticated by a dedicated status token (generated in Settings), not a login session, so a monitoring tool can poll it without the endpoint being public. It returns a rolled-up health word, device online/offline counts and attention counts by severity.
RemotePower watches certificate and DNS expiry from two complementary angles:
host:port targets in tls_targets.json, fetches the leaf cert, records the expiry. 30 d → warning, 7 d / expired → critical. Both thresholds fire tls_expiry once on the crossing (edge-triggered). Details in tls-monitor.md.~/.acme.sh on each host once per hour and reports every certificate it manages — domain, challenge type, provider, dates, status. Surfaces in Security → TLS / DNS expiry. The dashboard can issue / renew / remove certs by sending one-shot flags back through the heartbeat. v3.0.2 adds a Force ACME rescan button per device so you don't have to wait for the hourly cadence after issuing on the CLI. Devices without acme.sh installed are hidden from the table; a discreet count is surfaced above. Full reference: acme.md.The agent does four edge-triggered audits per heartbeat. All four fire only on the change; baseline state is silent.
Failed password, Invalid user) and web (POST /wp-login.php, POST /xmlrpc.php) failure patterns are counted per source IP in a 5-minute rolling window. The threshold (default 20, configurable in Settings → Dashboard with an enable toggle) fires brute_force_detected once per source. Web access logs nginx.access / apache2.access are collected incrementally by the agent.authorized_keys fires ssh_key_added with the user and key fingerprint. Requires "Collect all current" to have run at least once for the baseline.LISTEN entry not in the per-device baseline fires new_port_detected with port + protocol + process. The Monitor page shows a fleet-wide grouped view.backup_stale once on the threshold crossing. Also a critical item in Needs Attention.Every Needs Attention card whose alert has a deterministic remediation gets a Mitigate button (v3.0.1+). Clicking it asks the configured AI provider to generate a playbook for the specific finding, with operator sign-off required before any step runs. Each run is logged to mitigate_logs/<run_id>.json for audit. This is opt-in and disabled unless AI assistant is configured. See mitigation.md.
The IaC page (v3.0.0+) generates Infrastructure-as-Code for any device on demand. Pick the device, pick categories (18 total: OS & identity, packages, systemd, users, groups, SSH keys, networking, fstab, containers, repos, firewall, cron, TLS paths, env, snaps, kernel modules, sysctl, RemotePower-specific) and the output format (Terraform HCL, Ansible YAML, Pulumi Python, Pulumi TypeScript, cloud-init YAML). The server flags the device, the agent runs the relevant collectors on the next heartbeat, and the configured AI provider renders the result.
Server-side secret masking: any env var whose name matches PASSWORD|SECRET|TOKEN|KEY|PASS|AUTH|CRED|PRIVATE is redacted before the payload leaves the host. The LLM never sees those values.
The Server status sidebar entry (v3.0.2+) reports the things you want to know about RemotePower itself — server version and memory, DATA_DIR disk usage with the top 20 largest files, fleet-wide device freshness, webhook delivery rate (24 h and 7 d), audit log entry count plus archive size, and scheduled backup state. GET /api/self/status is the machine-readable form for external monitoring.
A daily gzipped tarball of /var/lib/remotepower/ runs from the heartbeat hook on a 24-hour sentinel with stale-lock recovery. Retention defaults to 14 days; output path defaults to /var/lib/remotepower/backups/. The backup excludes the backups directory itself, in-flight .tmp.* files, and existing .gz archives. Owner / group are stripped so restoring on a different host doesn't fight over UIDs. A manual Run backup now button on the Server status page hits POST /api/self/backup-now.
The audit log itself has age-based retention (default 90 days, configurable). Entries older than retention are moved to audit_log_archive.jsonl.gz rather than dropped, and the rolling 200-event fleet log evicts to fleet_events_archive.jsonl.gz on the same principle.
The full posture, threat model and operator hardening checklist live in security.md. This section covers what an operator needs to act on or set.
CERT_REQUIRED TLS by default; opt out only for known self-signed internal CAs.hmac.compare_digest match, shown to the operator only at creation, per-key expiry, capped at 50 per server.Failed logins escalate per consecutive episode: 10 s → 1 min → 5 min → 30 min → 2 h. Resets on the next successful login. A missing-user verify path runs a dummy hash so the timing doesn't enumerate accounts.
Fresh installs seed admin / remotepower with the must_change_password flag set. Every API call returns 403 until that password is changed; only POST /api/users/passwd and GET /api/public-info are reachable through the gate. The dashboard catches the 403, surfaces a clear toast and routes the user to Settings → Account; the flag clears the moment the new password is saved.
The gate is per-account — once cleared, never enforced again. API keys are unaffected (a default-password admin can't create them anyway, since the create-key endpoint is in the blocked set).
Three secret fields can now live in the environment instead of config.json — env wins over config, the secret stays out of the data directory, and it is excluded from the backup export:
RP_PROXMOX_TOKEN_SECRET (since v2.3.1)RP_SMTP_PASSWORD (v3.0.3)RP_LDAP_BIND_PASSWORD (v3.0.3)# /etc/systemd/system/remotepower.service.d/secrets.conf
[Service]
Environment=RP_PROXMOX_TOKEN_SECRET=…
Environment=RP_SMTP_PASSWORD=…
Environment=RP_LDAP_BIND_PASSWORD=…
When set, the relevant input under Settings is disabled and a green hint reads "✓ Password is currently being read from RP_…". To finish migrating, clear the field once and Save — the plaintext drops from config.json and from the backup export.
Session tokens travel in the custom X-Token header rather than a cookie. Browsers require a CORS preflight for any cross-origin request that carries a custom header, and RemotePower serves no permissive Access-Control-Allow-Origin, so classic CSRF cannot apply. Defence-in-depth: every POST/PUT/PATCH/DELETE additionally passes an Origin / Referer same-origin check before route dispatch (v3.0.2). CLI / agent / API-key clients send no Origin and are unaffected.
Agent state files in /var/lib/remotepower/ (mode 0700) use O_NOFOLLOW on every read and write to defeat symlink attacks from local non-root users; the enrolment credentials file is written with O_WRONLY|O_CREAT|O_EXCL|O_NOFOLLOW at 0600 atomically. Server-pushed log-watch paths pass through a deny list (/etc/shadow, /root/.ssh/, /proc/, /sys/, /dev/, …) with realpath() resolution so symlinks cannot bypass.
The sidebar is organised into six collapsible groups. Each group's open/closed state persists in localStorage. All groups except Admin are expanded by default. Home and Links are standalone items above the groups, always visible.
| Group | Pages / Sub-items |
|---|---|
| Fleet | Devices, CMDB, Agent Containers, Proxmox LXC, Virtualization, Network |
| Monitoring | Targets · Device Metrics · Listening Ports · Custom Scripts |
| Security | TLS/DNS Expiry · ACME Certificates · Patches · CVEs · Drift · Audit |
| Planning | Schedule, Calendar, Tasks, Maintenance, History |
| Admin | Settings, Users, API Keys, Library, Scripts, IaC Generator, Server Status |
| Help | Documentation, AI Assistant, API Reference, About |
Sub-items (Monitoring, Containers under Fleet, TLS under Security) use focused views (v3.1.0): clicking one shows only that panel. Keyboard shortcuts and the command palette navigate to the top-level page and show all panels.
showMonitorSection(sectionId, btn) — navigates to Monitoring and hides all panels except the requested one. showContainerSection(panelId, btn) and showTLSSection(panelId, btn) follow the same pattern.
Clicking a device name or the ⋮ button on Devices / Dashboard opens a full-screen slide-in drawer with two tabs. Actions & Settings has the quick-action grid (run command, reboot, shutdown, WoL, upgrade packages, scan, web terminal, run script, update agent, docker compose, host config, CMDB, runbook, maintenance, adjust poll, delete) plus inline-editable device settings (group, tags, icon, monitored, poll interval, watched services, log rules, drift files, command allowlist) — all behind a single Save button. Audit has 11 collapsible sections, lazy-loaded on first open: system info, listening ports, packages, logs (filterable by unit), command history, fleet events, drift state, CVE summary, containers, metrics, host config.
/ or Ctrl-K opens the global palette. Indexes pages, devices and quick actions. Arrow keys to navigate, Enter to activate, Esc to close. ? opens the keyboard shortcut cheat sheet. g-prefix navigation: g h / d / l / s / c / m / a / v for Home / Devices / Logs / Settings / CVE / Monitoring / Audit / serVer.
Every fleet table — Devices, Services, CVE Findings, Containers, Monitor, TLS, Patches, Audit Log, Command History, Schedule, Maintenance, plus the admin tables — has a substring filter input and clickable column headers.
The Devices grid has four density modes: Minimal (table layout, one device per row, sortable), Compact, Comfortable (default), Spacious. Selection persists per user.
Filter / sort / density preferences are stored under users[username].ui_prefs server-side, so they follow you across browsers and devices. Sanitised: 256-char filter cap, 5-key sort cap, 16 KB total per user.
Fixed in v3.0.3. The fix lands as soon as the new service worker takes over — force one hard reload (Ctrl-Shift-R / Cmd-Shift-R) after upgrading. If it's still missing, the browser has cached the manifest; clear site data for the host and reload. If it's still missing after that, the install criteria aren't met (most often: not served over HTTPS, or the service worker registration failed — check the browser console).
Expected (v3.0.3). The account still has the must_change_password flag set — usually because nobody changed the default admin / remotepower password yet. Go to Settings → Account and set a new one; the flag clears and the rest of the app unlocks immediately.
Expected (v3.0.3). The corresponding env var (RP_SMTP_PASSWORD or RP_LDAP_BIND_PASSWORD) is set in the systemd unit / container env and is taking precedence over config.json. To revert to config-file storage, unset the env var, restart the server, and re-enter the password in the field.
The ^~ /api/ location is winning over the exact-match = /api/webterm/connect. Make sure the exact-match location appears in the file before the prefix-match.
nginx is correctly trying to reach the daemon but the daemon isn't running. Check:
sudo systemctl status remotepower-webterm
sudo journalctl -u remotepower-webterm -n 50
ss -tlnp | grep 8765
Daemon process crashed mid-session. journalctl -u remotepower-webterm will show the traceback. Common causes: SSH server unreachable, password wrong, idle longer than the daemon's keepalive.
Pre-v1.11.7 agents have a bug where command output never reached the server. Push agent self-updates: Update button in the toolbar, or per-device Agent update menu item. Once on v1.11.7+, the next upgrade you trigger will populate Update history within ~60s of completion.
Pre-v1.11.8 monitors only ran when the page was open. Upgrade to v1.11.8+. From there onwards, monitors run on schedule as long as any CGI request hits the server.
The agent needs to be v1.11.10+ to report per-mount data. Older agents only report a single root-disk percentage. Push an agent self-update.
Symptom of the v1.12.0 concurrent-write corruption (fixed in v1.12.1). One of your JSON files in /var/lib/remotepower/ got corrupted, load() returns {} on parse failure, the dashboard shows nothing and heartbeats can't validate device tokens.
v1.12.1+ adds automatic .bak fallback to load(), so this can't happen silently anymore. To clean up files damaged before the upgrade:
sudo -u www-data python3 packaging/recover-corrupted-json.py # dry-run scan
sudo -u www-data python3 packaging/recover-corrupted-json.py --apply # fix
The tool uses json.JSONDecoder.raw_decode() to find the first valid JSON document and discards trailing garbage, making a .broken-<timestamp> backup before overwriting.
Your CSP blocks cdn.jsdelivr.net. Either relax CSP for that origin, or self-host xterm.js — download @xterm/xterm@5.5.0/css/xterm.min.css, @xterm/xterm@5.5.0/lib/xterm.min.js, and @xterm/addon-fit@0.10.0/lib/addon-fit.min.js into /var/www/remotepower/static/, then edit the _loadXtermOnce() function in index.html to point there.