RemotePower Manual

Version 3.13.0 — the long-form reference: architecture, agent and heartbeat, commands, monitoring, CVE scanning, drift detection, ACME, mitigation runners, IaC generator, Proxmox, the MCP server (now with write tools), inbound webhooks, OIDC SSO, alert inbox, channel routing, RAG over your infrastructure, the per-device timeline, fleet health score, posture reports, and the security model.

How this doc fits together. The in-app Documentation page (sidebar → Help → Documentation) covers common tasks with substring search across topics; this Manual.html is the long-form reference; the dedicated docs/<feature>.md files are the source of truth for each subsystem. Where a section here is intentionally brief, follow the link to the matching docs/*.md.

What's new in v3.13.x / v3.12.x / v3.11.x / v3.10.x / v3.9.x (older: see CHANGELOG.md):

v3.13.0 (bind it together, round four): A sweep that surfaces host signals the agent already collected but the UI never showed, caps overflowing panels, and hardens performance and security. Newly surfaced in the device drawer / cards: Access — recent logins (per-user table of recent logins and distinct source IPs, the data the login_new_source alert fires off); Scheduled jobs / timers (failed-first systemd timer inventory); Pools / arrays (this host's own ZFS/mdadm/btrfs storage & RAID health, previously fleet-page only); the listening-ports card gains a bind address column and a world / LAN / local scope badge; the firewall card shows the active backend, rule count and fingerprint (the drift baseline the firewall_changed alert compares against); active brute-force lockouts show as a device-card badge; and the drawer adds Disk/Swap pressure pills. Every box fits: drawer cards and large page tables (Compliance, Pools, Ports, Mounts, Containers, SMART) cap at ~15 rows and scroll internally; two latent clip bugs (host-config dump, patch history) are fixed. Performance: version-busted static assets are cached immutably for a year (no more per-load 304 revalidations), front-end scripts load defer, and _compute_fleet_risk() is file-cached for 10s so /api/home and /api/risk share the work. Hardening: agent-supplied SCAP reports are served under a self-contained sandboxed CSP (default-src 'none'; … sandbox;) so stored XSS can't reach an operator's session even if the upstream CSP is loosened; OIDC id_tokens are checked for expiry, issuer and audience; and the syslog audit-forwarder resolves its target once to close a DNS-rebinding window. Service-worker cache remotepower-shell-v3.13.0. See docs/v3.13.0.md.
v3.12.0 (pluggable storage backend): An optional embedded SQLite backend alongside the default flat-JSON store, switchable in Settings → Advanced → Storage backend. Flat JSON rewrites a whole file on every write — on a busy fleet devices.json is rewritten on every heartbeat; SQLite (WAL mode, stdlib sqlite3, no new dependencies) stores hot, high-cardinality data row-per-entity (devices, alerts, cmd_output, metrics, and the history/fleet_events/metrics_history logs) so a device update writes a single row and log appends are O(1). The new backend sits behind the existing storage helpers, so every feature behaves identically — only the on-disk representation changes. Switching is in-place and reversible: the migration takes a rollback snapshot, migrates every file, verifies a full round-trip, and only then flips the active backend (a Preview dry-run shows what would move). Endpoints GET /api/storage-backend/status and POST /api/storage-backend/migrate; CLI tools/migrate_storage.py --to sqlite|json (--dry-run/--verify-only). Flat JSON stays the default; existing installs are unaffected until you opt in. On a network filesystem (NFS/CIFS) WAL is unsafe, so a rollback journal is used automatically — a local disk is recommended. The DB self-maintains (hourly WAL checkpoint, weekly VACUUM + integrity_check; size + last integrity on Server Status, a failed check raises a critical db_integrity_failed alert). Quieter posture alerts: the listening-port & firewall audit (new_port_detected / port_exposed_world / firewall_changed) is now off by default (Settings → Security) because it is noisy on Docker hosts; you can instead keep it on and mute a specific process (e.g. docker-proxy) from the Exposure page or the managed list in Settings — muting resolves matching open alerts, and the Exposure page shows a banner when alerting is off but world-reachable services exist. My Account: a top-right account menu (avatar → My Account / Sign out) and a dedicated page consolidating per-user settings — profile picture (downscaled in-browser, stored under DATA_DIR/avatars/), your role & permissions, 2FA and default SSH username (moved here from Settings → Security), and your acknowledged alerts (GET /api/me, /api/me/avatar). Acknowledge → ticket webhook: each webhook destination gains an “also fire on alert ACK” option (webhook_urls[].on_ack) that POSTs the full alert record to that destination when an operator acknowledges it — point a generic/GitHub-issue/PagerDuty destination at a ticket system to open a ticket on ack. Service-worker cache remotepower-shell-v3.12.0. See docs/v3.12.0.md.
v3.11.0 (fleet posture batch): Seven features that turn already-collected (or cheaply-collectable) agent data into first-class security and operational signals — no new daemons or dependencies. Exposure (attack surface): the agent now keeps each listening socket's bind address (previously discarded) and classifies an exposure scope — local / lan / world; a service first binding to a world-reachable address raises port_exposed_world, and a new Exposure page lists every socket with a World/LAN/Local filter. Fleet Software Policy: banned / required / min version rules (optionally tag-scoped) evaluated against the installed-package inventory every host already pushes; software_policy_violation fires edge-triggered, with a policy editor + violations table. Storage / RAID health: a new probe reports ZFS / mdadm / btrfs pool state, capacity and last-scrub — storage_degraded / storage_recovered (auto-resolving) and scrub_overdue, on a new Storage page. Access watch: successful logins and their source IPs are collected; login_new_source fires on a first-seen source. Host firewall drift: a stable fingerprint of the active ufw/nftables/iptables ruleset rides the heartbeat; firewall_changed fires on divergence. Scheduled-job failures: systemd timers are inventoried; timer_failed fires when a timer's backing job fails. Posture digest: an opt-in daily/weekly email summarising offline hosts, pending updates, critical CVEs, policy violations and degraded storage, over the existing SMTP path. All detections are edge-triggered with per-device state. Service-worker cache remotepower-shell-v3.11.0. See docs/v3.11.0.md.
v3.10.0 (bind-it-together & security, round three): A third consolidation sweep — agent data that was collected but stuck at zero now flows through, two real SSRF / secret-disclosure gaps are closed, and a couple of alert-label bugs are fixed. No new headline features. Security: the container image-registry check was the one outbound path not behind the connect-time SSRF guard — it followed redirects, re-resolved DNS between the pre-flight and the fetch, and fetched the registry-controlled bearer-token realm URL (used to mint the auth token) with no check at all, which could exfiltrate configured registry credentials; every fetch — manifest and token realm — now routes through the SSRF-safe opener (peer-IP re-validation, redirects refused) and the realm is pre-flighted and forced to HTTPS. GET /api/config gained a recursive secret-scrub backstop — it redacted known secrets by name, so the AI provider api_key and the per-registry credentials map leaked to any viewer or read-only MCP key; a recursive pass now strips any secret-named field at any depth while keeping every *_set / *_from_env indicator. The TCP uptime monitor and the Healthchecks.io ping picked up the same IP-class SSRF checks (and connect-time peer recheck) the HTTP paths already had, so a TCP monitor can't be turned into a blind internal port scanner. See docs/security-review-3.10.0.md. Bind it together: Docker/Podman containers reported restart_count / started_at / uptime_seconds hardcoded to zero, so the container-restarting alert only ever fired for Kubernetes pods and the drawer's container age was blank — the agent now fills them from a single batched docker inspect per heartbeat, so the alert fires fleet-wide and age renders; ClamAV last-scan time (parsed from the scan summary) and per-interface MAC addresses now show in the device drawer. Fixes: the config-drift alert title read a field the drift events never send, so every one said “? file(s)” — it now names the file that changed or the number of sections that drifted; the Devices table view showed a sort arrow on the Hostname column but never reordered (missing sort key) — fixed. Service-worker cache remotepower-shell-v3.10.0. See docs/v3.10.0.md.
v3.9.0 (bind-it-together & hardening, round two): A second consolidation sweep on top of v3.8.0 — no new headline features, but more dropped agent data wired into the UI, a few correctness bugs fixed, an SSRF gap closed, and front-end polish. Security: the HTTP uptime-monitor check used a literal string-prefix blocklist and a bare fetch, bypassable with an IPv6 loopback ([::1]), an integer/octal/hex-encoded IPv4, or a hostname that rebinds to a metadata/loopback address after the pre-flight check — it now routes through the same connect-time SSRF guard as the webhook/audit/OIDC channels (the connected peer IP is re-classified, redirects refused, the shared IP classifier replaces the string list); RFC1918 LAN targets stay allowed by design and cloud-metadata/link-local is always blocked. Inbound-webhook alert links are scheme-validated (http(s) only) at ingest. See docs/security-review-3.9.0.md. Fixes: the post-upgrade “Patched — didn't take” badge no longer false-alarms on hosts that had nothing to patch (a fleet-wide upgrade touches already-patched hosts) or on offline hosts whose command is still queued; a stray return in the metric-threshold engine could skip per-mount disk alerting for a heartbeat when CPU load eased back through the recovery band; TLS-expiry alerts now use the correct severity and title (they read a field the event never sends, so every one was labelled high / “expires in ?d”). Bind it together: CPU-load history is now plotted on the Trends page (load ÷ cores, as a saturation %) and swap joins the per-device metrics sparkline — both were collected but never charted; rkhunter last-run time shows on the AV pill; the systemd alias a watched unit resolved to (e.g. mysql.service→mariadb.service) shows in the Services table; livepatch state shows on the kernel pill when no patch is applied. Polish: three tables gained their missing sort wiring (Log Alert per-device & fleet-wide rules, Maintenance suppression log); typographic button glyphs were replaced with Lucide SVG icons; icon-only close buttons gained aria-labels. Image Updates: a one-click Update button on stale, compose-managed rows runs docker compose pull + up -d to fetch the new image and recreate the container, and the agent recovers the real image name when docker ps shows a bare untagged ID after a pull. Command Queue: ACME certificate actions now show in the recently-dispatched log, with Clear all pending / Clear log controls. Service-worker cache remotepower-shell-v3.9.0. See docs/v3.9.0.md.
Older releases (v3.8.0 and earlier): see CHANGELOG.md at the repository root for the complete release history, newest first.
v2.9.0: Device Drawer — full-screen slide-in panel with Actions, Settings, and Audit tabs.
v2.8.0–v2.8.1: brute-force detection (SSH + web), SSH-key-added audit, new-listening-port audit, backup-file-age monitoring.
v2.0: demo / read-only mode (RP_READ_ONLY=1), multi-doc CMDB.

1. What it is, what it isn't

RemotePower is a self-hosted fleet manager for Linux (and Windows, with limits). Each managed host runs an agent that polls the central server every 60 s — outbound HTTPS only, no inbound ports. The dashboard runs commands, monitors patches and services and CVEs and containers, alerts on metric thresholds, opens browser-based SSH terminals, and (from v3.0.0 onward) generates Terraform / Ansible / Pulumi / cloud-init from live host inventory.

What it isn't: a configuration-management system. It runs ad-hoc commands and reports back; it doesn't enforce desired state. Pair with Ansible / Puppet / Salt for that. The IaC Generator (§18) bridges the two — it produces IaC from what RemotePower sees, so you can hand off to a CM tool with a real starting point.

2. Install & deploy

2.1 Server

After deploy, browse to https://your-host/ and log in with the default admin password (printed on first run). Change it immediately under Settings.

2.2 Web terminal daemon (optional, v1.11.11+)

The script auto-detects your CGI user, installs Python dependencies via apt/dnf/pacman/apk/zypper, creates the rp-webterm daemon user, generates the daemon ↔ CGI shared secret, installs the systemd unit, and prints the nginx snippet you need to add. Run with --dry-run first if you want to see what it'll do.

Then add the printed snippet to your nginx server block above any catch-all location ^~ /api/ rule:

And ensure your http { … } block has the upgrade map (only needed once globally):

2.3 Agent

3. Device enrollment

3.1 Interactive PIN

Click Enroll device in the dashboard. You get a 6-digit PIN, valid for 10 minutes. Run sudo remotepower-agent enroll on the target machine, paste the PIN. Done.

3.2 API tokens (v1.11.10+)

Tokens are one-time use and expire after the specified duration (default 24h, max 7 days). The full token is shown ONCE in the create response; the list endpoint only returns 8-character prefixes.

On the target machine, the agent reads the token from one of three sources, in order:

If the token has default_group or default_tags set, those are applied to the device at enrollment.

4. The agent & heartbeat

The agent polls the server every poll_interval seconds (default 60). On each heartbeat it sends:

For long-running commands (apt upgrade can take minutes), the agent runs the command synchronously, then sends a separate follow-up heartbeat with just the output. Output appears under the device's "Update history" panel.

5. Commands & the queue

Commands are queued in /var/lib/remotepower/cmds.json per device. Only one command per device is in flight at a time; the agent acknowledges by sending the output back, which clears the queue slot.

6. External monitors (ping/TCP/HTTP)

The Monitoring page → Targets section lets you configure ICMP / TCP / HTTP probes that the server runs against external targets — not the same as the agent-side metrics or service watches. Useful for:

From v1.11.8 monitors run on a periodic schedule (monitor_interval, default 300s, minimum 60s) — not just when the page is open. The schedule is piggy-backed on incoming CGI requests, so as long as any agent is heartbeating or anyone's browsing the dashboard, monitors run on time. A truly idle server (no agents, no users) won't run them, but in practice that doesn't happen.

7. Device metrics & alert thresholds

7.1 What's measured

7.2 Default thresholds

Metric	What it is
Memory %	Virtual memory in use
Swap %	Swap in use (0 if no swap)
Disk % per mount	Each non-pseudo mount (skipping tmpfs/squashfs/overlay)
CPU load ratio	1-minute load average ÷ logical CPU count

Metric	Warning	Critical
Disk usage (per mount)	80%	90%
Memory usage	85%	95%
Swap usage	20%	50%
CPU load ratio	1.5×	3.0×

Hysteresis: a metric must drop 5 points below its warn threshold before metric_recovered fires. Without this, a metric oscillating around 80% would generate a webhook every ~60s.

7.3 Per-device + per-mount overrides (v1.12.0 UI)

Validation: warn must be less than crit for every kind. Percentages are 1–99 (0 and 100 don't make sense as alerts). Load ratios are 0.1–100. Out-of-range values are rejected rather than silently clamped.

Setting thresholds clears the device's metric_state, so the next heartbeat re-evaluates under the new values. Otherwise a metric currently at "warning" would silently stay there even after you raised the threshold.

7.4 Live metrics on the Monitor page (v1.12.0)

The Monitor page now has a Device metrics section below the external probes. It shows every device's current memory/swap/CPU/disk numbers, color-coded by alert level (green/amber/red), with clickable thresholds buttons that jump straight to the editor.

Filter by name, group, tag, or mount path. Sort by any column. Critical-state devices sort to the top when sorting by status ascending.

7.5 Direct API access

8. Web terminal (v1.11.11+)

8.1 Architecture

RemotePower's CGI model can't hold persistent WebSocket connections — fcgiwrap is request-response. So the web terminal is a separate remotepower-webterm daemon listening on 127.0.0.1:8765, with nginx proxying /api/webterm/connect to it. The CGI handles auth and audit logging via /api/webterm/auth and /api/webterm/audit; the daemon handles the actual SSH proxy.

8.2 Auth flow

8.3 Security model

8.4 Session recording

Every session is recorded to /var/lib/remotepower/webterm-sessions/<session_id>.cast in asciinema v2 format. Replayable in any asciinema player; also greppable as plain text JSON Lines.

Output-only by default. Keystrokes are excluded because they could include sudo SECRET_VALUE and similar. Set RECORD_INPUT=1 in the daemon's environment if you have compliance reasons that require keystroke capture; only do this if you've thought through who can read the recordings directory.

10 MiB cap per recording — at the cap we stop recording but keep proxying bytes. To replay:

9. Webhooks & notification destinations

RemotePower fires webhooks for fleet events. As of v3.0.2 you can configure multiple destinations, each with its own format and filter, in Settings → Notifications. The legacy single webhook_url field is still honoured for backward compatibility but the multi-destination array is the recommended shape.

9.1 Destinations and formats

Each destination is self-contained: URL, format adapter, optional event allowlist, optional minimum priority. Supported formats:

Per-destination filters let you do things like "Pushover only for critical-and-above" while another destination gets everything. Format is auto-detected from the URL when not explicitly set.

9.2 Events

9.3 Per-destination test, secrets, and unmonitored-device suppression

Event	Default	Trigger
`device_offline`	on	No heartbeat in 2× poll_interval
`device_online`	on	Heartbeat resumed after offline
`monitor_down`	on	External monitor target down
`monitor_up`	on	External monitor target recovered
`service_down`	on	Watched systemd unit went inactive
`service_up`	on	Watched systemd unit recovered
`patch_alert`	on	Pending updates exceed configured threshold
`cve_found`	on	OSV scan turned up new CVEs
`log_alert`	on	Watched log pattern matched (systemd unit or file path)
`container_stopped`	on	Container/pod disappeared or stopped
`container_restarting`	on	Container restart count climbed
`containers_stale`	on	No container report for >TTL
`metric_warning`	on	Resource crossed warning threshold
`metric_critical`	on	Resource crossed critical threshold
`metric_recovered`	on	Resource dropped below warn − buffer
`drift_detected`	on	Watched config file changed against its baseline
`mailbox_threshold`	on	Mailbox file count crossed its alert threshold
`proxmox_action`	on	VM/LXC start, shutdown or snapshot via the Virtualization page
`snapshot_old` (v2.7.0)	on	Proxmox snapshot first crosses age threshold
`tls_expiry` (v2.6.1)	on	Cert within 30d (warning) or 7d / expired (critical)
`reboot_required` (v2.6.1)	on	`/run/reboot-required` appears on a host
`ssh_key_added` (v2.8.0)	on	New `authorized_keys` entry on a host (edge-triggered)
`new_port_detected` (v2.8.0)	on	New listening port on a host
`brute_force_detected` (v2.8.0)	on	10+ SSH or web login failures from one IP in 5 minutes
`backup_stale` (v2.8.1)	on	Configured backup path exceeded its max age
`custom_script_fail`	on	Custom monitoring script returned non-zero
`custom_script_recover`	on	Custom monitoring script returned 0 after a failure
`command_queued`	off	Command sent to a device
`command_executed`	off	Command completed on a device

Each destination has a Test button that fires a synthetic event using that destination's config only, without touching others. Pushover credentials are stored write-once and never echoed back from /api/config; they are also excluded from the backup export. Any event carrying a device_id is suppressed for devices marked unmonitored — operators silencing a host during a migration won't get pinged about its disk usage anyway.

10. CVE scanning

Every agent reports its installed-package inventory (dpkg-query / rpm / pacman / apk). The server cross-checks that inventory against the OSV.dev vulnerability database on a schedule and records the findings per device.

The CVEs page lists findings fleet-wide, ranked by severity. Severity is computed with the real CVSS v3.1 formula, and each finding records where its score came from. Debian Security Tracker urgency — a patching-priority hint, not a CVSS severity — is capped at medium and never promoted to high/critical.

Findings drill down per device with fixed-version hints. A per-CVE ignore list covers accepted risk: ignore on one device or fleet-wide, with a reason. Ignored findings drop out of the Needs Attention digest. New findings fire the cve_found webhook.

Use Scan packages now in a device's action menu to force a fresh inventory upload on the next heartbeat — handy right after patching a host, rather than waiting for the periodic scan.

11. Configuration drift detection

RemotePower hashes a watch-list of critical config files on every host — sshd_config, sudoers, and similar — on each heartbeat, and flags any change against an accepted baseline.

For a drifted file you can view a diff, accept the new content as the baseline, or mark the difference as ignored (per file) so it stops raising a red status. Watched-file lists are pushed to the agent in the heartbeat. Drift surfaces on the dashboard and in the fleet event log, and fires the drift_detected webhook.

Full reference — the watched-file list, customising it, re-baselining, the compliance angle: drift.md.

12. Proxmox virtualization

Connect one Proxmox VE node under Settings → Proxmox using a scoped API token (use a scoped token, not a full-access one). This is a server-to-API integration — no agent runs on the Proxmox node itself.

The Virtualization page lists the node's QEMU virtual machines with start / shutdown actions and a search box that filters by name or VMID. LXC containers appear on the Containers page with the same actions.

Every Proxmox guest — QEMU and LXC — has a Snapshots panel: create, list, roll back and delete. Snapshots are disk-only (no RAM state). Rollback is destructive and requires typing the guest name to confirm. Guest actions fire the proxmox_action webhook.

Create LXC containers (v3.4.0). The Containers page → LXC section has a Create container button that opens a wizard. It pulls live options from the Proxmox API — OS templates, root-disk storages, network bridges (Linux and Open vSwitch), and the next free VMID — and creates an unprivileged container in one validated request: hostname, template, disk size, cores, memory, swap, network (DHCP or a static CIDR + gateway), and a root password and/or SSH key, with start-now and start-on-boot toggles. Every field is validated server-side before the API call; the action is admin-only and audited, and the root password is passed straight to Proxmox — never logged or stored.

Delete LXC containers (v3.4.0). Each LXC card has a Delete button that opens a type-to-confirm dialog (you type the container's name, or its VMID if unnamed; the destructive button stays disabled until it matches exactly). Deleting is admin-only and audited: a running container is force-stopped and polled until it is down (bounded wait) before the container is removed. No purge — backup and replication entries are left intact. This permanently deletes the container and its disk and cannot be undone.

13. MCP server

RemotePower ships an MCP (Model Context Protocol) server that lets an MCP-capable AI client — Claude Desktop, for example — query fleet state through a set of defined read-only tools: device lists, status, patches, CVEs, drift and more.

It is read-oriented by design: the MCP tools report state, they do not queue commands or change configuration. Setup, the Claude Desktop config block, the full tool list and the security model are in mcp.md.

14. Fleet health: Needs Attention & status endpoint

The Home dashboard's Needs Attention panel is a single ranked list, computed server-side, that merges every fleet-wide signal — offline devices, critical/high CVEs, configuration drift, pending-patch pile-ups and mailbox threshold breaches — into one prioritised view. Unmonitored devices are excluded, the same gate the alert pipeline uses.

The mailbox monitor is a lightweight count-only check with no IMAP/SMTP: configure a device with one or more directory paths and the agent counts the regular files in each (for a Maildir new/ folder that is the unread-message count). A monitor can carry an alert threshold; crossing it fires the mailbox_threshold webhook, edge-triggered — it fires once on the crossing and re-arms when the count drops back below.

/api/status is a machine-readable fleet summary for external dashboards — Uptime Kuma, Homepage, Grafana. It is authenticated by a dedicated status token (generated in Settings), not a login session, so a monitoring tool can poll it without the endpoint being public. It returns a rolled-up health word, device online/offline counts and attention counts by severity.

15. ACME / TLS / DNS expiry

16. Audits: brute-force, SSH keys, listening ports, backup age

The agent does four edge-triggered audits per heartbeat. All four fire only on the change; baseline state is silent.

17. Mitigation runners

Every Needs Attention card whose alert has a deterministic remediation gets a Mitigate button (v3.0.1+). Clicking it asks the configured AI provider to generate a playbook for the specific finding, with operator sign-off required before any step runs. Each run is logged to mitigate_logs/<run_id>.json for audit. This is opt-in and disabled unless AI assistant is configured. See mitigation.md.

18. IaC Generator

The IaC page (v3.0.0+) generates Infrastructure-as-Code for any device on demand. Pick the device, pick categories (18 total: OS & identity, packages, systemd, users, groups, SSH keys, networking, fstab, containers, repos, firewall, cron, TLS paths, env, snaps, kernel modules, sysctl, RemotePower-specific) and the output format (Terraform HCL, Ansible YAML, Pulumi Python, Pulumi TypeScript, cloud-init YAML). The server flags the device, the agent runs the relevant collectors on the next heartbeat, and the configured AI provider renders the result.

19. Server self-monitoring & scheduled backup

The Server status sidebar entry (v3.0.2+) reports the things you want to know about RemotePower itself — server version and memory, DATA_DIR disk usage with the top 20 largest files, fleet-wide device freshness, webhook delivery rate (24 h and 7 d), audit log entry count plus archive size, and scheduled backup state. GET /api/self/status is the machine-readable form for external monitoring.

A daily gzipped tarball of /var/lib/remotepower/ runs from the heartbeat hook on a 24-hour sentinel with stale-lock recovery. Retention defaults to 14 days; output path defaults to /var/lib/remotepower/backups/. The backup excludes the backups directory itself, in-flight .tmp.* files, and existing .gz archives. Owner / group are stripped so restoring on a different host doesn't fight over UIDs. A manual Run backup now button on the Server status page hits POST /api/self/backup-now.

The audit log itself has age-based retention (default 90 days, configurable). Entries older than retention are moved to audit_log_archive.jsonl.gz rather than dropped, and the rolling 200-event fleet log evicts to fleet_events_archive.jsonl.gz on the same principle.

20. Security & authentication

The full posture, threat model and operator hardening checklist live in security.md. This section covers what an operator needs to act on or set.

20.1 Authentication

20.2 Login lockout ladder (v3.0.2)

Failed logins escalate per consecutive episode: 10 s → 1 min → 5 min → 30 min → 2 h. Resets on the next successful login. A missing-user verify path runs a dummy hash so the timing doesn't enumerate accounts.

20.3 Forced password change (v3.0.3)

Fresh installs seed admin / remotepower with the must_change_password flag set. Every API call returns 403 until that password is changed; only POST /api/users/passwd and GET /api/public-info are reachable through the gate. The dashboard catches the 403, surfaces a clear toast and routes the user to Settings → Account; the flag clears the moment the new password is saved.

The gate is per-account — once cleared, never enforced again. API keys are unaffected (a default-password admin can't create them anyway, since the create-key endpoint is in the blocked set).

20.4 Env-var secrets (v3.0.3)

Three secret fields can now live in the environment instead of config.json — env wins over config, the secret stays out of the data directory, and it is excluded from the backup export:

When set, the relevant input under Settings is disabled and a green hint reads "✓ Password is currently being read from RP_…". To finish migrating, clear the field once and Save — the plaintext drops from config.json and from the backup export.

20.5 CSRF / X-Token / same-origin

Session tokens travel in the custom X-Token header rather than a cookie. Browsers require a CORS preflight for any cross-origin request that carries a custom header, and RemotePower serves no permissive Access-Control-Allow-Origin, so classic CSRF cannot apply. Defence-in-depth: every POST/PUT/PATCH/DELETE additionally passes an Origin / Referer same-origin check before route dispatch (v3.0.2). CLI / agent / API-key clients send no Origin and are unaffected.

20.6 Agent hardening

Agent state files in /var/lib/remotepower/ (mode 0700) use O_NOFOLLOW on every read and write to defeat symlink attacks from local non-root users; the enrolment credentials file is written with O_WRONLY|O_CREAT|O_EXCL|O_NOFOLLOW at 0600 atomically. Server-pushed log-watch paths pass through a deny list (/etc/shadow, /root/.ssh/, /proc/, /sys/, /dev/, …) with realpath() resolution so symlinks cannot bypass.

21. Dashboard UI

21.1 Sidebar navigation groups (v3.0.7 / v3.1.0)

The sidebar is organised into six collapsible groups. Each group's open/closed state persists in localStorage. All groups except Admin are expanded by default. Home and Links are standalone items above the groups, always visible.

Group	Pages / Sub-items
Fleet	Devices, CMDB, Agent Containers, Proxmox LXC, Virtualization, Network
Monitoring	Targets · Device Metrics · Listening Ports · Custom Scripts
Security	TLS/DNS Expiry · ACME Certificates · Patches · CVEs · Drift · Audit
Planning	Schedule, Calendar, Tasks, Maintenance, History
Admin	Settings, Users, API Keys, Library, Scripts, IaC Generator, Server Status
Help	Documentation, AI Assistant, API Reference, About

Sub-items (Monitoring, Containers under Fleet, TLS under Security) use focused views (v3.1.0): clicking one shows only that panel. Keyboard shortcuts and the command palette navigate to the top-level page and show all panels.

showMonitorSection(sectionId, btn) — navigates to Monitoring and hides all panels except the requested one. showContainerSection(panelId, btn) and showTLSSection(panelId, btn) follow the same pattern.

21.3 Device Drawer (v2.9.0)

Clicking a device name or the ⋮ button on Devices / Dashboard opens a full-screen slide-in drawer with two tabs. Actions & Settings has the quick-action grid (run command, reboot, shutdown, WoL, upgrade packages, scan, web terminal, run script, update agent, docker compose, host config, CMDB, runbook, maintenance, adjust poll, delete) plus inline-editable device settings (group, tags, icon, monitored, poll interval, watched services, log rules, drift files, command allowlist) — all behind a single Save button. Audit has 11 collapsible sections, lazy-loaded on first open: system info, listening ports, packages, logs (filterable by unit), command history, fleet events, drift state, CVE summary, containers, metrics, host config.

21.4 Command palette (v3.0.2)

/ or Ctrl-K opens the global palette. Indexes pages, devices and quick actions. Arrow keys to navigate, Enter to activate, Esc to close. ? opens the keyboard shortcut cheat sheet. g-prefix navigation: g h / d / l / s / c / m / a / v for Home / Devices / Logs / Settings / CVE / Monitoring / Audit / serVer.

21.5 Filter, sort, density

Every fleet table — Devices, Services, CVE Findings, Containers, Monitor, TLS, Patches, Audit Log, Command History, Schedule, Maintenance, plus the admin tables — has a substring filter input and clickable column headers.

21.6 Density

The Devices grid has four density modes: Minimal (table layout, one device per row, sortable), Compact, Comfortable (default), Spacious. Selection persists per user.

Filter / sort / density preferences are stored under users[username].ui_prefs server-side, so they follow you across browsers and devices. Sanitised: 256-char filter cap, 5-key sort cap, 16 KB total per user.

22. Troubleshooting

"Install RemotePower" button missing in Chrome / Brave

Fixed in v3.0.3. The fix lands as soon as the new service worker takes over — force one hard reload (Ctrl-Shift-R / Cmd-Shift-R) after upgrading. If it's still missing, the browser has cached the manifest; clear site data for the host and reload. If it's still missing after that, the install criteria aren't met (most often: not served over HTTPS, or the service worker registration failed — check the browser console).

"Password change required" 403 on every endpoint

Expected (v3.0.3). The account still has the must_change_password flag set — usually because nobody changed the default admin / remotepower password yet. Go to Settings → Account and set a new one; the flag clears and the rest of the app unlocks immediately.

SMTP / LDAP password field is disabled and shows "set via RP_..." placeholder

Expected (v3.0.3). The corresponding env var (RP_SMTP_PASSWORD or RP_LDAP_BIND_PASSWORD) is set in the systemd unit / container env and is taking precedence over config.json. To revert to config-file storage, unset the env var, restart the server, and re-enter the password in the field.

WebSocket fails with 404

The ^~ /api/ location is winning over the exact-match = /api/webterm/connect. Make sure the exact-match location appears in the file before the prefix-match.

WebSocket fails with 502 Bad Gateway

nginx is correctly trying to reach the daemon but the daemon isn't running. Check:

WebSocket closes immediately with 1006

Daemon process crashed mid-session. journalctl -u remotepower-webterm will show the traceback. Common causes: SSH server unreachable, password wrong, idle longer than the daemon's keepalive.

Update history is empty

Pre-v1.11.7 agents have a bug where command output never reached the server. Push agent self-updates: Update button in the toolbar, or per-device Agent update menu item. Once on v1.11.7+, the next upgrade you trigger will populate Update history within ~60s of completion.

Monitor history has gaps

Pre-v1.11.8 monitors only ran when the page was open. Upgrade to v1.11.8+. From there onwards, monitors run on schedule as long as any CGI request hits the server.

Per-mount disk thresholds aren't taking effect

The agent needs to be v1.11.10+ to report per-mount data. Older agents only report a single root-disk percentage. Push an agent self-update.

Devices vanished from the dashboard, agents getting "Credentials rejected"

Symptom of the v1.12.0 concurrent-write corruption (fixed in v1.12.1). One of your JSON files in /var/lib/remotepower/ got corrupted, load() returns {} on parse failure, the dashboard shows nothing and heartbeats can't validate device tokens.

v1.12.1+ adds automatic .bak fallback to load(), so this can't happen silently anymore. To clean up files damaged before the upgrade:

The tool uses json.JSONDecoder.raw_decode() to find the first valid JSON document and discards trailing garbage, making a .broken-<timestamp> backup before overwriting.

Web terminal modal says "Could not load xterm.js"

Your CSP blocks cdn.jsdelivr.net. Either relax CSP for that origin, or self-host xterm.js — download @xterm/xterm@5.5.0/css/xterm.min.css, @xterm/xterm@5.5.0/lib/xterm.min.js, and @xterm/addon-fit@0.10.0/lib/addon-fit.min.js into /var/www/remotepower/static/, then edit the _loadXtermOnce() function in index.html to point there.