PharosVPN platform · DESIGN §§0–4, §7
Architecture.
One controller behind NAT. A fleet of dumb public nodes. An optional public relay. A mobile client whose profiles are end-to-end encrypted. Four trust boundaries, one CA, and a control plane that assumes it will be attacked.
§0 · what it is
A self-hostable, open-source, dual-protocol VPN fleet platform.
One codebase serves two postures from the same binaries:
- Personal — "I want my own VPN," one operator, a handful of nodes.
- Enterprise — a team managing many users across many regions.
Defaults differ; the engine is identical.
The data plane is AmneziaWG (obfuscated WireGuard) and XRay (VLESS + REALITY), both terminating end-user tunnels on UDP/TCP 443. The platform is the control plane, account system, and clients around that data plane.
§1 · goals
Five hard constraints the design defers to.
- Self-hostable in under 30 minutes. Clone, follow the README, get a working fleet. No lock-in beyond the chosen cloud provider.
- Defense-in-depth control plane. The controller issues credentials and rotates server config — the highest-value target. It assumes it will be attacked: no inbound ports, no public DNS, no public IP.
- Dumb nodes. A compromised VPN node must not yield control of the fleet. Nodes act only on cryptographically validated instructions.
- Operate with the controller offline. If the controller is down, every node keeps serving existing tunnels indefinitely. Control plane, not data-plane dependency. The same applies to clients: a client connects from cached profiles when the account service is unreachable.
- The controller never holds usable user secrets. User profiles are end-to-end encrypted; a controller compromise yields ciphertext, not profiles.
§2 · the three node roles + clients
A controller that dials out to everything.
Buoys are already public — they must be, to terminate tunnels — so initiates outbound mTLS to each of them. also dials out to a remote (reverse tunnel), so the controller needs zero inbound ports anywhere.
| Role | Network posture | Job |
|---|---|---|
Controllerhelm | Private, behind NAT. Zero inbound ports. | Source of truth, admin UI, issues certs/profiles, drives the fleet. |
VPN nodebuoy | Public IP. Listens udp/tcp 443 + mTLS control port. | Runs the data plane. Dumb agent — applies only validated config. |
Relaybeacon | Public. The only public ingress for clients. | mTLS-terminating proxy. Lets clients reach a NAT'd controller. Embedded in helm by default; optionally remote. |
Mobile clientcaravel | End-user device. | Runs the actual VPN tunnel + acquires profiles from multiple sources. |
§3 · component responsibilities
What each piece is on the hook for.
helm
- Source of truth. SQLite holding fleet inventory, profiles, users, devices, peers, admins, sessions, the CA, audit log, metrics samples.
- Admin Web UI — SvelteKit SPA embedded in the binary, served on localhost.
- Outbound control loop — holds a long-lived mTLS/gRPC connection to each
buoy; pushes config, pushes/revokes peers, and receives a live event stream. - Node onboarding over SSH — installs and updates the
buoyagent on operator-provided VMs; all node control is gRPC. - Issues node certs, the controller's own client cert, relay certs, and per-user/device certs.
- Account & sync service — authenticates users/admins, serves E2E-encrypted profile bundles. This surface is reached only via a
beacon.
buoy
- Stateless except for what
helmgave it. All config is written to disk only afterhelmpushes it over mTLS. - Data plane:
awg-quick@awg0on UDP 443,xray.serviceon TCP 443. - Control port (mTLS-only, gRPC). Status, metrics, push config, live peer add/remove, handshake stats, service restart — and a server-stream of live events back to
helm. - SSH is install-only. Every operational instruction is gRPC.
- Cold-start resilient. Comes up from disk every boot. Controller offline ⇒ existing peers keep working.
beacon
- Stateless public proxy. Terminates client mTLS, forwards gRPC streams to
helm. No database; every lookup is delegated. - Strips spoofable client metadata; injects exactly one trusted value — the verified device fingerprint.
- Two transports to
helm: embedded (in-process) or remote reverse tunnel. - Carries only ciphertext profile bundles — see §8 — so a compromised remote beacon host cannot read user profiles.
caravel
- Two decoupled layers: a VPN engine (multi-node, multi-protocol) and a set of pluggable profile sources.
- Posture-aware: personal (account login, QR, file import; admin section if the logged-in account is an admin) vs managed (MDM config present — account login and admin hidden, profiles locked). One app, one store listing.
§4 · trust model & PKI
One root, two intermediates, no third party.
A single in-repo root CA, generated on
helm's first run, stored in helm's SQLite, never
copied off the controller. Two intermediates under it:
- Fleet CA — issues
buoynode certs, the controller's client cert, andbeaconrelay certs. - Device CA — issues per-user / per-device leaf certs for
caraveland the admin browser.
| Certificate | Issued by | Held by | Validity |
|---|---|---|---|
| Root CA | self-signed | helm only | 10 years |
| Fleet / Device intermediates | Root CA | helm only | 5 years |
| Controller client cert | Fleet CA | helm | 1 year, auto-rotated |
| Node server cert | Fleet CA | each buoy | 1 year, auto-rotated |
| Relay cert | Fleet CA | each beacon | 1 year, auto-rotated |
| Device leaf | Device CA | each caravel / browser | 1 year |
Compromise containment
- Compromised
buoy→ attacker gets that node's key + the CA cert (not key). Cannot impersonatehelmor other nodes. Operator revokes the node cert. - Compromised remote
beacon→ attacker can see traffic metadata but profile bundles are E2E-encrypted ciphertext. Cannot mint certs. - Compromised
helm→ attacker gets the CA key. Fleet fully compromised — but user profiles remain encrypted (the controller never holds users' private keys in usable form). Hence helm's "no inbound ports, behind NAT" posture.
§7 · real-time & multi-admin
The admin UI feels live, multi-admin safe.
A client connecting to a node appears immediately, not on a thirty-second poll.
-
buoy→helm:helmholds its outbound mTLS connection open and the buoy streams events (handshake up/down, peer connect/disconnect, errors) over a gRPC server-stream. Polling remains only as a fallback heartbeat. -
helm→ browser: every open admin page holds a WebSocket.helmpushes state changes to all of them — open the dashboard on three machines, all three update together.
Optimistic concurrency
Every mutable record carries a version integer. A
mutation must send the version the admin loaded. If
helm's current version is higher, it rejects with
HTTP 409 Conflict — "changed by someone else,
reload." Live WebSocket replication makes conflicts rare; the
version check is the hard safety net.