# Architecture

See Glossary for definitions of gateway, endpoint, credential, rule, profile, plugin, runtime, and the rest of the vocabulary used below.

# Overview — actors

Five actors take part in a clawpatrol deployment:

Agent. The AI client the operator wants to gate (Claude, Codex, …). The agent runs as an ordinary process on the operator’s workstation and dials upstream hostnames directly; it has no awareness that clawpatrol is in the path. clawpatrol also covers the non-AI CLIs the agent shells out to (the GitHub CLI, kubectl, psql, ssh, …): those aren’t agents themselves but tools the agent uses, and the gateway applies the same policy gates to whichever flows the agent kicks off through them.
Device. The machine the agent runs on. The device hosts a small clawpatrol client (CLI binary on Linux; system extension inside Clawpatrol.app on macOS) that captures the agent’s outbound flows and feeds them into the tunnel.
Tunnel. A WireGuard underlay between the device and the gateway. The tunnel carries L3 packets — every byte the agent emits travels inside it. The agent never sees a proxy URL or a CA bundle.
Gateway. The clawpatrol process. A single Go binary that terminates the tunnel, decides per flow whether to intercept or pass through, and runs the policy plugins that inject real credentials, gate requests, and emit events. The diagram below draws the gateway on its own machine — typically a small VM the operator controls — to keep the picture clean, but the deployment shape is independent of the binary: the same gateway also runs on localhost next to the agent for single-machine setups, or anywhere reachable by the device’s WireGuard config.
Upstream. The API or service the agent is calling (api.anthropic.com, api.github.com, an internal Kubernetes API server, a Postgres database, a ClickHouse cluster, an SSH bastion, …). The upstream sees a connection from the gateway, not from the device.

# Process diagram

The gateway is drawn on a separate machine; the device runs only the client — it does not run policy logic, does not hold credentials, and does not know upstream secrets.

The gateway pulls in three plugin families:

Endpoint plugins define an upstream binding and the wire protocol to terminate (http, kubernetes, postgres, clickhouse_native, clickhouse_https, ssh). Each plugin owns the per-protocol decode: an http endpoint sees parsed http.Request objects; a postgres endpoint sees Query / Parse messages; a clickhouse_native endpoint sees Hello packets; an ssh endpoint sees channels and global requests.
Credential plugins own one secret shape each (bearer token, OAuth flow, mTLS bundle, postgres user/password, ClickHouse user/password, SSH key, cookie, header token, …). Each plugin writes to one well-defined slot on the matched flow — header, startup message, hello packet, auth replay — and nothing else rewrites that slot. The agent never holds the real secret; the device only sees a placeholder.
Approver plugins arbitrate human-in-the-loop and LLM-in-the-loop verdicts on rules that opt in (dashboard, human_approver over Slack/Discord/Telegram, llm_approver for synchronous LLM proctoring against the approver's inline policy = <<-EOT ... EOT prompt — see config/README.md). The dashboard's built-in approver pushes live pending entries to the SPA for the operator to decide while the original request is waiting.

# Connection modes

clawpatrol join <gateway> enrolls the device. What the gateway mints + what the client installs depends on the gateway’s control mode.

# Tailscale mode

The gateway embeds tsnet; it joins the tailnet in-process and exposes only /api/onboard/{start,poll,claim} + /api/cred/* on :443 via Funnel. Every other route is tailnet-only. At onboard the gateway mints a Tailscale auth key (reusable=true, ephemeral=true for per-process; ephemeral=false for --whole-machine) via OAuth and the CA + api-token are delivered inside the approved Funnel response.

clawpatrol run -- <cmd> (Linux + macOS). Each invocation is its own ephemeral tailnet node. On Linux a new user + net + mnt namespace runs userspace wireguard-go inside tsnet.Server with MkdirTemp state (Ephemeral: true); on macOS the NETransparentProxyProvider extension hosts the tsnet stack and PPID-filters flows. Concurrent runs on one host don’t share state. Reference: run_tsnet_linux.go, run_tsnet_darwin.go, macos/netstack/wgnetstack.go.

The persisted tsnet auth key is hidden from agent processes:

Linux — parent reads ~/.clawpatrol/tsnet-auth-key from the host mnt ns; the child ns overlays an empty tmpfs on the dir before exec'ing the agent, re-creating only ca.crt inside the overlay. Agent sees no key, no api-token.
macOS — key is not written under $HOME at all. clawpatrol join hands it to the container app, which stores it in NETransparentProxyManager providerConfiguration (system VPN prefs). Subsequent clawpatrol run invocations pass an empty authKey arg; the container app reuses the stored value.

Net effect: the bearer is bound to "code running on this physical machine," not "anyone who can copy the file off-box."

clawpatrol join --whole-machine (Linux). Installs system Tailscale (tailscale up --authkey=...), sets the gateway as the exit node, and routes the whole host through. The auth key for this path is minted with ephemeral=false so the node persists. Reference: setup.go:runLogin.

clawpatrol join --whole-machine (macOS). The NE owns whole- host routing — no system Tailscale touched. macOS never runs system tailscaled.

# WireGuard mode

The gateway runs an in-process WireGuard server (wireguard-go + gVisor netstack). At onboard it mints a keypair, allocates a /32 from gateway.wireguard.subnet_cidr, and persists the wg-quick config at ~/.config/clawpatrol/wg.conf.

clawpatrol run -- <cmd> (Linux). Per-process ephemeral WG peer in a fresh netns. Reference: run_linux.go.

clawpatrol join --whole-machine (Linux). Kernel WireGuard via wg-quick up. Default route flips to the WG tunnel. Reference: setup.go:wgQuickUp, wireguard.go.

clawpatrol run -- <cmd> (macOS). WG userspace inside the NE, PPID-filtered. Reference: run_darwin.go, macos/ClawpatrolExtension/Provider.swift.

# Network traffic processing

Once a flow reaches the gateway over the tunnel, the gateway inspects the destination port (and, for some families, the SNI or the resolved hostname) to pick a handler. A family is the protocol class an endpoint plugin advertises so the rule engine can target it: today the gateway ships http (the http endpoint), sql (postgres, clickhouse_native, clickhouse_https), and k8s (kubernetes). Rules are a single block kind; the family is inferred from the rule’s endpoint(s) at load time, and each family exposes its own CEL variable (http.*, sql.*, k8s.*) that the rule’s condition may reference. New protocols (e.g. ssh) ship with their own family identifier and CEL variable. Anything the gateway has no opinion on splices to the real upstream byte-for-byte. There is no HTTPS_PROXY env var, no per-tool CA configuration, and no iptables rule on the gateway host: the WG netstack accepts SYNs to any destination IP/port and hands the dispatcher the original 4-tuple intact.

# Dispatch decision

The promiscuous WG forwarder picks one branch per inbound flow based on the destination port and IP:

The branches are described below, with the summary table at the end of the section.

# TLS SNI

For TCP flows on :443, the gateway peeks the TLS ClientHello to recover the SNI hostname, then looks up the endpoint claiming that host within the device’s profile. If the endpoint is http or k8s, the gateway terminates TLS with a leaf cert minted on the fly (P-256, 30-day validity, in-memory cache, signed by the gateway’s CA), parses the request, runs it through the rule matcher and approve chain, asks the credential plugin to inject the real secret, and round-trips upstream. Endpoints whose family isn’t HTTPS-shaped (e.g. clickhouse_https, schema-only today) fall through to passthrough.

The CA cert is provisioned on the device during onboarding so the agent’s TLS clients trust the minted leaves; the agent never sees the upstream’s real cert.

# Postgres claiming

Postgres endpoints don’t have an SNI to peek, so the gateway claims them by destination IP. The mechanism is the ConnRouter interface in config/runtime/conn_route.go: an endpoint plugin’s body satisfies ConnRouter when it exposes ConnRouteHosts() []string, returning the host:port tuples it claims (db.example.com:5432, …). At policy load the gateway resolves each host via DNS and folds the answers into a ConnIndex keyed dstIP → endpoint(s).

When a TCP connection lands on :5432, the WG forwarder routes it into handlePostgresConn, which consults the index by the connection’s destination IP to pick the matching endpoint. When several endpoints share an IP (writer + readonly aimed at the same RDS instance) the lookup filters by the device’s profile so the right one wins; single-database profiles fall back to "first postgres in profile" without needing DNS at all. The postgres endpoint runtime then performs auth offload and runs the flow through sql-family rule matching with the right credential.

The same ConnRouter mechanism powers clickhouse_native (claimed by direct IP) and ssh (claimed by DNS-VIP); the plugin only has to declare its host tuples and the dispatcher does the rest without main.go having to learn about new families.

# DNS interception → VIP

Some families (ssh, clickhouse_native) have no SNI and no Host header, so the gateway can’t recover the agent-dialed hostname from the wire bytes alone. Their endpoint plugins flag RequiresVIP, and the dnsvip allocator assigns each hostname a stable virtual IP at policy build, persisted to disk so VIPs survive restart.

The gateway runs an in-process DNS responder on UDP/TCP :53. The WG netstack delivers all DNS queries here regardless of the agent’s resolver setting (any port-53 datagram reaches the gateway). For VIP-bound hostnames it returns the allocated VIP; for everything else it forwards the query to the upstream resolver and returns the real A/AAAA verbatim, so unrelated traffic flows unchanged.

When the agent dials the VIP, the WG forwarder routes any port on that IP into the matching endpoint runtime, which recovers the hostname from the VIP table and dispatches into the right plugin (SSH server-toward-agent / SSH client-toward-upstream with auth replay; ClickHouse Hello-packet placeholder swap; …).

# Direct IP

Endpoint plugins can also bind to literal IPs (hosts = ["172.17.0.1"] for an in-cluster ClickHouse). Those skip dnsvip entirely — the agent dials the IP without ever issuing a DNS query. The gateway maintains an index of IP-literal bindings and consults it in the catch-all branch of the dispatcher: if the destination IP claims an endpoint, the flow goes to that endpoint’s runtime; otherwise it falls through to transparent relay.

# Intercept-or-passthrough summary

With the branches explained, the dispatch table reads as a summary:

dst port	handler
`:443`	SNI peek, then HTTPS family dispatch (`http` / `k8s`) or passthrough
`:5432`	postgres wire-protocol gateway (auth offload + `sql`-family rule matching)
`:53`	DNS-VIP responder (UDP and TCP fallback)
any port, dst is VIP	VIP-bound endpoint runtime (today: `ssh`, `clickhouse_native` reached by hostname)
`else`	direct-IP endpoint lookup; falls through to transparent TCP relay when no plugin claims

If no endpoint plugin claims the destination, the gateway falls back to a transparent relay: it dials the real destination IP and pipes bytes both ways. The top-level unknown_host setting in gateway.hcl (passthrough by default) decides what to do when an HTTPS SNI doesn’t match any configured endpoint — splice it unchanged or close it.

UDP dispatch is narrower: only :53 is handled today (DNS-VIP); other UDP datagrams are dropped.