How We Secured OpenClaw: Zero-Knowledge Architecture for AI Agents

If you've been following the OpenClaw community, you've probably seen the security discussions. Exposed API keys in Docker containers. SSH private keys traveling through browsers. Secrets returned in plain JSON API responses. These aren't theoretical risks — they're the default behavior in most OpenClaw deployments.

We decided to fix all of it. Here's exactly what we built, why, and how it works.

The problem with default OpenClaw security

A standard OpenClaw deployment has a straightforward architecture: a Docker container running the gateway, your LLM API keys passed as environment variables, and SSH access managed through the dashboard.

That simplicity comes with real security costs:

API keys live inside the container as environment variables. If the container is compromised, every key is immediately exposed.
SSH private keys travel through the browser. They're generated server-side, sent to the client in API responses, stored in sessionStorage, and then sent back to the server for every operation.
API responses include raw secrets. Gateway tokens, SSH keys, and Hetzner credentials all appear in JSON payloads that any browser extension or network proxy can intercept.

Warning

This isn't a vulnerability in OpenClaw itself — it's a consequence of how most self-hosted deployments are configured. The default assumes a trusted network and a single operator. That assumption breaks down quickly in production.

Our approach: zero-knowledge at every layer

We built Cannes with a simple principle: no secret should exist in a place where it isn't strictly needed. That means:

The agent container never sees real API keys
SSH private keys never reach the browser
API responses never contain raw secrets
Temporary credentials are deleted as soon as they're used

Let's walk through each layer.

Layer 1: The LLM auth proxy

This is the most impactful change. Instead of passing API keys to the container as environment variables, we run a host-side proxy that intercepts every LLM API call and injects the real credentials.

Here's the flow:

The OpenClaw gateway config has apiKey: "proxy-managed" — a placeholder
All LLM requests route through http://host.docker.internal:3101/proxy/<provider-id>
The host proxy reads the real key from /root/.model-keys.json (mode 600, root-only)
It injects the correct auth header for each provider (x-api-key for Anthropic, Authorization: Bearer for OpenAI, x-goog-api-key for Google)
The request forwards to the upstream provider

The container never sees the real key at any point. Even if someone gets a shell inside the container, they find "proxy-managed" — which is useless.

# Gateway config (simplified)
models:
  providers:
    - id: anthropic
      baseUrl: "http://host.docker.internal:3101/proxy/anthropic"
      apiKey: "proxy-managed"  # ← not a real key

The proxy enforces network-level isolation too. It only accepts requests from Docker bridge IPs (172.16.0.0/12). A request from the public internet or any non-container IP is rejected outright.

Tip

The key file lives at /root/.model-keys.json with permissions 600. The container runs as uid 1000 (node user). Even if Docker volume mounts were misconfigured, the container process literally cannot read the file.

Layer 2: SSH keys never leave the server

This was the trickiest part to get right, and the one most OpenClaw setups get wrong.

In a typical deployment, the SSH key lifecycle looks like this:

Key generated on the server (rescue mode)
Key sent to the browser in an API response
Key stored in sessionStorage
Key sent back to the server for every SSH operation
Key potentially visible in browser DevTools, extensions, network logs

We eliminated steps 2-5 entirely.

In our architecture:

Key generated server-side — ssh-keygen runs in Hetzner rescue mode during provisioning
Key encrypted and stored in the database — AES-256-GCM encryption in Neon Postgres
Server-side API routes read the key directly — the client never receives it
API responses return a boolean flag — hasSshKey: true instead of the actual key
Key automatically rotated after provisioning — the VPS generates a fresh keypair, old key replaced in DB

// What the API returns now:
{
  "slug": "abc12345",
  "ip": "1.2.3.4",
  "hasSshKey": true,    // boolean flag only
  "complete": true
}

// What it used to return:
{
  "slug": "abc12345",
  "ip": "1.2.3.4",
  "sshPrivateKey": "-----BEGIN OPENSSH PRIVATE KEY-----...",
  "gatewayToken": "gw-abc123..."
}

The browser doesn't need the SSH key. It needs to know whether the server has one so it can show the right UI. That's all the boolean flag provides.

Layer 3: Browser-direct key delivery via Tailscale

When you set up your LLM API keys through the Cannes dashboard, those keys travel directly from your browser to your VPS over Tailscale HTTPS. They never pass through Vercel's servers or any intermediary.

This matters because in many hosted AI platforms, your API keys transit through the vendor's cloud infrastructure. Even with TLS, the vendor's servers decrypt the request, read your keys, and forward them. With browser-direct delivery over Tailscale, the keys go straight from your machine to your VPS — end-to-end encrypted, no middleman.

Layer 4: Credential cleanup

We don't just protect secrets — we delete them as soon as they're no longer needed.

Hetzner API tokens are particularly sensitive. They grant full control over your cloud infrastructure — create servers, destroy them, read SSH keys. We need the token during server setup, but after that, it's a liability.

As soon as provisioning completes, clearAgentHetznerToken() runs and permanently deletes the token from our database. If our database were ever compromised, the attacker would find no Hetzner tokens to abuse.

SSH key rotation follows the same principle. The key used during initial setup is rotated immediately after provisioning. The VPS generates a completely new keypair, the server saves the new key to the encrypted database, and the old key ceases to exist. This entire process happens server-side — the browser is never involved.

Layer 5: Encrypted secrets at rest

Every sensitive field in the database is encrypted with AES-256-GCM before storage. This isn't just "database encryption" — it's field-level encryption that means even a full database dump reveals nothing useful.

The encryption key is derived from environment variables that only exist in the Vercel runtime. A database backup without the runtime environment is a collection of encrypted blobs.

Layer 6: Zero secrets in API responses

We audited every API endpoint in the dashboard and applied a strict rule: no endpoint returns a raw secret. Ever.

What used to be returned	What's returned now
`sshPrivateKey: "-----BEGIN..."`	`hasSshKey: true`
`gatewayToken: "gw-abc..."`	`complete: true`
`hetznerToken: "hk-abc..."`	`hasHetznerToken: true`

The gateway token is a partial exception — it's delivered to the browser via server component props (not API responses) because the browser needs it for HMAC-signed direct VPS API calls. But it never appears in any REST API JSON response.

How this compares to the default

Aspect	Default OpenClaw	Cannes
API key storage	Environment variable in container	Host-only file, mode 600, container can't read
Key visibility to agent	Fully visible (`process.env.API_KEY`)	Never visible (`"proxy-managed"`)
SSH key handling	Sent to browser in API response	Never leaves server — encrypted DB only
Container compromise impact	All API keys exposed	No keys accessible
Secrets in API responses	Tokens, keys returned in JSON	Zero secrets — boolean flags only
Key rotation	Manual — redeploy container	Automatic, server-side, zero downtime
Key delivery path	Via vendor cloud (they see your keys)	Browser-direct via Tailscale HTTPS
Credential cleanup	Manual	Automatic — tokens wiped after setup

The principle: minimum secrets, minimum time

Every design decision reduces to one question: does this secret need to exist here, right now?

If the container doesn't need the API key to make LLM calls (the proxy handles it), the container shouldn't have the key. If the browser doesn't need the SSH key to show a status indicator (a boolean works), the browser shouldn't have the key. If the Hetzner token isn't needed after setup (the server is already running), the token should be deleted.

Security isn't about adding encryption on top of a permissive architecture. It's about making the architecture restrictive by default, and then verifying that every layer enforces its boundaries independently.

Tip

Everything described here is fully open source and auditable. The LLM auth proxy runs in vps-api/server.mjs. The SSH key lifecycle is in the provisioning routes. The API response filtering is in the agent API handlers. You can read every line.

Getting started with a secure OpenClaw setup

If you're running OpenClaw and want this level of security out of the box:

Sign up at Cannes — paste your Hetzner API token and we provision a hardened VPS
Add your LLM API keys — they go browser-direct to your VPS, never through our servers
Your agent starts working — with six layers of isolation protecting every credential

No security configuration needed. No manual hardening steps. The zero-knowledge architecture is the default.