#275 reported workspace stuck on 'Disconnected' even though the agent was reachable. Root cause: workspace boots before agent in docker compose, every probe fails, capabilities cached as zero-state for the full 120s TTL. By the time the agent comes up, the cache is still stale and the UI looks broken. Changes: * effectiveProbeTtl(): 120s when healthy, 15s when disconnected. The shorter window during 'mode=disconnected' state means a stack where workspace lost the race to the agent recovers within ~15s of the agent becoming reachable, instead of being stuck on the first failed probe for two minutes. * New POST /api/gateway-reprobe endpoint: forces a fresh probe regardless of TTL. Useful for diagnostic scripts and a future UI 'Reconnect' button. Auth-gated (same as /api/gateway-status). * New forceReprobeGateway() helper exported from gateway-capabilities. * New docs/docker.md: comprehensive setup guide covering single-host, multi-host (NAS/VPS), capability mismatches, and a step-by-step diagnostic playbook for connection failures. Cross-references the new /api/gateway-reprobe endpoint. Foundation for #275 — the docs + faster recovery cover the most common cases. Outstanding work: better startup ordering hint when probes fail because the agent isn't up yet (toast + 'Reconnect' button in the UI) and a CI test that boots both services in compose to catch regressions in the connection contract.
6.8 KiB
Docker
Hermes Workspace + Hermes Agent in containers.
TL;DR (single-host, localhost-only)
git clone https://github.com/outsourc-e/hermes-workspace
cd hermes-workspace
cp .env.example .env
# add at least one provider key (e.g. OPENROUTER_API_KEY=...)
docker compose up -d
open http://localhost:3000
That's it. The repo's docker-compose.yml runs:
hermes-agent(port8642, internal only)hermes-workspace(port3000, bound to127.0.0.1)
The workspace waits for the agent's /health to return 200 before starting (via depends_on: condition: service_healthy). On a fresh laptop this takes about 15 seconds.
Multi-host / NAS / VPS
If the workspace and agent run on different machines, or you want LAN/Tailscale access to the workspace, three things change:
1. Agent binds publicly
In .env:
API_SERVER_HOST=0.0.0.0
API_SERVER_KEY=<a long random string>
This makes the agent listen on all interfaces, not just the Docker loopback. API_SERVER_KEY is mandatory when API_SERVER_HOST is non-loopback — the agent will refuse to start otherwise.
2. Workspace knows where the agent is
In .env:
HERMES_API_URL=http://<agent-host-or-service>:8642
HERMES_API_TOKEN=<the same value as API_SERVER_KEY>
HERMES_DASHBOARD_URL=http://<agent-host-or-service>:9119
HERMES_DASHBOARD_TOKEN=<same key, or set CLAUDE_DASHBOARD_TOKEN>
Inside docker compose on the same host, <agent-host-or-service> is the service name from your compose file (e.g. hermes-agent). On a Synology NAS with a separate workspace stack, it's the LAN IP (e.g. 192.168.1.78).
3. Workspace gets a password
The workspace bind is non-loopback in Docker (0.0.0.0:3000). It refuses to start in production mode without a password to prevent accidental open exposure:
HERMES_PASSWORD=<a long random string different from API_SERVER_KEY>
If you publish the workspace behind HTTPS (reverse proxy, Tailscale Funnel, Cloudflare Tunnel), also set COOKIE_SECURE=1 so session cookies get the Secure flag.
Connection failures — diagnostic playbook
If the workspace shows "Disconnected" or "Missing Hermes APIs detected" but the agent appears to be running:
Step 1 — Verify the agent is reachable from inside the workspace container
docker compose exec hermes-workspace sh
# inside the workspace container:
curl -fsS http://hermes-agent:8642/health
curl -fsS -H "Authorization: Bearer $HERMES_API_TOKEN" http://hermes-agent:8642/v1/models | head -c 200
exit
If /health returns a JSON {"status": "ok"}, the agent is alive on the docker network.
Step 2 — Confirm the workspace's environment
docker compose exec hermes-workspace env | grep -E "HERMES_API|API_SERVER"
You should see:
HERMES_API_URL=http://hermes-agent:8642(or whichever service name)HERMES_API_TOKEN=<same value as agent's API_SERVER_KEY>
Step 3 — Force a reprobe
The workspace caches the gateway capability map for 2 minutes (15 seconds when in disconnected state, since v2.2.1). If the agent came up after the workspace started probing, that cache is stale.
curl -X POST http://localhost:3000/api/gateway-reprobe
This re-runs the probe and returns the fresh capability map. If it now reads mode=zero-fork you're connected.
Step 4 — Read the workspace's capability log
The workspace logs the full capability summary on every probe. Look for the [gateway] line:
docker compose logs hermes-workspace 2>&1 | grep '\[gateway\]' | tail -3
A healthy log looks like:
[gateway] gateway=http://hermes-agent:8642 dashboard=http://hermes-agent:9119 mode=zero-fork core=[health,chatCompletions,models,streaming] enhanced=[sessions,skills,memory,config,jobs,enhancedChat,conductor,kanban] missing=[mcp]
A failing log usually shows core=[] and missing=[health,...] — that means every probe got a non-2xx response. Check the agent's logs (docker compose logs hermes-agent) for matching 401/404/timeout entries.
Common causes
| Symptom | Cause | Fix |
|---|---|---|
core=[] and missing=[health,...] |
Workspace probed before agent was ready | Wait 30s and reload, or POST /api/gateway-reprobe. Cache TTL drops to 15s in disconnected state. |
core=[health,chatCompletions] but no models |
Older agent image (pre-/v1/models) |
Update: docker compose pull && docker compose up -d |
| All probes 401 | HERMES_API_TOKEN doesn't match agent's API_SERVER_KEY |
Check both .env values are the same. They must match exactly. |
| Workspace UI shows "Connection refused" | Workspace using 127.0.0.1 instead of the service name |
Set HERMES_API_URL=http://hermes-agent:8642 (or whichever service name). |
Agent restart loops with API_SERVER_KEY required |
Agent bound to 0.0.0.0 without a key | Set API_SERVER_KEY in .env (mandatory for non-loopback bind). |
Synology NAS / external host setups
If your workspace and agent are on different stacks on the same NAS (or different hosts entirely), they don't share a docker network. You need:
- Both to publish their ports (the agent on
8642, the workspace on3000). - The workspace to point at the agent's host IP, not service name. Example for Synology with NAS at
192.168.1.78:
HERMES_API_URL=http://192.168.1.78:8642
HERMES_API_TOKEN=<API_SERVER_KEY>
HERMES_DASHBOARD_URL=http://192.168.1.78:9119
- The agent to bind on
0.0.0.0:
API_SERVER_HOST=0.0.0.0
API_SERVER_KEY=<long random>
- The dashboard plugin (multi-board kanban, conductor missions) needs the dashboard service running on the agent host too — see the agent's docker-compose for that service.
If you bind the agent to 0.0.0.0 on a NAS without API_SERVER_KEY, the agent will refuse to start. This is intentional — open-internet exposure of the agent's chat endpoint without auth would be a footgun.
Hermes Workspace + Hermes Agent: why two containers?
The workspace is the UI. The agent is the engine. Splitting them lets you:
- Update either independently (
docker compose pull hermes-workspaceetc.) - Run multiple workspaces against one agent (different ports)
- Run the workspace on a tablet/phone while the agent stays on a beefy machine
The default compose colocates them for simplicity. The split-host setup above is the explicit "you know what you're doing" path.
Filing bugs
If your setup matches the playbook above and still breaks, file an issue at https://github.com/outsourc-e/hermes-workspace/issues with:
- Your
docker-compose.yml(redact secrets) - The output of
docker compose logs hermes-workspace 2>&1 | grep '\[gateway\]' | tail -5 - The output of
curl -fsS http://<workspace-host>:3000/api/gateway-reprobe -X POST(also redact)
That gets us to the actual cause within a couple of comments instead of a long back-and-forth.