# Hermes Workspace OpenAI-Compat Architecture Spec

> **For Claude:** Use `writing-plans` if this turns into an implementation plan. This doc locks the product and backend compatibility direction.

**Goal:** Make Hermes Workspace work out of the box against vanilla `hermes-agent` and any OpenAI-compatible backend, while unlocking richer workspace features automatically when Claude-specific APIs are available.

**Status:** Approved architectural constraint for the next implementation pass.

---

## 1. Problem

Hermes Workspace currently depends on a forked `hermes-agent` gateway for extended functionality:

- session management
- streaming chat
- memory browser
- skills browser / install flow
- config editing
- capability-aware dashboard behavior

That fork dependency is the wrong shape for distribution.

Current downside:

- users cannot point the workspace at stock `hermes-agent` and expect it to work
- README/setup flow forces a custom fork
- chat reliability is coupled to `/api/sessions` instead of the more portable OpenAI-compatible chat interface
- product adoption is constrained by backend politics instead of frontend usability

We want to reverse that.

---

## 2. Architectural Constraint

This is the decision to lock in:

> **Hermes Workspace must work standalone against any OpenAI-compatible backend.**
>
> Claude-specific workspace features may enhance the experience when the full Hermes Agent API is available, but the product must remain usable without those endpoints.

Non-negotiable implication:

- **The fork cannot remain a product requirement.**
- Enhanced APIs are optional capability unlocks, not startup prerequisites.

---

## 3. Two-Step Strategy

### Step 1 — Make Workspace standalone now

Rewrite the workspace so the core chat product works against:

- vanilla `hermes-agent`
- any backend exposing `/v1/chat/completions`
- any backend exposing `/v1/models` optionally

In this mode, advanced features degrade gracefully when Claude-specific APIs are absent.

### Step 2 — Upstream the richer API later

Submit the custom Claude endpoints into upstream `hermes-agent`, targeting `gateway/platforms/api_server.py`.

If upstream accepts them:

- full workspace functionality works with vanilla `hermes-agent`
- no long-term fork dependency remains
- the enhanced UX becomes a first-class upstream capability, not a private patchset

---

## 4. Product Modes

The workspace should operate in two runtime modes.

### Mode A — Portable OpenAI-Compat Mode

Minimum required backend surface:

- `POST /v1/chat/completions`
- optional `GET /v1/models`

User gets:

- working chat
- streaming assistant responses when backend supports streaming
- model selection when `/v1/models` exists
- basic attachments if backend/model supports them
- clean onboarding and connection state

User does **not** need:

- `/api/sessions`
- `/api/skills`
- `/api/memory`
- `/api/config`
- Claude-specific metadata endpoints

### Mode B — Enhanced Claude Mode

When Claude-specific endpoints are present, unlock:

- session history and named sessions
- memory browser / search / editing
- skills browser / install / management
- config editor
- jobs/cron visibility
- richer capability and workspace introspection

The UI should detect these capabilities and progressively enhance.

---

## 5. Core Product Principle

**Chat is the base product. Everything else is optional enhancement.**

If a user points Hermes Workspace at a valid OpenAI-compatible backend, they should be able to send a message and receive a streamed response without caring whether the backend is Claude, OpenAI, OpenRouter, Ollama, vLLM, or something else.

Anything beyond that should be treated as capability-based augmentation.

---

## 6. Required Behavior Changes

### 6.1 Chat transport

The workspace must stop treating `/api/sessions` as the prerequisite for sending a chat message.

Instead:

1. Detect whether Claude session APIs exist.
2. If yes, use the enhanced Claude session flow.
3. If not, send chat through `POST /v1/chat/completions`.
4. If streaming is supported, render streamed deltas.
5. If streaming is not supported, render standard non-stream response cleanly.

Result:

- missing Claude sessions API must no longer cause the product to hang or hard-fail for basic chat

### 6.2 Capability detection

Capability probing should explicitly distinguish:

#### Core portable capabilities

- health / reachability
- `/v1/chat/completions`
- `/v1/models`
- streaming support if detectable
- attachment / image support if inferable

#### Claude enhancement capabilities

- `/api/sessions`
- `/api/skills`
- `/api/memory`
- `/api/config`
- `/api/jobs`

The app should expose these as two layers:

- `coreCapabilities`
- `enhancedCapabilities`

### 6.3 Graceful degradation

When Claude-specific APIs are missing, the UI must not show broken loaders, dead tabs, or cryptic errors.

Instead, each advanced surface should do one of the following:

- hide itself when not relevant
- show a clear “Not available on this backend” state
- explain what capability would unlock it
- continue to preserve the rest of the app as fully usable

Required degraded states:

- **Sessions:** fallback to ephemeral/local chat thread state
- **Memory:** read-only unavailable state with explanation
- **Skills:** unavailable state with explanation
- **Config:** unavailable state with explanation
- **Jobs:** unavailable state with explanation

### 6.4 Onboarding and setup

The setup flow must stop instructing users that a fork is required.

New setup principle:

- connect any OpenAI-compatible backend first
- verify chat works
- then advertise extra Claude-native features if supported

Onboarding copy should communicate:

- “Works with any OpenAI-compatible backend”
- “Enhanced features unlock automatically with Hermes Agent gateway APIs”

### 6.5 Documentation

README and setup docs must reflect the architecture honestly.

Required messaging:

- workspace works standalone with OpenAI-compatible backends
- vanilla `hermes-agent` is a supported target
- the richer Hermes Agent API is optional for advanced workspace features
- upstreaming those APIs is the long-term path

---

## 7. UX Requirements

### 7.1 Connection status language

Do not frame missing advanced APIs as a fatal error when core chat works.

Use status language like:

- **Connected** — chat available
- **Enhanced** — Claude workspace APIs detected
- **Partial** — chat available, some advanced features unavailable
- **Disconnected** — no usable chat backend detected

### 7.2 Feature gating

Feature gating should feel intentional, not broken.

Good examples:

- “Memory browser requires Claude memory API.”
- “Session history isn’t available on this backend yet.”
- “Connected in portable mode. Chat works; advanced workspace tools are unavailable.”

Bad examples:

- raw 404 text
- spinner forever
- generic 500 banners with no next step
- startup screen claiming setup is incomplete when chat is actually usable

### 7.3 Session behavior in portable mode

When no Claude sessions API exists, the app still needs a sane chat UX.

Portable-mode minimum:

- maintain current thread in client state
- preserve visible message history for the active page/app session
- clearly label it as local / temporary if persistence is unavailable
- avoid fake server session IDs when the backend does not provide them

---

## 8. API Design Direction

### 8.1 Portable path

Primary portable request target:

- `POST /v1/chat/completions`

Expected request compatibility:

- `model`
- `messages`
- `stream`
- `temperature` if supported
- attachments / image content where backend accepts multimodal OpenAI-style messages

Expected response handling:

- SSE stream chunks for streaming mode
- standard OpenAI chat completion JSON for non-stream mode

### 8.2 Enhanced Claude path

Enhanced path remains Claude-native where available, because it provides:

- persistent sessions
- message history
- memory/skills/config surfaces
- richer workspace affordances

That is fine, but it must sit behind capability detection instead of being assumed.

### 8.3 Upstream target

For Step 2, the custom API endpoints should be proposed upstream in:

- `gateway/platforms/api_server.py`

Intent:

- make enhanced workspace APIs part of upstream `hermes-agent`
- remove ongoing maintenance burden of a permanent fork
- let Hermes Workspace treat stock Claude as the best backend, without requiring it

---

## 9. Non-Goals

This spec does **not** require:

- universal parity across every OpenAI-compatible provider
- guaranteed session persistence on non-Hermes backends
- memory/skills/config support outside Claude
- building a backend abstraction for every vendor-specific extension

The goal is simpler:

- portable chat first
- enhanced Claude features second
- no fork requirement

---

## 10. Acceptance Criteria

This initiative is complete when all of the following are true:

### Product acceptance

- A user can launch Hermes Workspace against a stock OpenAI-compatible backend and successfully chat without patching backend code.
- A user can launch Hermes Workspace against vanilla `hermes-agent` and get a working core experience.
- Advanced features do not hard-fail the app when Claude-specific APIs are absent.
- The UI clearly communicates portable mode vs enhanced Claude mode.

### Technical acceptance

- Chat send path no longer hard-depends on `/api/sessions`.
- Capability probing includes `/v1/chat/completions` readiness, not just Claude-specific APIs.
- Missing `/api/sessions`, `/api/skills`, `/api/memory`, or `/api/config` does not block app boot or core chat.
- Portable-mode chat streaming works against OpenAI-compatible SSE responses.

### Documentation acceptance

- README no longer says the fork is required.
- Setup docs describe OpenAI-compatible standalone mode first.
- Enhanced Hermes Agent API support is documented as progressive enhancement.
- Step 2 upstreaming target is documented clearly.

---

## 11. Implementation Guidance

This is not the detailed task plan, but the engineering direction should be:

1. Separate **core chat client** from **Claude enhanced client**.
2. Refactor capability probing into portable vs enhanced layers.
3. Add OpenAI-compatible streaming parser path.
4. Add local-thread fallback for non-session backends.
5. Gate advanced screens cleanly behind capability checks.
6. Rewrite onboarding and docs around portable-first positioning.
7. After Step 1 is stable, prepare the upstream PR for Claude-native endpoints.

---

## 12. Final Decision Statement

Lock this in:

> Hermes Workspace is a standalone frontend for OpenAI-compatible chat backends.
>
> Claude-native APIs are an enhancement layer, not a requirement.
>
> Step 1 is portable compatibility now.
>
> Step 2 is upstreaming the enhanced Hermes Agent APIs so no fork is needed ever again.