mirror of https://github.com/HKUDS/nanobot.git synced 2026-04-30 14:56:01 +00:00

Xubin Ren e3bca929fb fix(webui): left-align prose inside user message pill

2026-04-23 00:07:27 +08:00

12 KiB

Raw Blame History

WebSocket Server Channel

Nanobot can act as a WebSocket server, allowing external clients (web apps, CLIs, scripts) to interact with the agent in real time via persistent connections.

Features

Bidirectional real-time communication over WebSocket
Streaming support — receive agent responses token by token
Token-based authentication (static tokens and short-lived issued tokens)
Multi-chat multiplexing — one connection can run many concurrent chat_ids
TLS/SSL support (WSS) with enforced TLSv1.2 minimum
Client allow-list via allowFrom
Auto-cleanup of dead connections

Quick Start

1. Configure

Add to config.json under channels.websocket:

{
  "channels": {
    "websocket": {
      "enabled": true,
      "host": "127.0.0.1",
      "port": 8765,
      "path": "/",
      "websocketRequiresToken": false,
      "allowFrom": ["*"],
      "streaming": true
    }
  }
}

2. Start nanobot

nanobot gateway

You should see:

WebSocket server listening on ws://127.0.0.1:8765/

3. Connect a client

# Using websocat
websocat ws://127.0.0.1:8765/?client_id=alice

# Using Python
import asyncio, json, websockets

async def main():
    async with websockets.connect("ws://127.0.0.1:8765/?client_id=alice") as ws:
        ready = json.loads(await ws.recv())
        print(ready)  # {"event": "ready", "chat_id": "...", "client_id": "alice"}
        await ws.send(json.dumps({"content": "Hello nanobot!"}))
        reply = json.loads(await ws.recv())
        print(reply["text"])

asyncio.run(main())

Connection URL

ws://{host}:{port}{path}?client_id={id}&token={token}

Parameter	Required	Description
`client_id`	No	Identifier for `allowFrom` authorization. Auto-generated as `anon-xxxxxxxxxxxx` if omitted. Truncated to 128 chars.
`token`	Conditional	Authentication token. Required when `websocketRequiresToken` is `true` or `token` (static secret) is configured.

Wire Protocol

All frames are JSON text. Each message has an event field.

Server → Client

ready — sent immediately after connection is established:

{
  "event": "ready",
  "chat_id": "uuid-v4",
  "client_id": "alice"
}

message — full agent response:

{
  "event": "message",
  "chat_id": "uuid-v4",
  "text": "Hello! How can I help?",
  "media": ["/tmp/image.png"],
  "reply_to": "msg-id"
}

media and reply_to are only present when applicable.

delta — streaming text chunk (only when streaming: true):

{
  "event": "delta",
  "chat_id": "uuid-v4",
  "text": "Hello",
  "stream_id": "s1"
}

stream_end — signals the end of a streaming segment:

{
  "event": "stream_end",
  "chat_id": "uuid-v4",
  "stream_id": "s1"
}

attached — confirmation for new_chat / attach inbound envelopes (see Multi-chat multiplexing):

{"event": "attached", "chat_id": "uuid-v4"}

error — soft error for malformed inbound envelopes. The connection stays open:

{"event": "error", "detail": "invalid chat_id"}

Client → Server

Legacy (default chat): send a plain string, or a JSON object with a recognized text field:

"Hello nanobot!"

{"content": "Hello nanobot!"}

Recognized fields: content, text, message (checked in that order). Invalid JSON is treated as plain text. These frames route to the connection's default chat_id (the one announced in ready).

Typed envelopes (multi-chat): any JSON object with a string type field is a typed envelope:

`type`	Fields	Effect
`new_chat`	—	Server mints a new `chat_id`, subscribes this connection, replies with `attached`.
`attach`	`chat_id`	Subscribe to an existing `chat_id` (e.g. after a page reload). Replies with `attached`.
`message`	`chat_id`, `content`	Send `content` on `chat_id`. First use auto-attaches; no explicit `attach` needed.

See Multi-chat multiplexing for the full flow.

Configuration Reference

All fields go under channels.websocket in config.json.

Connection

Field	Type	Default	Description
`enabled`	bool	`false`	Enable the WebSocket server.
`host`	string	`"127.0.0.1"`	Bind address. Use `"0.0.0.0"` to accept external connections.
`port`	int	`8765`	Listen port.
`path`	string	`"/"`	WebSocket upgrade path. Trailing slashes are normalized (root `/` is preserved).
`maxMessageBytes`	int	`37748736`	Maximum inbound message size in bytes (1 KB – 40 MB). Default (36 MB) is sized to accept up to 4 base64-encoded image attachments at 8 MB each; lower it if the channel only carries text.

Authentication

Field	Type	Default	Description
`token`	string	`""`	Static shared secret. When set, clients must provide `?token=<value>` matching this secret (timing-safe comparison). Issued tokens are also accepted as a fallback.
`websocketRequiresToken`	bool	`true`	When `true` and no static `token` is configured, clients must still present a valid issued token. Set to `false` to allow unauthenticated connections (only safe for local/trusted networks).
`tokenIssuePath`	string	`""`	HTTP path for issuing short-lived tokens. Must differ from `path`. See Token Issuance.
`tokenIssueSecret`	string	`""`	Secret required to obtain tokens via the issue endpoint. If empty, any client can obtain tokens (logged as a warning).
`tokenTtlS`	int	`300`	Time-to-live for issued tokens in seconds (30 – 86,400).

Access Control

Field	Type	Default	Description
`allowFrom`	list of string	`["*"]`	Allowed `client_id` values. `"*"` allows all; `[]` denies all.

Streaming

Field	Type	Default	Description
`streaming`	bool	`true`	Enable streaming mode. The agent sends `delta` + `stream_end` frames instead of a single `message`.

Keep-alive

Field	Type	Default	Description
`pingIntervalS`	float	`20.0`	WebSocket ping interval in seconds (5 – 300).
`pingTimeoutS`	float	`20.0`	Time to wait for a pong before closing the connection (5 – 300).

TLS/SSL

Field	Type	Default	Description
`sslCertfile`	string	`""`	Path to the TLS certificate file (PEM). Both `sslCertfile` and `sslKeyfile` must be set to enable WSS.
`sslKeyfile`	string	`""`	Path to the TLS private key file (PEM). Minimum TLS version is enforced as TLSv1.2.

Token Issuance

For production deployments where websocketRequiresToken: true, use short-lived tokens instead of embedding static secrets in clients.

How it works

Client sends GET {tokenIssuePath} with Authorization: Bearer {tokenIssueSecret} (or X-Nanobot-Auth header).
Server responds with a one-time-use token:

{"token": "nbwt_aBcDeFg...", "expires_in": 300}

Client opens WebSocket with ?token=nbwt_aBcDeFg...&client_id=....
The token is consumed (single use) and cannot be reused.

Example setup

{
  "channels": {
    "websocket": {
      "enabled": true,
      "port": 8765,
      "path": "/ws",
      "tokenIssuePath": "/auth/token",
      "tokenIssueSecret": "your-secret-here",
      "tokenTtlS": 300,
      "websocketRequiresToken": true,
      "allowFrom": ["*"],
      "streaming": true
    }
  }
}

Client flow:

# 1. Obtain a token
curl -H "Authorization: Bearer your-secret-here" http://127.0.0.1:8765/auth/token

# 2. Connect using the token
websocat "ws://127.0.0.1:8765/ws?client_id=alice&token=nbwt_aBcDeFg..."

Limits

Issued tokens are single-use — each token can only complete one handshake.
Outstanding tokens are capped at 10,000. Requests beyond this return HTTP 429.
Expired tokens are purged lazily on each issue or validation request.

Multi-chat multiplexing

A single WebSocket can carry many concurrent chats. The server tracks chat_id -> {connections} as a fan-out set, so the same chat can also be mirrored across multiple connections (e.g. two browser tabs).

Typical flow (web UI with a sidebar)

client                                server
  | --- connect -------------------->  |
  | <-- {"event":"ready",              |
  |      "chat_id":"d3..."}   (default)|
  |                                     |
  | --- {"type":"new_chat"} --------->  |
  | <-- {"event":"attached",            |
  |      "chat_id":"a1..."}             |
  |                                     |
  | --- {"type":"message",              |
  |      "chat_id":"a1...",             |
  |      "content":"hi"} ------------>  |
  | <-- {"event":"delta", ...}          |
  | <-- {"event":"stream_end", ...}     |
  |                                     |
  | --- {"type":"attach",               |  # after page reload
  |      "chat_id":"a1..."} --------->  |
  | <-- {"event":"attached", ...}       |

Rules

Every outbound event carries chat_id. Clients must dispatch by that field.
chat_id format: ^[A-Za-z0-9_:-]{1,64}$. Non-matching values return error.
message auto-attaches on first use — no separate attach is required for chats the server minted (new_chat) on the same connection.
Errors (invalid envelope, unknown type, bad chat_id) are soft: the server replies with {"event":"error","detail":"..."} and keeps the connection open.

Backward compatibility

Legacy clients that only send plain text or {"content": ...} keep working unchanged: those frames route to the connection's default chat_id (the one from ready). No config flag is needed.

Security boundary

chat_id is a capability: anyone holding a valid WebSocket auth credential and the chat_id can attach to that conversation and see its output. This is safe for nanobot's local, single-user model. Multi-tenant deployments should namespace chat_ids per user (or introduce a per-tenant auth gate) — nanobot does not do this today.

Security Notes

Timing-safe comparison: Static token validation uses hmac.compare_digest to prevent timing attacks.
Defense in depth: allowFrom is checked at both the HTTP handshake level and the message level.
chat_id as capability: see Multi-chat multiplexing. Auth on the WebSocket handshake is the single line of defense; callers who pass it can attach to any chat_id they know.
TLS enforcement: When SSL is enabled, TLSv1.2 is the minimum allowed version.
Default-secure: websocketRequiresToken defaults to true. Explicitly set it to false only on trusted networks.

Media Files

Outbound message events may include a media field containing local filesystem paths. Remote clients cannot access these files directly — they need either:

A shared filesystem mount, or
An HTTP file server serving the nanobot media directory

Common Patterns

Trusted local network (no auth)

{
  "channels": {
    "websocket": {
      "enabled": true,
      "host": "0.0.0.0",
      "port": 8765,
      "websocketRequiresToken": false,
      "allowFrom": ["*"],
      "streaming": true
    }
  }
}

Static token (simple auth)

{
  "channels": {
    "websocket": {
      "enabled": true,
      "token": "my-shared-secret",
      "allowFrom": ["alice", "bob"]
    }
  }
}

Clients connect with ?token=my-shared-secret&client_id=alice.

Public endpoint with issued tokens

{
  "channels": {
    "websocket": {
      "enabled": true,
      "host": "0.0.0.0",
      "port": 8765,
      "path": "/ws",
      "tokenIssuePath": "/auth/token",
      "tokenIssueSecret": "production-secret",
      "websocketRequiresToken": true,
      "sslCertfile": "/etc/ssl/certs/server.pem",
      "sslKeyfile": "/etc/ssl/private/server-key.pem",
      "allowFrom": ["*"]
    }
  }
}

Custom path

{
  "channels": {
    "websocket": {
      "enabled": true,
      "path": "/chat/ws",
      "allowFrom": ["*"]
    }
  }
}

Clients connect to ws://127.0.0.1:8765/chat/ws?client_id=.... Trailing slashes are normalized, so /chat/ws/ works the same.

12 KiB Raw Blame History Unescape Escape