From 4c25b739b5f5d3fa0a96b80731a6f1ef21a9fe2f Mon Sep 17 00:00:00 2001 From: Mizarka Date: Wed, 22 Apr 2026 09:42:03 +0000 Subject: [PATCH] docs: add new web tool settings --- docs/configuration.md | 91 ++++++++++++++++++++++++++++++------------- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index 8cd7dd339..c0b7bb97b 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -474,19 +474,21 @@ When a channel `send()` raises, nanobot retries at the channel-manager layer. By > > If a channel is completely unreachable, nanobot cannot notify the user through that same channel. Watch logs for `Failed to send to {channel} after N attempts` to spot persistent delivery failures. -## Web Search +## Web Tools -> [!TIP] -> Use `proxy` in `tools.web` to route all web requests (search + fetch) through a proxy: -> ```json -> { "tools": { "web": { "proxy": "http://127.0.0.1:7890" } } } -> ``` +nanobot incorporates basic tools for accessing the web. These include searching via APIs, and fetching arbitrary web pages in Markdown format. They are enabled by default, and can be configured in `~/.nanobot/config.json` under `tools.web`. -nanobot supports multiple web search providers. Configure in `~/.nanobot/config.json` under `tools.web.search`. +If you want to disable them, which removes both `web_search` and `web_fetch` from the tool list sent to the LLM, set `tools.web.enable` to `false`: -By default, web tools are enabled and web search uses `duckduckgo`, so search works out of the box without an API key. - -If you want to disable all built-in web tools entirely, set `tools.web.enable` to `false`. This removes both `web_search` and `web_fetch` from the tool list sent to the LLM. +```json +{ + "tools": { + "web": { + "enable": false + } + } +} +``` If you need to allow trusted private ranges such as Tailscale / CGNAT addresses, you can explicitly exempt them from SSRF blocking with `tools.ssrfWhitelist`: @@ -498,6 +500,26 @@ If you need to allow trusted private ranges such as Tailscale / CGNAT addresses, } ``` +> [!TIP] +> Use `proxy` in `tools.web` to route all web requests (search + fetch) through a proxy: +> ```json +> { "tools": { "web": { "proxy": "http://127.0.0.1:7890" } } } +> ``` + +### `tools.web` + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `enable` | boolean | `true` | Enable or disable all built-in web tools (`web_search` + `web_fetch`) | +| `proxy` | string or null | `null` | Proxy for all web requests, for example `http://127.0.0.1:7890` | +| `userAgent` | string or null | `null` | User-Agent header for all web requests. If null, a browser one will be used | + +### Web Search + +nanobot supports multiple web search providers. Configure in `~/.nanobot/config.json` under `tools.web.search`. + +By default, web search uses `duckduckgo`, and it works out of the box without an API key. + | Provider | Config fields | Env var fallback | Free | |----------|--------------|------------------|------| | `brave` | `apiKey` | `BRAVE_API_KEY` | No | @@ -507,17 +529,6 @@ If you need to allow trusted private ranges such as Tailscale / CGNAT addresses, | `searxng` | `baseUrl` | `SEARXNG_BASE_URL` | Yes (self-hosted) | | `duckduckgo` (default) | — | — | Yes | -**Disable all built-in web tools:** -```json -{ - "tools": { - "web": { - "enable": false - } - } -} -``` - **Brave:** ```json { @@ -601,13 +612,7 @@ If you need to allow trusted private ranges such as Tailscale / CGNAT addresses, } ``` -| Option | Type | Default | Description | -|--------|------|---------|-------------| -| `enable` | boolean | `true` | Enable or disable all built-in web tools (`web_search` + `web_fetch`) | -| `proxy` | string or null | `null` | Proxy for all web requests, for example `http://127.0.0.1:7890` | -| `userAgent` | string or null | `null` | User agent header for all web requests. If null, a browser one will be used | - -### `tools.web.search` +#### `tools.web.search` | Option | Type | Default | Description | |--------|------|---------|-------------| @@ -616,6 +621,36 @@ If you need to allow trusted private ranges such as Tailscale / CGNAT addresses, | `baseUrl` | string | `""` | Base URL for SearXNG | | `maxResults` | integer | `5` | Results per search (1–10) | +### Web Fetch + +> [!TIP] +> If you are having issues with JS proof-of-work or Cloudflare captchas, set a random user agent and disable Jina Reader: +> ```json +> { "tools": { "web": { "userAgent": "Not-A-Browser", "fetch": { "useJinaReader": false } } } } +> ``` + +nanobot by default uses [Jina Reader](https://jina.ai/reader/), a third-party API, to convert arbitrary pages into Markdown format for easy digestion by the LLM, with a local fallback based on [readability-lxml](https://github.com/buriy/python-readability) if the former fails. + +If you want to always use the local conversion, you can force it using: + +```json +{ + "tools": { + "web": { + "fetch": { + "useJinaReader": false + } + } + } +} +``` + +#### `tools.web.fetch` + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `useJinaReader` | boolean | `true` | If true, Jina Reader will be preferred over the local conversion | + ## MCP (Model Context Protocol) > [!TIP]