LLM-generated tool calls may wrap URLs in markdown backticks or quotes
(e.g. \https://example.com\), causing urlparse to produce empty scheme
and netloc, which leads to all fetch attempts failing silently.
Add URL cleaning at the top of WebFetchTool.execute to strip whitespace,
backticks, double quotes, and single quotes, plus an early rejection guard
for non-http(s) URLs after cleaning.
Adds Olostep (https://www.olostep.com) as an optional web_search backend
using the official olostep Python SDK (client.answers.create()).
Changes:
- pyproject.toml: adds olostep>=0.1.0 optional dependency
- schema.py: adds olostep to provider comment in WebSearchConfig
- web.py: adds _search_olostep() with lazy import and provider branching
- docs/configuration.md: documents Olostep setup under web search config
- tests: unit tests for the new provider
Backward compatible: existing users see no behavior change unless they
opt into provider: "olostep". No hard dependency at runtime path.
Co-authored-by: umerkay <umerkk164@gmail.com>
A new configuration block has been added for the web fetch tool, which
allows forcing the tool to use the local readability-lxml mode.
Combined with the previous option to modify the user agent, allows
bypassing most Cloudflare captchas and JS proof-of-work.
Assisted-by: Jo'Zahir:Qwen3.6-35B-A3B
DDGS's internal `timeout=10` relies on `requests` read-timeout semantics,
which only measure the gap between bytes — not total wall-clock time.
When the underlying HTTP connection enters CLOSE-WAIT or the server
dribbles data slowly, this timeout never fires, causing `ddgs.text` to
hang indefinitely via `asyncio.to_thread`.
Since `asyncio.to_thread` cannot cancel the underlying OS thread, the
agent's session lock is never released, blocking all subsequent messages
on the same session (observed: 8+ hours of unresponsiveness).
Fix:
- Add `timeout` field to `WebSearchConfig` (default: 30s, configurable
via config.json or NANOBOT_TOOLS__WEB__SEARCH__TIMEOUT env var)
- Wrap `asyncio.to_thread` with `asyncio.wait_for` to enforce a hard
wall-clock deadline
Closes#2804
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep multimodal tool outputs on the native content-block path while
restoring redirect SSRF checks for web_fetch image responses. Also share
image block construction, simplify persisted history sanitization, and
add regression tests for image reads and blocked private redirects.
Made-with: Cursor
- Remove trailing whitespace and normalize blank lines
- Unify string quotes and line breaks for long lines
- Sort imports alphabetically across modules
- Defer Brave API key resolution to execute() time instead of __init__,
so env var or config changes take effect without gateway restart
- Improve error message to reference actual config path
(tools.web.search.apiKey) instead of only mentioning env var
Fixes#1069 (issues 1 and 2 of 3)
Previously, WebSearchTool cached the API key in __init__, so keys added
to config.json or env vars after gateway startup were never picked up.
This caused a confusing 'BRAVE_API_KEY not configured' error even after
the key was correctly set (issue #1069).
Changes:
- Store the init-time key separately, resolve via property at each call
- Improve error message to guide users toward the correct fix
Closes#1069
Add URL validation and redirect limits to web_fetch tool to prevent potential security issues:
- Add _validate_url() function to validate URLs before fetching
- Only allow http:// and https:// schemes (prevent file://, ftp://, etc.)
- Verify URL has valid scheme and domain
- Return descriptive error messages for invalid URLs
- Limit HTTP redirects to 5 (down from default 20) to prevent DoS attacks
- Add MAX_REDIRECTS constant for easy configuration
- Explicitly configure httpx.AsyncClient with max_redirects parameter
- Improve error handling with JSON error responses for validation failures
This addresses security concerns identified in code review where web_fetch
had no URL validation or redirect limits, potentially allowing:
- Unsafe URL schemes (file://, etc.)
- Redirect-based DoS attacks
- Invalid URL formats causing unclear errors