29 Commits

Author SHA1 Message Date
Xubin Ren
aea5948b11 fix(tools): tighten web fetch URL cleaning
Made-with: Cursor
2026-05-01 19:58:19 +08:00
彭星杰
5dc96505e8 fix(web_fetch): sanitize URL to strip markdown backticks and quotes before validation
LLM-generated tool calls may wrap URLs in markdown backticks or quotes
(e.g. \https://example.com\), causing urlparse to produce empty scheme
and netloc, which leads to all fetch attempts failing silently.

Add URL cleaning at the top of WebFetchTool.execute to strip whitespace,
backticks, double quotes, and single quotes, plus an early rejection guard
for non-http(s) URLs after cleaning.
2026-05-01 19:58:19 +08:00
chengyongru
28f9bbff31 feat(web_search): add olostep provider
Adds Olostep (https://www.olostep.com) as an optional web_search backend
using the official olostep Python SDK (client.answers.create()).

Changes:
- pyproject.toml: adds olostep>=0.1.0 optional dependency
- schema.py: adds olostep to provider comment in WebSearchConfig
- web.py: adds _search_olostep() with lazy import and provider branching
- docs/configuration.md: documents Olostep setup under web search config
- tests: unit tests for the new provider

Backward compatible: existing users see no behavior change unless they
opt into provider: "olostep". No hard dependency at runtime path.

Co-authored-by: umerkay <umerkk164@gmail.com>
2026-04-28 19:09:38 +08:00
Xubin Ren
f4d8783f5e test(web): cover configurable fetch behavior
Ensure custom user agents are applied to direct web requests and disabling Jina Reader forces the local readability path.

Made-with: Cursor
2026-04-28 07:25:47 +00:00
Mizarka
3d40e159ae
feat(web-tools): add option to disable fetching via Jina Reader
A new configuration block has been added for the web fetch tool, which
allows forcing the tool to use the local readability-lxml mode.

Combined with the previous option to modify the user agent, allows
bypassing most Cloudflare captchas and JS proof-of-work.

Assisted-by: Jo'Zahir:Qwen3.6-35B-A3B
2026-04-22 09:28:30 +00:00
Mizarka
ec2f0ccfdb
feat(web-tools): add configurable User-Agent
Assisted-by: Jo'Zahir:Qwen3.6-35B-A3B
2026-04-22 09:11:57 +00:00
yeyitech
ee061f0595 fix(web): serialize duckduckgo search calls 2026-04-14 14:10:06 +08:00
Mike Terhar
d3aa209cf6 add kagi web search tool 2026-04-11 16:53:05 +08:00
Xubin Ren
edb821e10d feat(agent): prompt behavior directives, tool descriptions, and loop robustness 2026-04-08 02:22:25 +08:00
hoaresky
6bd2950b99 Fix: add asyncio timeout guard for DuckDuckGo search
DDGS's internal `timeout=10` relies on `requests` read-timeout semantics,
which only measure the gap between bytes — not total wall-clock time.
When the underlying HTTP connection enters CLOSE-WAIT or the server
dribbles data slowly, this timeout never fires, causing `ddgs.text` to
hang indefinitely via `asyncio.to_thread`.

Since `asyncio.to_thread` cannot cancel the underlying OS thread, the
agent's session lock is never released, blocking all subsequent messages
on the same session (observed: 8+ hours of unresponsiveness).

Fix:
- Add `timeout` field to `WebSearchConfig` (default: 30s, configurable
  via config.json or NANOBOT_TOOLS__WEB__SEARCH__TIMEOUT env var)
- Wrap `asyncio.to_thread` with `asyncio.wait_for` to enforce a hard
  wall-clock deadline

Closes #2804

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 02:21:51 +08:00
KimGLee
f422de8084 fix(web-search): fix Jina search format and fallback 2026-04-06 02:06:00 +08:00
Jack Lu
e7798a28ee refactor(tools): streamline Tool class and add JSON Schema for parameters
Refactor Tool methods and type handling; introduce JSON Schema support for tool parameters (schema module, validation tests).

Made-with: Cursor
2026-04-04 19:58:44 +08:00
Xubin Ren
fbedf7ad77 feat: harden agent runtime for long-running tasks 2026-04-01 19:12:49 +00:00
Xubin Ren
445a96ab55 fix(agent): harden multimodal tool result flow
Keep multimodal tool outputs on the native content-block path while
restoring redirect SSRF checks for web_fetch image responses. Also share
image block construction, simplify persisted history sanitization, and
add regression tests for image reads and blocked private redirects.

Made-with: Cursor
2026-03-21 05:34:56 +00:00
vandazia
71a88da186 feat: implement native multimodal autonomous sensory capabilities 2026-03-20 22:00:38 +08:00
Xubin Ren
6e2b6396a4 security: add SSRF protection, untrusted content marking, and internal URL blocking 2026-03-16 15:05:26 +08:00
Xubin Ren
ca5047b602 feat(web): multi-provider web search + Jina Reader fetch 2026-03-13 05:44:16 +00:00
Re-bin
15529c668e fix(web): sanitize proxy logs and polish search key hint 2026-03-01 12:53:18 +00:00
chengyongru
82be2ae1a5 feat(tool): add web search proxy 2026-03-01 16:51:54 +08:00
JK_Lu
977ca725f2 style: unify code formatting and import order
- Remove trailing whitespace and normalize blank lines
- Unify string quotes and line breaks for long lines
- Sort imports alphabetically across modules
2026-02-28 20:55:43 +08:00
Yongfeng Huang
7a3788fee9 fix(web): use self.api_key instead of undefined api_key
Made-with: Cursor
2026-02-26 15:43:04 +08:00
Re-bin
4b9ffea3fc merge origin/main into pr-1071, adopt @property api_key pattern 2026-02-24 13:41:49 +00:00
Re-bin
cda3a02f68 style(web): inline api key resolution, remove unnecessary method 2026-02-24 11:18:33 +00:00
coldxiangyu
ef57225974 fix(web): resolve API key on each call + improve error message
- Defer Brave API key resolution to execute() time instead of __init__,
  so env var or config changes take effect without gateway restart
- Improve error message to reference actual config path
  (tools.web.search.apiKey) instead of only mentioning env var

Fixes #1069 (issues 1 and 2 of 3)
2026-02-24 18:19:47 +08:00
haosenwang1018
eeaad6e0c2 fix: resolve API key at call time so config changes take effect without restart
Previously, WebSearchTool cached the API key in __init__, so keys added
to config.json or env vars after gateway startup were never picked up.
This caused a confusing 'BRAVE_API_KEY not configured' error even after
the key was correctly set (issue #1069).

Changes:
- Store the init-time key separately, resolve via property at each call
- Improve error message to guide users toward the correct fix

Closes #1069
2026-02-24 04:06:22 +08:00
chtangwin
a2379a08ac Fix: Ensure UTF-8 encoding and ensure_ascii=False for remaining file/JSON operations 2026-02-18 18:37:17 -08:00
Re-bin
1a784fca1e refactor: simplify _validate_url function 2026-02-03 17:13:30 +00:00
Cheng Wang
ea849650ef feat: improve web_fetch URL validation and security
Add URL validation and redirect limits to web_fetch tool to prevent potential security issues:

- Add _validate_url() function to validate URLs before fetching
  - Only allow http:// and https:// schemes (prevent file://, ftp://, etc.)
  - Verify URL has valid scheme and domain
  - Return descriptive error messages for invalid URLs

- Limit HTTP redirects to 5 (down from default 20) to prevent DoS attacks
  - Add MAX_REDIRECTS constant for easy configuration
  - Explicitly configure httpx.AsyncClient with max_redirects parameter

- Improve error handling with JSON error responses for validation failures

This addresses security concerns identified in code review where web_fetch
had no URL validation or redirect limits, potentially allowing:
- Unsafe URL schemes (file://, etc.)
- Redirect-based DoS attacks
- Invalid URL formats causing unclear errors
2026-02-02 19:34:22 +08:00
Re-bin
d4cc48afd5 🐈nanobot: hello world! 2026-02-01 07:36:42 +00:00