mirror of
https://github.com/HKUDS/nanobot.git
synced 2026-05-19 16:12:30 +00:00
Keep one minimal OpenRouter example and link to Provider Notes for AIHubMix, MiniMax, and Gemini configuration.
230 lines
7.6 KiB
Markdown
230 lines
7.6 KiB
Markdown
# Image Generation
|
|
|
|
nanobot can generate and edit images through the `generate_image` tool. In the WebUI, users can enable **Image Generation** from the composer, choose an aspect ratio, and keep iterating on generated images inside the same chat.
|
|
|
|
The feature is disabled by default. Enable it in `~/.nanobot/config.json`, configure a supported image provider, then restart the gateway.
|
|
|
|
## Quick Setup
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"openrouter": {
|
|
"apiKey": "${OPENROUTER_API_KEY}"
|
|
}
|
|
},
|
|
"tools": {
|
|
"imageGeneration": {
|
|
"enabled": true,
|
|
"provider": "openrouter",
|
|
"model": "openai/gpt-5.4-image-2"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
See [Provider Notes](#provider-notes) for AIHubMix, MiniMax, and Gemini configuration examples.
|
|
|
|
> [!TIP]
|
|
> Prefer environment variables for API keys. nanobot resolves `${VAR_NAME}` values from the environment at startup.
|
|
|
|
## WebUI Usage
|
|
|
|
In the WebUI composer:
|
|
|
|
1. Click **Image Generation**.
|
|
2. Choose an aspect ratio: `Auto`, `1:1`, `3:4`, `9:16`, `4:3`, or `16:9`.
|
|
3. Describe the image or the edit you want.
|
|
4. Attach reference images when editing an existing image.
|
|
|
|
Generated images are rendered as assistant media in the chat. Follow-up prompts such as "make it warmer", "change the background", or "try a 16:9 version" can reuse the most recent generated artifact.
|
|
|
|
The WebUI hides provider storage details from the user. The agent sees the saved artifact path internally and can pass it back to `generate_image` as `reference_images` for iterative edits.
|
|
|
|
## Configuration Reference
|
|
|
|
| Option | Type | Default | Description |
|
|
|--------|------|---------|-------------|
|
|
| `tools.imageGeneration.enabled` | boolean | `false` | Register the `generate_image` tool |
|
|
| `tools.imageGeneration.provider` | string | `"openrouter"` | Image provider name. Supported values: `openrouter`, `aihubmix`, `minimax`, `gemini` |
|
|
| `tools.imageGeneration.model` | string | `"openai/gpt-5.4-image-2"` | Provider model name |
|
|
| `tools.imageGeneration.defaultAspectRatio` | string | `"1:1"` | Default ratio when the prompt/tool call does not specify one |
|
|
| `tools.imageGeneration.defaultImageSize` | string | `"1K"` | Default size hint, for example `1K`, `2K`, `4K`, or `1024x1024` |
|
|
| `tools.imageGeneration.maxImagesPerTurn` | number | `4` | Maximum `count` accepted by one tool call. Valid range: `1` to `8` |
|
|
| `tools.imageGeneration.saveDir` | string | `"generated"` | Relative directory under nanobot's media directory for generated artifacts |
|
|
|
|
Provider settings reuse normal provider config fields:
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `providers.<name>.apiKey` | Provider API key. Prefer `${ENV_VAR}` |
|
|
| `providers.<name>.apiBase` | Optional custom base URL |
|
|
| `providers.<name>.extraHeaders` | Headers merged into provider requests |
|
|
| `providers.<name>.extraBody` | Extra JSON fields merged into provider request bodies |
|
|
|
|
Both camelCase and snake_case config keys are accepted, but docs use camelCase to match `config.json`.
|
|
|
|
## Provider Notes
|
|
|
|
### OpenRouter
|
|
|
|
OpenRouter uses a chat-completions style image response. Configure:
|
|
|
|
```json
|
|
{
|
|
"tools": {
|
|
"imageGeneration": {
|
|
"enabled": true,
|
|
"provider": "openrouter",
|
|
"model": "openai/gpt-5.4-image-2"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Use a model that supports image generation and image editing if you want reference-image edits.
|
|
|
|
### AIHubMix
|
|
|
|
AIHubMix `gpt-image-2-free` is supported through AIHubMix's unified predictions API. Internally nanobot calls:
|
|
|
|
```text
|
|
/v1/models/openai/gpt-image-2-free/predictions
|
|
```
|
|
|
|
Configure:
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"aihubmix": {
|
|
"apiKey": "${AIHUBMIX_API_KEY}",
|
|
"extraBody": {
|
|
"quality": "low"
|
|
}
|
|
}
|
|
},
|
|
"tools": {
|
|
"imageGeneration": {
|
|
"enabled": true,
|
|
"provider": "aihubmix",
|
|
"model": "gpt-image-2-free"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
`quality: low` is optional. It can make free image models faster and less likely to time out, but it is not required for correctness.
|
|
|
|
### MiniMax
|
|
|
|
MiniMax `image-01` supports text-to-image and reference-image (subject reference) edits. Supported aspect ratios are `1:1`, `16:9`, `4:3`, `3:2`, `2:3`, `3:4`, `9:16`, and `21:9`.
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"minimax": {
|
|
"apiKey": "${MINIMAX_API_KEY}"
|
|
}
|
|
},
|
|
"tools": {
|
|
"imageGeneration": {
|
|
"enabled": true,
|
|
"provider": "minimax",
|
|
"model": "image-01",
|
|
"defaultAspectRatio": "1:1"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Gemini
|
|
|
|
nanobot supports two Gemini image generation model families via Google's Generative Language API:
|
|
|
|
| Model | Endpoint | Reference images |
|
|
|-------|----------|-----------------|
|
|
| `imagen-4.0-generate-001` | `:predict` | Not supported by this integration |
|
|
| `gemini-2.5-flash-image` | `:generateContent` | Supported |
|
|
|
|
For reference-image edits, use a Gemini Flash image model:
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"gemini": {
|
|
"apiKey": "${GEMINI_API_KEY}"
|
|
}
|
|
},
|
|
"tools": {
|
|
"imageGeneration": {
|
|
"enabled": true,
|
|
"provider": "gemini",
|
|
"model": "gemini-2.5-flash-image"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Imagen 4 supports the aspect ratios `1:1`, `9:16`, `16:9`, `3:4`, and `4:3`. Unsupported ratios are ignored and the model uses its default. The `defaultImageSize` setting has no effect on Gemini models; sizing is controlled by `defaultAspectRatio` only. Reference images passed with an Imagen model are ignored (with a warning logged).
|
|
|
|
## Artifacts
|
|
|
|
Generated images are stored under the active nanobot instance's media directory:
|
|
|
|
```text
|
|
~/.nanobot/media/generated/YYYY-MM-DD/img_<id>.<ext>
|
|
~/.nanobot/media/generated/YYYY-MM-DD/img_<id>.json
|
|
```
|
|
|
|
For non-default config locations, the media directory is relative to the active config file's directory.
|
|
|
|
The JSON sidecar stores:
|
|
|
|
| Field | Meaning |
|
|
|-------|---------|
|
|
| `id` | Short generated image id, such as `img_ab12cd34ef56` |
|
|
| `path` | Local image path used internally for follow-up edits |
|
|
| `mime` | Detected image MIME type |
|
|
| `prompt` | Prompt used for the generation |
|
|
| `model` | Provider model |
|
|
| `provider` | Provider name |
|
|
| `source_images` | Reference image paths used for edits |
|
|
| `created_at` | Creation timestamp |
|
|
|
|
Do not paste base64 image payloads into chat. The agent should keep local artifact paths internal unless the user explicitly asks for debugging details.
|
|
|
|
## Prompting
|
|
|
|
Good image prompts include:
|
|
|
|
- Subject and scene.
|
|
- Composition, camera, or layout.
|
|
- Style, mood, lighting, and color palette.
|
|
- Exact text that must appear in the image, quoted.
|
|
- Constraints such as "keep the same character" or "preserve the logo".
|
|
|
|
Example:
|
|
|
|
```text
|
|
A minimal app icon for nanobot: friendly robot head, rounded square, soft blue and white palette, clean vector style, no text
|
|
```
|
|
|
|
For edits, describe what should change and what must stay fixed:
|
|
|
|
```text
|
|
Use the reference image. Keep the same robot and composition, change the palette to warm orange, and add a subtle sunrise background.
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
| Symptom | Check |
|
|
|---------|-------|
|
|
| `generate_image` is not available | Set `tools.imageGeneration.enabled` to `true` and restart the gateway |
|
|
| Missing API key error | Configure `providers.<provider>.apiKey`; if using `${VAR_NAME}`, confirm the environment variable is visible to the gateway process |
|
|
| `unsupported image generation provider` | Use `openrouter`, `aihubmix`, `minimax`, or `gemini` |
|
|
| AIHubMix says `Incorrect model ID` | Use `model: "gpt-image-2-free"`; nanobot expands it to the required `openai/gpt-image-2-free` model path internally |
|
|
| Generation times out | Try a smaller/default image size, set AIHubMix `extraBody.quality` to `"low"`, or retry later |
|
|
| Reference image rejected | Reference image paths must be inside the workspace or nanobot media directory and must be valid image files |
|
|
|