Initial commit with translated description
This commit is contained in:
130
SKILL.md
Normal file
130
SKILL.md
Normal file
@@ -0,0 +1,130 @@
|
|||||||
|
---
|
||||||
|
name: nano-banana-pro
|
||||||
|
description: "使用Nano Banana Pro生成/编辑图像。"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Nano Banana Pro Image Generation & Editing
|
||||||
|
|
||||||
|
Generate new images or edit existing ones using Google's Nano Banana Pro API (Gemini 3 Pro Image).
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Run the script using absolute path (do NOT cd to skill directory first):
|
||||||
|
|
||||||
|
**Generate new image:**
|
||||||
|
```bash
|
||||||
|
uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "your image description" --filename "output-name.png" [--resolution 1K|2K|4K] [--api-key KEY]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Edit existing image:**
|
||||||
|
```bash
|
||||||
|
uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "editing instructions" --filename "output-name.png" --input-image "path/to/input.png" [--resolution 1K|2K|4K] [--api-key KEY]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:** Always run from the user's current working directory so images are saved where the user is working, not in the skill directory.
|
||||||
|
|
||||||
|
## Default Workflow (draft → iterate → final)
|
||||||
|
|
||||||
|
Goal: fast iteration without burning time on 4K until the prompt is correct.
|
||||||
|
|
||||||
|
- Draft (1K): quick feedback loop
|
||||||
|
- `uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "<draft prompt>" --filename "yyyy-mm-dd-hh-mm-ss-draft.png" --resolution 1K`
|
||||||
|
- Iterate: adjust prompt in small diffs; keep filename new per run
|
||||||
|
- If editing: keep the same `--input-image` for every iteration until you’re happy.
|
||||||
|
- Final (4K): only when prompt is locked
|
||||||
|
- `uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "<final prompt>" --filename "yyyy-mm-dd-hh-mm-ss-final.png" --resolution 4K`
|
||||||
|
|
||||||
|
## Resolution Options
|
||||||
|
|
||||||
|
The Gemini 3 Pro Image API supports three resolutions (uppercase K required):
|
||||||
|
|
||||||
|
- **1K** (default) - ~1024px resolution
|
||||||
|
- **2K** - ~2048px resolution
|
||||||
|
- **4K** - ~4096px resolution
|
||||||
|
|
||||||
|
Map user requests to API parameters:
|
||||||
|
- No mention of resolution → `1K`
|
||||||
|
- "low resolution", "1080", "1080p", "1K" → `1K`
|
||||||
|
- "2K", "2048", "normal", "medium resolution" → `2K`
|
||||||
|
- "high resolution", "high-res", "hi-res", "4K", "ultra" → `4K`
|
||||||
|
|
||||||
|
## API Key
|
||||||
|
|
||||||
|
The script checks for API key in this order:
|
||||||
|
1. `--api-key` argument (use if user provided key in chat)
|
||||||
|
2. `GEMINI_API_KEY` environment variable
|
||||||
|
|
||||||
|
If neither is available, the script exits with an error message.
|
||||||
|
|
||||||
|
## Preflight + Common Failures (fast fixes)
|
||||||
|
|
||||||
|
- Preflight:
|
||||||
|
- `command -v uv` (must exist)
|
||||||
|
- `test -n \"$GEMINI_API_KEY\"` (or pass `--api-key`)
|
||||||
|
- If editing: `test -f \"path/to/input.png\"`
|
||||||
|
|
||||||
|
- Common failures:
|
||||||
|
- `Error: No API key provided.` → set `GEMINI_API_KEY` or pass `--api-key`
|
||||||
|
- `Error loading input image:` → wrong path / unreadable file; verify `--input-image` points to a real image
|
||||||
|
- “quota/permission/403” style API errors → wrong key, no access, or quota exceeded; try a different key/account
|
||||||
|
|
||||||
|
## Filename Generation
|
||||||
|
|
||||||
|
Generate filenames with the pattern: `yyyy-mm-dd-hh-mm-ss-name.png`
|
||||||
|
|
||||||
|
**Format:** `{timestamp}-{descriptive-name}.png`
|
||||||
|
- Timestamp: Current date/time in format `yyyy-mm-dd-hh-mm-ss` (24-hour format)
|
||||||
|
- Name: Descriptive lowercase text with hyphens
|
||||||
|
- Keep the descriptive part concise (1-5 words typically)
|
||||||
|
- Use context from user's prompt or conversation
|
||||||
|
- If unclear, use random identifier (e.g., `x9k2`, `a7b3`)
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- Prompt "A serene Japanese garden" → `2025-11-23-14-23-05-japanese-garden.png`
|
||||||
|
- Prompt "sunset over mountains" → `2025-11-23-15-30-12-sunset-mountains.png`
|
||||||
|
- Prompt "create an image of a robot" → `2025-11-23-16-45-33-robot.png`
|
||||||
|
- Unclear context → `2025-11-23-17-12-48-x9k2.png`
|
||||||
|
|
||||||
|
## Image Editing
|
||||||
|
|
||||||
|
When the user wants to modify an existing image:
|
||||||
|
1. Check if they provide an image path or reference an image in the current directory
|
||||||
|
2. Use `--input-image` parameter with the path to the image
|
||||||
|
3. The prompt should contain editing instructions (e.g., "make the sky more dramatic", "remove the person", "change to cartoon style")
|
||||||
|
4. Common editing tasks: add/remove elements, change style, adjust colors, blur background, etc.
|
||||||
|
|
||||||
|
## Prompt Handling
|
||||||
|
|
||||||
|
**For generation:** Pass user's image description as-is to `--prompt`. Only rework if clearly insufficient.
|
||||||
|
|
||||||
|
**For editing:** Pass editing instructions in `--prompt` (e.g., "add a rainbow in the sky", "make it look like a watercolor painting")
|
||||||
|
|
||||||
|
Preserve user's creative intent in both cases.
|
||||||
|
|
||||||
|
## Prompt Templates (high hit-rate)
|
||||||
|
|
||||||
|
Use templates when the user is vague or when edits must be precise.
|
||||||
|
|
||||||
|
- Generation template:
|
||||||
|
- “Create an image of: <subject>. Style: <style>. Composition: <camera/shot>. Lighting: <lighting>. Background: <background>. Color palette: <palette>. Avoid: <list>.”
|
||||||
|
|
||||||
|
- Editing template (preserve everything else):
|
||||||
|
- “Change ONLY: <single change>. Keep identical: subject, composition/crop, pose, lighting, color palette, background, text, and overall style. Do not add new objects. If text exists, keep it unchanged.”
|
||||||
|
|
||||||
|
## Output
|
||||||
|
|
||||||
|
- Saves PNG to current directory (or specified path if filename includes directory)
|
||||||
|
- Script outputs the full path to the generated image
|
||||||
|
- **Do not read the image back** - just inform the user of the saved path
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
**Generate new image:**
|
||||||
|
```bash
|
||||||
|
uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "A serene Japanese garden with cherry blossoms" --filename "2025-11-23-14-23-05-japanese-garden.png" --resolution 4K
|
||||||
|
```
|
||||||
|
|
||||||
|
**Edit existing image:**
|
||||||
|
```bash
|
||||||
|
uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "make the sky more dramatic with storm clouds" --filename "2025-11-23-14-25-30-dramatic-sky.png" --input-image "original-photo.jpg" --resolution 2K
|
||||||
|
```
|
||||||
6
_meta.json
Normal file
6
_meta.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"ownerId": "kn70pywhg0fyz996kpa8xj89s57yhv26",
|
||||||
|
"slug": "nano-banana-pro",
|
||||||
|
"version": "1.0.1",
|
||||||
|
"publishedAt": 1767651987917
|
||||||
|
}
|
||||||
167
scripts/generate_image.py
Normal file
167
scripts/generate_image.py
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
# /// script
|
||||||
|
# requires-python = ">=3.10"
|
||||||
|
# dependencies = [
|
||||||
|
# "google-genai>=1.0.0",
|
||||||
|
# "pillow>=10.0.0",
|
||||||
|
# ]
|
||||||
|
# ///
|
||||||
|
"""
|
||||||
|
Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) API.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
uv run generate_image.py --prompt "your image description" --filename "output.png" [--resolution 1K|2K|4K] [--api-key KEY]
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def get_api_key(provided_key: str | None) -> str | None:
|
||||||
|
"""Get API key from argument first, then environment."""
|
||||||
|
if provided_key:
|
||||||
|
return provided_key
|
||||||
|
return os.environ.get("GEMINI_API_KEY")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Generate images using Nano Banana Pro (Gemini 3 Pro Image)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--prompt", "-p",
|
||||||
|
required=True,
|
||||||
|
help="Image description/prompt"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--filename", "-f",
|
||||||
|
required=True,
|
||||||
|
help="Output filename (e.g., sunset-mountains.png)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--input-image", "-i",
|
||||||
|
help="Optional input image path for editing/modification"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--resolution", "-r",
|
||||||
|
choices=["1K", "2K", "4K"],
|
||||||
|
default="1K",
|
||||||
|
help="Output resolution: 1K (default), 2K, or 4K"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--api-key", "-k",
|
||||||
|
help="Gemini API key (overrides GEMINI_API_KEY env var)"
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Get API key
|
||||||
|
api_key = get_api_key(args.api_key)
|
||||||
|
if not api_key:
|
||||||
|
print("Error: No API key provided.", file=sys.stderr)
|
||||||
|
print("Please either:", file=sys.stderr)
|
||||||
|
print(" 1. Provide --api-key argument", file=sys.stderr)
|
||||||
|
print(" 2. Set GEMINI_API_KEY environment variable", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Import here after checking API key to avoid slow import on error
|
||||||
|
from google import genai
|
||||||
|
from google.genai import types
|
||||||
|
from PIL import Image as PILImage
|
||||||
|
|
||||||
|
# Initialise client
|
||||||
|
client = genai.Client(api_key=api_key)
|
||||||
|
|
||||||
|
# Set up output path
|
||||||
|
output_path = Path(args.filename)
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Load input image if provided
|
||||||
|
input_image = None
|
||||||
|
output_resolution = args.resolution
|
||||||
|
if args.input_image:
|
||||||
|
try:
|
||||||
|
input_image = PILImage.open(args.input_image)
|
||||||
|
print(f"Loaded input image: {args.input_image}")
|
||||||
|
|
||||||
|
# Auto-detect resolution if not explicitly set by user
|
||||||
|
if args.resolution == "1K": # Default value
|
||||||
|
# Map input image size to resolution
|
||||||
|
width, height = input_image.size
|
||||||
|
max_dim = max(width, height)
|
||||||
|
if max_dim >= 3000:
|
||||||
|
output_resolution = "4K"
|
||||||
|
elif max_dim >= 1500:
|
||||||
|
output_resolution = "2K"
|
||||||
|
else:
|
||||||
|
output_resolution = "1K"
|
||||||
|
print(f"Auto-detected resolution: {output_resolution} (from input {width}x{height})")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error loading input image: {e}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Build contents (image first if editing, prompt only if generating)
|
||||||
|
if input_image:
|
||||||
|
contents = [input_image, args.prompt]
|
||||||
|
print(f"Editing image with resolution {output_resolution}...")
|
||||||
|
else:
|
||||||
|
contents = args.prompt
|
||||||
|
print(f"Generating image with resolution {output_resolution}...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = client.models.generate_content(
|
||||||
|
model="gemini-3-pro-image-preview",
|
||||||
|
contents=contents,
|
||||||
|
config=types.GenerateContentConfig(
|
||||||
|
response_modalities=["TEXT", "IMAGE"],
|
||||||
|
image_config=types.ImageConfig(
|
||||||
|
image_size=output_resolution
|
||||||
|
)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Process response and convert to PNG
|
||||||
|
image_saved = False
|
||||||
|
for part in response.parts:
|
||||||
|
if part.text is not None:
|
||||||
|
print(f"Model response: {part.text}")
|
||||||
|
elif part.inline_data is not None:
|
||||||
|
# Convert inline data to PIL Image and save as PNG
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
# inline_data.data is already bytes, not base64
|
||||||
|
image_data = part.inline_data.data
|
||||||
|
if isinstance(image_data, str):
|
||||||
|
# If it's a string, it might be base64
|
||||||
|
import base64
|
||||||
|
image_data = base64.b64decode(image_data)
|
||||||
|
|
||||||
|
image = PILImage.open(BytesIO(image_data))
|
||||||
|
|
||||||
|
# Ensure RGB mode for PNG (convert RGBA to RGB with white background if needed)
|
||||||
|
if image.mode == 'RGBA':
|
||||||
|
rgb_image = PILImage.new('RGB', image.size, (255, 255, 255))
|
||||||
|
rgb_image.paste(image, mask=image.split()[3])
|
||||||
|
rgb_image.save(str(output_path), 'PNG')
|
||||||
|
elif image.mode == 'RGB':
|
||||||
|
image.save(str(output_path), 'PNG')
|
||||||
|
else:
|
||||||
|
image.convert('RGB').save(str(output_path), 'PNG')
|
||||||
|
image_saved = True
|
||||||
|
|
||||||
|
if image_saved:
|
||||||
|
full_path = output_path.resolve()
|
||||||
|
print(f"\nImage saved: {full_path}")
|
||||||
|
else:
|
||||||
|
print("Error: No image was generated in the response.", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error generating image: {e}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Reference in New Issue
Block a user