Initial commit with translated description

This commit is contained in:
2026-03-29 10:19:02 +08:00
commit c027104b0b
6 changed files with 455 additions and 0 deletions

83
README.md Normal file
View File

@@ -0,0 +1,83 @@
# Markdown.new Skill
Single-skill repository for `markdown-new` - official Cloudflare URL-to-Markdown service ([markdown.new](https://markdown.new/)) converted into a skill.
Skill entrypoint:
- `markdown-new/SKILL.md`
## What It Does
`markdown-new` converts public web pages into LLM-ready Markdown using [markdown.new](https://markdown.new), with:
- URL-to-Markdown conversion for summarization, extraction, RAG, and archiving
- conversion fallback control (`auto`, `ai`, `browser`)
- optional image retention
- optional wrapped delivery mode for downstream parsing
## Path Resolution (Important)
- Relative paths such as `scripts/markdown_new_fetch.py` are relative to the skill directory.
- Do not run `python3 scripts/markdown_new_fetch.py ...` from workspace root unless `scripts/` exists there.
- Safe command from any current directory:
```bash
python3 ~/.codex/skills/markdown-new/scripts/markdown_new_fetch.py 'https://example.com'
```
## Modes
### Conversion Modes (`--method`)
- `auto`: default pipeline, fastest successful path
- `ai`: force Workers AI conversion path
- `browser`: force Browser Rendering for JS-heavy pages
### Output Modes
- default: print Markdown to stdout
- `--output <file>`: write Markdown to file
- `--deliver-md`: write `.md` output with wrapped content; useful for reasoning LLMs on long reads because it reduces format confusion:
```text
<url>
...markdown...
</url>
```
If `--deliver-md` is used without `--output`, filename is auto-generated from the URL.
## How It Works
1. Validate the input URL (`http/https`).
2. Call `POST https://markdown.new/` with `url`, `method`, and `retain_images`.
3. Accept response as either raw markdown or JSON with markdown in `content`.
4. Normalize metadata and choose output behavior.
5. Return stdout by default, `--output` for files, and `--deliver-md` for wrapped `.md` packets.
## Install Paths
- Codex (macOS/Linux): `~/.codex/skills/markdown-new`
- Claude Code (macOS/Linux): `~/.claude/skills/markdown-new`
## Install on macOS/Linux (single command)
### Codex
```bash
mkdir -p ~/.codex/skills && rm -rf ~/.codex/skills/markdown-new && cp -R /Users/pro16/Dropbox/experiments/skills-i-use/markdown-new ~/.codex/skills/
```
### Claude Code
```bash
mkdir -p ~/.claude/skills && rm -rf ~/.claude/skills/markdown-new && cp -R /Users/pro16/Dropbox/experiments/skills-i-use/markdown-new ~/.claude/skills/
```
## Quick Usage
```bash
python3 scripts/markdown_new_fetch.py 'https://example.com'
python3 scripts/markdown_new_fetch.py 'https://example.com' --method browser --retain-images --output page.md
python3 scripts/markdown_new_fetch.py 'https://example.com' --deliver-md
```
## Credits
- `webservervis` for the markdown conversion service powering this skill.

90
SKILL.md Normal file
View File

@@ -0,0 +1,90 @@
---
name: markdown-new
description: "使用markdown.new将公共网页转换为干净的Markdown用于AI工作流。"
---
# Markdown.new
Use this skill to convert public URLs into LLM-ready Markdown via [markdown.new](https://markdown.new).
## Path Resolution (Critical)
- Resolve relative paths like `scripts/...` and `references/...` from the skill directory, not workspace root.
- If current directory is unknown, use an absolute script path.
```bash
python3 ~/.codex/skills/markdown-new/scripts/markdown_new_fetch.py 'https://example.com'
```
```bash
cd ~/.codex/skills/markdown-new
python3 scripts/markdown_new_fetch.py 'https://example.com'
```
Avoid this pattern from an arbitrary workspace root:
```bash
python3 scripts/markdown_new_fetch.py 'https://example.com'
```
## Workflow
1. Validate the input URL is public `http` or `https`.
2. Run `scripts/markdown_new_fetch.py` with `--method auto` first.
3. Re-run with `--method browser` if output misses JS-rendered content.
4. Enable `--retain-images` only when image links are required.
5. Capture response metadata (`x-markdown-tokens`, `x-rate-limit-remaining`, and JSON metadata when present) for downstream planning.
## Quick Start
Commands below assume current directory is the skill root (`~/.codex/skills/markdown-new`).
```bash
python3 scripts/markdown_new_fetch.py 'https://example.com' > page.md
```
```bash
python3 scripts/markdown_new_fetch.py 'https://example.com' --method browser --retain-images --output page.md
```
```bash
python3 scripts/markdown_new_fetch.py 'https://example.com' --deliver-md
```
## Method Selection
- `auto`: default. Let markdown.new use its fastest successful pipeline.
- `ai`: force Workers AI HTML-to-Markdown conversion.
- `browser`: force headless browser rendering for JS-heavy pages.
Use `auto` first, then retry with `browser` only when needed.
## Delivery Mode
- Use `--deliver-md` to force file output in `.md` format.
- In delivery mode, content is wrapped as:
- `<url>`
- `...markdown...`
- `</url>`
- If `--output` is omitted, the script auto-generates a filename from the URL.
## API Modes
- Prefix mode:
- `https://markdown.new/https://example.com?method=browser&retain_images=true`
- POST mode:
- `POST https://markdown.new/`
- JSON body: `{"url":"https://example.com","method":"auto","retain_images":false}`
Prefer POST mode for automation and explicit parameters.
## Limits And Safety
- Treat `429` as rate limiting (documented limit: 500 requests/day/IP).
- Convert only publicly accessible pages.
- Respect `robots.txt`, terms of service, and copyright constraints.
- Do not treat markdown.new output as guaranteed complete for every page; verify critical extractions.
## References
- `references/markdown-new-api.md`

6
_meta.json Normal file
View File

@@ -0,0 +1,6 @@
{
"ownerId": "kn7a3xe57dnqxgg5kfp78twfj98022a5",
"slug": "markdown-convert",
"version": "1.0.0",
"publishedAt": 1771528764013
}

4
agents/openai.yaml Normal file
View File

@@ -0,0 +1,4 @@
interface:
display_name: "Markdown.new"
short_description: "Convert web URLs to LLM-ready Markdown"
default_prompt: "Use $markdown-new to fetch clean Markdown from public URLs for summarization, RAG, or extraction tasks."

View File

@@ -0,0 +1,56 @@
# markdown.new API Reference Notes
## Endpoints
- Prefix conversion: `GET https://markdown.new/<absolute-url>`
- API conversion: `POST https://markdown.new/`
## POST Request
- Content type: `application/json`
- Body fields:
- `url` (string, required): public URL to convert
- `method` (string, optional): `auto` (default), `ai`, or `browser`
- `retain_images` (boolean, optional): `false` (default)
Example:
```json
{
"url": "https://example.com",
"method": "auto",
"retain_images": false
}
```
## Response
- Status: `200` on success
- Prefix mode typically returns Markdown (`text/markdown`)
- POST mode may return JSON (`application/json`) with Markdown in `content`
- Common headers:
- `x-markdown-tokens`: estimated token count for the returned Markdown
- `x-rate-limit-remaining`: remaining requests for current daily quota
## Conversion Pipeline (as documented)
1. Request native Markdown via `Accept: text/markdown`
2. Fall back to Workers AI `toMarkdown()` when HTML is returned
3. Fall back to Browser Rendering for JS-heavy pages
## Operational Notes
- Documented rate limit: 500 requests/day per IP
- `429` indicates rate-limit exhaustion
- Public URLs only; authenticated/paywalled pages may fail
- Browser rendering usually adds latency compared with `auto`/`ai`
## Skill Script Notes
- `scripts/markdown_new_fetch.py --deliver-md` writes a `.md` file and wraps the markdown body with pseudo-XML tags:
```text
<url>
...markdown...
</url>
```

View File

@@ -0,0 +1,216 @@
#!/usr/bin/env python3
"""Convert public URLs to Markdown through markdown.new."""
import argparse
import json
import re
import sys
import urllib.error
import urllib.parse
import urllib.request
from pathlib import Path
from typing import Dict, Optional, Tuple
DEFAULT_API_URL = "https://markdown.new/"
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Fetch Markdown from a public URL using markdown.new"
)
parser.add_argument("url", help="Public URL to convert (http/https)")
parser.add_argument(
"--method",
choices=["auto", "ai", "browser"],
default="auto",
help="Conversion method to request (default: auto)",
)
parser.add_argument(
"--retain-images",
action="store_true",
help="Request image retention in output markdown",
)
parser.add_argument(
"--output",
help="Write markdown to this file instead of stdout",
)
parser.add_argument(
"--timeout",
type=float,
default=45.0,
help="Request timeout in seconds (default: 45)",
)
parser.add_argument(
"--api-url",
default=DEFAULT_API_URL,
help="markdown.new API endpoint (default: https://markdown.new/)",
)
parser.add_argument(
"--show-headers",
action="store_true",
help="Print response headers to stderr",
)
parser.add_argument(
"--deliver-md",
action="store_true",
help=(
"Write output into a .md file and wrap content in pseudo-XML "
"<url>...</url> tags"
),
)
return parser
def validate_url(url: str) -> None:
parsed = urllib.parse.urlparse(url)
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
raise ValueError(f"Invalid URL: {url!r}. Use absolute http/https URL.")
def build_request(api_url: str, payload: Dict[str, object]) -> urllib.request.Request:
data = json.dumps(payload).encode("utf-8")
return urllib.request.Request(
api_url,
data=data,
method="POST",
headers={
"Content-Type": "application/json",
"Accept": "text/markdown",
"User-Agent": "codex-markdown-new-skill/1.0",
},
)
def normalize_body(body: str, headers: Dict[str, str]) -> Tuple[str, Dict[str, str]]:
content_type = headers.get("content-type", "")
if "application/json" not in content_type:
return body, {}
try:
payload = json.loads(body)
except json.JSONDecodeError:
return body, {}
if not isinstance(payload, dict):
return body, {}
markdown = payload.get("content")
if not isinstance(markdown, str):
return body, {}
metadata: Dict[str, str] = {}
for key in ("title", "url", "method", "duration_ms", "timestamp"):
value = payload.get(key)
if value is not None:
metadata[f"response_{key}"] = str(value)
return markdown, metadata
def print_metadata(
headers: Dict[str, str], show_headers: bool, response_meta: Dict[str, str]
) -> None:
tokens = headers.get("x-markdown-tokens")
remaining = headers.get("x-rate-limit-remaining")
if tokens:
print(f"x-markdown-tokens: {tokens}", file=sys.stderr)
if remaining:
print(f"x-rate-limit-remaining: {remaining}", file=sys.stderr)
for key, value in sorted(response_meta.items()):
print(f"{key}: {value}", file=sys.stderr)
if show_headers:
for key, value in sorted(headers.items()):
print(f"{key}: {value}", file=sys.stderr)
def slugify_url(url: str) -> str:
parsed = urllib.parse.urlparse(url)
raw = f"{parsed.netloc}{parsed.path}".strip("/")
if not raw:
raw = parsed.netloc or "page"
slug = re.sub(r"[^a-zA-Z0-9._-]+", "_", raw).strip("_")
return slug or "page"
def resolve_output_path(url: str, output_path: Optional[str], deliver_md: bool) -> Optional[str]:
if output_path:
if deliver_md and not output_path.lower().endswith(".md"):
output_path = f"{output_path}.md"
return output_path
if deliver_md:
return f"{slugify_url(url)}.md"
return None
def wrap_in_url_tag(markdown: str) -> str:
body = markdown.rstrip("\n")
return f"<url>\n{body}\n</url>\n"
def write_output(markdown: str, output_path: Optional[str]) -> None:
if output_path:
path = Path(output_path)
if path.parent != Path("."):
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w", encoding="utf-8") as f:
f.write(markdown)
return
sys.stdout.write(markdown)
def main() -> int:
parser = build_parser()
args = parser.parse_args()
try:
validate_url(args.url)
except ValueError as exc:
print(str(exc), file=sys.stderr)
return 2
payload = {
"url": args.url,
"method": args.method,
"retain_images": bool(args.retain_images),
}
req = build_request(args.api_url, payload)
try:
with urllib.request.urlopen(req, timeout=args.timeout) as resp:
headers = {k.lower(): v for k, v in resp.headers.items()}
body = resp.read().decode("utf-8", errors="replace")
except urllib.error.HTTPError as err:
error_body = err.read().decode("utf-8", errors="replace")
print(f"HTTP {err.code} from markdown.new", file=sys.stderr)
if err.code == 429:
print(
"Rate limit reached (documented: 500 requests/day/IP). Retry later.",
file=sys.stderr,
)
if error_body:
print(error_body.strip(), file=sys.stderr)
return 1
except urllib.error.URLError as err:
print(f"Request failed: {err.reason}", file=sys.stderr)
return 1
markdown, response_meta = normalize_body(body, headers)
print_metadata(headers, args.show_headers, response_meta)
content = wrap_in_url_tag(markdown) if args.deliver_md else markdown
output_path = resolve_output_path(args.url, args.output, args.deliver_md)
write_output(content, output_path)
if args.deliver_md and output_path:
print(f"output_file: {Path(output_path).resolve()}", file=sys.stderr)
return 0
if __name__ == "__main__":
raise SystemExit(main())