Initial commit with translated description
This commit is contained in:
83
README.md
Normal file
83
README.md
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
# Markdown.new Skill
|
||||||
|
|
||||||
|
Single-skill repository for `markdown-new` - official Cloudflare URL-to-Markdown service ([markdown.new](https://markdown.new/)) converted into a skill.
|
||||||
|
|
||||||
|
Skill entrypoint:
|
||||||
|
- `markdown-new/SKILL.md`
|
||||||
|
|
||||||
|
## What It Does
|
||||||
|
|
||||||
|
`markdown-new` converts public web pages into LLM-ready Markdown using [markdown.new](https://markdown.new), with:
|
||||||
|
- URL-to-Markdown conversion for summarization, extraction, RAG, and archiving
|
||||||
|
- conversion fallback control (`auto`, `ai`, `browser`)
|
||||||
|
- optional image retention
|
||||||
|
- optional wrapped delivery mode for downstream parsing
|
||||||
|
|
||||||
|
## Path Resolution (Important)
|
||||||
|
|
||||||
|
- Relative paths such as `scripts/markdown_new_fetch.py` are relative to the skill directory.
|
||||||
|
- Do not run `python3 scripts/markdown_new_fetch.py ...` from workspace root unless `scripts/` exists there.
|
||||||
|
- Safe command from any current directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 ~/.codex/skills/markdown-new/scripts/markdown_new_fetch.py 'https://example.com'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Modes
|
||||||
|
|
||||||
|
### Conversion Modes (`--method`)
|
||||||
|
- `auto`: default pipeline, fastest successful path
|
||||||
|
- `ai`: force Workers AI conversion path
|
||||||
|
- `browser`: force Browser Rendering for JS-heavy pages
|
||||||
|
|
||||||
|
### Output Modes
|
||||||
|
- default: print Markdown to stdout
|
||||||
|
- `--output <file>`: write Markdown to file
|
||||||
|
- `--deliver-md`: write `.md` output with wrapped content; useful for reasoning LLMs on long reads because it reduces format confusion:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<url>
|
||||||
|
...markdown...
|
||||||
|
</url>
|
||||||
|
```
|
||||||
|
|
||||||
|
If `--deliver-md` is used without `--output`, filename is auto-generated from the URL.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. Validate the input URL (`http/https`).
|
||||||
|
2. Call `POST https://markdown.new/` with `url`, `method`, and `retain_images`.
|
||||||
|
3. Accept response as either raw markdown or JSON with markdown in `content`.
|
||||||
|
4. Normalize metadata and choose output behavior.
|
||||||
|
5. Return stdout by default, `--output` for files, and `--deliver-md` for wrapped `.md` packets.
|
||||||
|
|
||||||
|
## Install Paths
|
||||||
|
|
||||||
|
- Codex (macOS/Linux): `~/.codex/skills/markdown-new`
|
||||||
|
- Claude Code (macOS/Linux): `~/.claude/skills/markdown-new`
|
||||||
|
|
||||||
|
## Install on macOS/Linux (single command)
|
||||||
|
|
||||||
|
### Codex
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/.codex/skills && rm -rf ~/.codex/skills/markdown-new && cp -R /Users/pro16/Dropbox/experiments/skills-i-use/markdown-new ~/.codex/skills/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Claude Code
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/.claude/skills && rm -rf ~/.claude/skills/markdown-new && cp -R /Users/pro16/Dropbox/experiments/skills-i-use/markdown-new ~/.claude/skills/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com'
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com' --method browser --retain-images --output page.md
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com' --deliver-md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Credits
|
||||||
|
|
||||||
|
- `webservervis` for the markdown conversion service powering this skill.
|
||||||
90
SKILL.md
Normal file
90
SKILL.md
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
---
|
||||||
|
name: markdown-new
|
||||||
|
description: "使用markdown.new将公共网页转换为干净的Markdown,用于AI工作流。"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Markdown.new
|
||||||
|
|
||||||
|
Use this skill to convert public URLs into LLM-ready Markdown via [markdown.new](https://markdown.new).
|
||||||
|
|
||||||
|
## Path Resolution (Critical)
|
||||||
|
|
||||||
|
- Resolve relative paths like `scripts/...` and `references/...` from the skill directory, not workspace root.
|
||||||
|
- If current directory is unknown, use an absolute script path.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 ~/.codex/skills/markdown-new/scripts/markdown_new_fetch.py 'https://example.com'
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/.codex/skills/markdown-new
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com'
|
||||||
|
```
|
||||||
|
|
||||||
|
Avoid this pattern from an arbitrary workspace root:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. Validate the input URL is public `http` or `https`.
|
||||||
|
2. Run `scripts/markdown_new_fetch.py` with `--method auto` first.
|
||||||
|
3. Re-run with `--method browser` if output misses JS-rendered content.
|
||||||
|
4. Enable `--retain-images` only when image links are required.
|
||||||
|
5. Capture response metadata (`x-markdown-tokens`, `x-rate-limit-remaining`, and JSON metadata when present) for downstream planning.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
Commands below assume current directory is the skill root (`~/.codex/skills/markdown-new`).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com' > page.md
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com' --method browser --retain-images --output page.md
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/markdown_new_fetch.py 'https://example.com' --deliver-md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Method Selection
|
||||||
|
|
||||||
|
- `auto`: default. Let markdown.new use its fastest successful pipeline.
|
||||||
|
- `ai`: force Workers AI HTML-to-Markdown conversion.
|
||||||
|
- `browser`: force headless browser rendering for JS-heavy pages.
|
||||||
|
|
||||||
|
Use `auto` first, then retry with `browser` only when needed.
|
||||||
|
|
||||||
|
## Delivery Mode
|
||||||
|
|
||||||
|
- Use `--deliver-md` to force file output in `.md` format.
|
||||||
|
- In delivery mode, content is wrapped as:
|
||||||
|
- `<url>`
|
||||||
|
- `...markdown...`
|
||||||
|
- `</url>`
|
||||||
|
- If `--output` is omitted, the script auto-generates a filename from the URL.
|
||||||
|
|
||||||
|
## API Modes
|
||||||
|
|
||||||
|
- Prefix mode:
|
||||||
|
- `https://markdown.new/https://example.com?method=browser&retain_images=true`
|
||||||
|
- POST mode:
|
||||||
|
- `POST https://markdown.new/`
|
||||||
|
- JSON body: `{"url":"https://example.com","method":"auto","retain_images":false}`
|
||||||
|
|
||||||
|
Prefer POST mode for automation and explicit parameters.
|
||||||
|
|
||||||
|
## Limits And Safety
|
||||||
|
|
||||||
|
- Treat `429` as rate limiting (documented limit: 500 requests/day/IP).
|
||||||
|
- Convert only publicly accessible pages.
|
||||||
|
- Respect `robots.txt`, terms of service, and copyright constraints.
|
||||||
|
- Do not treat markdown.new output as guaranteed complete for every page; verify critical extractions.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `references/markdown-new-api.md`
|
||||||
6
_meta.json
Normal file
6
_meta.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"ownerId": "kn7a3xe57dnqxgg5kfp78twfj98022a5",
|
||||||
|
"slug": "markdown-convert",
|
||||||
|
"version": "1.0.0",
|
||||||
|
"publishedAt": 1771528764013
|
||||||
|
}
|
||||||
4
agents/openai.yaml
Normal file
4
agents/openai.yaml
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
interface:
|
||||||
|
display_name: "Markdown.new"
|
||||||
|
short_description: "Convert web URLs to LLM-ready Markdown"
|
||||||
|
default_prompt: "Use $markdown-new to fetch clean Markdown from public URLs for summarization, RAG, or extraction tasks."
|
||||||
56
references/markdown-new-api.md
Normal file
56
references/markdown-new-api.md
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
# markdown.new API Reference Notes
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
- Prefix conversion: `GET https://markdown.new/<absolute-url>`
|
||||||
|
- API conversion: `POST https://markdown.new/`
|
||||||
|
|
||||||
|
## POST Request
|
||||||
|
|
||||||
|
- Content type: `application/json`
|
||||||
|
- Body fields:
|
||||||
|
- `url` (string, required): public URL to convert
|
||||||
|
- `method` (string, optional): `auto` (default), `ai`, or `browser`
|
||||||
|
- `retain_images` (boolean, optional): `false` (default)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://example.com",
|
||||||
|
"method": "auto",
|
||||||
|
"retain_images": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Response
|
||||||
|
|
||||||
|
- Status: `200` on success
|
||||||
|
- Prefix mode typically returns Markdown (`text/markdown`)
|
||||||
|
- POST mode may return JSON (`application/json`) with Markdown in `content`
|
||||||
|
- Common headers:
|
||||||
|
- `x-markdown-tokens`: estimated token count for the returned Markdown
|
||||||
|
- `x-rate-limit-remaining`: remaining requests for current daily quota
|
||||||
|
|
||||||
|
## Conversion Pipeline (as documented)
|
||||||
|
|
||||||
|
1. Request native Markdown via `Accept: text/markdown`
|
||||||
|
2. Fall back to Workers AI `toMarkdown()` when HTML is returned
|
||||||
|
3. Fall back to Browser Rendering for JS-heavy pages
|
||||||
|
|
||||||
|
## Operational Notes
|
||||||
|
|
||||||
|
- Documented rate limit: 500 requests/day per IP
|
||||||
|
- `429` indicates rate-limit exhaustion
|
||||||
|
- Public URLs only; authenticated/paywalled pages may fail
|
||||||
|
- Browser rendering usually adds latency compared with `auto`/`ai`
|
||||||
|
|
||||||
|
## Skill Script Notes
|
||||||
|
|
||||||
|
- `scripts/markdown_new_fetch.py --deliver-md` writes a `.md` file and wraps the markdown body with pseudo-XML tags:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<url>
|
||||||
|
...markdown...
|
||||||
|
</url>
|
||||||
|
```
|
||||||
216
scripts/markdown_new_fetch.py
Normal file
216
scripts/markdown_new_fetch.py
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Convert public URLs to Markdown through markdown.new."""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, Optional, Tuple
|
||||||
|
|
||||||
|
|
||||||
|
DEFAULT_API_URL = "https://markdown.new/"
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Fetch Markdown from a public URL using markdown.new"
|
||||||
|
)
|
||||||
|
parser.add_argument("url", help="Public URL to convert (http/https)")
|
||||||
|
parser.add_argument(
|
||||||
|
"--method",
|
||||||
|
choices=["auto", "ai", "browser"],
|
||||||
|
default="auto",
|
||||||
|
help="Conversion method to request (default: auto)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--retain-images",
|
||||||
|
action="store_true",
|
||||||
|
help="Request image retention in output markdown",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
help="Write markdown to this file instead of stdout",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--timeout",
|
||||||
|
type=float,
|
||||||
|
default=45.0,
|
||||||
|
help="Request timeout in seconds (default: 45)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--api-url",
|
||||||
|
default=DEFAULT_API_URL,
|
||||||
|
help="markdown.new API endpoint (default: https://markdown.new/)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--show-headers",
|
||||||
|
action="store_true",
|
||||||
|
help="Print response headers to stderr",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--deliver-md",
|
||||||
|
action="store_true",
|
||||||
|
help=(
|
||||||
|
"Write output into a .md file and wrap content in pseudo-XML "
|
||||||
|
"<url>...</url> tags"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def validate_url(url: str) -> None:
|
||||||
|
parsed = urllib.parse.urlparse(url)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
raise ValueError(f"Invalid URL: {url!r}. Use absolute http/https URL.")
|
||||||
|
|
||||||
|
|
||||||
|
def build_request(api_url: str, payload: Dict[str, object]) -> urllib.request.Request:
|
||||||
|
data = json.dumps(payload).encode("utf-8")
|
||||||
|
return urllib.request.Request(
|
||||||
|
api_url,
|
||||||
|
data=data,
|
||||||
|
method="POST",
|
||||||
|
headers={
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
"Accept": "text/markdown",
|
||||||
|
"User-Agent": "codex-markdown-new-skill/1.0",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_body(body: str, headers: Dict[str, str]) -> Tuple[str, Dict[str, str]]:
|
||||||
|
content_type = headers.get("content-type", "")
|
||||||
|
if "application/json" not in content_type:
|
||||||
|
return body, {}
|
||||||
|
|
||||||
|
try:
|
||||||
|
payload = json.loads(body)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return body, {}
|
||||||
|
|
||||||
|
if not isinstance(payload, dict):
|
||||||
|
return body, {}
|
||||||
|
|
||||||
|
markdown = payload.get("content")
|
||||||
|
if not isinstance(markdown, str):
|
||||||
|
return body, {}
|
||||||
|
|
||||||
|
metadata: Dict[str, str] = {}
|
||||||
|
for key in ("title", "url", "method", "duration_ms", "timestamp"):
|
||||||
|
value = payload.get(key)
|
||||||
|
if value is not None:
|
||||||
|
metadata[f"response_{key}"] = str(value)
|
||||||
|
|
||||||
|
return markdown, metadata
|
||||||
|
|
||||||
|
|
||||||
|
def print_metadata(
|
||||||
|
headers: Dict[str, str], show_headers: bool, response_meta: Dict[str, str]
|
||||||
|
) -> None:
|
||||||
|
tokens = headers.get("x-markdown-tokens")
|
||||||
|
remaining = headers.get("x-rate-limit-remaining")
|
||||||
|
|
||||||
|
if tokens:
|
||||||
|
print(f"x-markdown-tokens: {tokens}", file=sys.stderr)
|
||||||
|
if remaining:
|
||||||
|
print(f"x-rate-limit-remaining: {remaining}", file=sys.stderr)
|
||||||
|
for key, value in sorted(response_meta.items()):
|
||||||
|
print(f"{key}: {value}", file=sys.stderr)
|
||||||
|
|
||||||
|
if show_headers:
|
||||||
|
for key, value in sorted(headers.items()):
|
||||||
|
print(f"{key}: {value}", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
def slugify_url(url: str) -> str:
|
||||||
|
parsed = urllib.parse.urlparse(url)
|
||||||
|
raw = f"{parsed.netloc}{parsed.path}".strip("/")
|
||||||
|
if not raw:
|
||||||
|
raw = parsed.netloc or "page"
|
||||||
|
slug = re.sub(r"[^a-zA-Z0-9._-]+", "_", raw).strip("_")
|
||||||
|
return slug or "page"
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_output_path(url: str, output_path: Optional[str], deliver_md: bool) -> Optional[str]:
|
||||||
|
if output_path:
|
||||||
|
if deliver_md and not output_path.lower().endswith(".md"):
|
||||||
|
output_path = f"{output_path}.md"
|
||||||
|
return output_path
|
||||||
|
|
||||||
|
if deliver_md:
|
||||||
|
return f"{slugify_url(url)}.md"
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def wrap_in_url_tag(markdown: str) -> str:
|
||||||
|
body = markdown.rstrip("\n")
|
||||||
|
return f"<url>\n{body}\n</url>\n"
|
||||||
|
|
||||||
|
|
||||||
|
def write_output(markdown: str, output_path: Optional[str]) -> None:
|
||||||
|
if output_path:
|
||||||
|
path = Path(output_path)
|
||||||
|
if path.parent != Path("."):
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(path, "w", encoding="utf-8") as f:
|
||||||
|
f.write(markdown)
|
||||||
|
return
|
||||||
|
sys.stdout.write(markdown)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
try:
|
||||||
|
validate_url(args.url)
|
||||||
|
except ValueError as exc:
|
||||||
|
print(str(exc), file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"url": args.url,
|
||||||
|
"method": args.method,
|
||||||
|
"retain_images": bool(args.retain_images),
|
||||||
|
}
|
||||||
|
|
||||||
|
req = build_request(args.api_url, payload)
|
||||||
|
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=args.timeout) as resp:
|
||||||
|
headers = {k.lower(): v for k, v in resp.headers.items()}
|
||||||
|
body = resp.read().decode("utf-8", errors="replace")
|
||||||
|
except urllib.error.HTTPError as err:
|
||||||
|
error_body = err.read().decode("utf-8", errors="replace")
|
||||||
|
print(f"HTTP {err.code} from markdown.new", file=sys.stderr)
|
||||||
|
if err.code == 429:
|
||||||
|
print(
|
||||||
|
"Rate limit reached (documented: 500 requests/day/IP). Retry later.",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
if error_body:
|
||||||
|
print(error_body.strip(), file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
except urllib.error.URLError as err:
|
||||||
|
print(f"Request failed: {err.reason}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
markdown, response_meta = normalize_body(body, headers)
|
||||||
|
print_metadata(headers, args.show_headers, response_meta)
|
||||||
|
|
||||||
|
content = wrap_in_url_tag(markdown) if args.deliver_md else markdown
|
||||||
|
output_path = resolve_output_path(args.url, args.output, args.deliver_md)
|
||||||
|
write_output(content, output_path)
|
||||||
|
|
||||||
|
if args.deliver_md and output_path:
|
||||||
|
print(f"output_file: {Path(output_path).resolve()}", file=sys.stderr)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
Reference in New Issue
Block a user