# Playwright Scraper Skill πŸ•·οΈ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js](https://img.shields.io/badge/Node.js-18+-green.svg)](https://nodejs.org/) [![Playwright](https://img.shields.io/badge/Playwright-1.40+-blue.svg)](https://playwright.dev/) **[δΈ­ζ–‡ζ–‡ζͺ”](README_ZH.md)** | English A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex websites like Discuss.com.hk. > πŸ“¦ **Installation:** See [INSTALL.md](INSTALL.md) > πŸ“š **Full Documentation:** See [SKILL.md](SKILL.md) > πŸ’‘ **Examples:** See [examples/README.md](examples/README.md) --- ## ✨ Features - βœ… **Pure Playwright** β€” Modern, powerful, easy to use - βœ… **Anti-Bot Protection** β€” Hides automation, realistic UA - βœ… **Verified** β€” 100% success on Discuss.com.hk - βœ… **Simple to Use** β€” One-line commands - βœ… **Customizable** β€” Environment variable support --- ## πŸš€ Quick Start ### Installation ```bash npm install npx playwright install chromium ``` ### Usage ```bash # Quick scraping node scripts/playwright-simple.js https://example.com # Stealth mode (recommended) node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot" ``` --- ## πŸ“– Two Modes | Mode | Use Case | Speed | Anti-Bot | |------|----------|-------|----------| | **Simple** | Regular dynamic sites | Fast (3-5s) | None | | **Stealth** ⭐ | Sites with anti-bot | Medium (5-20s) | Medium-High | ### Simple Mode For sites without anti-bot protection: ```bash node scripts/playwright-simple.js ``` ### Stealth Mode (Recommended) For sites with Cloudflare or anti-bot protection: ```bash node scripts/playwright-stealth.js ``` **Anti-Bot Techniques:** - Hide `navigator.webdriver` - Realistic User-Agent (iPhone) - Human-like behavior simulation - Screenshot and HTML saving support --- ## 🎯 Customization All scripts support environment variables: ```bash # Show browser HEADLESS=false node scripts/playwright-stealth.js # Custom wait time (milliseconds) WAIT_TIME=10000 node scripts/playwright-stealth.js # Save screenshot SCREENSHOT_PATH=/tmp/page.png node scripts/playwright-stealth.js # Save HTML SAVE_HTML=true node scripts/playwright-stealth.js # Custom User-Agent USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js ``` --- ## πŸ“Š Test Results | Website | Result | Time | |---------|--------|------| | **Discuss.com.hk** | βœ… 200 OK | 5-20s | | **Example.com** | βœ… 200 OK | 3-5s | | **Cloudflare Protected** | βœ… Mostly successful | 10-30s | --- ## πŸ“ File Structure ``` playwright-scraper-skill/ β”œβ”€β”€ scripts/ β”‚ β”œβ”€β”€ playwright-simple.js # Simple mode β”‚ └── playwright-stealth.js # Stealth mode ⭐ β”œβ”€β”€ examples/ β”‚ β”œβ”€β”€ discuss-hk.sh # Discuss.com.hk example β”‚ └── README.md # More examples β”œβ”€β”€ SKILL.md # Full documentation β”œβ”€β”€ INSTALL.md # Installation guide β”œβ”€β”€ README.md # This file β”œβ”€β”€ README_ZH.md # Chinese documentation β”œβ”€β”€ CONTRIBUTING.md # Contribution guide β”œβ”€β”€ CHANGELOG.md # Version history └── package.json # npm config ``` --- ## πŸ’‘ Best Practices 1. **Try web_fetch first** β€” OpenClaw's built-in tool is fastest 2. **Use Simple for dynamic sites** β€” When no anti-bot protection 3. **Use Stealth for protected sites** ⭐ β€” Main workhorse 4. **Use specialized skills** β€” For YouTube, Reddit, etc. --- ## πŸ› Troubleshooting ### Getting 403 blocked? Use Stealth mode: ```bash node scripts/playwright-stealth.js ``` ### Cloudflare challenge? Increase wait time + headful mode: ```bash HEADLESS=false WAIT_TIME=30000 node scripts/playwright-stealth.js ``` ### Playwright not found? Reinstall: ```bash npm install npx playwright install chromium ``` More issues? See [INSTALL.md](INSTALL.md) --- ## 🀝 Contributing Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) --- ## πŸ“„ License MIT License - See [LICENSE](LICENSE) --- ## πŸ”— Links - [Playwright Official Docs](https://playwright.dev/) - [Full Documentation (SKILL.md)](SKILL.md) - [Installation Guide (INSTALL.md)](INSTALL.md) - [Examples (examples/)](examples/)