Initial commit with translated description

This commit is contained in:
2026-03-29 14:36:54 +08:00
commit 35c34a7952
14 changed files with 1529 additions and 0 deletions

229
examples/README.md Normal file
View File

@@ -0,0 +1,229 @@
# Usage Examples
## Basic Usage
### 1. Quick Scrape (Example.com)
```bash
node scripts/playwright-simple.js https://example.com
```
**Output:**
```json
{
"title": "Example Domain",
"url": "https://example.com/",
"content": "Example Domain\n\nThis domain is for use...",
"metaDescription": "",
"elapsedSeconds": "3.42"
}
```
---
### 2. Anti-Bot Protected Site (Discuss.com.hk)
```bash
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
```
**Output:**
```json
{
"title": "香港討論區 discuss.com.hk",
"url": "https://m.discuss.com.hk/#hot",
"htmlLength": 186345,
"contentPreview": "...",
"cloudflare": false,
"screenshot": "./screenshot-1770467444364.png",
"data": {
"links": [
{
"text": "區議員周潔瑩疑消防通道違泊 道歉稱急於搬貨",
"href": "https://m.discuss.com.hk/index.php?action=thread&tid=32148378..."
}
]
},
"elapsedSeconds": "19.59"
}
```
---
## Advanced Usage
### 3. Custom Wait Time
```bash
WAIT_TIME=15000 node scripts/playwright-stealth.js <URL>
```
### 4. Show Browser (Debug Mode)
```bash
HEADLESS=false node scripts/playwright-stealth.js <URL>
```
### 5. Save Screenshot and HTML
```bash
SCREENSHOT_PATH=/tmp/my-page.png \
SAVE_HTML=true \
node scripts/playwright-stealth.js <URL>
```
### 6. Custom User-Agent
```bash
USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
node scripts/playwright-stealth.js <URL>
```
---
## Integration Examples
### Using in Shell Scripts
```bash
#!/bin/bash
# Run from playwright-scraper-skill directory
URL="https://example.com"
OUTPUT_FILE="result.json"
echo "🕷️ Starting scrape: $URL"
node scripts/playwright-stealth.js "$URL" > "$OUTPUT_FILE"
if [ $? -eq 0 ]; then
echo "✅ Success! Results saved to: $OUTPUT_FILE"
else
echo "❌ Failed"
exit 1
fi
```
### Batch Scraping Multiple URLs
```bash
#!/bin/bash
URLS=(
"https://example.com"
"https://example.org"
"https://example.net"
)
for url in "${URLS[@]}"; do
echo "Scraping: $url"
node scripts/playwright-stealth.js "$url" > "output_$(date +%s).json"
sleep 5 # Avoid IP blocking
done
```
---
## Calling from Node.js
```javascript
const { spawn } = require('child_process');
function scrape(url) {
return new Promise((resolve, reject) => {
const proc = spawn('node', [
'scripts/playwright-stealth.js',
url
]);
let output = '';
proc.stdout.on('data', (data) => {
output += data.toString();
});
proc.on('close', (code) => {
if (code === 0) {
try {
// Extract JSON (last line)
const lines = output.trim().split('\n');
const json = JSON.parse(lines[lines.length - 1]);
resolve(json);
} catch (e) {
reject(e);
}
} else {
reject(new Error(`Exit code: ${code}`));
}
});
});
}
// Usage
(async () => {
const result = await scrape('https://example.com');
console.log(result.title);
})();
```
---
## Common Scenarios
### Scraping News Articles
```bash
node scripts/playwright-stealth.js "https://news.example.com/article/123"
```
### Scraping E-commerce Products
```bash
WAIT_TIME=10000 \
SAVE_HTML=true \
node scripts/playwright-stealth.js "https://shop.example.com/product/456"
```
### Scraping Forum Posts
```bash
node scripts/playwright-stealth.js "https://forum.example.com/thread/789"
```
---
## Troubleshooting
### Issue: Page Not Fully Loaded
**Solution:** Increase wait time
```bash
WAIT_TIME=20000 node scripts/playwright-stealth.js <URL>
```
### Issue: Still Blocked by Cloudflare
**Solution:** Use headful mode + manual wait
```bash
HEADLESS=false \
WAIT_TIME=30000 \
node scripts/playwright-stealth.js <URL>
```
### Issue: Requires Login
**Solution:** Manually login first, export cookies, then load
(Future feature, currently not supported)
---
## Performance Tips
1. **Parallel scraping:** Use `Promise.all()` or shell `&`
2. **Delay requests:** `sleep 5` to avoid IP blocking
3. **Use proxies:** Rotate IPs (future feature)
4. **Cache results:** Avoid duplicate scraping
---
For more information, see [SKILL.md](../SKILL.md)

16
examples/discuss-hk.sh Normal file
View File

@@ -0,0 +1,16 @@
#!/bin/bash
# 範例:爬取 Discuss.com.hk 熱門話題
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SKILL_DIR="$(dirname "$SCRIPT_DIR")"
echo "🕷️ Discuss.com.hk 爬蟲範例"
echo ""
echo "使用 Playwright Stealth已驗證成功"
echo ""
cd "$SKILL_DIR" && \
WAIT_TIME=10000 \
SCREENSHOT_PATH=/tmp/discuss-hk.png \
SAVE_HTML=true \
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"