Initial commit with translated description
This commit is contained in:
212
docs/PREMIUM_SOURCES.md
Normal file
212
docs/PREMIUM_SOURCES.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# Premium Source Authentication
|
||||
|
||||
## Contents
|
||||
- [Overview](#overview)
|
||||
- [Option 1: Keep It Simple (Recommended)](#option-1-keep-it-simple-recommended)
|
||||
- [Option 2: Use Premium Sources (Advanced)](#option-2-use-premium-sources-advanced)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Alternative: Use APIs Instead](#alternative-use-apis-instead)
|
||||
- [Recommendation](#recommendation)
|
||||
|
||||
## Overview
|
||||
|
||||
WSJ and Barron's are premium financial news sources that require subscriptions. This guide explains how to authenticate and use premium sources with the finance-news skill.
|
||||
|
||||
**Recommendation:** For simplicity, we recommend using **free sources only** (Yahoo Finance, CNBC, MarketWatch). Premium sources add complexity and maintenance burden.
|
||||
|
||||
If you have subscriptions and want premium content, follow the steps below.
|
||||
|
||||
---
|
||||
|
||||
## Option 1: Keep It Simple (Recommended)
|
||||
|
||||
**Use free sources only.** They provide 90% of the value without authentication complexity:
|
||||
|
||||
- ✅ Yahoo Finance (free, reliable)
|
||||
- ✅ CNBC (free, real-time news)
|
||||
- ✅ MarketWatch (free, broad coverage)
|
||||
- ✅ Reuters (free via Yahoo RSS)
|
||||
|
||||
**To disable premium sources:**
|
||||
1. Edit `config/config.json` (legacy: `config/sources.json`)
|
||||
2. Set `"enabled": false` for WSJ/Barron's entries
|
||||
3. Done - no authentication needed
|
||||
|
||||
---
|
||||
|
||||
## Option 2: Use Premium Sources (Advanced)
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Active WSJ or Barron's subscription
|
||||
- Browser with active login session (Chrome/Firefox)
|
||||
- **Option B only:** Install `requests` library if needed:
|
||||
```bash
|
||||
pip install requests
|
||||
```
|
||||
|
||||
### Step 1: Export Cookies from Browser
|
||||
|
||||
**Chrome:**
|
||||
1. Install extension: [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/)
|
||||
2. Navigate to wsj.com (logged in)
|
||||
3. Click EditThisCookie icon → Export → Copy JSON
|
||||
|
||||
**Firefox:**
|
||||
1. Install extension: [Cookie Quick Manager](https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/)
|
||||
2. Navigate to wsj.com (logged in)
|
||||
3. Right-click page → Inspect → Storage → Cookies
|
||||
4. Copy relevant cookies (see format below)
|
||||
|
||||
### Step 2: Create Cookie File
|
||||
|
||||
Create `config/cookies.json` (this file is gitignored):
|
||||
|
||||
```json
|
||||
{
|
||||
"feeds.a.dj.com": {
|
||||
"wsjgeo": "US",
|
||||
"djcs_session": "YOUR_SESSION_TOKEN_HERE",
|
||||
"djcs_route": "YOUR_ROUTE_HERE"
|
||||
},
|
||||
"www.barrons.com": {
|
||||
"wsjgeo": "US",
|
||||
"djcs_session": "YOUR_SESSION_TOKEN_HERE"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Important:** Cookie domain must match feed URL domain:
|
||||
- WSJ feeds use `feeds.a.dj.com` (not `wsj.com`)
|
||||
- Barron's feeds use `www.barrons.com`
|
||||
- Check `config/config.json` for actual feed URLs
|
||||
|
||||
**Note:** Cookie names/values vary by site. Export from browser to get actual values.
|
||||
|
||||
### Step 3: Pass Cookies to fetch_news.py
|
||||
|
||||
**Option A: Modify fetch_news.py (not officially supported)**
|
||||
|
||||
Add cookie loading to `fetch_rss()` function (maintains existing signature):
|
||||
|
||||
```python
|
||||
import json
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
from urllib.parse import urlparse
|
||||
|
||||
def fetch_rss(url: str, limit: int = 10) -> list[dict]:
|
||||
"""Fetch and parse RSS feed with optional cookie authentication."""
|
||||
|
||||
# Load cookies if they exist
|
||||
cookie_file = Path(__file__).parent.parent / "config" / "cookies.json"
|
||||
cookies = {}
|
||||
if cookie_file.exists():
|
||||
with open(cookie_file) as f:
|
||||
all_cookies = json.load(f)
|
||||
# Extract domain from URL (e.g., feeds.a.dj.com)
|
||||
domain = urlparse(url).netloc
|
||||
cookies = all_cookies.get(domain, {})
|
||||
|
||||
# Fetch with cookies and User-Agent
|
||||
req = urllib.request.Request(url, headers={'User-Agent': 'OpenClaw/1.0'})
|
||||
if cookies:
|
||||
cookie_header = "; ".join([f"{k}={v}" for k, v in cookies.items()])
|
||||
req.add_header("Cookie", cookie_header)
|
||||
|
||||
# ... rest of function (unchanged)
|
||||
```
|
||||
|
||||
**Note:** This is a doc-only suggestion, not officially supported by the skill.
|
||||
|
||||
**Option B: Use requests library instead of urllib**
|
||||
|
||||
Replace `urllib` with `requests` for easier cookie handling (maintains API signature):
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def fetch_rss(url: str, limit: int = 10, cookies_dict: dict = None) -> list[dict]:
|
||||
response = requests.get(url, cookies=cookies_dict, timeout=10)
|
||||
response.raise_for_status()
|
||||
# ... parse with feedparser
|
||||
```
|
||||
|
||||
### Step 4: Security Considerations
|
||||
|
||||
**Critical: Do NOT commit cookies to git**
|
||||
|
||||
1. **`.gitignore` already includes cookie files:**
|
||||
- `config/cookies.json`
|
||||
- `*.cookie`
|
||||
- No action needed (already configured)
|
||||
|
||||
2. **Set restrictive file permissions:**
|
||||
```bash
|
||||
chmod 600 config/cookies.json
|
||||
```
|
||||
|
||||
2. **Set restrictive file permissions:**
|
||||
```bash
|
||||
chmod 600 config/cookies.json
|
||||
```
|
||||
|
||||
3. **Rotate cookies regularly:**
|
||||
- Browser session cookies expire (usually 7-30 days)
|
||||
- Re-export cookies when authentication fails
|
||||
|
||||
4. **Never share cookie files:**
|
||||
- Cookies grant full account access
|
||||
- Treat like passwords
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "HTTP 403 Forbidden" errors
|
||||
|
||||
**Cause:** Cookies expired or invalid
|
||||
|
||||
**Fix:**
|
||||
1. Log in to WSJ/Barron's in browser
|
||||
2. Re-export cookies
|
||||
3. Update `config/cookies.json`
|
||||
|
||||
### "Paywall detected" in articles
|
||||
|
||||
**Cause:** RSS feed doesn't require auth, but full article does
|
||||
|
||||
**Fix:**
|
||||
- Premium sources often provide headlines/snippets in RSS (no auth needed)
|
||||
- Full articles require subscription + cookie auth
|
||||
- If you only need headlines → no cookies needed
|
||||
|
||||
### Cookies not working
|
||||
|
||||
**Debug checklist:**
|
||||
- [ ] Correct domain in cookies.json:
|
||||
- WSJ: Use `feeds.a.dj.com` (not `wsj.com`)
|
||||
- Barron's: Use `www.barrons.com` (not `barrons.com`)
|
||||
- Check `config/config.json` for actual feed URLs
|
||||
- [ ] Cookie values copied completely (no truncation)
|
||||
- [ ] Browser session still active (test by visiting site)
|
||||
- [ ] File permissions correct (chmod 600)
|
||||
|
||||
---
|
||||
|
||||
## Alternative: Use APIs Instead
|
||||
|
||||
Some premium sources offer APIs:
|
||||
- **WSJ API:** Not publicly available
|
||||
- **Barron's API:** Part of Dow Jones API (enterprise only)
|
||||
- **Bloomberg API:** Enterprise only
|
||||
|
||||
**Conclusion:** Cookie-based auth is the only practical option for individual users.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**For most users:** Stick with free sources. They're reliable, no auth needed, and provide comprehensive market coverage.
|
||||
|
||||
**For premium subscribers:** Follow Option 2, but be prepared to maintain cookie files and handle expiration.
|
||||
Reference in New Issue
Block a user