213 lines
6.2 KiB
Markdown
213 lines
6.2 KiB
Markdown
|
|
# Premium Source Authentication
|
||
|
|
|
||
|
|
## Contents
|
||
|
|
- [Overview](#overview)
|
||
|
|
- [Option 1: Keep It Simple (Recommended)](#option-1-keep-it-simple-recommended)
|
||
|
|
- [Option 2: Use Premium Sources (Advanced)](#option-2-use-premium-sources-advanced)
|
||
|
|
- [Troubleshooting](#troubleshooting)
|
||
|
|
- [Alternative: Use APIs Instead](#alternative-use-apis-instead)
|
||
|
|
- [Recommendation](#recommendation)
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
WSJ and Barron's are premium financial news sources that require subscriptions. This guide explains how to authenticate and use premium sources with the finance-news skill.
|
||
|
|
|
||
|
|
**Recommendation:** For simplicity, we recommend using **free sources only** (Yahoo Finance, CNBC, MarketWatch). Premium sources add complexity and maintenance burden.
|
||
|
|
|
||
|
|
If you have subscriptions and want premium content, follow the steps below.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Option 1: Keep It Simple (Recommended)
|
||
|
|
|
||
|
|
**Use free sources only.** They provide 90% of the value without authentication complexity:
|
||
|
|
|
||
|
|
- ✅ Yahoo Finance (free, reliable)
|
||
|
|
- ✅ CNBC (free, real-time news)
|
||
|
|
- ✅ MarketWatch (free, broad coverage)
|
||
|
|
- ✅ Reuters (free via Yahoo RSS)
|
||
|
|
|
||
|
|
**To disable premium sources:**
|
||
|
|
1. Edit `config/config.json` (legacy: `config/sources.json`)
|
||
|
|
2. Set `"enabled": false` for WSJ/Barron's entries
|
||
|
|
3. Done - no authentication needed
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Option 2: Use Premium Sources (Advanced)
|
||
|
|
|
||
|
|
### Prerequisites
|
||
|
|
|
||
|
|
- Active WSJ or Barron's subscription
|
||
|
|
- Browser with active login session (Chrome/Firefox)
|
||
|
|
- **Option B only:** Install `requests` library if needed:
|
||
|
|
```bash
|
||
|
|
pip install requests
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 1: Export Cookies from Browser
|
||
|
|
|
||
|
|
**Chrome:**
|
||
|
|
1. Install extension: [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/)
|
||
|
|
2. Navigate to wsj.com (logged in)
|
||
|
|
3. Click EditThisCookie icon → Export → Copy JSON
|
||
|
|
|
||
|
|
**Firefox:**
|
||
|
|
1. Install extension: [Cookie Quick Manager](https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/)
|
||
|
|
2. Navigate to wsj.com (logged in)
|
||
|
|
3. Right-click page → Inspect → Storage → Cookies
|
||
|
|
4. Copy relevant cookies (see format below)
|
||
|
|
|
||
|
|
### Step 2: Create Cookie File
|
||
|
|
|
||
|
|
Create `config/cookies.json` (this file is gitignored):
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"feeds.a.dj.com": {
|
||
|
|
"wsjgeo": "US",
|
||
|
|
"djcs_session": "YOUR_SESSION_TOKEN_HERE",
|
||
|
|
"djcs_route": "YOUR_ROUTE_HERE"
|
||
|
|
},
|
||
|
|
"www.barrons.com": {
|
||
|
|
"wsjgeo": "US",
|
||
|
|
"djcs_session": "YOUR_SESSION_TOKEN_HERE"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Important:** Cookie domain must match feed URL domain:
|
||
|
|
- WSJ feeds use `feeds.a.dj.com` (not `wsj.com`)
|
||
|
|
- Barron's feeds use `www.barrons.com`
|
||
|
|
- Check `config/config.json` for actual feed URLs
|
||
|
|
|
||
|
|
**Note:** Cookie names/values vary by site. Export from browser to get actual values.
|
||
|
|
|
||
|
|
### Step 3: Pass Cookies to fetch_news.py
|
||
|
|
|
||
|
|
**Option A: Modify fetch_news.py (not officially supported)**
|
||
|
|
|
||
|
|
Add cookie loading to `fetch_rss()` function (maintains existing signature):
|
||
|
|
|
||
|
|
```python
|
||
|
|
import json
|
||
|
|
import urllib.request
|
||
|
|
from pathlib import Path
|
||
|
|
from urllib.parse import urlparse
|
||
|
|
|
||
|
|
def fetch_rss(url: str, limit: int = 10) -> list[dict]:
|
||
|
|
"""Fetch and parse RSS feed with optional cookie authentication."""
|
||
|
|
|
||
|
|
# Load cookies if they exist
|
||
|
|
cookie_file = Path(__file__).parent.parent / "config" / "cookies.json"
|
||
|
|
cookies = {}
|
||
|
|
if cookie_file.exists():
|
||
|
|
with open(cookie_file) as f:
|
||
|
|
all_cookies = json.load(f)
|
||
|
|
# Extract domain from URL (e.g., feeds.a.dj.com)
|
||
|
|
domain = urlparse(url).netloc
|
||
|
|
cookies = all_cookies.get(domain, {})
|
||
|
|
|
||
|
|
# Fetch with cookies and User-Agent
|
||
|
|
req = urllib.request.Request(url, headers={'User-Agent': 'OpenClaw/1.0'})
|
||
|
|
if cookies:
|
||
|
|
cookie_header = "; ".join([f"{k}={v}" for k, v in cookies.items()])
|
||
|
|
req.add_header("Cookie", cookie_header)
|
||
|
|
|
||
|
|
# ... rest of function (unchanged)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Note:** This is a doc-only suggestion, not officially supported by the skill.
|
||
|
|
|
||
|
|
**Option B: Use requests library instead of urllib**
|
||
|
|
|
||
|
|
Replace `urllib` with `requests` for easier cookie handling (maintains API signature):
|
||
|
|
|
||
|
|
```python
|
||
|
|
import requests
|
||
|
|
|
||
|
|
def fetch_rss(url: str, limit: int = 10, cookies_dict: dict = None) -> list[dict]:
|
||
|
|
response = requests.get(url, cookies=cookies_dict, timeout=10)
|
||
|
|
response.raise_for_status()
|
||
|
|
# ... parse with feedparser
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 4: Security Considerations
|
||
|
|
|
||
|
|
**Critical: Do NOT commit cookies to git**
|
||
|
|
|
||
|
|
1. **`.gitignore` already includes cookie files:**
|
||
|
|
- `config/cookies.json`
|
||
|
|
- `*.cookie`
|
||
|
|
- No action needed (already configured)
|
||
|
|
|
||
|
|
2. **Set restrictive file permissions:**
|
||
|
|
```bash
|
||
|
|
chmod 600 config/cookies.json
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Set restrictive file permissions:**
|
||
|
|
```bash
|
||
|
|
chmod 600 config/cookies.json
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Rotate cookies regularly:**
|
||
|
|
- Browser session cookies expire (usually 7-30 days)
|
||
|
|
- Re-export cookies when authentication fails
|
||
|
|
|
||
|
|
4. **Never share cookie files:**
|
||
|
|
- Cookies grant full account access
|
||
|
|
- Treat like passwords
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### "HTTP 403 Forbidden" errors
|
||
|
|
|
||
|
|
**Cause:** Cookies expired or invalid
|
||
|
|
|
||
|
|
**Fix:**
|
||
|
|
1. Log in to WSJ/Barron's in browser
|
||
|
|
2. Re-export cookies
|
||
|
|
3. Update `config/cookies.json`
|
||
|
|
|
||
|
|
### "Paywall detected" in articles
|
||
|
|
|
||
|
|
**Cause:** RSS feed doesn't require auth, but full article does
|
||
|
|
|
||
|
|
**Fix:**
|
||
|
|
- Premium sources often provide headlines/snippets in RSS (no auth needed)
|
||
|
|
- Full articles require subscription + cookie auth
|
||
|
|
- If you only need headlines → no cookies needed
|
||
|
|
|
||
|
|
### Cookies not working
|
||
|
|
|
||
|
|
**Debug checklist:**
|
||
|
|
- [ ] Correct domain in cookies.json:
|
||
|
|
- WSJ: Use `feeds.a.dj.com` (not `wsj.com`)
|
||
|
|
- Barron's: Use `www.barrons.com` (not `barrons.com`)
|
||
|
|
- Check `config/config.json` for actual feed URLs
|
||
|
|
- [ ] Cookie values copied completely (no truncation)
|
||
|
|
- [ ] Browser session still active (test by visiting site)
|
||
|
|
- [ ] File permissions correct (chmod 600)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternative: Use APIs Instead
|
||
|
|
|
||
|
|
Some premium sources offer APIs:
|
||
|
|
- **WSJ API:** Not publicly available
|
||
|
|
- **Barron's API:** Part of Dow Jones API (enterprise only)
|
||
|
|
- **Bloomberg API:** Enterprise only
|
||
|
|
|
||
|
|
**Conclusion:** Cookie-based auth is the only practical option for individual users.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Recommendation
|
||
|
|
|
||
|
|
**For most users:** Stick with free sources. They're reliable, no auth needed, and provide comprehensive market coverage.
|
||
|
|
|
||
|
|
**For premium subscribers:** Follow Option 2, but be prepared to maintain cookie files and handle expiration.
|