docs/PREMIUM_SOURCES.md

# Premium Source Authentication

## Contents
- [Overview](#overview)
- [Option 1: Keep It Simple (Recommended)](#option-1-keep-it-simple-recommended)
- [Option 2: Use Premium Sources (Advanced)](#option-2-use-premium-sources-advanced)
- [Troubleshooting](#troubleshooting)
- [Alternative: Use APIs Instead](#alternative-use-apis-instead)
- [Recommendation](#recommendation)

## Overview

WSJ and Barron's are premium financial news sources that require subscriptions. This guide explains how to authenticate and use premium sources with the finance-news skill.

**Recommendation:** For simplicity, we recommend using **free sources only** (Yahoo Finance, CNBC, MarketWatch). Premium sources add complexity and maintenance burden.

If you have subscriptions and want premium content, follow the steps below.

---

## Option 1: Keep It Simple (Recommended)

**Use free sources only.** They provide 90% of the value without authentication complexity:

- ✅ Yahoo Finance (free, reliable)
- ✅ CNBC (free, real-time news)
- ✅ MarketWatch (free, broad coverage)
- ✅ Reuters (free via Yahoo RSS)

**To disable premium sources:**
1. Edit `config/config.json` (legacy: `config/sources.json`)
2. Set `"enabled": false` for WSJ/Barron's entries
3. Done - no authentication needed

---

## Option 2: Use Premium Sources (Advanced)

### Prerequisites

- Active WSJ or Barron's subscription
- Browser with active login session (Chrome/Firefox)
- **Option B only:** Install `requests` library if needed:
  ```bash
  pip install requests
  ```

### Step 1: Export Cookies from Browser

**Chrome:**
1. Install extension: [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/)
2. Navigate to wsj.com (logged in)
3. Click EditThisCookie icon → Export → Copy JSON

**Firefox:**
1. Install extension: [Cookie Quick Manager](https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/)
2. Navigate to wsj.com (logged in)
3. Right-click page → Inspect → Storage → Cookies
4. Copy relevant cookies (see format below)

### Step 2: Create Cookie File

Create `config/cookies.json` (this file is gitignored):

```json
{
  "feeds.a.dj.com": {
    "wsjgeo": "US",
    "djcs_session": "YOUR_SESSION_TOKEN_HERE",
    "djcs_route": "YOUR_ROUTE_HERE"
  },
  "www.barrons.com": {
    "wsjgeo": "US",
    "djcs_session": "YOUR_SESSION_TOKEN_HERE"
  }
}
```

**Important:** Cookie domain must match feed URL domain:
- WSJ feeds use `feeds.a.dj.com` (not `wsj.com`)
- Barron's feeds use `www.barrons.com`
- Check `config/config.json` for actual feed URLs

**Note:** Cookie names/values vary by site. Export from browser to get actual values.

### Step 3: Pass Cookies to fetch_news.py

**Option A: Modify fetch_news.py (not officially supported)**

Add cookie loading to `fetch_rss()` function (maintains existing signature):

```python
import json
import urllib.request
from pathlib import Path
from urllib.parse import urlparse

def fetch_rss(url: str, limit: int = 10) -> list[dict]:
    """Fetch and parse RSS feed with optional cookie authentication."""
    
    # Load cookies if they exist
    cookie_file = Path(__file__).parent.parent / "config" / "cookies.json"
    cookies = {}
    if cookie_file.exists():
        with open(cookie_file) as f:
            all_cookies = json.load(f)
            # Extract domain from URL (e.g., feeds.a.dj.com)
            domain = urlparse(url).netloc
            cookies = all_cookies.get(domain, {})
    
    # Fetch with cookies and User-Agent
    req = urllib.request.Request(url, headers={'User-Agent': 'OpenClaw/1.0'})
    if cookies:
        cookie_header = "; ".join([f"{k}={v}" for k, v in cookies.items()])
        req.add_header("Cookie", cookie_header)
    
    # ... rest of function (unchanged)
```

**Note:** This is a doc-only suggestion, not officially supported by the skill.

**Option B: Use requests library instead of urllib**

Replace `urllib` with `requests` for easier cookie handling (maintains API signature):

```python
import requests

def fetch_rss(url: str, limit: int = 10, cookies_dict: dict = None) -> list[dict]:
    response = requests.get(url, cookies=cookies_dict, timeout=10)
    response.raise_for_status()
    # ... parse with feedparser
```

### Step 4: Security Considerations

**Critical: Do NOT commit cookies to git**

1. **`.gitignore` already includes cookie files:**
   - `config/cookies.json`
   - `*.cookie`
   - No action needed (already configured)

2. **Set restrictive file permissions:**
   ```bash
   chmod 600 config/cookies.json
   ```

2. **Set restrictive file permissions:**
   ```bash
   chmod 600 config/cookies.json
   ```

3. **Rotate cookies regularly:**
   - Browser session cookies expire (usually 7-30 days)
   - Re-export cookies when authentication fails

4. **Never share cookie files:**
   - Cookies grant full account access
   - Treat like passwords

---

## Troubleshooting

### "HTTP 403 Forbidden" errors

**Cause:** Cookies expired or invalid

**Fix:**
1. Log in to WSJ/Barron's in browser
2. Re-export cookies
3. Update `config/cookies.json`

### "Paywall detected" in articles

**Cause:** RSS feed doesn't require auth, but full article does

**Fix:**
- Premium sources often provide headlines/snippets in RSS (no auth needed)
- Full articles require subscription + cookie auth
- If you only need headlines → no cookies needed

### Cookies not working

**Debug checklist:**
- [ ] Correct domain in cookies.json:
  - WSJ: Use `feeds.a.dj.com` (not `wsj.com`)
  - Barron's: Use `www.barrons.com` (not `barrons.com`)
- Check `config/config.json` for actual feed URLs
- [ ] Cookie values copied completely (no truncation)
- [ ] Browser session still active (test by visiting site)
- [ ] File permissions correct (chmod 600)

---

## Alternative: Use APIs Instead

Some premium sources offer APIs:
- **WSJ API:** Not publicly available
- **Barron's API:** Part of Dow Jones API (enterprise only)
- **Bloomberg API:** Enterprise only

**Conclusion:** Cookie-based auth is the only practical option for individual users.

---

## Recommendation

**For most users:** Stick with free sources. They're reliable, no auth needed, and provide comprehensive market coverage.

**For premium subscribers:** Follow Option 2, but be prepared to maintain cookie files and handle expiration.
Initial commit with translated description 2026-03-29 10:21:46 +08:00			`# Premium Source Authentication`

			`## Contents`
			`- [Overview](#overview)`
			`- [Option 1: Keep It Simple (Recommended)](#option-1-keep-it-simple-recommended)`
			`- [Option 2: Use Premium Sources (Advanced)](#option-2-use-premium-sources-advanced)`
			`- [Troubleshooting](#troubleshooting)`
			`- [Alternative: Use APIs Instead](#alternative-use-apis-instead)`
			`- [Recommendation](#recommendation)`

			`## Overview`

			`WSJ and Barron's are premium financial news sources that require subscriptions. This guide explains how to authenticate and use premium sources with the finance-news skill.`

			`Recommendation: For simplicity, we recommend using free sources only (Yahoo Finance, CNBC, MarketWatch). Premium sources add complexity and maintenance burden.`

			`If you have subscriptions and want premium content, follow the steps below.`

			`---`

			`## Option 1: Keep It Simple (Recommended)`

			`Use free sources only. They provide 90% of the value without authentication complexity:`

			`- ✅ Yahoo Finance (free, reliable)`
			`- ✅ CNBC (free, real-time news)`
			`- ✅ MarketWatch (free, broad coverage)`
			`- ✅ Reuters (free via Yahoo RSS)`

			`To disable premium sources:`
			1. Edit `config/config.json` (legacy: `config/sources.json`)
			2. Set `"enabled": false` for WSJ/Barron's entries
			`3. Done - no authentication needed`

			`---`

			`## Option 2: Use Premium Sources (Advanced)`

			`### Prerequisites`

			`- Active WSJ or Barron's subscription`
			`- Browser with active login session (Chrome/Firefox)`
			- Option B only: Install `requests` library if needed:
			```bash
			`pip install requests`
			```

			`### Step 1: Export Cookies from Browser`

			`Chrome:`
			`1. Install extension: [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/)`
			`2. Navigate to wsj.com (logged in)`
			`3. Click EditThisCookie icon → Export → Copy JSON`

			`Firefox:`
			`1. Install extension: [Cookie Quick Manager](https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/)`
			`2. Navigate to wsj.com (logged in)`
			`3. Right-click page → Inspect → Storage → Cookies`
			`4. Copy relevant cookies (see format below)`

			`### Step 2: Create Cookie File`

			Create `config/cookies.json` (this file is gitignored):

			```json
			`{`
			`"feeds.a.dj.com": {`
			`"wsjgeo": "US",`
			`"djcs_session": "YOUR_SESSION_TOKEN_HERE",`
			`"djcs_route": "YOUR_ROUTE_HERE"`
			`},`
			`"www.barrons.com": {`
			`"wsjgeo": "US",`
			`"djcs_session": "YOUR_SESSION_TOKEN_HERE"`
			`}`
			`}`
			```

			`Important: Cookie domain must match feed URL domain:`
			- WSJ feeds use `feeds.a.dj.com` (not `wsj.com`)
			- Barron's feeds use `www.barrons.com`
			- Check `config/config.json` for actual feed URLs

			`Note: Cookie names/values vary by site. Export from browser to get actual values.`

			`### Step 3: Pass Cookies to fetch_news.py`

			`Option A: Modify fetch_news.py (not officially supported)`

			Add cookie loading to `fetch_rss()` function (maintains existing signature):

			```python
			`import json`
			`import urllib.request`
			`from pathlib import Path`
			`from urllib.parse import urlparse`

			`def fetch_rss(url: str, limit: int = 10) -> list[dict]:`
			`"""Fetch and parse RSS feed with optional cookie authentication."""`

			`# Load cookies if they exist`
			`cookie_file = Path(__file__).parent.parent / "config" / "cookies.json"`
			`cookies = {}`
			`if cookie_file.exists():`
			`with open(cookie_file) as f:`
			`all_cookies = json.load(f)`
			`# Extract domain from URL (e.g., feeds.a.dj.com)`
			`domain = urlparse(url).netloc`
			`cookies = all_cookies.get(domain, {})`

			`# Fetch with cookies and User-Agent`
			`req = urllib.request.Request(url, headers={'User-Agent': 'OpenClaw/1.0'})`
			`if cookies:`
			`cookie_header = "; ".join([f"{k}={v}" for k, v in cookies.items()])`
			`req.add_header("Cookie", cookie_header)`

			`# ... rest of function (unchanged)`
			```

			`Note: This is a doc-only suggestion, not officially supported by the skill.`

			`Option B: Use requests library instead of urllib`

			Replace `urllib` with `requests` for easier cookie handling (maintains API signature):

			```python
			`import requests`

			`def fetch_rss(url: str, limit: int = 10, cookies_dict: dict = None) -> list[dict]:`
			`response = requests.get(url, cookies=cookies_dict, timeout=10)`
			`response.raise_for_status()`
			`# ... parse with feedparser`
			```

			`### Step 4: Security Considerations`

			`Critical: Do NOT commit cookies to git`

			1. `.gitignore` already includes cookie files:
			- `config/cookies.json`
			- `*.cookie`
			`- No action needed (already configured)`

			`2. Set restrictive file permissions:`
			```bash
			`chmod 600 config/cookies.json`
			```

			`2. Set restrictive file permissions:`
			```bash
			`chmod 600 config/cookies.json`
			```

			`3. Rotate cookies regularly:`
			`- Browser session cookies expire (usually 7-30 days)`
			`- Re-export cookies when authentication fails`

			`4. Never share cookie files:`
			`- Cookies grant full account access`
			`- Treat like passwords`

			`---`

			`## Troubleshooting`

			`### "HTTP 403 Forbidden" errors`

			`Cause: Cookies expired or invalid`

			`Fix:`
			`1. Log in to WSJ/Barron's in browser`
			`2. Re-export cookies`
			3. Update `config/cookies.json`

			`### "Paywall detected" in articles`

			`Cause: RSS feed doesn't require auth, but full article does`

			`Fix:`
			`- Premium sources often provide headlines/snippets in RSS (no auth needed)`
			`- Full articles require subscription + cookie auth`
			`- If you only need headlines → no cookies needed`

			`### Cookies not working`

			`Debug checklist:`
			`- [ ] Correct domain in cookies.json:`
			- WSJ: Use `feeds.a.dj.com` (not `wsj.com`)
			- Barron's: Use `www.barrons.com` (not `barrons.com`)
			- Check `config/config.json` for actual feed URLs
			`- [ ] Cookie values copied completely (no truncation)`
			`- [ ] Browser session still active (test by visiting site)`
			`- [ ] File permissions correct (chmod 600)`

			`---`

			`## Alternative: Use APIs Instead`

			`Some premium sources offer APIs:`
			`- WSJ API: Not publicly available`
			`- Barron's API: Part of Dow Jones API (enterprise only)`
			`- Bloomberg API: Enterprise only`

			`Conclusion: Cookie-based auth is the only practical option for individual users.`

			`---`

			`## Recommendation`

			`For most users: Stick with free sources. They're reliable, no auth needed, and provide comprehensive market coverage.`

			`For premium subscribers: Follow Option 2, but be prepared to maintain cookie files and handle expiration.`