commit 35a94547065f71d4bd3efeccf80c440abeeb4ef9 Author: zlei9 Date: Sun Mar 29 09:36:59 2026 +0800 Initial commit with translated description diff --git a/README.md b/README.md new file mode 100644 index 0000000..c4b9342 --- /dev/null +++ b/README.md @@ -0,0 +1,113 @@ +# News Aggregator Skill + +全网科技/金融新闻聚合助手,支持AI 智能解读。 + +## ✨ 功能特性 + +- **多源聚合**:一站式覆盖硅谷科技、中国创投、开源社区及金融市场。 +- **深度阅读**:支持 Deep Fetch 模式,自动获取正文并进行 AI 深度分析。 +- **智能周报**:自动生成杂志级排版的中文日报/周报。 +- **交互菜单**:可通过"news-aggregator-skill 如意如意"唤醒交互式菜单,指哪打哪。 + +## 📚 聚合信源 + +覆盖全球 8 大主流高价值信息渠道: + +- **全球科技**:Hacker News, Product Hunt +- **开源社区**:GitHub Trending, V2EX +- **中国创投**:36Kr, 腾讯新闻科技频道 +- **社会/金融**:微博热搜, 华尔街见闻 + +## 📥 安装指南 + +### 第一步:安装到 Code Agent + +选择以下任一方式将 Skill 添加到您的 Agent: + +#### 方法 A:使用 Openskills CLI (推荐) + +会自动处理路径依赖和配置同步。 + +```bash +# 克隆仓库 +git clone git@github.com:cclank/news-aggregator-skill.git + +# 安装 skill +openskills install ./news-aggregator-skill + +# 同步配置到 Agent +openskills sync +``` + +#### 方法 B:使用 NPX (推荐 2) + +直接从远程仓库添加。 + +```bash +npx skills add https://github.com/cclank/news-aggregator-skill +``` + +#### 方法 C:Claude 标准安装 (手动) + +手动将 Skill 集成到 Claude 项目的标准方式。 + +```bash +# 1. 克隆仓库 +git clone git@github.com:cclank/news-aggregator-skill.git + +# 2. 定位或创建项目的 skills 目录 +mkdir -p YourProject/.claude/skills + +# 3. 将整个文件夹复制过去 +cp -r news-aggregator-skill YourProject/.claude/skills/ + +# 4. 验证:确保 SKILL.md 存在于目标目录 +ls YourProject/.claude/skills/news-aggregator-skill/SKILL.md +``` + +### 第二步:安装 Python 依赖(如果你的agent足够聪明,可以跳过) + +进入已安装的 Skill 目录,执行依赖安装: + +```bash +# 进入 Skill 安装目录 (根据您选择的安装方式调整路径) +cd ~/.claude/skills/news-aggregator-skill # 或 YourProject/.claude/skills/news-aggregator-skill + +# 安装依赖 +pip install -r requirements.txt +``` + +## 🚀 如何使用 + +### 1. 🔮 唤醒交互菜单 (推荐) + +最简单的使用方式,来自岚叔的彩蛋--直接召唤智能菜单: + +> **"news-aggregator-skill 如意如意"** + +系统将为您展示功能列表(如:早安日报、硅谷热点、全网扫描等),回复数字即可执行。 + +### 2. 🗣️ 自然语言触发 + +您也可以直接说出您的需求: + +- **看热点**:"帮我看看 Hacker News 和 Product Hunt 最近有什么 AI 新闻?" +- **看国内**:"36氪和腾讯新闻今天有什么科技大瓜?" +- **看开源**:"GitHub 上最近火的项目是啥?" +- **全网扫**:"全网扫描一下关于 Agent 和 LLM 的最新进展。" + +> ⚠️ **全网扫描注意**: Global Scan 是基于各平台"热榜"进行关键词过滤,而非全文检索。如果关键词(如 Agent)在当天的全网热榜中未出现,可能返回较少结果。 + +## 📊 支持源列表 + +| Source Name | ID | Category | +|-------------|----|----------| +| **Hacker News** | `hackernews` | Global Tech | +| **GitHub Trending** | `github` | Open Source | +| **Product Hunt** | `producthunt` | New Products | +| **36Kr** | `36kr` | China VC | +| **Tencent News** | `tencent` | General Tech | +| **Weibo** | `weibo` | Social Trends | +| **WallStreetCN** | `wallstreetcn` | Finance | +| **V2EX** | `v2ex` | Dev Community | + diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 0000000..236b77e --- /dev/null +++ b/SKILL.md @@ -0,0 +1,95 @@ +--- +name: news-aggregator-skill +description: "综合新闻聚合器,从8个主要来源(Hacker News、GitHub Trending、Product Hunt、36Kr、腾讯新闻、华尔街见闻、V2EX和微博)获取、过滤和深度分析实时内容。最适合"每日扫描"、"科技新闻简报"、"财经更新"和热点话题的"深度解读"。" +--- + +# News Aggregator Skill + +Fetch real-time hot news from multiple sources. + +## Tools + +### fetch_news.py + +**Usage:** + +```bash +### Single Source (Limit 10) +```bash +### Global Scan (Option 12) - **Broad Fetch Strategy** +> **NOTE**: This strategy is specifically for the "Global Scan" scenario where we want to catch all trends. + +```bash +# 1. Fetch broadly (Massive pool for Semantic Filtering) +python3 scripts/fetch_news.py --source all --limit 15 --deep + +# 2. SEMANTIC FILTERING: +# Agent manually filters the broad list (approx 120 items) for user's topics. +``` + +### Single Source & Combinations (Smart Keyword Expansion) +**CRITICAL**: You MUST automatically expand the user's simple keywords to cover the entire domain field. +* User: "AI" -> Agent uses: `--keyword "AI,LLM,GPT,Claude,Generative,Machine Learning,RAG,Agent"` +* User: "Android" -> Agent uses: `--keyword "Android,Kotlin,Google,Mobile,App"` +* User: "Finance" -> Agent uses: `--keyword "Finance,Stock,Market,Economy,Crypto,Gold"` + +```bash +# Example: User asked for "AI news from HN" (Note the expanded keywords) +python3 scripts/fetch_news.py --source hackernews --limit 20 --keyword "AI,LLM,GPT,DeepSeek,Agent" --deep +``` + +### Specific Keyword Search +Only use `--keyword` for very specific, unique terms (e.g., "DeepSeek", "OpenAI"). +```bash +python3 scripts/fetch_news.py --source all --limit 10 --keyword "DeepSeek" --deep +``` + +**Arguments:** + +- `--source`: One of `hackernews`, `weibo`, `github`, `36kr`, `producthunt`, `v2ex`, `tencent`, `wallstreetcn`, `all`. +- `--limit`: Max items per source (default 10). +- `--keyword`: Comma-separated filters (e.g. "AI,GPT"). +- `--deep`: **[NEW]** Enable deep fetching. Downloads and extracts the main text content of the articles. + +**Output:** +JSON array. If `--deep` is used, items will contain a `content` field associated with the article text. + +## Interactive Menu + +When the user says **"news-aggregator-skill 如意如意"** (or similar "menu/help" triggers): +1. **READ** the content of `templates.md` in the skill directory. +2. **DISPLAY** the list of available commands to the user exactly as they appear in the file. +3. **GUIDE** the user to select a number or copy the command to execute. + +### Smart Time Filtering & Reporting (CRITICAL) +If the user requests a specific time window (e.g., "past X hours") and the results are sparse (< 5 items): +1. **Prioritize User Window**: First, list all items that strictly fall within the user's requested time (Time < X). +2. **Smart Fill**: If the list is short, you MUST include high-value/high-heat items from a wider range (e.g. past 24h) to ensure the report provides at least 5 meaningful insights. +2. **Annotation**: Clearly mark these older items (e.g., "⚠️ 18h ago", "🔥 24h Hot") so the user knows they are supplementary. +3. **High Value**: Always prioritize "SOTA", "Major Release", or "High Heat" items even if they slightly exceed the time window. +4. **GitHub Trending Exception**: For purely list-based sources like **GitHub Trending**, strictly return the valid items from the fetched list (e.g. Top 10). **List ALL fetched items**. Do **NOT** perform "Smart Fill". + * **Deep Analysis (Required)**: For EACH item, you **MUST** leverage your AI capabilities to analyze: + * **Core Value (核心价值)**: What specific problem does it solve? Why is it trending? + * **Inspiration (启发思考)**: What technical or product insights can be drawn? + * **Scenarios (场景标签)**: 3-5 keywords (e.g. `#RAG #LocalFirst #Rust`). + +### 6. Response Guidelines (CRITICAL) + +**Format & Style:** +- **Language**: Simplified Chinese (简体中文). +- **Style**: Magazine/Newsletter style (e.g., "The Economist" or "Morning Brew" vibe). Professional, concise, yet engaging. +- **Structure**: + - **Global Headlines**: Top 3-5 most critical stories across all domains. + - **Tech & AI**: Specific section for AI, LLM, and Tech items. + - **Finance / Social**: Other strong categories if relevant. +- **Item Format**: + - **Title**: **MUST be a Markdown Link** to the original URL. + - ✅ Correct: `### 1. [OpenAI Releases GPT-5](https://...)` + - ❌ Incorrect: `### 1. OpenAI Releases GPT-5` + - **Metadata Line**: Must include Source, **Time/Date**, and Heat/Score. + - **1-Liner Summary**: A punchy, "so what?" summary. + - **Deep Interpretation (Bulleted)**: 2-3 bullet points explaining *why* this matters, technical details, or context. (Required for "Deep Scan"). + +**Output Artifact:** +- Always save the full report to `reports/` directory with a timestamped filename (e.g., `reports/hn_news_YYYYMMDD_HHMM.md`). +- Present the full report content to the user in the chat. diff --git a/_meta.json b/_meta.json new file mode 100644 index 0000000..dd5e8b0 --- /dev/null +++ b/_meta.json @@ -0,0 +1,6 @@ +{ + "ownerId": "kn70rxpkwjvb2873k5x3cwyfzn7zys6f", + "slug": "news-aggregator-skill", + "version": "0.1.0", + "publishedAt": 1769420683107 +} \ No newline at end of file diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..1190bd8 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,2 @@ +requests +beautifulsoup4 diff --git a/scripts/fetch_news.py b/scripts/fetch_news.py new file mode 100644 index 0000000..c1a64db --- /dev/null +++ b/scripts/fetch_news.py @@ -0,0 +1,323 @@ +import argparse +import json +import requests +from bs4 import BeautifulSoup +import sys +import time +import re +import concurrent.futures +from datetime import datetime + +# Headers for scraping to avoid basic bot detection +HEADERS = { + "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" +} + +def filter_items(items, keyword=None): + if not keyword: + return items + keywords = [k.strip() for k in keyword.split(',') if k.strip()] + pattern = '|'.join([r'\b' + re.escape(k) + r'\b' for k in keywords]) + regex = r'(?i)(' + pattern + r')' + return [item for item in items if re.search(regex, item['title'])] + +def fetch_url_content(url): + """ + Fetches the content of a URL and extracts text from paragraphs. + Truncates to 3000 characters. + """ + if not url or not url.startswith('http'): + return "" + try: + response = requests.get(url, headers=HEADERS, timeout=5) + response.raise_for_status() + soup = BeautifulSoup(response.content, 'html.parser') + # Remove script and style elements + for script in soup(["script", "style", "nav", "footer", "header"]): + script.extract() + # Get text + text = soup.get_text(separator=' ', strip=True) + # Simple cleanup + lines = (line.strip() for line in text.splitlines()) + chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) + text = ' '.join(chunk for chunk in chunks if chunk) + return text[:3000] + except Exception: + return "" + +def enrich_items_with_content(items, max_workers=10): + with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor: + future_to_item = {executor.submit(fetch_url_content, item['url']): item for item in items} + for future in concurrent.futures.as_completed(future_to_item): + item = future_to_item[future] + try: + content = future.result() + if content: + item['content'] = content + except Exception: + item['content'] = "" + return items + +# --- Source Fetchers --- + +def fetch_hackernews(limit=5, keyword=None): + base_url = "https://news.ycombinator.com" + news_items = [] + page = 1 + max_pages = 5 + + while len(news_items) < limit and page <= max_pages: + url = f"{base_url}/news?p={page}" + try: + response = requests.get(url, headers=HEADERS, timeout=10) + if response.status_code != 200: break + except: break + + soup = BeautifulSoup(response.text, 'html.parser') + rows = soup.select('.athing') + if not rows: break + + page_items = [] + for row in rows: + try: + id_ = row.get('id') + title_line = row.select_one('.titleline a') + if not title_line: continue + title = title_line.get_text() + link = title_line.get('href') + + # Metadata + score_span = soup.select_one(f'#score_{id_}') + score = score_span.get_text() if score_span else "0 points" + + # Age/Time + age_span = soup.select_one(f'.age a[href="item?id={id_}"]') + time_str = age_span.get_text() if age_span else "" + + if link and link.startswith('item?id='): link = f"{base_url}/{link}" + + page_items.append({ + "source": "Hacker News", + "title": title, + "url": link, + "heat": score, + "time": time_str + }) + except: continue + + news_items.extend(filter_items(page_items, keyword)) + if len(news_items) >= limit: break + page += 1 + time.sleep(0.5) + + return news_items[:limit] + +def fetch_weibo(limit=5, keyword=None): + # Use the PC Ajax API which returns JSON directly and is less rate-limited than scraping s.weibo.com + url = "https://weibo.com/ajax/side/hotSearch" + headers = { + "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", + "Referer": "https://weibo.com/" + } + + try: + response = requests.get(url, headers=headers, timeout=10) + data = response.json() + items = data.get('data', {}).get('realtime', []) + + all_items = [] + for item in items: + # key 'note' is usually the title, sometimes 'word' + title = item.get('note', '') or item.get('word', '') + if not title: continue + + # 'num' is the heat value + heat = item.get('num', 0) + + # Construct URL (usually search query) + # Web UI uses: https://s.weibo.com/weibo?q=%23TITLE%23&Refer=top + full_url = f"https://s.weibo.com/weibo?q={requests.utils.quote(title)}&Refer=top" + + all_items.append({ + "source": "Weibo Hot Search", + "title": title, + "url": full_url, + "heat": f"{heat}", + "time": "Real-time" + }) + + return filter_items(all_items, keyword)[:limit] + except Exception: + return [] + +def fetch_github(limit=5, keyword=None): + try: + response = requests.get("https://github.com/trending", headers=HEADERS, timeout=10) + except: return [] + + soup = BeautifulSoup(response.text, 'html.parser') + items = [] + for article in soup.select('article.Box-row'): + try: + h2 = article.select_one('h2 a') + if not h2: continue + title = h2.get_text(strip=True).replace('\n', '').replace(' ', '') + link = "https://github.com" + h2['href'] + + desc = article.select_one('p') + desc_text = desc.get_text(strip=True) if desc else "" + + # Stars (Heat) + # usually the first 'Link--muted' with a SVG star + stars_tag = article.select_one('a[href$="/stargazers"]') + stars = stars_tag.get_text(strip=True) if stars_tag else "" + + items.append({ + "source": "GitHub Trending", + "title": f"{title} - {desc_text}", + "url": link, + "heat": f"{stars} stars", + "time": "Today" + }) + except: continue + return filter_items(items, keyword)[:limit] + +def fetch_36kr(limit=5, keyword=None): + try: + response = requests.get("https://36kr.com/newsflashes", headers=HEADERS, timeout=10) + soup = BeautifulSoup(response.text, 'html.parser') + items = [] + for item in soup.select('.newsflash-item'): + title = item.select_one('.item-title').get_text(strip=True) + href = item.select_one('.item-title')['href'] + time_tag = item.select_one('.time') + time_str = time_tag.get_text(strip=True) if time_tag else "" + + items.append({ + "source": "36Kr", + "title": title, + "url": f"https://36kr.com{href}" if not href.startswith('http') else href, + "time": time_str, + "heat": "" + }) + return filter_items(items, keyword)[:limit] + except: return [] + +def fetch_v2ex(limit=5, keyword=None): + try: + # Hot topics json + data = requests.get("https://www.v2ex.com/api/topics/hot.json", headers=HEADERS, timeout=10).json() + items = [] + for t in data: + # V2EX API fields: created, replies (heat) + replies = t.get('replies', 0) + created = t.get('created', 0) + # convert epoch to readable if possible, simpler to just leave as is or basic format + # Let's keep it simple + items.append({ + "source": "V2EX", + "title": t['title'], + "url": t['url'], + "heat": f"{replies} replies", + "time": "Hot" + }) + return filter_items(items, keyword)[:limit] + except: return [] + +def fetch_tencent(limit=5, keyword=None): + try: + url = "https://i.news.qq.com/web_backend/v2/getTagInfo?tagId=aEWqxLtdgmQ%3D" + data = requests.get(url, headers={"Referer": "https://news.qq.com/"}, timeout=10).json() + items = [] + for news in data['data']['tabs'][0]['articleList']: + items.append({ + "source": "Tencent News", + "title": news['title'], + "url": news.get('url') or news.get('link_info', {}).get('url'), + "time": news.get('pub_time', '') or news.get('publish_time', '') + }) + return filter_items(items, keyword)[:limit] + except: return [] + +def fetch_wallstreetcn(limit=5, keyword=None): + try: + url = "https://api-one.wallstcn.com/apiv1/content/information-flow?channel=global-channel&accept=article&limit=30" + data = requests.get(url, timeout=10).json() + items = [] + for item in data['data']['items']: + res = item.get('resource') + if res and (res.get('title') or res.get('content_short')): + ts = res.get('display_time', 0) + time_str = datetime.fromtimestamp(ts).strftime('%H:%M') if ts else "" + items.append({ + "source": "Wall Street CN", + "title": res.get('title') or res.get('content_short'), + "url": res.get('uri'), + "time": time_str + }) + return filter_items(items, keyword)[:limit] + except: return [] + +def fetch_producthunt(limit=5, keyword=None): + try: + # Using RSS for speed and reliability without API key + response = requests.get("https://www.producthunt.com/feed", headers=HEADERS, timeout=10) + soup = BeautifulSoup(response.text, 'xml') + if not soup.find('item'): soup = BeautifulSoup(response.text, 'html.parser') + + items = [] + for entry in soup.find_all(['item', 'entry']): + title = entry.find('title').get_text(strip=True) + link_tag = entry.find('link') + url = link_tag.get('href') or link_tag.get_text(strip=True) if link_tag else "" + + pubBox = entry.find('pubDate') or entry.find('published') + pub = pubBox.get_text(strip=True) if pubBox else "" + + items.append({ + "source": "Product Hunt", + "title": title, + "url": url, + "time": pub, + "heat": "Top Product" # RSS implies top rank + }) + return filter_items(items, keyword)[:limit] + except: return [] + +def main(): + parser = argparse.ArgumentParser() + sources_map = { + 'hackernews': fetch_hackernews, 'weibo': fetch_weibo, 'github': fetch_github, + '36kr': fetch_36kr, 'v2ex': fetch_v2ex, 'tencent': fetch_tencent, + 'wallstreetcn': fetch_wallstreetcn, 'producthunt': fetch_producthunt + } + + parser.add_argument('--source', default='all', help='Source(s) to fetch from (comma-separated)') + parser.add_argument('--limit', type=int, default=10, help='Limit per source. Default 10') + parser.add_argument('--keyword', help='Comma-sep keyword filter') + parser.add_argument('--deep', action='store_true', help='Download article content for detailed summarization') + + args = parser.parse_args() + + to_run = [] + if args.source == 'all': + to_run = list(sources_map.values()) + else: + requested_sources = [s.strip() for s in args.source.split(',')] + for s in requested_sources: + if s in sources_map: to_run.append(sources_map[s]) + + results = [] + for func in to_run: + try: + results.extend(func(args.limit, args.keyword)) + except: pass + + if args.deep and results: + sys.stderr.write(f"Deep fetching content for {len(results)} items...\n") + results = enrich_items_with_content(results) + + print(json.dumps(results, indent=2, ensure_ascii=False)) + +if __name__ == "__main__": + main() diff --git a/templates.md b/templates.md new file mode 100644 index 0000000..dd9601a --- /dev/null +++ b/templates.md @@ -0,0 +1,49 @@ +# 🗞️ News Aggregator Skill 指令菜单 + +请回复序号(如 "1")或直接复制指令来执行任务。 + +## 🎯 单点直击 (Single Source) + +**1. 🦄 硅谷热点 (Hacker News)** +> 使用news-aggregator skill 帮我深度扫描 Hacker News 看看过去 5 小时有哪些 AI/LLM 新动态? + +**2. 🐙 开源趋势 (GitHub Trending)** +> 使用news-aggregator skill 看看 GitHub Trending 前 10 个热门开源项目。 + +**3. 🚀 创投快讯 (36Kr)** +> 使用news-aggregator skill 看看 36氪 前 10 条最新科技快讯。 + +**4. 🐧 腾讯科技 (Tencent News)** +> 使用news-aggregator skill 腾讯新闻科技频道的前 10 条大新闻是什么? + +**5. 📈 华尔街见闻 (WallstreetCN)** +> 使用news-aggregator skill 扫一眼华尔街见闻,关注前 10 条市场动态。 + +**6. 🔴 微博吃瓜 (Weibo Hot Search)** +> 使用news-aggregator skill 看看微博热搜榜前 10 都在讨论什么。 + +**7. 🐱 产品猎人 (Product Hunt)** +> 使用news-aggregator skill 扫描 Product Hunt,看看今天前 10 名的新产品有哪些。 + +**8. 🤓 极客社区 (V2EX)** +> 使用news-aggregator skill 刷一下 V2EX 的 10 个热门话题。 + +--- + +## 🥊 组合视角 (Combinations) + +**9. ☕️ 早安·全球 AI 速递 (Tech & AI)** +> 使用news-aggregator skill 帮我深度扫描 Hacker News 和 Product Hunt,看看过去 24 小时有哪些 **AI 和 LLM** 相关的重磅技术或新产品? + +**10. 🇨🇳 中国科技圈早报 (China Tech)** +> 使用news-aggregator skill 看看 36氪 和 腾讯新闻,今天国内科技圈和互联网有什么大新闻?挑 5 个最重要的给我深度总结一下。 + +**11. 👨‍💻 极客与开源 (Dev & Open Source)** +> 使用news-aggregator skill GitHub 和 V2EX 上最近现在的热门项目和话题是什么?有没有什么好玩的开源工具? + +--- + +## 🌍 上帝视角 (Global Scan) + +**12. 🔥 全网地毯式搜索 (Global Scan)** +> 使用news-aggregator skill 帮我全网扫描(所有源)热点新闻