ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-03 09:11:59 +08:00

Author	SHA1	Message	Date
Öndery	742188c3bb	feat(agent): report accurate aggregated token usage and propagate session/user + input/output to Langfuse for agent runs (#16420 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): ## Summary Agent (Canvas) runs previously did not surface token usage in the SSE stream, and RAGFlow's own Langfuse generations for agent runs were missing the prompt/completion split and the session/user correlation. This made it impossible for an external caller (or Langfuse) to reconcile an agent turn's cost with the upstream provider (e.g. OpenRouter), because a single turn can issue several distinct LLM calls (query rewriting / cross-language translation, multi-round tool reasoning, nested sub-agents, and the final answer). This PR introduces a per-run token usage sink so that every LLM call in a run is aggregated and reported once, and enriches Langfuse generations with the prompt/completion split plus session/user attributes. ## What changes ### 1. Per-run token usage sink (`common/token_utils.py`) - Adds two `contextvars`: `token_usage_sink` (a mutable per-run accumulator) and `langfuse_run_attrs` (session_id/user_id for the run). - Adds `record_run_token_usage(...)` (thread-safe via a lock, because `thread_pool_exec` copies the context into worker threads that share the sink dict) and `usage_from_response(...)` which extracts a `{prompt_tokens, completion_tokens, total_tokens}` split from OpenAI/OpenRouter-style responses. ### 2. Provider layer captures the prompt/completion split (`rag/llm/chat_model.py`) - `LiteLLMBase` and `Base` now store `self.last_usage` (prompt/completion/total) for the most recent chat call, in both the plain and tool-calling paths. - Streaming requests set `stream_options.include_usage = True` (LiteLLM path) so the authoritative usage arrives on the final chunk; this is read even on the usage-only chunk that carries no `choices`. - Fixes a multi-round accounting bug in `_with_tools`: token totals were overwritten* by each round (`total_tokens = tol`) instead of accumulated, undercounting multi-round tool conversations. Each round is now committed to a running aggregate. ### 3. LLMBundle reports usage once, per call (`api/db/services/llm_service.py`) - New `_report_usage(total_tokens)` records the call's usage into the active run sink and returns the prompt/completion/total split for Langfuse. The split is only used when it is consistent with the authoritative total; otherwise only the total is reported. - All three chat entry points (`async_chat`, `async_chat_streamly`, `async_chat_streamly_delta`) now emit `usage_details` with `input`/`output`/`total` instead of total-only. - `_start_langfuse_observation` now applies `session_id`/`user_id` from the per-run context (`langfuse_run_attrs`) so agent-run generations are correctly grouped, even though agent LLMBundles are constructed without those attributes. ### 4. Canvas installs the sink and emits the aggregate (`agent/canvas.py`) - `Canvas.run()` installs a fresh `token_usage_sink` and `langfuse_run_attrs` (from `user_id`/`session_id`) at the start of every turn. - `message_end` now includes an aggregated `usage` object: `{prompt_tokens, completion_tokens, total_tokens, calls}` covering all LLM calls in the run. ### 5. Pass session id into the run (`api/db/services/canvas_service.py`) - `completion()` forwards `session_id` to `Canvas.run()` for Langfuse session correlation. ## Why a context variable LLM calls in an agent run originate from many places that each build their own `LLMBundle` (e.g. `cross_languages`/`keyword_extraction` helpers, the Agent component, and nested sub-agents invoked as tools). A run-scoped context variable is the only non-invasive chokepoint that captures all of them exactly once, including nested agents (which run in the same async context) and thread-pool tools (the executor copies the context). ## Behavior / compatibility - No public API or wire-format removal: `message_end` gains an additional optional `usage` field; existing consumers are unaffected. - When a provider does not return authoritative usage, behavior falls back to the previous token estimate (total only, no split). - Non-agent flows (Dataflow `Pipeline`, sync `Graph.run`) are untouched. ## Testing - [x] Simple agent answer: `message_end.usage.total_tokens` matches provider usage. - [x] Agent with cross-language retrieval: aggregate equals the sum of both provider calls. - [x] Tool-calling agent (multi-round): total accumulates across rounds. - [x] Nested agent (agent-as-tool): sub-agent tokens included in the parent run total. - [x] Langfuse: agent generations show input/output split and are grouped by session/user. --------- Co-authored-by: yzc <yuzhichang@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-07-02 09:35:28 +08:00
jony376	8fb692f10a	fix(agent): enforce document access on POST /api/v1/agents/rerun (#15145 ) ## Related issues Closes #15144 ### What problem does this PR solve? `POST /api/v1/agents/rerun` loaded a pipeline operation log by UUID via `PipelineOperationLogService.get_documents_info` with no authorization, then wiped chunks, reset document counters, deleted tasks, and re-queued dataflow for the victim document. Any authenticated user who knew a victim's pipeline log id could disrupt parsing on documents they did not own. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/agent_api.py` \| Call `DocumentService.accessible(doc["id"], tenant_id)` before destructive rerun operations; deny with generic `"Document not found."` \| \| `test/unit_test/api/apps/restful_apis/test_rerun_agent_authorization.py` \| Unit tests: cross-tenant log rejected, missing/unauthorized same message, authorized rerun proceeds \| ### Security notes - CWE-639: Closes cross-tenant pipeline rerun / chunk wipe via leaked log UUID. - `tenant_id` from `@add_tenant_id_to_kwargs` is `current_user.id`; `DocumentService.accessible` covers team-shared KBs. ### Test plan - [ ] `pytest test/unit_test/api/apps/restful_apis/test_rerun_agent_authorization.py` - [ ] Manual: attacker cannot rerun victim pipeline log id ```bash cd ragflow uv run pytest test/unit_test/api/apps/restful_apis/test_rerun_agent_authorization.py -q ``` --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:17 +08:00
Tim Wang	f0f10b6092	Fix: UserFillUp interactive forms not working in agent explore mode (#14589 ) ## Summary - Backend: `_iter_session_completion_events` in `agent_api.py` was filtering out `user_inputs` and `workflow_finished` SSE events, causing agents with UserFillUp components to silently fail in explore mode — the interactive form never appeared, while the same agent worked correctly in run (editor) mode. - Frontend: `SessionChat` component in explore mode was missing `DebugContent` children rendering inside `MessageItem`, so even if the backend forwarded the events, the form UI would not render. Added `DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and input-disabling logic to match the run mode's `chat/box.tsx` behavior. ## What was changed ### Backend (`api/apps/restful_apis/agent_api.py`) - Line 266: Added `"user_inputs"` and `"workflow_finished"` to the allowed event filter in `_iter_session_completion_events` ### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`) - Added imports: `DebugContent`, `MarkdownContent`, `useAwaitCompentData`, `useParams` - Added `sendFormMessage` from `useSendSessionMessage()` hook - Added `useAwaitCompentData` hook for form state management - Added `DebugContent` as `MessageItem` children for the latest assistant message (renders UserFillUp form) - Added `MarkdownContent` + submitted values display for previous assistant messages - Updated `NextMessageInput` disabled states to respect `isWaitting` (form submission in progress) ## Test plan - [x] Agent with UserFillUp component (e.g., email draft with send/edit/cancel options) shows interactive form in explore mode - [x] Same agent continues to work correctly in run (editor) mode - [x] Form submission sends data back to the agent and workflow continues - [x] Input field is disabled while waiting for form submission - [ ] Agents without UserFillUp components are unaffected in explore mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:17 +08:00
Yongteng Lei	5d391fb1f9	fix: guard Dashscope response attribute access in token/log utils (#12082 ) ### What problem does this PR solve? Guard Dashscope response attribute access in token/log utils, since `dashscope_response` returns dict like object. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-22 16:17:58 +08:00
Stephen Hu	a63dcfed6f	Refactor: improve cohere calculate total counts (#12007 ) ### What problem does this PR solve? improve cohere calculate total counts ### Type of change - [x] Refactoring	2025-12-18 10:04:28 +08:00
张哲芳	ecf0322165	fix(llm): handle None response in total_token_count_from_response (#10941 ) ### What problem does this PR solve? Fixes #10933 This PR fixes a `TypeError` in the Gemini model provider where the `total_token_count_from_response()` function could receive a `None` response object, causing the error: TypeError: argument of type 'NoneType' is not iterable Root Cause: The function attempted to use the `in` operator to check dictionary keys (lines 48, 54, 60) without first validating that `resp` was not `None`. When Gemini's `chat_streamly()` method returns `None`, this triggers the error. Solution: 1. Added a null check at the beginning of the function to return `0` if `resp is None` 2. Added `isinstance(resp, dict)` checks before all `in` operations to ensure type safety 3. This defensive programming approach prevents the TypeError while maintaining backward compatibility ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Changes Made File: `rag/utils/__init__.py` - Line 36-38: Added `if resp is None: return 0` check - Line 52: Added `isinstance(resp, dict)` before `'usage' in resp` - Line 58: Added `isinstance(resp, dict)` before `'usage' in resp` - Line 64: Added `isinstance(resp, dict)` before `'meta' in resp` ### Testing - [x] Code compiles without errors - [x] Follows existing code style and conventions - [x] Change is minimal and focused on the specific issue ### Additional Notes This fix ensures robust handling of various response types from LLM providers, particularly Gemini, w --------- Signed-off-by: Zhang Zhefang <zhangzhefang@example.com>	2025-11-20 10:04:03 +08:00
Jin Hai	360f5c1179	Move token related functions to common (#10942 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 08:50:05 +08:00

7 Commits