Commit Graph

7 Commits

Author SHA1 Message Date
Öndery
742188c3bb feat(agent): report accurate aggregated token usage and propagate session/user + input/output to Langfuse for agent runs (#16420)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe):

## Summary

Agent (Canvas) runs previously did not surface token usage in the SSE
stream, and RAGFlow's own Langfuse generations for agent runs were
missing the prompt/completion split and the session/user correlation.
This made it impossible for an external caller (or Langfuse) to
reconcile an agent turn's cost with the upstream provider (e.g.
OpenRouter), because a single turn can issue several distinct LLM calls
(query rewriting / cross-language translation, multi-round tool
reasoning, nested sub-agents, and the final answer).

This PR introduces a per-run token usage sink so that **every** LLM call
in a run is aggregated and reported once, and enriches Langfuse
generations with the prompt/completion split plus session/user
attributes.

## What changes

### 1. Per-run token usage sink (`common/token_utils.py`)

- Adds two `contextvars`: `token_usage_sink` (a mutable per-run
accumulator) and `langfuse_run_attrs` (session_id/user_id for the run).
- Adds `record_run_token_usage(...)` (thread-safe via a lock, because
`thread_pool_exec` copies the context into worker threads that share the
sink dict) and `usage_from_response(...)` which extracts a
`{prompt_tokens, completion_tokens, total_tokens}` split from
OpenAI/OpenRouter-style responses.

### 2. Provider layer captures the prompt/completion split
(`rag/llm/chat_model.py`)

- `LiteLLMBase` and `Base` now store `self.last_usage`
(prompt/completion/total) for the most recent chat call, in both the
plain and tool-calling paths.
- Streaming requests set `stream_options.include_usage = True` (LiteLLM
path) so the authoritative usage arrives on the final chunk; this is
read even on the usage-only chunk that carries no `choices`.
- Fixes a multi-round accounting bug in `*_with_tools`: token totals
were **overwritten** by each round (`total_tokens = tol`) instead of
accumulated, undercounting multi-round tool conversations. Each round is
now committed to a running aggregate.

### 3. LLMBundle reports usage once, per call
(`api/db/services/llm_service.py`)

- New `_report_usage(total_tokens)` records the call's usage into the
active run sink and returns the prompt/completion/total split for
Langfuse. The split is only used when it is consistent with the
authoritative total; otherwise only the total is reported.
- All three chat entry points (`async_chat`, `async_chat_streamly`,
`async_chat_streamly_delta`) now emit `usage_details` with
`input`/`output`/`total` instead of total-only.
- `_start_langfuse_observation` now applies `session_id`/`user_id` from
the per-run context (`langfuse_run_attrs`) so agent-run generations are
correctly grouped, even though agent LLMBundles are constructed without
those attributes.

### 4. Canvas installs the sink and emits the aggregate
(`agent/canvas.py`)

- `Canvas.run()` installs a fresh `token_usage_sink` and
`langfuse_run_attrs` (from `user_id`/`session_id`) at the start of every
turn.
- `message_end` now includes an aggregated `usage` object:
`{prompt_tokens, completion_tokens, total_tokens, calls}` covering all
LLM calls in the run.

### 5. Pass session id into the run
(`api/db/services/canvas_service.py`)

- `completion()` forwards `session_id` to `Canvas.run()` for Langfuse
session correlation.

## Why a context variable

LLM calls in an agent run originate from many places that each build
their own `LLMBundle` (e.g. `cross_languages`/`keyword_extraction`
helpers, the Agent component, and nested sub-agents invoked as tools). A
run-scoped context variable is the only non-invasive chokepoint that
captures all of them exactly once, including nested agents (which run in
the same async context) and thread-pool tools (the executor copies the
context).

## Behavior / compatibility

- No public API or wire-format removal: `message_end` gains an
additional optional `usage` field; existing consumers are unaffected.
- When a provider does not return authoritative usage, behavior falls
back to the previous token estimate (total only, no split).
- Non-agent flows (Dataflow `Pipeline`, sync `Graph.run`) are untouched.

## Testing
- [x] Simple agent answer: `message_end.usage.total_tokens` matches
provider usage.
- [x] Agent with cross-language retrieval: aggregate equals the sum of
both provider calls.
- [x] Tool-calling agent (multi-round): total accumulates across rounds.
- [x] Nested agent (agent-as-tool): sub-agent tokens included in the
parent run total.
- [x] Langfuse: agent generations show input/output split and are
grouped by session/user.

---------

Co-authored-by: yzc <yuzhichang@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-07-02 09:35:28 +08:00
jony376
8fb692f10a fix(agent): enforce document access on POST /api/v1/agents/rerun (#15145)
## Related issues

Closes #15144

### What problem does this PR solve?

`POST /api/v1/agents/rerun` loaded a pipeline operation log by UUID via
`PipelineOperationLogService.get_documents_info` with no authorization,
then wiped chunks, reset document counters, deleted tasks, and re-queued
dataflow for the victim document.

Any authenticated user who knew a victim's pipeline log id could disrupt
parsing on documents they did not own.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Changes

| File | Change |
|------|--------|
| `api/apps/restful_apis/agent_api.py` | Call
`DocumentService.accessible(doc["id"], tenant_id)` before destructive
rerun operations; deny with generic `"Document not found."` |
|
`test/unit_test/api/apps/restful_apis/test_rerun_agent_authorization.py`
| Unit tests: cross-tenant log rejected, missing/unauthorized same
message, authorized rerun proceeds |

### Security notes

- **CWE-639:** Closes cross-tenant pipeline rerun / chunk wipe via
leaked log UUID.
- `tenant_id` from `@add_tenant_id_to_kwargs` is `current_user.id`;
`DocumentService.accessible` covers team-shared KBs.

### Test plan

- [ ] `pytest
test/unit_test/api/apps/restful_apis/test_rerun_agent_authorization.py`
- [ ] Manual: attacker cannot rerun victim pipeline log id

```bash
cd ragflow
uv run pytest test/unit_test/api/apps/restful_apis/test_rerun_agent_authorization.py -q
```

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-29 09:45:17 +08:00
Tim Wang
f0f10b6092 Fix: UserFillUp interactive forms not working in agent explore mode (#14589)
## Summary

- **Backend**: `_iter_session_completion_events` in `agent_api.py` was
filtering out `user_inputs` and `workflow_finished` SSE events, causing
agents with UserFillUp components to silently fail in explore mode — the
interactive form never appeared, while the same agent worked correctly
in run (editor) mode.
- **Frontend**: `SessionChat` component in explore mode was missing
`DebugContent` children rendering inside `MessageItem`, so even if the
backend forwarded the events, the form UI would not render. Added
`DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and
input-disabling logic to match the run mode's `chat/box.tsx` behavior.

## What was changed

### Backend (`api/apps/restful_apis/agent_api.py`)
- Line 266: Added `"user_inputs"` and `"workflow_finished"` to the
allowed event filter in `_iter_session_completion_events`

### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`)
- Added imports: `DebugContent`, `MarkdownContent`,
`useAwaitCompentData`, `useParams`
- Added `sendFormMessage` from `useSendSessionMessage()` hook
- Added `useAwaitCompentData` hook for form state management
- Added `DebugContent` as `MessageItem` children for the latest
assistant message (renders UserFillUp form)
- Added `MarkdownContent` + submitted values display for previous
assistant messages
- Updated `NextMessageInput` disabled states to respect `isWaitting`
(form submission in progress)

## Test plan

- [x] Agent with UserFillUp component (e.g., email draft with
send/edit/cancel options) shows interactive form in **explore mode**
- [x] Same agent continues to work correctly in **run (editor) mode**
- [x] Form submission sends data back to the agent and workflow
continues
- [x] Input field is disabled while waiting for form submission
- [ ] Agents without UserFillUp components are unaffected in explore
mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-29 09:45:17 +08:00
Yongteng Lei
5d391fb1f9 fix: guard Dashscope response attribute access in token/log utils (#12082)
### What problem does this PR solve?

Guard Dashscope response attribute access in token/log utils, since
`dashscope_response` returns dict like object.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-22 16:17:58 +08:00
Stephen Hu
a63dcfed6f Refactor: improve cohere calculate total counts (#12007)
### What problem does this PR solve?

improve cohere calculate total counts

### Type of change


- [x] Refactoring
2025-12-18 10:04:28 +08:00
张哲芳
ecf0322165 fix(llm): handle None response in total_token_count_from_response (#10941)
### What problem does this PR solve?

Fixes #10933

This PR fixes a `TypeError` in the Gemini model provider where the
`total_token_count_from_response()` function could receive a `None`
response object, causing the error:

TypeError: argument of type 'NoneType' is not iterable

**Root Cause:**
The function attempted to use the `in` operator to check dictionary keys
(lines 48, 54, 60) without first validating that `resp` was not `None`.
When Gemini's `chat_streamly()` method returns `None`, this triggers the
error.

**Solution:**
1. Added a null check at the beginning of the function to return `0` if
`resp is None`
2. Added `isinstance(resp, dict)` checks before all `in` operations to
ensure type safety
3. This defensive programming approach prevents the TypeError while
maintaining backward compatibility

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes Made

**File:** `rag/utils/__init__.py`

- Line 36-38: Added `if resp is None: return 0` check
- Line 52: Added `isinstance(resp, dict)` before `'usage' in resp`
- Line 58: Added `isinstance(resp, dict)` before `'usage' in resp`  
- Line 64: Added `isinstance(resp, dict)` before `'meta' in resp`

### Testing

- [x] Code compiles without errors
- [x] Follows existing code style and conventions
- [x] Change is minimal and focused on the specific issue

### Additional Notes

This fix ensures robust handling of various response types from LLM
providers, particularly Gemini, w

---------

Signed-off-by: Zhang Zhefang <zhangzhefang@example.com>
2025-11-20 10:04:03 +08:00
Jin Hai
360f5c1179 Move token related functions to common (#10942)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-03 08:50:05 +08:00