## Summary
- Decrement document and knowledgebase chunk counts after chunks are
deleted
- Keep token counts unchanged because deleted chunk token totals are not
available
- Add tests for stats update, zero-delete behavior, error handling, and
transaction rollback
### Summary
1. env 'MINIO_PORT' is used for MINIO external access, which shouldn't
be used in Go config.
2. After RAGFlow 1.0 release, MINIO_PORT will be used for docker compose
internal usage. new ENV MINIO_EXTERNAL_PORT will be used for external
access.
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### Summary
Per discussion with @yuzhichang , disable agent test firstly.
https://github.com/infiniflow/ragflow/actions/runs/28562749273/job/84704079689?pr=16521
[0.094ms] [rows:0] SELECT * FROM `tenant_model_instance` WHERE
provider_id = "provider-1" AND instance_name = "default" ORDER BY
`tenant_model_instance`.`id` LIMIT 1
--- FAIL: TestInvoke_ProxyDNSPin (2.00s)
invoke_test.go:375: dial error = Invoke: do: Get "http://8.8.8.8/api":
context deadline exceeded; want pinned proxy IP 192.88.99.1:9999
(connection-refused is acceptable; an absent IP means the dialer fell
through to the default resolver and the pinning regression went
undetected)
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.358ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.283ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.523ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58 ExpectPing will have no effect as monitoring pings
is disabled. Use MonitorPingsOption to enable.
FAIL
FAIL ragflow/internal/agent/component 2.759s
ok ragflow/internal/agent/component/io 0.026s
### What problem does this PR solve?
Part of #15240 (rewriting the RAGFlow API server in Go).
Implements the two public bot endpoints from
`api/apps/restful_apis/bot_api.py`:
- **`GET /api/v1/chatbots/<dialog_id>/info`** (`chatbots_inputs`) —
returns `{title, avatar, prologue, has_tavily_key}` for a dialog the
authenticated tenant owns (tenant match + `status == VALID`), otherwise
`"Authentication error: no access to this chatbot!"`.
- **`GET /api/v1/searchbots/detail`** (`detail_share_embedded`) —
returns search-app detail for a `search_id` the tenant can access.
Permission is checked across the tenant's joined tenants; denial returns
`"Has no permission for this operation."` (operating error, `data:
false`) and a missing app returns `"Can't find this Search App!"`.
Both endpoints authenticate with an SDK **beta token** (`Authorization:
Bearer <beta>`) rather than a session — the token is resolved to a
tenant via `APIToken.query(beta=token)`, backed by a new
`APITokenDAO.GetByBeta`. Because they perform their own token-based
auth, the routes are registered on the unauthenticated route group
(mirroring the Python blueprint, which has no `@login_required`).
Both live in a new `internal/handler/bot.go` + `internal/service/bot.go`
since they share the same source module. Handler unit tests cover the
auth, success, and error-mapping paths.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ling Qin <qinling0210@163.com>
## Summary
Fix error messages in `build.sh` and add documentation in
`internal/development.md` for downloading native static libraries
(pdfium, pdf_oxide, office_oxide).
## Changes
- `build.sh`: change error hint from `uv run download_deps.py` to `uv
run ragflow_deps/download_deps.py` (correct path from project root)
- `internal/development.md`: add section 2.1 documenting how to download
native libs and install lld
## Summary
- use the project-standard 32-character ID generator when creating
shared chatbot sessions
- fix MySQL insert failures caused by writing 36-character UUID strings
into `api_4_conversation.id`
### Summary
Adopt sync.WaitGroup.Go (Go 1.25) to simplify tracked goroutine
spawning. This replaces the error-prone trio of wg.Add(1), go func(),
and defer wg.Done() with a single, self-contained call.
More info: https://github.com/golang/go/issues/63796
Signed-off-by: grandpig <grandpig@outlook.com>
### Summary
This PR fixes a Go backend bug where updating agent settings, such as
description, could clear the agent DSL.
Root cause:
PUT /api/v1/agents/:canvas_id only bound the dsl field in Go. When the
frontend submitted settings without dsl, the service still updated the
canvas with an empty DSL value.
Changes:
- Treat agent updates as partial patches.
- Preserve existing DSL when dsl is not present in the request.
- Update only specified user_canvas fields instead of saving the full
row.
- Add a regression test for settings updates preserving DSL.
Test:
`go test ./internal/service ./internal/handler`
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
## Summary
- align the Go `/api/v1/thumbnails` endpoint with the frontend request
format for repeated `doc_ids`
- return thumbnail mappings for multiple documents instead of failing on
a single missing document
- preserve Python-compatible thumbnail formatting, including base64
thumbnail passthrough
### Summary
Fixes a bug in the Go chunk list handler where the available` query
parser rejected `false` and `0` even though they were documented as
supported values.`
This caused requests from the "Disabled" chunk filter to return HTTP 400
and broke the chunk list page when filtering disabled chunks.
### Summary
As title
This PR fixes dataset index task creation failing with unsupported data
type: entity.JSONMap when loading document chunking config.
#### issues:
```
2026/06/30 15:19:40 /home/infiniflow/Documents/development/ragflow/internal/dao/document.go:162
[error] unsupported data type: ragflow/internal/entity.JSONMap
```
#### Changes:
+ Adds the missing GORM type:longtext tag to ParserConfig in
DocumentDAO.GetChunkingConfig.
+ Adds a DAO regression test covering GetChunkingConfig joins across
document, knowledgebase, and tenant while scanning parser_config.
### Summary
As title
Before, it return `update success` but never insert or update any
metadata
fixed:
```go
_, err = s.docEngine.InsertMetadata(nil, []map[string]interface{}{
{
"id": docID,
"kb_id": doc.KbID,
"meta_fields": meta,
},
}, tenantID)
```
### What problem does this PR solve?
Closes#12962
MCPToolCallSessions created during agent execution (in `Agent.__init__`)
are never explicitly closed. Each session starts its own event loop
thread and opens an SSE/HTTP connection to the MCP server. When the
canvas goes out of scope, these threads and connections remain alive
indefinitely, accumulating over time and causing resource exhaustion
after prolonged use.
### Solution
1. Add a `Graph.close()` method that iterates all components, finds
MCPToolCallSessions held by Agent tools, and calls `close_sync()` on
each to properly shut down the event loop, thread, and connection.
2. Call `canvas.close()` in `finally` blocks after `canvas.run()`
completes in `canvas_service.py` and `canvas_app.py`.
3. Move MCP session cleanup to `finally` blocks in `test_tool` endpoint
(`mcp_server_app.py`) and `get_mcp_tools` (`api_utils.py`) to ensure
sessions are closed even on exceptions.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: conflict-resolver <conflict-resolver@local>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Replace fragile wall-clock timeout assertions with semantic checks for
deadline errors, retry suppression, and event ordering. Keep only
lower-bound timing checks where they prove backoff behavior. This
reduces CPU-load flakes without weakening regression coverage.
### Summary
This PR fixes incorrect dataset document counters in the Go service.
Several document creation paths inserted document records directly
through documentDAO.Create, bypassing the shared InsertDocument logic
that increments knowledgebase.doc_num. As a result, datasets could
contain documents while doc_num remained 0.
### Summary
```
/api/v1/chats/<chat_id>/sessions/<session_id>/messages/<msg_id> DELETE
/api/v1/chats/<chat_id>/sessions/<session_id>/messages/<msg_id>/feedback PUT
```
Migrates the chat session message delete and feedback APIs to the Go
server, matching the Python behavior for authorization, session
ownership checks, message/reference updates, and feedback validation.
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe):
## Summary
Agent (Canvas) runs previously did not surface token usage in the SSE
stream, and RAGFlow's own Langfuse generations for agent runs were
missing the prompt/completion split and the session/user correlation.
This made it impossible for an external caller (or Langfuse) to
reconcile an agent turn's cost with the upstream provider (e.g.
OpenRouter), because a single turn can issue several distinct LLM calls
(query rewriting / cross-language translation, multi-round tool
reasoning, nested sub-agents, and the final answer).
This PR introduces a per-run token usage sink so that **every** LLM call
in a run is aggregated and reported once, and enriches Langfuse
generations with the prompt/completion split plus session/user
attributes.
## What changes
### 1. Per-run token usage sink (`common/token_utils.py`)
- Adds two `contextvars`: `token_usage_sink` (a mutable per-run
accumulator) and `langfuse_run_attrs` (session_id/user_id for the run).
- Adds `record_run_token_usage(...)` (thread-safe via a lock, because
`thread_pool_exec` copies the context into worker threads that share the
sink dict) and `usage_from_response(...)` which extracts a
`{prompt_tokens, completion_tokens, total_tokens}` split from
OpenAI/OpenRouter-style responses.
### 2. Provider layer captures the prompt/completion split
(`rag/llm/chat_model.py`)
- `LiteLLMBase` and `Base` now store `self.last_usage`
(prompt/completion/total) for the most recent chat call, in both the
plain and tool-calling paths.
- Streaming requests set `stream_options.include_usage = True` (LiteLLM
path) so the authoritative usage arrives on the final chunk; this is
read even on the usage-only chunk that carries no `choices`.
- Fixes a multi-round accounting bug in `*_with_tools`: token totals
were **overwritten** by each round (`total_tokens = tol`) instead of
accumulated, undercounting multi-round tool conversations. Each round is
now committed to a running aggregate.
### 3. LLMBundle reports usage once, per call
(`api/db/services/llm_service.py`)
- New `_report_usage(total_tokens)` records the call's usage into the
active run sink and returns the prompt/completion/total split for
Langfuse. The split is only used when it is consistent with the
authoritative total; otherwise only the total is reported.
- All three chat entry points (`async_chat`, `async_chat_streamly`,
`async_chat_streamly_delta`) now emit `usage_details` with
`input`/`output`/`total` instead of total-only.
- `_start_langfuse_observation` now applies `session_id`/`user_id` from
the per-run context (`langfuse_run_attrs`) so agent-run generations are
correctly grouped, even though agent LLMBundles are constructed without
those attributes.
### 4. Canvas installs the sink and emits the aggregate
(`agent/canvas.py`)
- `Canvas.run()` installs a fresh `token_usage_sink` and
`langfuse_run_attrs` (from `user_id`/`session_id`) at the start of every
turn.
- `message_end` now includes an aggregated `usage` object:
`{prompt_tokens, completion_tokens, total_tokens, calls}` covering all
LLM calls in the run.
### 5. Pass session id into the run
(`api/db/services/canvas_service.py`)
- `completion()` forwards `session_id` to `Canvas.run()` for Langfuse
session correlation.
## Why a context variable
LLM calls in an agent run originate from many places that each build
their own `LLMBundle` (e.g. `cross_languages`/`keyword_extraction`
helpers, the Agent component, and nested sub-agents invoked as tools). A
run-scoped context variable is the only non-invasive chokepoint that
captures all of them exactly once, including nested agents (which run in
the same async context) and thread-pool tools (the executor copies the
context).
## Behavior / compatibility
- No public API or wire-format removal: `message_end` gains an
additional optional `usage` field; existing consumers are unaffected.
- When a provider does not return authoritative usage, behavior falls
back to the previous token estimate (total only, no split).
- Non-agent flows (Dataflow `Pipeline`, sync `Graph.run`) are untouched.
## Testing
- [x] Simple agent answer: `message_end.usage.total_tokens` matches
provider usage.
- [x] Agent with cross-language retrieval: aggregate equals the sum of
both provider calls.
- [x] Tool-calling agent (multi-round): total accumulates across rounds.
- [x] Nested agent (agent-as-tool): sub-agent tokens included in the
parent run total.
- [x] Langfuse: agent generations show input/output split and are
grouped by session/user.
---------
Co-authored-by: yzc <yuzhichang@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
## Problem
When building or updating a knowledge graph with a large number of
entities and edges, `set_graph()` in `rag/graphrag/utils.py` creates one
`asyncio` task per entity and one per edge, each calling
`embd_mdl.encode([single_name])` — a single-item HTTP request to the
embedding server.
For a graph with 17,000+ nodes and edges (real case reported in #16205),
this generates **34,000+ individual embedding API round-trips** instead
of ~266 batched calls at the default `_INSERT_BULK_SIZE=128`. The
asyncio gather over thousands of tasks makes the embedding server the
bottleneck; under load, a single slow/failed call aborts all remaining
tasks, causing the pipeline to stall and never complete.
Closes#16205. Related: #15921.
## Root Cause
```python
# Before (in set_graph, node loop):
tasks = [asyncio.create_task(graph_node_to_chunk(n, ...)) for n in nodes]
# Each task calls embd_mdl.encode([single_name]) — 1 HTTP call per node
```
`graph_node_to_chunk` checks the embed cache first, but the cache is
cold on first build, so every task makes a live API call.
## Fix
Pre-warm the embedding cache with batched calls before spawning tasks.
Each batch pre-warm calls `embd_mdl.encode(batch_of_128)` once,
populating the cache. Then every individual task hits the cache and
makes zero embedding API calls.
- Only encodes names not already in cache (no-op on warm cache / small
incremental updates)
- Uses existing project idioms: `thread_pool_exec`, `chat_limiter`,
`_INSERT_BULK_SIZE`, `get_embed_cache`, `set_embed_cache`
- Mirrors the `ENABLE_TIMEOUT_ASSERTION` timeout pattern from
`graph_node_to_chunk`
- Zero behavior change: per-task encode logic remains as a correct
fallback
## Result
| Graph size | Before | After |
|---|---|---|
| 17,576 edges | ~17,576 embedding calls → stall | ~138 batched calls |
| 17,509 nodes | ~17,509 embedding calls → stall | ~137 batched calls |
| **Total** | **~35,000 calls** | **~275 calls** |
---------
Co-authored-by: Oti_B <oti@mac.speedport.ip>
### Summary
The addition of the Context method to Go's testing.T provides
significant improvements for writing concurrent tests. It allows better
management of goroutines, ensuring they properly exit and preventing
issues like deadlocks and unfinished processes.
By using Context, errors and cancellations can be handled more
effectively, making tests more robust and easier to reason about. This
change also enables tighter integration between tests and the
application code, especially for systems that span multiple concurrent
components. Overall, it simplifies test code and enhances test stability
and maintainability.
More info: [golang/go#18368](https://github.com/golang/go/issues/18368)
Signed-off-by: blackflytech <blackflytech@outlook.com>