### Summary
Plan to start api_server, admin_server and ingestor in one binary:
- ./ragflow_main --admin
- ./ragflow_main --api
- ./ragflow_main --ingestor
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Decrement document and knowledgebase chunk counts after chunks are
deleted
- Keep token counts unchanged because deleted chunk token totals are not
available
- Add tests for stats update, zero-delete behavior, error handling, and
transaction rollback
### Summary
1. env 'MINIO_PORT' is used for MINIO external access, which shouldn't
be used in Go config.
2. After RAGFlow 1.0 release, MINIO_PORT will be used for docker compose
internal usage. new ENV MINIO_EXTERNAL_PORT will be used for external
access.
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### Summary
Per discussion with @yuzhichang , disable agent test firstly.
https://github.com/infiniflow/ragflow/actions/runs/28562749273/job/84704079689?pr=16521
[0.094ms] [rows:0] SELECT * FROM `tenant_model_instance` WHERE
provider_id = "provider-1" AND instance_name = "default" ORDER BY
`tenant_model_instance`.`id` LIMIT 1
--- FAIL: TestInvoke_ProxyDNSPin (2.00s)
invoke_test.go:375: dial error = Invoke: do: Get "http://8.8.8.8/api":
context deadline exceeded; want pinned proxy IP 192.88.99.1:9999
(connection-refused is acceptable; an absent IP means the dialer fell
through to the default resolver and the pinning regression went
undetected)
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.358ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.283ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.523ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58 ExpectPing will have no effect as monitoring pings
is disabled. Use MonitorPingsOption to enable.
FAIL
FAIL ragflow/internal/agent/component 2.759s
ok ragflow/internal/agent/component/io 0.026s
### What problem does this PR solve?
Part of #15240 (rewriting the RAGFlow API server in Go).
Implements the two public bot endpoints from
`api/apps/restful_apis/bot_api.py`:
- **`GET /api/v1/chatbots/<dialog_id>/info`** (`chatbots_inputs`) —
returns `{title, avatar, prologue, has_tavily_key}` for a dialog the
authenticated tenant owns (tenant match + `status == VALID`), otherwise
`"Authentication error: no access to this chatbot!"`.
- **`GET /api/v1/searchbots/detail`** (`detail_share_embedded`) —
returns search-app detail for a `search_id` the tenant can access.
Permission is checked across the tenant's joined tenants; denial returns
`"Has no permission for this operation."` (operating error, `data:
false`) and a missing app returns `"Can't find this Search App!"`.
Both endpoints authenticate with an SDK **beta token** (`Authorization:
Bearer <beta>`) rather than a session — the token is resolved to a
tenant via `APIToken.query(beta=token)`, backed by a new
`APITokenDAO.GetByBeta`. Because they perform their own token-based
auth, the routes are registered on the unauthenticated route group
(mirroring the Python blueprint, which has no `@login_required`).
Both live in a new `internal/handler/bot.go` + `internal/service/bot.go`
since they share the same source module. Handler unit tests cover the
auth, success, and error-mapping paths.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ling Qin <qinling0210@163.com>
## Summary
Fix error messages in `build.sh` and add documentation in
`internal/development.md` for downloading native static libraries
(pdfium, pdf_oxide, office_oxide).
## Changes
- `build.sh`: change error hint from `uv run download_deps.py` to `uv
run ragflow_deps/download_deps.py` (correct path from project root)
- `internal/development.md`: add section 2.1 documenting how to download
native libs and install lld
## Summary
- use the project-standard 32-character ID generator when creating
shared chatbot sessions
- fix MySQL insert failures caused by writing 36-character UUID strings
into `api_4_conversation.id`
### Summary
Adopt sync.WaitGroup.Go (Go 1.25) to simplify tracked goroutine
spawning. This replaces the error-prone trio of wg.Add(1), go func(),
and defer wg.Done() with a single, self-contained call.
More info: https://github.com/golang/go/issues/63796
Signed-off-by: grandpig <grandpig@outlook.com>
### Summary
This PR fixes a Go backend bug where updating agent settings, such as
description, could clear the agent DSL.
Root cause:
PUT /api/v1/agents/:canvas_id only bound the dsl field in Go. When the
frontend submitted settings without dsl, the service still updated the
canvas with an empty DSL value.
Changes:
- Treat agent updates as partial patches.
- Preserve existing DSL when dsl is not present in the request.
- Update only specified user_canvas fields instead of saving the full
row.
- Add a regression test for settings updates preserving DSL.
Test:
`go test ./internal/service ./internal/handler`
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
## Summary
- align the Go `/api/v1/thumbnails` endpoint with the frontend request
format for repeated `doc_ids`
- return thumbnail mappings for multiple documents instead of failing on
a single missing document
- preserve Python-compatible thumbnail formatting, including base64
thumbnail passthrough
### Summary
Fixes a bug in the Go chunk list handler where the available` query
parser rejected `false` and `0` even though they were documented as
supported values.`
This caused requests from the "Disabled" chunk filter to return HTTP 400
and broke the chunk list page when filtering disabled chunks.
### Summary
As title
This PR fixes dataset index task creation failing with unsupported data
type: entity.JSONMap when loading document chunking config.
#### issues:
```
2026/06/30 15:19:40 /home/infiniflow/Documents/development/ragflow/internal/dao/document.go:162
[error] unsupported data type: ragflow/internal/entity.JSONMap
```
#### Changes:
+ Adds the missing GORM type:longtext tag to ParserConfig in
DocumentDAO.GetChunkingConfig.
+ Adds a DAO regression test covering GetChunkingConfig joins across
document, knowledgebase, and tenant while scanning parser_config.
### Summary
As title
Before, it return `update success` but never insert or update any
metadata
fixed:
```go
_, err = s.docEngine.InsertMetadata(nil, []map[string]interface{}{
{
"id": docID,
"kb_id": doc.KbID,
"meta_fields": meta,
},
}, tenantID)
```
### What problem does this PR solve?
Closes#12962
MCPToolCallSessions created during agent execution (in `Agent.__init__`)
are never explicitly closed. Each session starts its own event loop
thread and opens an SSE/HTTP connection to the MCP server. When the
canvas goes out of scope, these threads and connections remain alive
indefinitely, accumulating over time and causing resource exhaustion
after prolonged use.
### Solution
1. Add a `Graph.close()` method that iterates all components, finds
MCPToolCallSessions held by Agent tools, and calls `close_sync()` on
each to properly shut down the event loop, thread, and connection.
2. Call `canvas.close()` in `finally` blocks after `canvas.run()`
completes in `canvas_service.py` and `canvas_app.py`.
3. Move MCP session cleanup to `finally` blocks in `test_tool` endpoint
(`mcp_server_app.py`) and `get_mcp_tools` (`api_utils.py`) to ensure
sessions are closed even on exceptions.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: conflict-resolver <conflict-resolver@local>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Replace fragile wall-clock timeout assertions with semantic checks for
deadline errors, retry suppression, and event ordering. Keep only
lower-bound timing checks where they prove backoff behavior. This
reduces CPU-load flakes without weakening regression coverage.
### Summary
This PR fixes incorrect dataset document counters in the Go service.
Several document creation paths inserted document records directly
through documentDAO.Create, bypassing the shared InsertDocument logic
that increments knowledgebase.doc_num. As a result, datasets could
contain documents while doc_num remained 0.
### Summary
```
/api/v1/chats/<chat_id>/sessions/<session_id>/messages/<msg_id> DELETE
/api/v1/chats/<chat_id>/sessions/<session_id>/messages/<msg_id>/feedback PUT
```
Migrates the chat session message delete and feedback APIs to the Go
server, matching the Python behavior for authorization, session
ownership checks, message/reference updates, and feedback validation.
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe):
## Summary
Agent (Canvas) runs previously did not surface token usage in the SSE
stream, and RAGFlow's own Langfuse generations for agent runs were
missing the prompt/completion split and the session/user correlation.
This made it impossible for an external caller (or Langfuse) to
reconcile an agent turn's cost with the upstream provider (e.g.
OpenRouter), because a single turn can issue several distinct LLM calls
(query rewriting / cross-language translation, multi-round tool
reasoning, nested sub-agents, and the final answer).
This PR introduces a per-run token usage sink so that **every** LLM call
in a run is aggregated and reported once, and enriches Langfuse
generations with the prompt/completion split plus session/user
attributes.
## What changes
### 1. Per-run token usage sink (`common/token_utils.py`)
- Adds two `contextvars`: `token_usage_sink` (a mutable per-run
accumulator) and `langfuse_run_attrs` (session_id/user_id for the run).
- Adds `record_run_token_usage(...)` (thread-safe via a lock, because
`thread_pool_exec` copies the context into worker threads that share the
sink dict) and `usage_from_response(...)` which extracts a
`{prompt_tokens, completion_tokens, total_tokens}` split from
OpenAI/OpenRouter-style responses.
### 2. Provider layer captures the prompt/completion split
(`rag/llm/chat_model.py`)
- `LiteLLMBase` and `Base` now store `self.last_usage`
(prompt/completion/total) for the most recent chat call, in both the
plain and tool-calling paths.
- Streaming requests set `stream_options.include_usage = True` (LiteLLM
path) so the authoritative usage arrives on the final chunk; this is
read even on the usage-only chunk that carries no `choices`.
- Fixes a multi-round accounting bug in `*_with_tools`: token totals
were **overwritten** by each round (`total_tokens = tol`) instead of
accumulated, undercounting multi-round tool conversations. Each round is
now committed to a running aggregate.
### 3. LLMBundle reports usage once, per call
(`api/db/services/llm_service.py`)
- New `_report_usage(total_tokens)` records the call's usage into the
active run sink and returns the prompt/completion/total split for
Langfuse. The split is only used when it is consistent with the
authoritative total; otherwise only the total is reported.
- All three chat entry points (`async_chat`, `async_chat_streamly`,
`async_chat_streamly_delta`) now emit `usage_details` with
`input`/`output`/`total` instead of total-only.
- `_start_langfuse_observation` now applies `session_id`/`user_id` from
the per-run context (`langfuse_run_attrs`) so agent-run generations are
correctly grouped, even though agent LLMBundles are constructed without
those attributes.
### 4. Canvas installs the sink and emits the aggregate
(`agent/canvas.py`)
- `Canvas.run()` installs a fresh `token_usage_sink` and
`langfuse_run_attrs` (from `user_id`/`session_id`) at the start of every
turn.
- `message_end` now includes an aggregated `usage` object:
`{prompt_tokens, completion_tokens, total_tokens, calls}` covering all
LLM calls in the run.
### 5. Pass session id into the run
(`api/db/services/canvas_service.py`)
- `completion()` forwards `session_id` to `Canvas.run()` for Langfuse
session correlation.
## Why a context variable
LLM calls in an agent run originate from many places that each build
their own `LLMBundle` (e.g. `cross_languages`/`keyword_extraction`
helpers, the Agent component, and nested sub-agents invoked as tools). A
run-scoped context variable is the only non-invasive chokepoint that
captures all of them exactly once, including nested agents (which run in
the same async context) and thread-pool tools (the executor copies the
context).
## Behavior / compatibility
- No public API or wire-format removal: `message_end` gains an
additional optional `usage` field; existing consumers are unaffected.
- When a provider does not return authoritative usage, behavior falls
back to the previous token estimate (total only, no split).
- Non-agent flows (Dataflow `Pipeline`, sync `Graph.run`) are untouched.
## Testing
- [x] Simple agent answer: `message_end.usage.total_tokens` matches
provider usage.
- [x] Agent with cross-language retrieval: aggregate equals the sum of
both provider calls.
- [x] Tool-calling agent (multi-round): total accumulates across rounds.
- [x] Nested agent (agent-as-tool): sub-agent tokens included in the
parent run total.
- [x] Langfuse: agent generations show input/output split and are
grouped by session/user.
---------
Co-authored-by: yzc <yuzhichang@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
## Problem
When building or updating a knowledge graph with a large number of
entities and edges, `set_graph()` in `rag/graphrag/utils.py` creates one
`asyncio` task per entity and one per edge, each calling
`embd_mdl.encode([single_name])` — a single-item HTTP request to the
embedding server.
For a graph with 17,000+ nodes and edges (real case reported in #16205),
this generates **34,000+ individual embedding API round-trips** instead
of ~266 batched calls at the default `_INSERT_BULK_SIZE=128`. The
asyncio gather over thousands of tasks makes the embedding server the
bottleneck; under load, a single slow/failed call aborts all remaining
tasks, causing the pipeline to stall and never complete.
Closes#16205. Related: #15921.
## Root Cause
```python
# Before (in set_graph, node loop):
tasks = [asyncio.create_task(graph_node_to_chunk(n, ...)) for n in nodes]
# Each task calls embd_mdl.encode([single_name]) — 1 HTTP call per node
```
`graph_node_to_chunk` checks the embed cache first, but the cache is
cold on first build, so every task makes a live API call.
## Fix
Pre-warm the embedding cache with batched calls before spawning tasks.
Each batch pre-warm calls `embd_mdl.encode(batch_of_128)` once,
populating the cache. Then every individual task hits the cache and
makes zero embedding API calls.
- Only encodes names not already in cache (no-op on warm cache / small
incremental updates)
- Uses existing project idioms: `thread_pool_exec`, `chat_limiter`,
`_INSERT_BULK_SIZE`, `get_embed_cache`, `set_embed_cache`
- Mirrors the `ENABLE_TIMEOUT_ASSERTION` timeout pattern from
`graph_node_to_chunk`
- Zero behavior change: per-task encode logic remains as a correct
fallback
## Result
| Graph size | Before | After |
|---|---|---|
| 17,576 edges | ~17,576 embedding calls → stall | ~138 batched calls |
| 17,509 nodes | ~17,509 embedding calls → stall | ~137 batched calls |
| **Total** | **~35,000 calls** | **~275 calls** |
---------
Co-authored-by: Oti_B <oti@mac.speedport.ip>