### What problem does this PR solve?
Closes#12962
MCPToolCallSessions created during agent execution (in `Agent.__init__`)
are never explicitly closed. Each session starts its own event loop
thread and opens an SSE/HTTP connection to the MCP server. When the
canvas goes out of scope, these threads and connections remain alive
indefinitely, accumulating over time and causing resource exhaustion
after prolonged use.
### Solution
1. Add a `Graph.close()` method that iterates all components, finds
MCPToolCallSessions held by Agent tools, and calls `close_sync()` on
each to properly shut down the event loop, thread, and connection.
2. Call `canvas.close()` in `finally` blocks after `canvas.run()`
completes in `canvas_service.py` and `canvas_app.py`.
3. Move MCP session cleanup to `finally` blocks in `test_tool` endpoint
(`mcp_server_app.py`) and `get_mcp_tools` (`api_utils.py`) to ensure
sessions are closed even on exceptions.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: conflict-resolver <conflict-resolver@local>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe):
## Summary
Agent (Canvas) runs previously did not surface token usage in the SSE
stream, and RAGFlow's own Langfuse generations for agent runs were
missing the prompt/completion split and the session/user correlation.
This made it impossible for an external caller (or Langfuse) to
reconcile an agent turn's cost with the upstream provider (e.g.
OpenRouter), because a single turn can issue several distinct LLM calls
(query rewriting / cross-language translation, multi-round tool
reasoning, nested sub-agents, and the final answer).
This PR introduces a per-run token usage sink so that **every** LLM call
in a run is aggregated and reported once, and enriches Langfuse
generations with the prompt/completion split plus session/user
attributes.
## What changes
### 1. Per-run token usage sink (`common/token_utils.py`)
- Adds two `contextvars`: `token_usage_sink` (a mutable per-run
accumulator) and `langfuse_run_attrs` (session_id/user_id for the run).
- Adds `record_run_token_usage(...)` (thread-safe via a lock, because
`thread_pool_exec` copies the context into worker threads that share the
sink dict) and `usage_from_response(...)` which extracts a
`{prompt_tokens, completion_tokens, total_tokens}` split from
OpenAI/OpenRouter-style responses.
### 2. Provider layer captures the prompt/completion split
(`rag/llm/chat_model.py`)
- `LiteLLMBase` and `Base` now store `self.last_usage`
(prompt/completion/total) for the most recent chat call, in both the
plain and tool-calling paths.
- Streaming requests set `stream_options.include_usage = True` (LiteLLM
path) so the authoritative usage arrives on the final chunk; this is
read even on the usage-only chunk that carries no `choices`.
- Fixes a multi-round accounting bug in `*_with_tools`: token totals
were **overwritten** by each round (`total_tokens = tol`) instead of
accumulated, undercounting multi-round tool conversations. Each round is
now committed to a running aggregate.
### 3. LLMBundle reports usage once, per call
(`api/db/services/llm_service.py`)
- New `_report_usage(total_tokens)` records the call's usage into the
active run sink and returns the prompt/completion/total split for
Langfuse. The split is only used when it is consistent with the
authoritative total; otherwise only the total is reported.
- All three chat entry points (`async_chat`, `async_chat_streamly`,
`async_chat_streamly_delta`) now emit `usage_details` with
`input`/`output`/`total` instead of total-only.
- `_start_langfuse_observation` now applies `session_id`/`user_id` from
the per-run context (`langfuse_run_attrs`) so agent-run generations are
correctly grouped, even though agent LLMBundles are constructed without
those attributes.
### 4. Canvas installs the sink and emits the aggregate
(`agent/canvas.py`)
- `Canvas.run()` installs a fresh `token_usage_sink` and
`langfuse_run_attrs` (from `user_id`/`session_id`) at the start of every
turn.
- `message_end` now includes an aggregated `usage` object:
`{prompt_tokens, completion_tokens, total_tokens, calls}` covering all
LLM calls in the run.
### 5. Pass session id into the run
(`api/db/services/canvas_service.py`)
- `completion()` forwards `session_id` to `Canvas.run()` for Langfuse
session correlation.
## Why a context variable
LLM calls in an agent run originate from many places that each build
their own `LLMBundle` (e.g. `cross_languages`/`keyword_extraction`
helpers, the Agent component, and nested sub-agents invoked as tools). A
run-scoped context variable is the only non-invasive chokepoint that
captures all of them exactly once, including nested agents (which run in
the same async context) and thread-pool tools (the executor copies the
context).
## Behavior / compatibility
- No public API or wire-format removal: `message_end` gains an
additional optional `usage` field; existing consumers are unaffected.
- When a provider does not return authoritative usage, behavior falls
back to the previous token estimate (total only, no split).
- Non-agent flows (Dataflow `Pipeline`, sync `Graph.run`) are untouched.
## Testing
- [x] Simple agent answer: `message_end.usage.total_tokens` matches
provider usage.
- [x] Agent with cross-language retrieval: aggregate equals the sum of
both provider calls.
- [x] Tool-calling agent (multi-round): total accumulates across rounds.
- [x] Nested agent (agent-as-tool): sub-agent tokens included in the
parent run total.
- [x] Langfuse: agent generations show input/output split and are
grouped by session/user.
---------
Co-authored-by: yzc <yuzhichang@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary
- Treat `max_tokens=0` as unset (`or 8192`) when building model context
budgets, fixing agents that silently zeroed prompts when a vLLM model
had `max_tokens: 0` in tenant config
- Replace trailing same-role canvas history in `LLM._sys_prompt_and_msg`
instead of skipping the current user prompt
- Add `LLM.fit_messages()` validation after `message_fit_in` on agent
paths so empty user content fails fast with a clear error instead of
reaching vLLM
Fixes#16411
## Root cause
Agent canvas workflow called `message_fit_in` with `int(max_length *
0.97)`. When `max_length` was `0`, both system and user content were
trimmed to empty strings. The `[HISTORY STREAMLY]` log showing only
`{"role":"user","content":""}` matches this. A secondary bug skipped
appending the formatted user prompt when history ended with a `user`
role message.
## Test plan
- [x] Added `test/unit_test/agent/component/test_llm_prompt.py` for
role-replace, validation, and zero-budget fitting
- [x] Added
`test_message_fit_in_zero_budget_preserves_non_empty_messages` in
`test_generator_message_fit_in.py`
- [ ] CI unit tests
- [ ] Manual: agent canvas `begin → Retrieval → Agent → Message` with
vLLM Qwen3; confirm user message reaches LLM
Made with [Cursor](https://cursor.com)
---------
Co-authored-by: Taranum Wasu <taranumwasu@Taranums-MacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
### What problem does this PR solve?
GET /api/v1/agents (list_agents) already supports filtering by
canvas_category, keywords, tags, and owner_ids, but it does not support
canvas_type — even though canvas_type is a persisted field on UserCanvas
and is already accepted on agent create/update APIs.
This gap causes two issues:
Filtering — clients cannot list agents by business category (e.g.
Marketing, Agent, Ingestion Pipeline) without fetching all agents and
filtering client-side.
Response payload — list_agents did not return canvas_type in each canvas
item, so consumers had to call GET /api/v1/agents/{id} per agent to read
it.
This PR adds optional canvas_type query parameter support and includes
canvas_type in the list response.
### Type of change
- [√] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
## Summary
After #16407 merged, 44 of the original 93 CodeQL alerts were still open
on the default branch. This PR closes the remaining ones by:
1. **Moving 32 existing `// codeql[...]` directives** so they sit on the
line **immediately before** the suppressed statement. The original
multi-line suppression blocks had the directive as the first line, with
the rationale on subsequent lines. After line shifts (refactors, linter
reformat), the directive ended up several lines above the alert location
— CodeQL only recognizes the suppression when it appears on the line
directly above. (32 alerts across 27 files.)
2. **Adding 9 new `// codeql[...]` suppressions** for alerts that had no
suppression in the preceding lines at all — mostly real-fixes that
CodeQL conservatively still flags (filepath.Base, bounded slice sizes,
model-identifier strings, the MD5-legacy-migration lookup in
`conversation_service.py`).
## Files changed
- `api/db/services/conversation_service.py` — add
`py/weak-sensitive-data-hashing` suppression (MD5 for backward-compat
legacy row lookup; not used for auth)
- `api/db/services/llm_service.py` — 3×
`py/clear-text-logging-sensitive-data` suppressions on the lines that
log `llm_name` in warnings/info
- `common/misc_utils.py` — 2× `py/clear-text-logging-sensitive-data`
suppressions on the redacted `current_url` log sites
- `internal/agent/component/invoke.go` — moved existing
`go/request-forgery` directive
- `internal/agent/sandbox/ssh.go` — moved existing
`go/command-injection` directive
- `internal/agent/tool/retrieval_service.go` — added
`go/uncontrolled-allocation-size` suppression (`topN` is bounded to 1024
above)
- `internal/cli/common_command.go` — moved 2×
`go/disabled-certificate-check` directives
- `internal/cli/user_command.go` — added `go/clear-text-logging`
suppression (filepath.Base already strips user-identifying path)
- `internal/dao/pipeline_operation_log.go` — moved 2× `go/sql-injection`
directives
- `internal/dao/user_canvas.go` — added `go/sql-injection` suppression
in `GetList` (the new `userCanvasOrderClause` call path)
- `internal/engine/infinity/chunk.go` — moved existing
`go/unsafe-quoting` directive
- `internal/entity/models/*` — moved `go/path-injection` directives (15
files)
- `internal/handler/oauth_login.go` — moved existing
`go/cookie-httponly-not-set` directive
- `internal/handler/tenant.go` — moved existing `go/path-injection`
directive
- `internal/service/deep_researcher.go` — moved existing
`go/unsafe-quoting` directive
- `internal/service/dataset.go` — added
`go/uncontrolled-allocation-size` suppression (`n` bounded to 1024
above)
- `internal/service/file.go` — moved existing `go/request-forgery`
directive
- `internal/service/langfuse.go` — moved 2× `go/request-forgery`
directives
- `internal/utility/mcp_client.go` — moved 3× `go/request-forgery`
directives
- `internal/utility/smtp.go` — moved existing `go/email-injection`
directive
- `rag/prompts/generator.py` — added
`py/clear-text-logging-sensitive-data` suppression
- `web/.../use-provider-fields.tsx` — added
`js/prototype-pollution-utility` suppression (FORBIDDEN_KEYS guard is on
the line above)
## Why the previous PR left alerts open
`// codeql[query-id] explanation` must be on the line **immediately
before** the suppressed statement per the [GitHub CodeQL suppression
spec](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning-with-codeql/suppressing-code-scanning-alerts).
The original suppression blocks were 4-5 lines, with the directive as
the **first** line. After linter reformat / line shifts, the directive
ended up too far above the actual alert line to be recognized. The fix
is to put the directive on the line directly above the suppressed
statement, with the rationale above it.
## Test plan
- All 9 modified Python files `ast.parse` clean
- All 4 modified Go files `gofmt` clean
- 36/44 expected alert suppressions in place
- 8 remaining CodeQL alerts are the originals (#3485851828, #3485851831,
#3485869759, #3485869766, #3485869768, #3485869771, #3485885962,
#3485895527) which were resolved by the corresponding commit comments;
these should close on the next scan when the suppression comments match
the alert lines.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
### What problem does this PR solve?
When I run RAGFlow_server.py:
```
2026-06-24 10:27:01,938 ERROR 3413485 fetch task exception
Traceback (most recent call last):
File "/home/infiniflow/Documents/development/ragflow/api/db/services/document_service.py", line 948, in _sync_progress
if t.progress_msg.strip():
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'
```
fixed:
```python
if t.progress_msg.strip():
# fix:
if (t.progress_msg or "").strip():
```
Fix crash in `_sync_progress` when `progress_msg` is `None`.
#### Root Cause
`progress_msg` from task records can be `None`, causing:
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Implement OpenAI chat completions in GO
POST /api/v1/openai/<chat_id>/chat/completions
OpenAI chat cli: internal/development.md
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Use a 95% max_length threshold before truncating embedding inputs, which
reduces the chance of provider-side invalid-parameter errors on
near-limit chunks.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
The parser pods suffer from OOM kills when processing large PDF
documents. The root cause is in api/db/services/task_service.py: when
layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to
MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be
processed as a single task with all pages loaded into memory
simultaneously.
This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the
same way DeepDOC already does.
### What problem does this PR solve?
The parser pods suffer from OOM kills when processing large PDF
documents. The root cause is in api/db/services/task_service.py: when
layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to
MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be
processed as a single task with all pages loaded into memory
simultaneously.
This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the
same way DeepDOC already does.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
## Summary
- The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id`
via a migration (added in a prior commit).
- Aligns the REST API layer (`chat_channel_api.py`,
`chat_channel_service.py`) to use `chat_id` consistently.
- Updates the frontend (`interface.ts`, `hooks.ts`,
`connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write
`chat_id` instead of `dialog_id`.
- The joined `dialog_name` alias in the list query is unchanged (backend
still returns it under that name).
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
Fix:
- Pass session_id to langfuse.
- Get correct status for add model_type.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Adds a legacy mode for /chat/completions that restores v0.23.0-style
output by converting start_to_think/end_to_think back into raw
<think></think> markers and streaming cumulative answer text.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat:
- Allow upsert model_type for instance model
Fix:
- Allow create instance with duplicate api_key
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
| # | Method | Endpoint | Description | Git Equivalent |
|---|--------|----------|-------------|----------------|
| 1 | `POST` | `/api/v1/{prefix}/{folder_id}/commits` | Create a
snapshot commit with file changes (add/modify/delete/rename) | `git add`
+ `git commit` |
| 2 | `GET` | `/api/v1/{prefix}/{folder_id}/commits` | List commit
history (paginated) | `git log` |
| 3 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` | Get
commit detail with file changes | `git show` |
| 4 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` |
List file changes in a commit | `git show --name-status` |
| 5 | `GET` |
`/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` | Compare
two commits and return differences | `git diff` |
| 6 | `GET` | `/api/v1/{prefix}/{folder_id}/changes` | Get uncommitted
changes (add/modify/delete) | `git status` |
| 7 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` |
Get the folder tree snapshot at commit time | `git ls-tree` |
| 8 | `GET` |
`/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content`
| Get a file's content as it existed in a specific commit | `git show
HEAD:file` |
| 9 | `GET` | `/api/v1/{prefix}/{file_id}/versions` | Get version
history for a specific file across all commits | `git log -- file` |
Where `{prefix}/{id}` can be:
- `folders/{folder_id}` — direct folder access
- `workspaces/{workspace_id}` — alias of `folders/{folder_id}`
- `datasets/{dataset_id}` — resolves to the dataset's folder
- `memories/{memory_id}` — resolves to the memory's folder
- `skills/{skill_id}` — resolves to the skill's folder
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
#15844
Adds a **Chat channels** capability so a RAGFlow assistant (Dialog) can
be exposed as a bot on external messaging platforms (Feishu/Lark,
Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot
in the UI, connects it to an assistant, and inbound messages are
answered from that assistant's knowledge base — replies are delivered
back on the channel.
**Feishu/Lark is implemented and tested end-to-end.** Discord, Telegram,
LINE, and WeCom are scaffolded against the same interface; the remaining
listed channels are tracked as follow-ups.
### Design
**Backend**
- New `chat_channel` table (`tenant_id`, `name`, `channel`, `config`
JSON holding `{credential: {...}}`, `dialog_id`, `status`) +
`ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`.
- Channel framework under `api/channels/`: a `core` registry +
per-channel packages that self-register a builder and implement a common
`Channel` interface (`start`/`stop`/`send` + inbound normalization) over
`IncomingMessage`/`OutgoingMessage`.
- Embedded **reconcile loop** in `ragflow_server`
(`api/channels/bootstrap.py`): loads enabled bots, and
starts/stops/restarts them as rows change (no server restart needed).
Inbound messages run the connected dialog via the non-streaming
completion path, keeping per-end-user conversation history.
- Missing optional channel SDKs degrade gracefully (channel skipped with
a warning; others unaffected). Channel-level errors are logged, not
crashed.
- Feishu's WebSocket client runs in a dedicated thread with its own
event loop to avoid cross-loop/contextvars conflicts with the channel
runtime.
**Frontend**
- **Settings → Chat channels** panel: available-channels grid +
configured-bots list with add/edit/delete and a **Connect assistant**
popup that binds a bot to a dialog.
- Brand icons via simple-icons / reused shared data-source assets, with
colored fallbacks for brands not available.
- Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix
so the settings page no longer highlights the Chat tab.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Notes
- DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id`
is also covered by a `migrate_db` `alter_db_add_column` for existing
installs.
- Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`,
`line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies.
- Screenshots / per-channel credential docs to follow.
<img width="1338" height="1290" alt="Image"
src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704"
/>
<img width="1344" height="738" alt="Image"
src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617"
/>
<img width="672" height="887" alt="Image"
src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84"
/>
---------
## Summary
This PR passes `session_id` into Langfuse trace observations so
multi-turn chat messages can be grouped under the same session in
Langfuse.
Changes include:
- Propagate `session_id` from chat/session APIs into
`dialog_service.async_chat`.
- Pass `session_id` into Langfuse `start_observation(...)`.
- Share Langfuse `trace_context` with chat, embedding, rerank, and TTS
model bundles where applicable.
- Add unit coverage to verify Langfuse observations receive
`session_id`.
- Update affected test stubs for the new optional Langfuse context
arguments.
## Related Issue
Closes: #15636
## Change Type
- [x] Feature
- [x] Bug fix
- [x] Test
- [ ] Refactor
- [ ] Documentation
- [ ] Breaking change
## Real Behavior Proof
Before this change:
- Langfuse observations were created without `session_id`.
- Multi-turn chat traces could not be grouped by session in Langfuse.
After this change:
- Chat/session flows pass `session_id` into `async_chat`.
- Langfuse observations include `session_id`.
- Related model bundles receive shared trace context and session
metadata.
Validation result:
```bash
uv run python -m py_compile \
api/db/services/tenant_llm_service.py \
api/db/services/llm_service.py \
api/db/services/dialog_service.py \
api/db/services/conversation_service.py \
api/apps/restful_apis/chat_api.py \
test/unit_test/api/db/services/test_dialog_service_final_answer.py \
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py
```
Passed.
```bash
uv run pytest \
test/unit_test/api/db/services/test_dialog_service_final_answer.py \
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py -q
```
Result:
```text
11 passed in 16.89s
```
```bash
git diff --check
```
Passed.
## Checklist
- [x] Analyzed the issue requirement.
- [x] Checked existing Langfuse trace integration.
- [x] Implemented only the requested session grouping behavior.
- [x] Added/updated unit tests.
- [x] Ran focused tests successfully.
- [x] Ran Python compile validation.
- [x] Ran whitespace diff validation.
## Summary
`get_model_config_from_provider_instance()` was not including
`max_tokens` in its returned dict, causing all downstream consumers
(dialog truncation, message fitting, knowledge base trimming, embedding,
graphrag, RAPTOR) to fall back to the hardcoded default of **8192
tokens** regardless of the actual model context window size (e.g.,
GPT-4o 128K, Claude 200K).
Closes#15944
## Root Cause
The function builds `model_config` with only: `llm_factory`, `api_key`,
`llm_name`, `api_base`, `model_type`, `is_tools`. `max_tokens` is never
included.
Yet the data exists in four independent sources:
1. `TenantModel.extra` JSON field — written by
`provider_api_service.py:659`
2. `conf/llm_factories.json` — every model entry has `max_tokens`
3. `rag/llm/model_meta.py` — 9 provider classes fetch real context
windows from APIs
4. `TenantLLM.max_tokens` database column
None of them are read by this function.
## Fix
Two lines added, one per return path:
- **Path B** (model_obj exists → provider-instance model): reads
`max_tokens` from `model_obj.extra` JSON
- **Path C** (fallback → factory config): reads `max_tokens` from
`llm_info` (sourced from `llm_factories.json`)
Both fall back to 8192 when the value is absent, preserving backward
compatibility.
## Impact
This single 5-line change fixes the context window budget for all **78+
call sites** across **20 files** that construct `LLMBundle` or read
`max_tokens` from the config dict, including:
| Consumer | File | Effect |
|---|---|---|
| Dialog chat truncation | `dialog_service.py:562` |
`message_fit_in(msg, max_tokens * 0.95)` now uses real context window |
| Knowledge base trimming | `dialog_service.py:752` |
`kb_prompt(kbinfos, max_tokens)` now fits more retrieved content |
| Agent message fitting | `agent/component/llm.py:322` | Agent prompts
no longer truncated at 7946 tokens |
| Embedding truncation | `task_executor.py:704` | Embedding input uses
actual model limit |
| GraphRAG extraction | `graphrag/*/extractor.py` | Entity extraction
gets full context budget |
| LLM4Tenant.max_length | `tenant_llm_service.py:513` | Chat model
wrapper exposes real context window |
Fixes#15529 .
### Problem
`async_ask()` accessed `kbs[0]` without verifying that
`KnowledgebaseService.get_by_ids()` returned any knowledge bases. Empty
or stale `kb_ids` raised `IndexError`, which surfaced as HTTP 500 on
search/bot SSE endpoints.
### Fix
- Add an early guard when `kbs` is empty, yielding a final SSE error
event (consistent with `gen_mindmap()` in the same module).
- Add regression tests for empty `kb_ids` and deleted/invalid KB IDs.
### Test plan
- [ ] `pytest
test/unit_test/api/db/services/test_dialog_service_final_answer.py -k
"async_ask_empty or async_ask_stale"`
- [ ] Manual: `POST /api/v1/searchbots/ask` with invalid `kb_ids`
returns SSE error, not HTTP 500
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
## Summary
Fixes#15699.
User upgrades to v0.25.6 against an existing MySQL database, tries to
add an Ollama provider instance, and gets:
```
MySQL IntegrityError: Duplicate entry 'dbaafbfe608a11f1a5516d6066988224'
for key 'tenant_model_instance.tenantmodelinstance_api_key_provider_id'
```
The route at
[api/apps/restful_apis/provider_api.py:354](api/apps/restful_apis/provider_api.py#L354)
catches it and returns `get_error_data_result(message="Internal server
error")` — which by RAGFlow's convention is HTTP 200 with an error
`code` on the body — hence the reporter's "200 status code but the
database errored" complaint.
### Root cause
The provider-instance refactor in [PR
#15460](https://github.com/infiniflow/ragflow/pull/15460) dropped the
unique-compound-index tuple from `TenantModelInstance`:
```python
# Removed in #15460
class Meta:
db_table = "tenant_model_instance"
indexes = (
(("api_key", "provider_id"), True), # unique
)
```
and added a one-shot drop in `migrate_db()` for existing databases. But
the drop targets the wrong index name:
```python
# Before this PR — wrong name
for table_name, index_name in [
("tenant_model_instance", "idx_api_key_provider_id"), # ← doesn't exist
("tenant_model", "idx_provider_model_instance"),
]:
```
Peewee's auto-derived index name is `<lowercase
classname>_<col1>_<col2>` →
**`tenantmodelinstance_api_key_provider_id`**, which matches the user's
error verbatim. The drop raises `OperationalError: 1091 (HY000): Can't
DROP …`, the surrounding `except` clause at
[db_models.py:1736](api/db/db_models.py#L1736) swallows it as
expected-on-fresh-installs, and the legacy unique index lives on
indefinitely.
### Why Ollama hits it specifically
Ollama doesn't require an API key. The form posts `api_key: ""`. The
app-layer dedupe at
[provider_api_service.py:288-292](api/apps/services/provider_api_service.py#L288-L292):
```python
api_key_str = ""
if api_key: # ← skipped for ""
...
same_key_instance = TenantModelInstanceService.get_by_provider_id_and_api_key(...)
if same_key_instance:
return False, f"Already exist instance: ... with api_key {api_key}"
```
falls through for empty keys. Control reaches
`TenantModelInstanceService.create_instance(..., api_key="")` which
inserts a row whose `(api_key, provider_id) = ("", <provider_uuid>)`
collides with any prior Ollama row that already shipped that same pair →
the still-present unique index throws.
(`dbaafbfe608a11f1a5516d6066988224` in the user's error is the
duplicated `provider_id` UUID, paired with the empty `api_key`.)
### Fix
Add the Peewee auto-name alongside the existing `idx_*` entry so the
migration finally drops the obsolete index on next restart:
```python
legacy_indexes = [
("tenant_model_instance", "idx_api_key_provider_id"),
("tenant_model_instance", "tenantmodelinstance_api_key_provider_id"), # ← added
("tenant_model", "idx_provider_model_instance"),
]
```
The surrounding `try/except (OperationalError, ProgrammingError)`
matches `1091` / `can't DROP` / `does not exist` and treats them as
success, so every state is idempotent (see Test plan).
### Idempotency matrix
| Database state | First entry (`idx_api_key_provider_id`) | New entry
(`tenantmodelinstance_api_key_provider_id`) |
| --- | --- | --- |
| Fresh install (≥ #15460) — neither index exists | `1091` → swallowed |
`1091` → swallowed |
| Upgraded from before dc4b82523 (the user's case) — auto-name present |
`1091` → swallowed | **drops the index** |
| Upgraded after a manual rename to `idx_*` | drops the index | `1091` →
swallowed |
| Re-run of `migrate_db()` after either of the above | `1091` →
swallowed | `1091` → swallowed |
No rollback hazard: nothing depends on this unique constraint anymore
(`create_instance` dedupes by `instance_name` via `duplicate_name`, see
[tenant_model_instance_service.py:27](api/db/services/tenant_model_instance_service.py#L27)).
### What this PR does NOT change
- **`provider_api_service.create_provider_instance`** — its `if
api_key:` gate is correct *for the post-migration world*: multiple
Ollama instances with empty keys under one provider are legitimate, so
we shouldn't tighten the app-layer check.
- **`TenantModelInstance` Peewee model** — the `indexes` tuple was
already removed in #15460. New databases never get the constraint in the
first place.
- **The `except → get_error_data_result` → HTTP 200 pattern at
`provider_api.py:354`** — that's a project-wide convention; changing one
route to HTTP 500 would be inconsistent and out of scope.
## Test plan
- [ ] **Reproducer (pre-fix):** on a database originally created before
#15460, configure an Ollama provider with an empty `api_key`, then try
to create a *second* instance under the same provider — confirm the
`Duplicate entry … 'tenantmodelinstance_api_key_provider_id'` error in
the server log.
- [ ] **Verify the index is present pre-restart:** `SHOW INDEX FROM
tenant_model_instance WHERE Key_name =
'tenantmodelinstance_api_key_provider_id';` — non-empty result.
- [ ] **Restart with the fix applied:** server starts cleanly,
`migrate_db()` runs, no `Failed to drop index` in critical logs.
- [ ] **Verify the index is gone post-restart:** same `SHOW INDEX` query
— empty result.
- [ ] **Re-run the reproducer:** two Ollama instances under the same
provider, both `api_key=""`, both succeed.
- [ ] **Restart a second time** — no new errors; the matching `1091`
swallow keeps the migration idempotent.
- [ ] **Fresh install smoke test:** drop the DB volume, start clean — no
`1091` noise (the new index never existed), no functional regression.
## Files changed
- [api/db/db_models.py](api/db/db_models.py) — extend the legacy-index
drop list with `tenantmodelinstance_api_key_provider_id`; refactor the
inline list to a named `legacy_indexes` local with a comment pointing at
#15460 and #15699.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
Support factory models with multiple model types, so visual chat models
can be exposed as both image2text and chat while preserving the database
model-type-per-record design.
This also updates the SILICONFLOW model list and adds a helper script to
refresh SiliconFlow models from the provider API.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Propagate `tenant_id` from memory task messages into task collection so
refactored task execution can build a valid context.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
`SearchService.get_detail` crashed with `AttributeError` (HTTP 500) when
no matching row existed, because it called `.first().to_dict()` before
the `if not search` guard — making that guard dead code.
## Root cause
`.first()` returns `None` when the query matches nothing (deleted search
app, or joined `User` not `VALID`). `None.to_dict()` raises before the
guard runs.
## Fix
```diff
.first()
- .to_dict()
)
if not search:
return {}
- return search
+ return search.to_dict()
```
Guard the `None` first, then serialize — restoring the intended `{}`
"not found" return that every caller (`search_api`, `bot_api`,
`chat_api`, `dataset_api_service`) already handles.
## Files changed
- `api/db/services/search_service.py`
## Verification
- `ruff check` — clean
- Logic: `.first()` → `None` now hits `return {}` instead of
`None.to_dict()`. Local full pytest not run (heavy RAG deps); CI
validates.
## Note
Implemented with LLM assistance (model: claude-opus-4-8).
Closes#15621
Co-authored-by: dearsishs <MCarter112116@outlook.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
### What problem does this PR solve?
Refine the stream parsing for `<think>` / `</think>` so MiniMax and
DeepSeek-style chunking both flush in the right order without mixing
think and answer buffers.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)