ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-04 01:29:35 +08:00

Author	SHA1	Message	Date
Wang Qi	ff685d3131	Delete duplicate route (#14883 ) ### What problem does this PR solve? The delete /graph is duplicated of `/datasets/<dataset_id>/<index_type>`, delete it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 15:57:44 +08:00
Idriss Sbaaoui	09e1fd290a	Chore: migrate tests to restful api (#14871 ) ### What problem does this PR solve? add new testing suite for the new restful api endpoints meant to replace http and web api tests ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-05-13 15:07:23 +08:00
tmimmanuel	d63d3bb7d2	Go: implement provider: Novita.ai (#14850 ) ### What problem does this PR solve? Add a Go driver for Novita.ai (https://novita.ai), one of the unchecked providers on the umbrella tracking issue #14736. Novita exposes an OpenAI-compatible REST API at `https://api.novita.ai/v3/openai` and proxies a large catalog of third-party models (DeepSeek, Llama, Qwen3, Kimi, Gemma, Mistral, MiniMax, GLM, etc.) behind a single OpenAI-shaped surface — 102 models live at the time of writing. Until this PR, a tenant who configured `novita` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. ### What this PR includes - New `internal/entity/models/novita.go` with a `NovitaModel` implementing the `ModelDriver` interface (~520 lines). - New `conf/models/novita.json` with 7 representative chat models (DeepSeek-V4, Llama-3.3-70B, Qwen3-30B/235B reasoning, Kimi-K2, Gemma-3-27B, Mistral-Nemo). - `factory.go`: route `"novita"` to `NewNovitaModel`. - `internal/entity/models/novita_test.go`: 23 unit tests. ### Notable design point: `<think>...</think>` reasoning extraction Novita-routed reasoning models like `qwen3-` and `deepseek-r1-` embed their chain-of-thought inline inside content as `<think>...</think>` tags, rather than in a separate `reasoning_content` field. Verified live by probing `api.novita.ai`: ``` content head 200: <think> Okay, let's see. I need to find 15% of 80. Hmm, percentages can sometimes be tricky, but I think content tail 100: h, that works. Alternatively, 0.15 × 80. If I move the decimal two places to the left for </think> ``` Without handling, a tenant picking qwen3 via Novita would see raw `<think>` tags in their UI answer — different from every other reasoning provider in the Go layer. The driver detects those tags and routes the inner text to `ChatResponse.ReasonContent` (non-stream) or the sender's second arg (stream), keeping the visible answer clean of tag clutter: - `splitNovitaThink` — scans a complete content string. Used by the non-streaming path. Handles multiple `<think>` blocks, unclosed tags (the model got cut off mid-reasoning), pure-text content with no tags. - `novitaThinkExtractor` — stateful streaming version. Buffers trailing bytes that might be the start of a tag (e.g. `<thi` held back when the next chunk completes `nk>`), then emits segments in routing order so callers can pipe them to a UI. Tested with byte-level chunk boundaries and tag-spanning scenarios. ### Method coverage \| Method \| Behavior \| \|---\|---\| \| `ChatWithMessages` \| `POST /v3/openai/chat/completions`, `<think>` extraction on response \| \| `ChatStreamlyWithSender` \| SSE stream, stateful `<think>` extraction across deltas \| \| `ListModels` / `CheckConnection` \| `GET /v3/openai/models` (102 live) \| \| `Embed` / `Rerank` / `Balance` / `TranscribeAudio` / `AudioSpeech` / `OCRFile` \| `"no such method"` — Novita's OpenAI-compatible surface does not expose any \| No interface change. No new dependencies. ### How was this tested? 23 unit tests in `internal/entity/models/novita_test.go` — all pass: ``` $ go test -vet=off -run "TestNovita\|TestSplitNovita" -count=1 ./internal/entity/models/... ok ragflow/internal/entity/models 0.020s ``` Coverage: - `splitNovitaThink` (5 cases: pure text, single block, leading text, multiple blocks, unclosed tag) - `novitaThinkExtractor` (6 cases: single-chunk, opening tag span, closing tag span, byte-level chunking, no tags, lone `<` not as tag start) - `ChatWithMessages`: pure text, with `<think>` tags, missing API key, empty messages, HTTP error - `ChatStreamlyWithSender`: tag-stripping with spanning deltas, pure content, sender-required, stream-true-required - `ListModels` / `CheckConnection` (happy paths) - All sentinel methods `go build ./internal/entity/models/...` exits 0 on go 1.25. Live integration test against `api.novita.ai/v3/openai`: ``` === RUN TestNovitaLiveSmoke [OK] Name() = "novita" [OK] CheckConnection [OK] ListModels: 102 models (showing first 6) [deepseek/deepseek-v4-pro deepseek/deepseek-v4-flash deepseek/deepseek-v3.2 xiaomimimo/mimo-v2.5-pro moonshotai/kimi-k2.6 zai-org/glm-5.1] [OK] Chat (llama-3.3) answer="ok" reason="" [OK] Chat (qwen3) answer len=0 head="" ReasonContent len=1657 head="Okay, so I need to figure out what 15% of 80 is. Hmm, percentages can sometimes trip me up, but let ..." [OK] Stream content: 0 chunks, 0 chars; reasoning: 600 chunks, 1667 chars [OK] Embed/Rerank/Balance/TranscribeAudio/AudioSpeech/OCRFile all return "novita, no such method" NOVITA LIVE SMOKE PASSED --- PASS: TestNovitaLiveSmoke (26.18s) ``` What the live run proves on the wire: - Auth (`Bearer <key>`) accepted by `api.novita.ai`. - `/v3/openai/models` parser handles the real 102-model response. - Non-stream chat against `meta-llama/llama-3.3-70b-instruct`: clean string answer, empty ReasonContent (non-reasoning model, pure-text path). - Non-stream chat against `qwen/qwen3-30b-a3b-fp8`: 1657-char reasoning extracted from `<think>...</think>` and routed to `ChatResponse.ReasonContent`. Visible answer is 0 chars in this run because qwen3 spent its 600-token budget entirely on reasoning before reaching the answer phase — that's the model's behavior, not a driver bug. The important thing: no `<think>` tags leaked into the visible Answer field. - Streaming against qwen3: 600 reasoning chunks (1667 chars) emitted via the sender's 2nd arg across SSE deltas; no `<think>` tag fragments leaked into the content channel despite tag boundaries crossing chunk boundaries on the wire. - All 6 sentinel methods return the documented `"no such method"` strings. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Tracking: #14736	2026-05-13 14:10:50 +08:00
Jackie	71d327b11c	Fix: The text field resizing function in the knowledge block creation… (#14212 ) … modal - Add vertical resizing functionality for the text field ### What problem does this PR solve? _Fix the issue where the text content of the knowledge base editing parsing block is too long to scroll._ <img width="701" height="775" alt="image" src="https://github.com/user-attachments/assets/b258422e-fbc1-466d-abab-062e642c21d5" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: chenyun <chenyun@chenyundemacbook-pro.local>	2026-05-13 13:57:05 +08:00
Wang Qi	45d676bc05	Fix delete graphrag not take effect in UI (#14879 ) ### What problem does this PR solve? Fix delete graphrag not take effect in UI ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 13:49:16 +08:00
Joseff	733d75d6a7	Fix(Go): make Baidu Encode fail loudly on malformed responses (#14721 ) ### What problem does this PR solve? The Baidu (Qianfan) `Encode` method silently swallowed malformed responses. If a `data[]` item from the API was missing a field (`index`, `embedding`, or unexpected shape), the loop did `continue` instead of returning an error, leaving `nil` entries in the result slice. Callers got back partial results with no indication anything went wrong, which then crashes downstream consumers when they try to use a `nil` vector. Concrete gaps fixed: - No count-mismatch check between `data` length and input texts (only checked for empty) - No duplicate-index detection (a duplicate would silently overwrite) - No missing-index final scan - No empty-embedding rejection - No per-call context timeout - `EmbeddingConfig.Dimension` (added in #14735) was not propagated This PR replaces `map[string]interface{}` parsing with a typed `baiduEmbeddingResponse` struct, applies the standard four-layer validation (count → out-of-range → duplicate → empty → final missing-index scan), adds `context.WithTimeout(nonStreamCallTimeout)`, and forwards `embeddingConfig.Dimension` as the `dimensions` parameter (Baidu Qianfan v2 uses an OpenAI-compatible API). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 12:54:00 +08:00
shawnxiao105-afk	8b6dd6a5c2	fix: guard whitespace-only chunks before embedding (#13938 ) ## Problem When parsing DOCX files with many tables, DeepDOC generates chunks containing only empty HTML table tags, such as: ```html <table><tr><td></td></tr><tr><td></td></tr><tr><td></td></tr><tr><td></td></tr></table> ``` After the regex cleanup at `task_executor.py:584`, this becomes `" "` (whitespace only). The guard at line 585 (`if not c`) only catches empty strings `""`, but whitespace strings are truthy in Python and pass through. When sent to Zhipu `embedding-3` API, it rejects them with error 1213: `未正常接收到prompt参数`. ## Root Cause ```python c = re.sub(r"</?(table\|td\|caption\|tr\|th)( [^<>]{0,12})?>", " ", c) if not c: # ← only catches "", not " " / "\n" / "\t" c = "None" ``` Verified with Zhipu `embedding-3`: \| Input \| Result \| \|---\|---\| \| `""` \| error 1213 \| \| `" "` \| error 1213 \| \| `"\n"` \| error 1213 \| \| `"None"` \| OK \| ## Fix ```diff - if not c: + if not c.strip(): c = "None" ``` ## Testing Reproduced with a 678KB DOCX file (166 tables, 270 chunks). Chunk #89 is the empty table above. After fix, `"None"` is sent instead and embedding succeeds. --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-13 11:47:50 +08:00
Wang Qi	64bd0130d3	Add REST API backward compatibility (#14872 ) ### What problem does this PR solve? Add REST API backward compatibility ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 11:44:40 +08:00
dale053	5a5e766386	fix(api): authorize owner_ids for list chats and search apps (#14775 ) Closes #14768 ### What problem does this PR solve? The `list_chats` and `list_searches` REST API endpoints did not enforce authorization on the `owner_ids` query parameter. Any authenticated user could pass arbitrary tenant IDs to `owner_ids` and retrieve chats or search apps belonging to other tenants they are not a member of. This PR resolves the issue by: 1. Looking up the current user's authorized tenants via `TenantService.get_joined_tenants_by_user_id` and rejecting any `owner_ids` that fall outside that set. 2. When no `owner_ids` are provided, scoping the query to only the user's authorized tenants instead of returning an unfiltered result. 3. Adding unit tests that verify unauthorized `owner_ids` are rejected with `OPERATING_ERROR`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 09:43:44 +08:00
Paul Yao	c34c81e8e6	fix: remove duplicate .wav and .aac in audio supported extensions list (#14791 ) What problem does this PR solve? In rag/app/audio.py, the supported audio extensions list contains duplicate entries: .wav appears twice (positions 3 and 5) and .aac appears twice (positions 6 and 14). While this does not affect runtime behavior, it is redundant and makes the code harder to maintain. This PR removes the duplicate entries to keep the list clean and consistent. Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 09:42:31 +08:00
writinwaters	5e46457c28	Docs: How to add Bitbucket as data source. (#14846 ) ### What problem does this PR solve? Added a guide on integrating Bitbucket as an external data source. ### Type of change - [x] Documentation Update	2026-05-12 20:48:30 +08:00
Jin Hai	ad4717f40a	Go: fix model type check when use the model (#14843 ) ### What problem does this PR solve? ``` RAGFlow(user)> chat with 'glm-ocr@test@zhipu-ai' message 'what is this' CLI error: expect model glm-ocr@zhipu-ai is a chat or multimodal model ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 19:44:01 +08:00
Wang Qi	76d5240fb5	Fix #14801 to allow search dataset list when add (#14841 ) ### What problem does this PR solve? Fix #14801 to allow search dataset list when add, following on #14825 <img width="2172" height="857" alt="image" src="https://github.com/user-attachments/assets/65ea7647-56f4-4c16-8437-121b834811f0" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-12 19:36:23 +08:00
balibabu	3f41f8cfae	Feat: When a Wait Node precedes a Message Node within a Loop Node, the outgoing message is split into two separate messages. (#14839 ) ### What problem does this PR solve? Feat: When a Wait Node precedes a Message Node within a Loop Node, the outgoing message is split into two separate messages. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-12 18:48:44 +08:00
0xτensor	127aeac4aa	fix: expose gpt-5.5 and gpt-5.4 in OpenAI model list (#14828 ) ### What problem does this PR solve? OpenAI model catalogs used in provider selection flows were missing the latest GPT models (`gpt-5.5` and `gpt-5.4`). Because model availability is driven by seeded catalog data (`conf/llm_factories.json` → DB seed → API response), these models were not selectable in the UI or `/llm/list` responses. This PR updates and synchronizes the OpenAI catalog definitions across configuration sources and ensures the new models are correctly exposed through the API layer and validated in tests. --- ### Type of change * [x] New Feature (non-breaking change which adds functionality) --- ### Changes Made * Added `gpt-5.5` and `gpt-5.4` to OpenAI catalog definitions in: * `conf/llm_factories.json` * `conf/models/openai.json` (chat + vision support) * Ensured consistency between DB-seeded factory config and provider model configuration * Updated test coverage in: * `test_llm_list_unit.py` * seeded OpenAI catalog entries * added response-level assertion validating `/llm/list` includes both new model IDs under OpenAI grouping --- ### Root Cause OpenAI model listings in selection flows are generated from catalog data seeded via `conf/llm_factories.json`. The catalog had not been updated to include the latest GPT models, resulting in missing availability in UI and API responses. --- ### Testing * Created isolated test environment: * `python -m venv .venv-review` * installed `pytest` * Ran targeted and full test suite: * `test_list_app_grouping_availability_and_merge`: ✅ passed * Full `test_llm_list_unit.py`: ✅ 10 passed --- ### Risks / Limitations * Adding models to the catalog does not guarantee upstream provider availability or account entitlement. * Environments with pre-seeded DB catalogs may require reseed or refresh to reflect updated configuration. --- ### Notes * Changes are minimal and scoped strictly to catalog configuration and related test coverage. * Ensures `/llm/list` API remains aligned with expected latest OpenAI model availability. * Closes #14827	2026-05-12 18:03:47 +08:00
Haruko386	45ee5ca9cd	Go: implement provider: Jina (#14838 ) ### What problem does this PR solve? This PR completes the Jina provider The following functionalities are now supported: Jina: - [ ] Chat / Stream Chat (Not available for now: [(Jina chat API docs)](https://api.jina.ai/docs#/Search%20Foundation%20Models/chat_completions_v1_chat_completions_post)) - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] ~~Balance~~ Verified examples from the CLI: ```plaintext RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'jina-embeddings-v2-base-en@test@jina' dimension 16 +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 768 \| 0 \| \| 768 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'jina-reranker-v2-base-multilingual@test@jina' top 3; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.74316794 \| \| 2 \| 0.18713269 \| \| 1 \| 0.15817434 \| +-------+-----------------+ RAGFlow(user)> list supported models from 'jina' 'test' +---------------------------------------------+ \| model_name \| +---------------------------------------------+ \| Jina AI: Jina VLM \| \| Jina AI: Jina Reranker v3 \| \| Jina AI: Jina Code Embeddings 0.5b \| \| Jina AI: Jina Code Embeddings 1.5b \| \| Jina AI: Jina Embeddings v4 \| \| Jina AI: Jina Reranker M0 \| \| Jina AI: ReaderLM v2 \| \| Jina AI: Jina Clip v2 \| \| Jina AI: Jina Embeddings v3 \| \| Jina AI: Jina Colbert v2 \| \| Jina AI: Reader LM 0.5b \| \| Jina AI: Reader LM 1.5b \| \| Jina AI: Jina Reranker v2 Base Multilingual \| \| Jina AI: Jina Clip v1 \| \| Jina AI: Jina Reranker v1 Tiny EN \| \| Jina AI: Jina Reranker v1 Turbo EN \| \| Jina AI: Jina Reranker v1 Base EN \| \| Jina AI: Jina Colbert v1 EN \| \| Jina AI: Jina Embeddings v2 Base ES \| \| Jina AI: Jina Embeddings v2 Base Code \| \| Jina AI: Jina Embeddings v2 Base DE \| \| Jina AI: Jina Embeddings v2 Base ZH \| \| Jina AI: Jina Embeddings v2 Base EN \| \| Jina AI: Jina Embedding B EN v1 \| \| Jina AI: Jina Embeddings v5 Text Small \| \| Jina AI: Jina Embeddings v5 Omni Small \| \| Jina AI: Jina Embeddings v5 Omni Nano \| \| Jina AI: Jina Embeddings v5 Text Nano \| +---------------------------------------------+ RAGFlow(user)> check instance 'test' from 'jina' SUCCESS ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-12 18:03:05 +08:00
tmimmanuel	7d3836907a	Go: implement Embed (embeddings) in Mistral driver (#14807 ) ### What problem does this PR solve? The Mistral Go driver landed in #14805 with chat, list models, and check connection. `Embed` was left as a stub that returns `"not implemented"`. This PR fills the gap. `conf/models/mistral.json` did not list any embedding model out of the box, so a tenant who wanted to use Mistral end to end (chat + embeddings) could not run an embedding call. This PR adds `mistral-embed` to the config and a real `/v1/embeddings` implementation. ### What this PR includes - `conf/models/mistral.json`: add `"embedding": "embeddings"` under `url_suffix` so the driver can build the URL from config (matches the `URLSuffix.Embedding` field already used by openai, siliconflow, zhipu-ai), and add a `mistral-embed` entry under `models` (1024-dimensional vectors, 8192 max input tokens). - `internal/entity/models/mistral.go`: replace the `Embed` stub with a real implementation that POSTs to `/v1/embeddings`. Adds local response types `mistralEmbeddingData` and `mistralEmbeddingResponse`. No factory change. No interface change. ### How the implementation works - Validate `apiConfig`, the API key, and the model name. Use the existing `baseURLForRegion` helper so an unknown region fails fast with a clear error. - Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so the call has a clear deadline. Same pattern as `ChatWithMessages` and `ListModels` already use in this file. - Send all input texts in one request. The Mistral API accepts the `input` field as an array. - Parse `data[].embedding` and copy each slice into a `[]EmbeddingData` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. - An empty input slice returns `[]EmbeddingData{}` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks that every input slot got a vector. If any slot is still empty, return a clear error so the caller does not silently use a zero vector. ### Note on stacking This PR builds on #14805 (the Mistral driver). Until #14805 merges, this PR's diff on GitHub will include both that PR's commits and this one. After #14805 lands on `main`, GitHub will auto-reduce this PR to only the `Embed` changes (one commit, ~111 line diff in `mistral.go` plus 8 lines in `mistral.json`). ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the `go.mod` minimum). - The full method set on `MistralModel` still matches the `ModelDriver` interface. - Pattern parity with the existing OpenAI Embed implementation (`internal/entity/models/openai.go`). Closes #14806 Depends on #14805 Tracking: #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 17:45:48 +08:00
buua436	14332dd75c	Go: fix dataset time unit (#14837 ) ### What problem does this PR solve? fix dataset time unit ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-12 17:22:16 +08:00
Jin Hai	d08bf02d9b	Go: add ASR, TTS, OCR command (#14836 ) ### What problem does this PR solve? ``` RAGFlow(user)> asr with 'glm-asr-2512@test@zhipu-ai' audio './speech.wav'; CLI error: zhipu, no such method RAGFlow(user)> stream asr with 'glm-asr-2512@test@zhipu-ai' audio './speech.wav'; CLI error: zhipu, no such method RAGFlow(user)> tts with 'glm-tts@test@zhipu-ai' text 'how are you'; CLI error: zhipu, no such method RAGFlow(user)> stream tts with 'glm-tts@test@zhipu-ai' text 'how are you'; CLI error: zhipu, no such method RAGFlow(user)> ocr with 'glm-ocr@test@zhipu-ai' file './test.log'; CLI error: zhipu, no such method ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 17:17:44 +08:00
buua436	9ee481807f	GO: implement GET /api/v1/datasets/:dataset_id (#14834 ) ### What problem does this PR solve? implement GET /api/v1/datasets/:dataset_id ### Type of change - [x] Refactoring	2026-05-12 17:16:48 +08:00
Wang Qi	4374e07a29	Speed up start time (#14833 ) ### What problem does this PR solve? Speed up start time ### Type of change - [x] Refactoring	2026-05-12 17:00:45 +08:00
tmimmanuel	eaa2e46b1e	Go: implement Embed (embeddings) in Upstage driver (#14819 ) ### What problem does this PR solve? The Upstage Go driver landed in #14817 with chat, list models, and check connection. `Embed` was left as a stub that returns `"not implemented"`. This PR fills the gap. Upstage exposes an OpenAI-compatible embeddings endpoint at `https://api.upstage.ai/v1/solar/embeddings` via the `solar-embedding-1-large` family (`solar-embedding-1-large-query` for queries, `solar-embedding-1-large-passage` for passages), and the Python side has had `UpstageEmbed(OpenAIEmbed)` in `rag/llm/embedding_model.py` for a long time targeting this same path. The existing `conf/models/upstage.json` did not list any embedding model out of the box, so a tenant who wanted to use Upstage end to end could not run an embedding call. This PR fills the gap. ### What this PR includes - `conf/models/upstage.json`: add `"embedding": "embeddings"` under `url_suffix` so the driver can build the URL from config (matches the `URLSuffix.Embedding` field already used by openai, mistral, siliconflow, zhipu-ai), and add `solar-embedding-1-large-query` and `solar-embedding-1-large-passage` entries under `models`. - `internal/entity/models/upstage.go`: replace the `Embed` stub with a real implementation that POSTs to `/v1/solar/embeddings`. Adds local response types `upstageEmbeddingData` and `upstageEmbeddingResponse`. No factory change. No interface change. ### How the implementation works - Validate `apiConfig`, the API key, and the model name. Use the existing `baseURLForRegion` helper so an unknown region fails fast with a clear error. - Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so the call has a clear deadline. Same pattern as `ChatWithMessages` and `ListModels` already use in this file. - Send all input texts in one request. The Upstage API accepts the `input` field as an array. - Parse `data[].embedding` and copy each slice into a `[]EmbeddingData` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. - An empty input slice returns `[]EmbeddingData{}` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks that every input slot got a vector. If any slot is still empty, return a clear error so the caller does not silently use a zero vector. ### Note on stacking This PR builds on #14817 (the Upstage driver). Until #14817 merges, this PR's diff on GitHub will include both that PR's commits and this one. After #14817 lands on `main`, GitHub will auto-reduce this PR to only the `Embed` changes (one commit, ~119 line diff in `upstage.go` plus ~15 lines in `upstage.json`). ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the `go.mod` minimum). - The full method set on `UpstageModel` still matches the `ModelDriver` interface. - Pattern parity with the existing Mistral Embed (`internal/entity/models/mistral.go`) and OpenAI Embed (`internal/entity/models/openai.go`) implementations. Closes #14818 Depends on #14817 Tracking: #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 16:11:06 +08:00
Haruko386	ebab3513c4	Go: implement provider: Baichuan (#14832 ) ### What problem does this PR solve? This PR completes the Baichuan provider The following functionalities are now supported: Baichuan: - [x] Chat / Stream Chat - [x] Embedding - [ ] ~~Rerank~~ - [ ] ~~Model listing~~ - [ ] ~~Provider connection checking~~ - [ ] ~~Balance~~ Verified examples from the CLI: ```plaintext # Baichuan RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'Baichuan-Text-Embedding@test@baichuan' dimension 16; +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 1024 \| 0 \| \| 1024 \| 1 \| +-----------+-------+ AGFlow(user)> chat with 'Baichuan-M2@test@baichuan' message 'who r u' Answer: I'm BaiChuan, a helpful AI assistant created by Baichuan-AI. I'm designed to be a knowledgeable, friendly, and reliable assistant for various tasks like answering questions, explaining concepts, writing content, and more. Feel free to ask me anything! 😊 Time: 1.637975 RAGFlow(user)> stream chat with 'Baichuan-M2@test@baichuan' message 'who r u' Answer: I'm BaiChuan-m2, an AI assistant developed by Baichuan-AI. My purpose is to help you with a wide range of tasks by providing information, answering questions, solving problems, and assisting with creative projects. Think of me as a helpful digital companion! If you have any questions or need assistance, just let me know.😊 Time: 1.692321 ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-12 16:10:32 +08:00
Achieve3318	2cc206ee85	Test : aggregation edge cases for list and scalar values (#14170 ) This PR adds focused unit tests for aggregate_by_field in OceanBase memory utilities to improve behavior coverage for real-world input shapes. - Adds test coverage for list-valued aggregation fields, including whitespace trimming and skipping invalid list entries. - Adds test coverage for scalar field values to ensure blank/non-string values are ignored. - Confirms aggregation output remains correct and stable for mixed-quality message payloads. ### Why this helps It strengthens regression protection for aggregation logic used by memory retrieval flows, with no production code changes and minimal review risk.	2026-05-12 15:53:35 +08:00
Magicbook1108	f85e18afbc	Refact: sandbox quickstart.md & add tutorial for code exec component (#14786 ) ### What problem does this PR solve? Refact: sandbox quickstart.md && add tutorial for code exec component ### Type of change - [x] Refactoring <img width="700" alt="img_v3_0211j_dcff835b-e3bb-4c77-9bc5-3b31a983229g" src="https://github.com/user-attachments/assets/7842fc0f-639a-458f-b164-bc81a99ce4a5" /> --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2026-05-12 14:42:20 +08:00
buua436	e8adc977bd	Fix: some agent bug (#14829 ) ### What problem does this PR solve? fix: update null checks to use 'is None' for better clarity replace RAGFlowSelect with SelectWithSearch in DebugContent add max height and overflow to DialogContent in ParameterDialog remove unused types from DataOperationsForm ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-12 14:41:49 +08:00
lif	a02b456720	fix(docs): correct broken knowledge graph construction link (#13838 ) Fixes #13817 ### What problem does this PR solve? The "knowledge graph construction" link on line 21 of `docs/guides/dataset/run_retrieval_test.md` points to `./construct_knowledge_graph.md`, which doesn't exist. The actual file is at `./advanced/construct_knowledge_graph.md`. ### Type of change - [x] Documentation Update Signed-off-by: majiayu000 <1835304752@qq.com>	2026-05-12 14:27:56 +08:00
tmimmanuel	558ea51a0f	Go: implement provider: StepFun (#14815 ) ### What problem does this PR solve? Add a Go driver for StepFun (阶跃星辰), one of the unchecked providers on the umbrella tracking issue #14736. Until this PR, a tenant who configured `stepfun` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. Chat, list models, and check connection all returned `"not implemented"` instead of reaching the StepFun API. The Python side has had StepFun registered in `rag/llm/__init__.py` as a `SupportedLiteLLMProvider` with base URL `https://api.stepfun.com/v1`, plus `StepFunCV` for vision and `StepFunSeq2txt` for ASR, but no Go path. StepFun's chat API is OpenAI-compatible, so the implementation pattern is the same as the merged Moonshot driver (#14433) and OpenAI driver (#14605). ### What this PR includes - New file `internal/entity/models/stepfun.go` with a `StepFunModel` that implements the `ModelDriver` interface. - `factory.go`: route the `"stepfun"` provider name to `NewStepFunModel`. - New `conf/models/stepfun.json` with the public StepFun chat models (step-2-16k, step-1 family in 8k/32k/128k/256k context lengths, step-1-flash, and the step-1v / step-1o vision models) and `url_suffix` entries for `chat` and `models`. ### How the driver works - StepFun exposes the OpenAI-compatible API at `https://api.stepfun.com/v1`. - `ChatWithMessages` and `ChatStreamlyWithSender` post to `/chat/completions` in the same shape as the merged moonshot, openrouter, and openai drivers. - `ListModels` and `CheckConnection` call `/models` to list available ids and confirm the API key works. - `Embed` is left as `"not implemented"`. StepFun has not advertised a public embeddings endpoint in the API reference linked from the umbrella issue (`https://platform.stepfun.com/docs/en/api-reference/chat/chat-completion-create` is the chat endpoint), so any real implementation belongs in a separate follow-up only after the endpoint is verified. - `Rerank` and `Balance` return `"no such method"` because StepFun does not expose either. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 with no errors on go 1.25 (the `go.mod` minimum). - Method set of `StepFunModel` matches the `ModelDriver` interface: `NewInstance`, `Name`, `ChatWithMessages`, `ChatStreamlyWithSender`, `Embed`, `Rerank`, `ListModels`, `Balance`, `CheckConnection`. - Pattern parity with the merged moonshot (#14433), openai (#14605), openrouter (#14652), and xai (#14550) drivers. Closes #14814 Tracking: #14736	2026-05-12 13:49:35 +08:00
hyl64	02c2587ca4	fix(agent): support iteration item aliases in child nodes (#14146 ) ## Summary This PR fixes the iteration variable mismatch reported in #14142. Changes: - restore compatibility for `IterationItem@result` by exposing `result` alongside `item` - support bare iteration aliases like `{item}`, `{index}`, and `{result}` inside iteration child-node inputs - add focused unit/runtime tests covering both alias styles and multi-item iteration execution ## Validation ```bash pytest -q --noconftest \ test/testcases/test_web_api/test_canvas_app/test_iterationitem_unit.py \ test/testcases/test_web_api/test_canvas_app/test_iteration_runtime_unit.py \ test/testcases/test_web_api/test_canvas_app/test_invoke_component_unit.py ``` Result: `12 passed` Closes #14142	2026-05-12 13:05:21 +08:00
Haruko386	128a64eae5	Refactor(Go): remove hardcode in huggingface provider (#14822 ) ### What problem does this PR solve? remove hardcode in `huggingface` provider ### Type of change - [x] Refactoring	2026-05-12 11:35:26 +08:00
dependabot[bot]	139b76d2b1	Chore(deps): Bump urllib3 from 2.6.3 to 2.7.0 in /agent/sandbox (#14824 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.3 to 2.7.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.7.0</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Security</h2> <p>Addressed high-severity security issues. Impact was limited to specific use cases detailed in the accompanying advisories; overall user exposure was estimated to be marginal.</p> <ul> <li> <p>Decompression-bomb safeguards of the streaming API were bypassed:</p> <ol> <li>When <code>HTTPResponse.drain_conn()</code> was called after the response had been read and decompressed partially. (Reported by <a href="https://github.com/Cycloctane"><code>@Cycloctane</code></a>)</li> <li>During the second <code>HTTPResponse.read(amt=N)</code> or <code>HTTPResponse.stream(amt=N)</code> call when the response was decompressed using the official <a href="https://pypi.org/project/brotli/">Brotli</a> library. (Reported by <a href="https://github.com/kimkou2024"><code>@kimkou2024</code></a>)</li> </ol> <p>See GHSA-mf9v-mfxr-j63j for details.</p> </li> <li> <p>HTTP pools created using <code>ProxyManager.connection_from_url</code> did not strip sensitive headers specified in <code>Retry.remove_headers_on_redirect</code> when redirecting to a different host. (GHSA-qccp-gfcp-xxvc reported by <a href="https://github.com/christos-spearbit"><code>@christos-spearbit</code></a>)</p> </li> </ul> <h2>Deprecations and Removals</h2> <ul> <li>Used <code>FutureWarning</code> instead of <code>DeprecationWarning</code> for better visibility of existing deprecation notices. Rescheduled the removal of deprecated features to version 3.0. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3763">urllib3/urllib3#3763</a>)</li> <li>Removed support for end-of-life Python 3.9. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3720">urllib3/urllib3#3720</a>)</li> <li>Removed support for end-of-life PyPy3.10. (<a href="https://redirect.github.com/urllib3/urllib3/issues/4979">urllib3/urllib3#4979</a>)</li> <li>Bumped the minimum supported pyOpenSSL version to 19.0.0. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3777">urllib3/urllib3#3777</a>)</li> </ul> <h2>Bugfixes</h2> <ul> <li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was ignoring decompressed data buffered from previous partial reads. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3636">urllib3/urllib3#3636</a>)</li> <li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only part of the response after a partial read when <code>cache_content=True</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/4967">urllib3/urllib3#4967</a>)</li> <li>Fixed <code>HTTPResponse.stream()</code> and <code>HTTPResponse.read_chunked()</code> to handle <code>amt=0</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3793">urllib3/urllib3#3793</a>)</li> <li>Updated <code>_TYPE_BODY</code> type alias to include missing <code>Iterable[str]</code>, matching the documented and runtime behavior of chunked request bodies. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3798">urllib3/urllib3#3798</a>)</li> <li>Fixed <code>LocationParseError</code> when paths resembling schemeless URIs were passed to <code>HTTPConnectionPool.urlopen()</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3352">urllib3/urllib3#3352</a>)</li> <li>Fixed <code>BaseHTTPResponse.readinto()</code> type annotation to accept <code>memoryview</code> in addition to <code>bytearray</code>, matching the <code>io.RawIOBase.readinto</code> contract and enabling use with <code>io.BufferedReader</code> without type errors. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3764">urllib3/urllib3#3764</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.7.0 (2026-05-07)</h1> <h2>Security</h2> <p>Addressed high-severity security issues. Impact was limited to specific use cases detailed in the accompanying advisories; overall user exposure was estimated to be marginal.</p> <ul> <li> <p>Decompression-bomb safeguards of the streaming API were bypassed:</p> <ol> <li>When <code>HTTPResponse.drain_conn()</code> was called after the response had been read and decompressed partially.</li> <li>During the second <code>HTTPResponse.read(amt=N)</code> or <code>HTTPResponse.stream(amt=N)</code> call when the response was decompressed using the official <code>Brotli <https://pypi.org/project/brotli/></code>__ library.</li> </ol> <p>See <code>GHSA-mf9v-mfxr-j63j <https://github.com/urllib3/urllib3/security/advisories/GHSA-mf9v-mfxr-j63j></code>__ for details.</p> </li> <li> <p>HTTP pools created using <code>ProxyManager.connection_from_url</code> did not strip sensitive headers specified in <code>Retry.remove_headers_on_redirect</code> when redirecting to a different host. (<code>GHSA-qccp-gfcp-xxvc <https://github.com/urllib3/urllib3/security/advisories/GHSA-qccp-gfcp-xxvc></code>__)</p> </li> </ul> <h2>Deprecations and Removals</h2> <ul> <li>Used <code>FutureWarning</code> instead of <code>DeprecationWarning</code> for better visibility of existing deprecation notices. Rescheduled the removal of deprecated features to version 3.0. (<code>[#3763](https://github.com/urllib3/urllib3/issues/3763) <https://github.com/urllib3/urllib3/issues/3763></code>__)</li> <li>Removed support for end-of-life Python 3.9. (<code>[#3720](https://github.com/urllib3/urllib3/issues/3720) <https://github.com/urllib3/urllib3/issues/3720></code>__)</li> <li>Removed support for end-of-life PyPy3.10. (<code>[#4979](https://github.com/urllib3/urllib3/issues/4979) <https://github.com/urllib3/urllib3/issues/4979></code>__)</li> <li>Bumped the minimum supported pyOpenSSL version to 19.0.0. (<code>[#3777](https://github.com/urllib3/urllib3/issues/3777) <https://github.com/urllib3/urllib3/issues/3777></code>__)</li> </ul> <h2>Bugfixes</h2> <ul> <li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was ignoring decompressed data buffered from previous partial reads. (<code>[#3636](https://github.com/urllib3/urllib3/issues/3636) <https://github.com/urllib3/urllib3/issues/3636></code>__)</li> <li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only part of the response after a partial read when <code>cache_content=True</code>.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`9a950b92d9`"><code>9a950b9</code></a> Release 2.7.0</li> <li><a href="`5ec0de499b`"><code>5ec0de4</code></a> Merge commit from fork</li> <li><a href="`2bdcc44d1e`"><code>2bdcc44</code></a> Merge commit from fork</li> <li><a href="`f45b0df09d`"><code>f45b0df</code></a> Fix a misleading example for <code>ProxyManager</code> (<a href="https://redirect.github.com/urllib3/urllib3/issues/4970">#4970</a>)</li> <li><a href="`577193ca02`"><code>577193c</code></a> Switch to nightly PyPy3.11 in CI for now (<a href="https://redirect.github.com/urllib3/urllib3/issues/4984">#4984</a>)</li> <li><a href="`e90af45bb0`"><code>e90af45</code></a> Avoid infinite loop in <code>HTTPResponse.read_chunked</code> when <code>amt=0</code> (<a href="https://redirect.github.com/urllib3/urllib3/issues/4974">#4974</a>)</li> <li><a href="`67ed74fdae`"><code>67ed74f</code></a> Bump dev dependencies (<a href="https://redirect.github.com/urllib3/urllib3/issues/4972">#4972</a>)</li> <li><a href="`3abd481097`"><code>3abd481</code></a> Upgrade mypy to version 1.20.2 (<a href="https://redirect.github.com/urllib3/urllib3/issues/4978">#4978</a>)</li> <li><a href="`2b8725dfca`"><code>2b8725d</code></a> Drop support for EOL PyPy3.10 (<a href="https://redirect.github.com/urllib3/urllib3/issues/4979">#4979</a>)</li> <li><a href="`2944b2a0a6`"><code>2944b2a</code></a> Upgrade <code>setup-chrome</code> and <code>setup-firefox</code> to fix warnings (<a href="https://redirect.github.com/urllib3/urllib3/issues/4973">#4973</a>)</li> <li>Additional commits viewable in <a href="https://github.com/urllib3/urllib3/compare/2.6.3...2.7.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=uv&previous-version=2.6.3&new-version=2.7.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/infiniflow/ragflow/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-12 11:10:15 +08:00
CaptainTimon	2717ee283f	feat(raptor): add Psi tree builder with original-space ranking and safe migration (#14679 ) ### What problem does this PR solve? Closes #14674. This PR improves RAPTOR configuration and tree construction while preserving the existing RAPTOR behavior as the default. RAPTOR currently builds summary layers with the original UMAP + GMM clustering path. This PR keeps that default path, and adds: - A hidden backend tree-builder option: - `tree_builder="raptor"`: default, existing RAPTOR behavior. - `tree_builder="psi"`: rank-aware Psi-style tree builder using original embedding-space cosine ranking. - A user-facing clustering method option for the default RAPTOR builder: - `clustering_method="gmm"`: existing default. - `clustering_method="ahc"`: agglomerative hierarchical clustering path. - A RAPTOR UI setting for `Clustering method` and `Max cluster`. ### What changed #### Backend - Added `tree_builder` support for RAPTOR/Psi. - Added `clustering_method` support for GMM/AHC. - Kept existing RAPTOR + GMM as the default. - Added Psi tree building from original-space cosine similarity. - Added bucketed Psi building controls for large inputs: - `raptor.ext.psi_exact_max_leaves` - `raptor.ext.psi_bucket_size` - Added method-aware RAPTOR summary metadata using existing `extra.raptor_method`. - Avoided adding a dedicated DB schema field for experimental method tracking. - Added cleanup/migration logic to avoid mixing stale RAPTOR summary trees. - Added defensive checks for Psi tree construction and summary failures. #### Frontend/UI - Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`. - Added/kept `Max cluster` in RAPTOR settings. - Enlarged max cluster UI limit to `1024`, matching backend validation. - Kept AHC editable even when a RAPTOR task has already finished. - Fixed the UI save payload so `clustering_method` and `tree_builder` are serialized through `parser_config.raptor.ext`, avoiding backend validation errors for extra top-level RAPTOR fields. Example saved RAPTOR config: ```json { "raptor": { "max_cluster": 317, "ext": { "clustering_method": "ahc", "tree_builder": "raptor" } } } Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>	2026-05-12 09:42:31 +08:00
黄圣祺	415169d497	fix(dify): add GET method support to /dify/retrieval for health check (#13837 ) ## Summary - Add GET method handler to `/api/v1/dify/retrieval` endpoint for Dify external knowledge base connectivity verification - GET requests return a simple success response; POST requests retain existing retrieval logic unchanged ## Problem When Dify integrates with RAGFlow as an external knowledge base, it sends periodic GET requests to the retrieval endpoint for health/connectivity checks. The endpoint only accepted POST, causing werkzeug to return `405 Method Not Allowed`. After several successful POST retrievals, the failing GET health checks trigger Dify's circuit breaker, causing all subsequent requests to fail. Traceback from the issue: ``` werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed: The method is not allowed for the requested URL. ``` ## Changes - `api/apps/sdk/dify_retrieval.py`: Added a separate GET route handler (`retrieval_health_check`) that returns `get_json_result(data=True)` ## Test plan - [ ] Verify `GET /api/v1/dify/retrieval` returns `{"code": 0, "message": "success", "data": true}` - [ ] Verify `POST /api/v1/dify/retrieval` with valid API key and body still works as before - [ ] Verify Dify external knowledge base integration no longer returns 405 errors Closes #13788 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-12 09:37:07 +08:00
Ramin M.	765cdc2ec2	[Bug]: REDIS error #12870 (#13875 ) Fix for: [Bug]: REDIS error #12870	2026-05-12 09:31:47 +08:00
Jin Hai	2f2d1569e6	Go: fix retrieval test error (#14794 ) ### What problem does this PR solve? 1. Add region check in zhipu-ai embed method 2. Fix retrieval test ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 20:19:08 +08:00
Haruko386	3e90d303e0	Go: implement provider: CoHere and FishAudio (#14790 ) ### What problem does this PR solve? This PR completes the Cohere provider integration (upgrading to the new Cohere V2 API) and enhances the Fish Audio provider in RAGFlow. The following functionalities are now supported: Cohere: - [x] Chat / Think Chat / Stream Chat / Stream Think Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] Balance Fish Audio: - [x] Model listing (`ListModels`) - [x] Balance (`Balance`) ----- Verified examples from the CLI: ```plaintext # Cohere RAGFlow(user)> think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho' Thinking: Okay, the user wrote "jumperwho". Let me try to figure out what they might be asking. First, I'll check if it's a misspelling. "Jumper" ...... Hmm. Since the query is unclear, the best approach is to ask the user to provide more context or correct any possible typos. Answer: It seems there might be a typo or missing context in your query "jumperwho." Could you clarify what you're referring to? For example: - Are you asking about a jumper (a type of sweater, a person who jumps, or a component in electronics)? - Is this related to a specific context, like a movie (e.g., the 2008 film Jumper) or a game? - Did you mean to ask about a person ("who") associated with jumping (e.g., a parachutist)? Let me know so I can provide a helpful response! 😊 Time: 6.710331 RAGFlow(user)> stream think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho' Thinking: , the user mentioned "jumperwho". Let me try to figure out what they're referring to. First, I'll check if it's a misspelling. "Jumper" could be a typo for "jumper" or maybe a username. Alternatively, it might be a combination of words like "jumper who",....... the best approach is to inform the user that I don't recognize the term and ask if they can provide more context or clarify what they mean by "jumperwho". That way, I can assist them better once I have more information. Answer: seems "jumperwho" isn't a widely recognized term, proper noun, or acronym in common usage. Could you provide more context or clarify what you mean by "jumperwho"? This will help me understand your question or request better! Time: 4.513596 RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'embed-v4.0@test3@cohere' dimension 16; +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| embedding \| index \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| [-0.016643638 -0.001957038 0.0055713872 0.009027058 0.05275187 -0.024542313 -0.044006906 0.024119169 0.0014192933 0.006558722 0.0019129605 -0.021016119 -0.026516981 -0.017489925 0.021298215 0.017772019 0.04569948 0.008886009 0.012059584 -0.0014721862 0.... \| 0 \| \| [0.018778935 -0.0063459855 -0.0006839742 0.0046623563 0.0067668925 -0.018001877 -0.03963003 0.035744734 -0.014246088 -0.0020721585 -0.006313608 0.025124922 -0.010749322 0.01217393 -0.010231283 -0.025254432 0.021498645 -0.028880708 0.019167464 -0.0058279... \| 1 \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'rerank-v4.0-pro@test@cohere' top 3; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.91744334 \| \| 1 \| 0.7458429 \| \| 2 \| 0.68729424 \| +-------+-----------------+ RAGFlow(user)> list supported models from 'cohere' 'test' +-------------------------------------+ \| model_name \| +-------------------------------------+ \| c4ai-aya-expanse-32b \| \| c4ai-aya-vision-32b \| \| cohere-transcribe-03-2026 \| \| command-a-03-2025 \| \| command-a-reasoning-08-2025 \| \| command-a-translate-08-2025 \| \| command-a-vision-07-2025 \| \| command-r-08-2024 \| \| command-r-plus-08-2024 \| \| command-r7b-12-2024 \| \| command-r7b-arabic-02-2025 \| \| embed-english-light-v3.0 \| \| embed-english-light-v3.0-image \| \| embed-english-v3.0 \| \| embed-english-v3.0-image \| \| embed-multilingual-light-v3.0 \| \| embed-multilingual-light-v3.0-image \| \| embed-multilingual-v3.0 \| \| embed-multilingual-v3.0-image \| \| embed-v4.0 \| +-------------------------------------+ RAGFlow(user)> check instance 'test' from 'cohere' SUCCESS # FishAudio RAGFlow(user)> list supported models from 'fishaudio' 'test' +----------------------------------------+ \| model_name \| +----------------------------------------+ \| Valentino Narración Biblica Fer \| \| Super Smash Bros. 4/Ultimate Announcer \| \| Farid Dieck \| \| عصام الشوالي \| \| ALEX_CHIKNA \| \| Energetic Male \| \| voz de locutor k \| \| يي \| \| ELITE \| \| Mortal Kombat \| +----------------------------------------+ RAGFlow(user)> show balance from 'fishaudio' 'test' +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ \| _id \| created_at \| credit \| has_free_credit \| has_phone_sha256 \| updated_at \| user_id \| +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ \| 82ffec12cf984d88a30ec504d7909812 \| 2026-05-09T07:52:16.119000Z \| 0 \| \| false \| 2026-05-09T07:52:16.119000Z \| 2578ab1126804d6eaa630552400d7ff3 \| +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-11 20:18:38 +08:00
buua436	daf8a58c4b	Fix: add codeexec attachments output (#14787 ) ### What problem does this PR solve? add codeexec attachments output ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 19:16:33 +08:00
Renzo	39ee2fb120	Go: implement Rerank in NVIDIA driver (#14778 ) ## Summary - Replaces the `"no such method"` stub on `NvidiaModel.Rerank` (`internal/entity/models/nvidia.go`) with a real implementation against NVIDIA NIM's `/ranking` endpoint. - Mirrors the existing Python `NvidiaRerank` class at `rag/llm/rerank_model.py:149-190` for behavior parity: same `passages`/`query.text`/`logit` payload shape; `top_n` set to `len(documents)` so every input gets a score returned in original order (the issue body's spec omitted `top_n`, which would cause silent data loss). - Adds the `"rerank": "ranking"` URL suffix and two NIM rerank model entries (`nvidia/nv-rerankqa-mistral-4b-v3`, `nvidia/llama-3.2-nv-rerankqa-1b-v2`) to `conf/models/nvidia.json` so the picker exposes them. - Follows the same shape as the recently merged Aliyun (#14676), Gitee (#14656), and ZhipuAI (#14608) Rerank implementations: lowercase per-driver request/response types, conversion to the project-wide `RerankResponse{Data: []RerankResult}`, per-call `context.WithTimeout` of 30s. Closes #14720 ## Test plan - [x] `gofmt -l internal/entity/models/nvidia.go` — clean - [x] `go vet ./internal/entity/models/...` — no new errors introduced (the two pre-existing vet errors in `baidu.go:642` and `openrouter.go:566` are unrelated to this PR) - [x] `go build ./internal/entity/models/...` — succeeds - [x] `python3 -c "import json; json.load(open('conf/models/nvidia.json'))"` — JSON valid - [ ] Live smoke test against NVIDIA NIM with a real API key (requires reviewer with NIM credentials) ## Notes for reviewers - The issue body suggested omitting `top_n`. The Python reference includes it (`top_n: len(texts)`), and without it NVIDIA returns only the default top-K rankings rather than scores for every input. This PR follows the Python. - The URL host is `integrate.api.nvidia.com` (kept consistent with the existing chat/embeddings BaseURL in `nvidia.go`), not the legacy `ai.api.nvidia.com` host the Python uses. NIM's unified endpoint accepts the model names as-is, so no per-model URL transform is needed.	2026-05-11 17:21:16 +08:00
Jin Hai	9b3850339b	Go: add development guide document (#14785 ) ### What problem does this PR solve? As the title suggests. ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 17:20:41 +08:00
tmimmanuel	663fc1d42c	fix(opensearch): implement doc-meta dispatch surface on OSConnection (#14577 ) ### What problem does this PR solve? Fixes #14570. On OpenSearch backends (`DOC_ENGINE=opensearch`) every document-metadata write failed with `'OSConnection' object has no attribute 'create_doc_meta_idx'`, so both `PATCH /api/v1/datasets/{ds}/documents/{doc}` with `meta_fields` and `POST /api/v1/datasets/{ds}/metadata/update` were unusable while every other document operation (retrieval, parsing, name update, chunk management) worked correctly on the same OpenSearch cluster. The bug runs deeper than the missing method name in the error message suggests. `DocMetadataService` also reached into `settings.docStoreConn.es.*` directly for the index refresh, the scripted partial update, and the count call, which means that even after adding `create_doc_meta_idx` to `OSConnection` the very next call in the same metadata flow would still raise `AttributeError` because `OSConnection` exposes `self.os` rather than `self.es`. Fixing only the reported symptom would have moved the failure one line down without restoring the feature. This PR adds a uniform document-metadata dispatch surface to both connection classes so they present the same abstract API, and routes the service layer through that surface via `getattr` guards instead of poking at backend-specific attributes. The four new methods on `OSConnection` and `ESConnectionBase` are `create_doc_meta_idx`, `refresh_idx`, `count_idx`, and `replace_meta_fields`. `OSConnection.create_doc_meta_idx` reuses the existing `conf/doc_meta_es_mapping.json` schema in the OpenSearch `body=` form because OpenSearch and Elasticsearch share the same index-creation payload, and `replace_meta_fields` emits a full scripted assignment (`ctx._source.meta_fields = params.meta_fields`) on both backends so removed keys actually disappear instead of being preserved by deep-merge semantics. The `getattr`-guarded dispatch in `DocMetadataService` keeps the existing fall-through paths intact for Infinity and OceanBase, which continue to rely on their search-based count fallback and on the delete-then-insert metadata replacement they used before, so this change is strictly additive for those two backends. Verification: `pytest test/unit_test/rag/utils/test_opensearch_doc_meta.py` runs 16 new unit tests that pass locally and pin the `OSConnection` dispatch surface, the `create_doc_meta_idx` short-circuit when the index already exists, the mapping-file payload routing, the `IndicesClient.create` failure path, the `refresh_idx` and `count_idx` success and error sentinels, and the full-assignment script emitted by `replace_meta_fields`. The test module stubs `common.settings` and `rag.nlp` at import time so the suite runs without the heavy backend SDKs that the rest of the repository pulls in transitively. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: tmimmanuel <tmimmanuel@users.noreply.github.com>	2026-05-11 17:04:28 +08:00
box4wangjing	292b0b8bce	chore: fix some comments to improve readability (#14756 ) ### What problem does this PR solve? fix some comments to improve readability ### Type of change - [x] Documentation Update --------- Signed-off-by: box4wangjing <box4wangjing@outlook.com>	2026-05-11 16:48:48 +08:00
Octopus	c58906b69e	fix: OCR.detect() returns truthy None-tuple causing NoneType subscript crash (#13951 ) Fixes #13851 ## Problem `OCR.detect()` in `deepdoc/vision/ocr.py` returns `None, None, time_dict` (a truthy 3-tuple) when the text detector fails or receives a `None` image. However, the caller in `pdf_parser.py:__ocr()` checks: ```python bxs = self.ocr.detect(np.array(img), device_id) if not bxs: # False! (None, None, time_dict) is a non-empty tuple → truthy self.boxes.append([]) return bxs = [(line[0], line[1][0]) for line in bxs] # iterates (None, None, time_dict) # line = None → None[0] → TypeError: 'NoneType' object is not subscriptable ``` This causes the `NoneType object is not subscriptable` error that appears after "OCR started" in the chunking pipeline when using PDF + General parser. ## Solution Simplified `OCR.detect()` to return `None` (falsy) instead of `None, None, time_dict` on failure. The `time_dict` was unused by the only caller of this method. The early-return guard `if not bxs:` in `pdf_parser.py` then correctly catches it. ## Testing - The method's only caller (`pdf_parser.py:__ocr`) already has a `if not bxs:` guard that handles the `None` return correctly. - No other callers of `OCR.detect()` exist in the codebase. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Modified OCR detection function return behavior to streamline output. The function now returns detection results only, without timing metadata. Error cases now return `None` instead of empty tuple values. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-05-11 16:19:28 +08:00
Nie WeiYang	1e80be77a2	fix(web): fix incomplete Docx preview in citation reference (#14122 ) This PR fixes a UI issue where the .docx document preview was displayed incompletely when clicking on a citation/reference link during a knowledge base conversation. ### What problem does this PR solve? The Issue: In the chat interface, when a user clicks the source citation at the end of an answer, the DocPreviewer opens. However, for .docx files, if the content exceeded the window height, it was truncated and unscrollable, preventing users from reading the full referenced text. Changes: web/src/components/document-preview/doc-preview.tsx: Added the overflow-auto Tailwind class to the DocPreviewer root container to ensure scrollbars appear automatically when content overflows. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: nie.weiyang <nie.weiyang@embedway.com>	2026-05-11 16:17:48 +08:00
as-ondewo	6fb8c31c22	Fix: Document parse status set to DONE before chunks are retrievable (#13352 ) ### What problem does this PR solve? The document parse status was set to DONE before the document chunks were actually retrievable from Elasticsearch/Opensearch because it did not wait for the index refresh. This meant that it was possible that the document parse status returned by the API was DONE but when trying to retrieve chunks there were none. Since the index refreshes every 1 second this was quite likely to happen when wait for document parsing by polling with a short interval and then immediately trying to retrieve chunks once the status was DONE. I fixed this bug and added a test case that would have caught it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 16:04:08 +08:00
Sank	592dba1489	Refact: Added a private helper _visibility_and_status_filter (#13627 ) ### What problem does this PR solve? Added a private helper _visibility_and_status_filter(joined_tenant_ids, user_id) that returns the Peewee condition: visible to user (team or own) and status is VALID. ### Type of change - [x] Refactoring --------- Co-authored-by: Serobabov Aleksandr <40SerobabovAS@region.cbr.ru> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-05-11 15:21:41 +08:00
tmimmanuel	6ce014c23b	fix: offload blocking DB/Redis calls to thread pool for high-concurrency support (#13825 ) (#13941 ) ### What problem does this PR solve? Addresses event-loop blocking under high concurrency reported in #13825. When multiple requests hit the API simultaneously, synchronous DB/Redis calls block the async event loop, preventing Quart from handling other requests and causing cascading 502/504 timeouts. This PR wraps all remaining blocking DB/Redis calls in `canvas_app.py`, `chat_api.py`, `session.py`, and `canvas_service.py` with `await thread_pool_exec()` - Offload all synchronous `Service.`, `REDIS_CONN.`, and `APIToken.query` calls to the thread pool - Convert sync endpoint handlers (`list_chats`, `get_chat`, `templates`, `sessions`, etc.) to `async def` - Convert sync helper functions (`_ensure_owned_chat`, `_validate_llm_id`, `_validate_dataset_ids`, etc.) to async - no duplicate sync/async pairs - Wrap `CanvasReplicaService` Redis IO calls (`bootstrap`, `replace_for_set`, `commit_after_run`) - Use `asyncio.gather()` for concurrent file uploads and chat response building Note: This fixes the code-level event-loop blocking, which is a prerequisite for handling concurrent requests. For the full "30 concurrent requests without 502/504" goal described in the issue, users should also tune deployment config: - `WS=4` or higher (HTTP worker processes, default 1) - `MAX_CONCURRENT_CHATS=50` (default 10) - `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` for workflow-heavy workloads ### Performance verification Reviewer asked for a before-vs-after comparison ([comment](https://github.com/infiniflow/ragflow/pull/13941#issuecomment-4393667231)). I built a self-contained microbenchmark that reproduces the exact failure mode this PR targets: an async handler that performs blocking DB/Redis-style calls (50 ms each, 3 per request, 30 concurrent requests) is run twice — once with the pre-PR pattern (sync call directly inside the async handler) and once with the post-PR pattern (`await thread_pool_exec(...)`). The benchmark imports nothing from RAGFlow except `thread_pool_exec` itself, so it is hermetic and reproducible (`THREAD_POOL_MAX_WORKERS=128`, Python 3.13.12). Throughput — wall-clock for 30 concurrent requests (lower is better) \| flavour \| wall(s) \| p50(s) \| p95(s) \| max(s) \| \|---\|---:\|---:\|---:\|---:\| \| before \| 4.986 \| 0.158 \| 0.207 \| 0.269 \| \| after \| 0.248 \| 0.181 \| 0.230 \| 0.231 \| The pre-PR handler serializes the entire load on the event-loop thread, so 30 × 3 × 50 ms ≈ 4.5 s shows up as the wall time. The post-PR handler parallelizes the blocking work across the thread pool and finishes the same load in 248 ms — a ~20× speedup on this workload. Event-loop responsiveness — latency of an unrelated probe coroutine while the 30 slow requests are running (lower is better) \| flavour \| samples \| probe p50 (ms) \| probe p95 (ms) \| probe max (ms) \| \|---\|---:\|---:\|---:\|---:\| \| before \| 1 \| 5442.26 \| 5442.26 \| 5442.26 \| \| after \| 28 \| 0.88 \| 11.53 \| 98.02 \| This is the metric that maps directly to "the API still answers other requests while one is busy". A 5 ms-interval probe was scheduled while the 30 slow handlers ran. With the pre-PR code the event loop was frozen for the entire duration of the blocking work, so only one probe sample was ever picked up and it waited 5,442 ms. After the PR, 28 probe samples landed with p50 0.88 ms / p95 11.53 ms, meaning unrelated requests are no longer starved by the slow ones. That is the regression mode behind the cascading 502/504s reported in #13825. <details> <summary>Raw benchmark output</summary> ``` config: 30 concurrent requests, 3 blocking calls of 50ms each per request, THREAD_POOL_MAX_WORKERS=128 === Throughput (lower wall is better) === flavour wall(s) p50(s) p95(s) max(s) before 4.986 0.158 0.207 0.269 after 0.248 0.181 0.230 0.231 === Event-loop responsiveness (lower probe latency is better) === flavour samples probe p50(ms) probe p95(ms) probe max(ms) before 1 5442.26 5442.26 5442.26 after 28 0.88 11.53 98.02 ``` </details> The benchmark script is included as a comment on the PR for reproducibility. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Performance Improvement Closes [#13825](https://github.com/infiniflow/ragflow/issues/13825) --------- Co-authored-by: tmimmanuel <tmimmanuel@users.noreply.github.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 15:08:55 +08:00
Paul Y Hui	a0efc453f3	Fix: safe argument guard and remove redundant redis call (#14060 ) ### What problem does this PR solve? - Moved if not all([email, new_pwd, new_pwd2]) guard to the top, before any decryption that could crash on None value - Removed the redundant REDIS_CONN.get() call — one call is sufficient ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-05-11 15:02:24 +08:00
Jin Hai	c55e23e7e2	Go: refactor embedding interface (#14757 ) ### What problem does this PR solve? Provide embedding index according to the input text ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 14:45:30 +08:00
Ricardo-M-L	5ef7f50eef	fix: use context manager for ThreadPoolExecutor in file_service.py (#14144 ) ## Summary - Wrap 2 `ThreadPoolExecutor` instances in `file_service.py` with `with` statement - Ensures threads are properly shut down after all futures complete ## Problem `parse_docs()` (line 532) and the file processing method (line 694) create `ThreadPoolExecutor` instances that are never shut down. In a long-running server process, this leaks thread resources on every invocation — threads remain alive consuming memory even after all submitted work is complete. ## Fix Replace bare `ThreadPoolExecutor()` with `with ThreadPoolExecutor() as exe:` context manager, which calls `executor.shutdown(wait=True)` on exit. ## Test plan - [x] Verified both call sites use `with` statement after fix - [x] No remaining bare `ThreadPoolExecutor` in `file_service.py` - [x] `document_service.py:1066` is a module-level executor (different pattern, not changed in this PR) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 14:02:45 +08:00
buua436	a03b95f8c4	Fix: shared dataset chunk index lookup (#14764 ) ### What problem does this PR solve? shared dataset chunk index lookup ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 13:50:08 +08:00

1 2 3 4 5 ...

6189 Commits