ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Lynn	478c9846a1	Fix: model list (#15860 ) ### What problem does this PR solve? Remove tenant_llm call in rag. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 14:59:57 +08:00
Wang Qi	899f76af6b	Fix add OpenRouter base_url, UI need to select at least one model to verify (#15894 ) Fix add OpenRouter base_url, UI need to select at least one model to verify	2026-06-10 14:59:27 +08:00
chanx	6822307436	fix: rename ark_api_key to api_key for volcengine provider config (#15896 ) ### What problem does this PR solve? fix: rename ark_api_key to api_key for volcengine provider config ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 14:56:38 +08:00
Lynn	f632bb4a85	Fix: tenant_model migrate (#15886 ) ### What problem does this PR solve? Find instance for models. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 14:06:23 +08:00
chanx	c23809a4bd	Fix: Fix some model provider-related UI issues (#15884 ) ### What problem does this PR solve? Fix: Fix some model provider-related UI issues ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 14:05:57 +08:00
Hz_	38755c705a	feat(go): Add DeepSeek models and Gitee alias metadata tests (#15885 ) This PR expands conf/all_models.json with DeepSeek model entries and provider aliases. Changes: - Added DeepSeek model entries across `V4`, `V3.2`, `V3.1`, `V3`, `R1`, `Coder`, `Math`, `VL`, `OCR`, `Prover`, `MoE`, and `LLM` series. - Normalized model name values to lowercase canonical IDs. - Added alias values for official DeepSeek/Hugging Face names and provider-specific names from OpenRouter, VolcEngine, SiliconFlow, HuaweiCloud, and QiniuCloud. - Preserved model metadata such as max_tokens, model_types, and thinking where applicable. - Added Gitee ListModels tests to verify DeepSeek aliases map back to model metadata from all_models.json. - Added an optional Gitee integration test gated by GITEE_LIST_MODELS_INTEGRATION=1. Test: /usr/local/go/bin/go clean -cache /usr/local/go/bin/go test ./internal/entity/models -run 'TestGiteeListModels(MapsAllDeepSeekAliasesToModelMetadata\|KeepsOwnedBySuffixAfterAliasMetadataLookup\| Integration)'	2026-06-10 13:59:23 +08:00
buua436	093eec3105	fix: handle qwen rerank error response (#15881 ) ### What problem does this PR solve? Fix QWen rerank error handling so DashScope error responses without a text attribute do not raise a secondary KeyError and hide the real provider error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 13:05:24 +08:00
Wang Qi	9aa81e7cad	Fix paddle ocr / minerU cannot add (#15858 ) Fix paddle ocr / minerU cannot add	2026-06-10 13:04:13 +08:00
Idriss Sbaaoui	7f4bf69f05	Enhancement: slim Docker image, add .dockerignore, fix Go binary shipping (#15880 ) ### What problem does this PR solve? The RAGFlow Docker image was 9.06 GB with build-only compiler packages leaking into the runtime, duplicate frontend source shipped alongside compiled assets, and no .dockerignore causing ~6 GB of unnecessary context transfer per build. ### Type of change - [x] Performance Improvement	2026-06-10 11:44:22 +08:00
oktofeesh	bbc1f2ecec	feat(go-api): add RAG retrieval to chat completions (#15739 ) ## Summary - Add knowledge-base retrieval support to Go chat completions. ## What changed - Routes KB-backed chat sessions through the Go retrieval service instead of falling back to solo chat. - Resolves embedding and rerank models, validates accessible knowledge bases, and preserves tenant-aware retrieval. - Rejects mixed embedding models across selected knowledge bases before retrieval to avoid incompatible vector dimensions. - Threads the HTTP request context into streaming retrieval so cancelled requests can stop downstream retrieval work. - Applies metadata filters and message-level `doc_ids` before retrieval. - Expands parent/child chunks before building references and prompt context. - Injects retrieved knowledge through a copied dialog prompt config so the caller's original dialog is not mutated. - Honors configured empty responses when no chunks are found. - Names the metadata no-match sentinel and reuses it across retrieval/handler paths. - Adds a defensive content cast while appending streamed answers. - Adds focused unit coverage for retrieval, metadata filtering, authorization, multimodal messages, references, empty-response behavior, prompt immutability, and mixed embedding models. --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 11:07:45 +08:00
Jin Hai	7c1bd9a5a5	Go CLI: switch to admin/api server (#15861 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> use admin SUCCESS RAGFlow(api/default)> use api 'abc'; SUCCESS ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-10 10:57:00 +08:00
writinwaters	9d9c2dc92c	Docs: Supported model providers and URLs updated (#15866 ) ### What problem does this PR solve? Updated supported model providers and the corresponding URLs. ~~Synced supported model providers and base URLs with llm_factories.json, while keeping the AI Badgr configuration example via the OpenAI-API-Compatible provider.~~ ### Type of change - [x] Documentation Update	2026-06-10 10:18:14 +08:00
Haruko386	d56aeb2f5d	feat[Go]: api datasets/<dataset_id>/documents/<document_id>/metadata/… (#15846 ) ### What problem does this PR solve? As title ``` /api/v1/datasets/<dataset_id>/documents/<document_id>/metadata/config PUT ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-06-10 09:57:11 +08:00
Haruko386	a396b1ace2	feat[Go]: implement /api/v1/agents/<agent_id> and test_db_connection (#15771 ) ### What problem does this PR solve? Add two API in go ``` /api/v1/agents/test_db_connection POST /api/v1/agents/<agent_id>/sessions DELETE ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-06-10 09:54:07 +08:00
Jack	87b8062df4	feat: implement POST /api/v1/searchbots/ask — streaming RAG with citations and think-tag processing (#15825 ) Implements POST /api/v1/searchbots/ask in Go with streaming SSE, citations, and think-tag processing. 23 files, 90+ unit tests. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 22:48:50 +08:00
Yingfeng	cf5cca5cbb	Fix wrong unit test path (#15864 )	2026-06-09 22:48:33 +08:00
Jack	2f99d52fb5	fix(ci): re-enable Go tests and fix compilation errors after ListModels signature change (#15862 ) ## Summary This PR re-enables the Go test steps in CI that were previously commented out, and fixes all compilation errors that have accumulated in `internal/entity/models/` since the `ListModels` return type was changed from `[]string` to `[]ListModelResponse`. ## Changes ### CI (`.github/workflows/tests.yml`) - Re-enable Prepare test resources step (clones resource repo with WordNet data) - Re-enable Test Go packages step (runs `go test ./internal/...`) - Fix resource path race condition by using `/tmp/resource-${GITHUB_RUN_ID}` instead of `/tmp/resource` - Exclude `/cli` package from Go tests (contains `main` redeclarations) ### Test fixes (16 model provider test files) All errors were caused by the upstream change from `[]string` to `[]ListModelResponse` in the `ListModels` interface: - Add `joinModelNames` test helper to extract `.Name` from `[]ListModelResponse` slices - `strings.Join(models, ",")` → `joinModelNames(models, ",")` (11 files) - `ids[i] != "..."` → `ids[i].Name != "..."` (cometapi, mistral) - `got[i] != want[i]` → `got[i].Name != want[i]` (bedrock) - `[]string` return types → `[]ListModelResponse` (google) ### Pre-existing bugs in model_test.go Bugs introduced by the upstream `entity/` → `entity/models/` directory rename: - Add missing `pm := GetProviderManager()` calls in 3 test functions - Fix `InitProviderManager` signature (`_, err :=` → `err :=`) - Fix `MaxTokens` `*int` dereference (6 comparisons) - Fix `readProviderConfig` relative path (3 levels up instead of 2) ### model.go - Add `findRepoRoot()` to make `conf/all_models.json` resolution work from any CWD, fixing `TestSiliconFlowProviderConfigLoadsLatestProModels` ### Test validation ```bash go build ./internal/... # ✅ go test ./internal/entity/models/... -count=1 # ✅ all pass ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 21:12:15 +08:00
cleanjunc	88e4d6bddb	Fix: restore GraphRAG entity ranking by indexing pagerank and n-hop paths (#15797 ) ### Summary Closes #15795 Knowledge-graph queries rank entities by `pagerank * sim` in `KGSearch`, but the entity chunks written at index time stopped carrying the values that ranking depends on. `graph_node_to_chunk` only stored `entity_type`, `description`, and `source_id`, dropping the node `pagerank` and the n-hop neighbour paths, while `search.py` still read them back as `rank_flt` and `n_hop_with_weight`. The producer of these fields, `update_nodes_pagerank_nhop_neighbour`, was removed in #6513, but the read side in `KGSearch` was never updated. The result is that on every knowledge-graph query: - `pagerank` resolves to `0`, so the `pagerank * sim` sort key is `0` for every entity and selection falls back to arbitrary order. - Every displayed entity score is `0.00`. - The n-hop relation-enrichment block is dead code because `n_hop_ents` is always empty, leaving `merge_tuples` and `is_continuous_subsequence` orphaned. This PR restores the missing index-time fields so the documented `P(E\|Q) = pagerank * sim` ranking and the n-hop enrichment work again. What changed: - `graph_node_to_chunk` now writes `rank_flt` from the node pagerank and `n_hop_with_weight` from the recomputed n-hop neighbour paths. - Reintroduced the n-hop path computation (`n_neighbor`) in `rag/graphrag/utils.py`, reusing the previously orphaned `merge_tuples` / `is_continuous_subsequence` helpers, with a direction-agnostic edge-weight lookup for undirected graphs. `set_graph` computes the paths per added or updated node and passes them through. - `KGSearch` now selects `n_hop_with_weight` in the entity keyword search so Infinity and OceanBase return it (Elasticsearch and OpenSearch already read it from `_source`), and the read is hardened against missing keys or empty strings before `json.loads`. - Added the `n_hop_with_weight` column to OceanBase, including the `EXTRA_COLUMNS` migration entry so existing tables get it. The other engines already map both fields via dynamic templates or the Infinity mapping. Scope note: pagerank and n-hop are re-indexed for the added or updated nodes in each pass, consistent with the existing incremental indexing design. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Testing Added unit tests in `test/unit_test/rag/graphrag/test_graphrag_utils.py`: - `n_neighbor`: path and weight shape, one-hop vs two-hop, isolated nodes, missing weights, and direction-agnostic lookup. - `graph_node_to_chunk`: `rank_flt` populated from pagerank and defaulting to `0`, `n_hop_with_weight` serialized and defaulting to an empty list. ``` uv run pytest test/unit_test/rag/graphrag/ # 106 passed uv run ruff check rag/graphrag/ rag/utils/ob_conn.py ```	2026-06-09 20:50:45 +08:00
ghost	64b860f771	fix(elasticsearch): complete Go result functions (#15148 ) ## Summary - Complete the Go Elasticsearch result functions that remained stubbed after #15160. - Add focused unit coverage for field mapping, aggregation, IDs, and highlighting behavior. - Update a stale query-builder test type import discovered during validation. ## What changed - Keep the Elasticsearch Go implementation merged in #15160 and fill in `GetFields`, `GetAggregation`, `GetHighlight`, and `GetDocIDs` in `internal/engine/elasticsearch/chunk.go`. - Add regression and invariant coverage in `internal/engine/elasticsearch/chunk_helpers_test.go`. - Update `internal/service/nlp/query_builder_test.go` to use the current `types.MatchTextExpr` type. ## Why - #15160 implemented the main Go Elasticsearch surface, but retrieval/tag flows still call result functions that returned stubs. - Completing these functions keeps Elasticsearch result processing aligned with the expected document-engine behavior for field extraction, tag aggregation, doc ID extraction, and snippet highlighting. ## Validation - `go test ./internal/engine/elasticsearch` - `GOARCH=arm64 CGO_ENABLED=1 go test ./internal/service/nlp -run TestQueryBuilder` - `git diff --check` - CodeRabbit review reported 0 issues after follow-up fixes. - Codex Security diff scan found no reportable issues. ## Notes - This PR is now a follow-up to #15160 rather than a competing implementation. - A full local `go test ./internal/service/nlp` run is blocked by local WordNet resource prerequisites; the query-builder tests touched by this PR pass with the arm64 CGO path.	2026-06-09 20:10:11 +08:00
balibabu	10bbe6b5d4	Fix: The variables in the Visual Input File of the agent operator are not displayed. (#15856 ) ### What problem does this PR solve? Fix: The variables in the Visual Input File of the agent operator are not displayed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 19:41:22 +08:00
JPette1783	acae932938	fix(go): guard four nil-pointer dereferences causing runtime panics (#15815 ) ### What problem does this PR solve? Fixes four Go paths that dereference a pointer with no prior nil check, each causing a runtime panic. Closes #15814. \| # \| File \| Bug \| Fix \| \|---\|------\|-----\|-----\| \| 1 \| `internal/entity/models/deepseek.go` \| streaming path runs `switch chatModelConfig.Effort` inside `if Thinking`; panics when `Thinking=true` and `Effort==nil` \| nil-check with default `"high"`, matching the non-streaming path in the same file \| \| 2 \| `internal/entity/models/volcengine.go` \| identical oversight: `switch modelConfig.Effort` with no guard \| nil-check with default `"medium"`, matching its non-streaming path \| \| 3 \| `internal/handler/auth.go` \| `AuthMiddleware` does `if user.IsSuperuser`; panics on every authenticated request when the DB column is `NULL` \| guard with `user.IsSuperuser != nil &&`, matching every other call site (`admin/handler.go`, `admin/service.go`, `user.go`) \| \| 4 \| `internal/service/heartbeat_sender.go` \| `responseBody["code"].(float64)` panics on any non-200 response lacking a numeric `code`; the upstream `recover()` calls `Fatal()` → `os.Exit(1)`, taking down the whole server \| comma-ok assertion (`code, ok := ...`); return an error instead of panicking \| - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 19:29:25 +08:00
Hz_	d4fe3bb148	feat(go-api): Add GET dataset metadata summary API (#15843 ) ## What Adds the RESTful dataset metadata summary endpoint: `GET /api/v1/datasets/{dataset_id}/metadata/summary` The endpoint supports optional document filtering through: `?doc_ids=doc_id_1,doc_id_2`	2026-06-09 19:27:47 +08:00
JPette1783	e050f1816e	fix(models): guard unsafe index access in Google and Ollama drivers (#15819 ) ### What problem does this PR solve? Fixes four panic / spurious-error paths in the Go model layer. Closes #15818. \| # \| File \| Bug \| Fix \| \|---\|------\|-----\|-----\| \| 1 \| \| Thinking-mode streaming path: accessed unconditionally; Gemini emits usage-only chunks with an empty slice, causing a runtime panic \| Guard each step: , , before indexing \| \| 2 \| \| is a plain for ordinary requests; the cast to silently returns , then panics immediately \| Switch on concrete type; handle both and \| \| 3 \| \| Identical panic on the streaming path \| Same switch-on-type fix \| \| 4 \| \| The field is optional (absent for non-thinking models) but the code returned an error when it was missing, breaking every ordinary Ollama completion \| Change to a silent comma-ok assertion; is empty string when the field is absent \| ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 19:26:52 +08:00
chanx	84482762d5	feat: support custom editing for model list (#15855 ) ### What problem does this PR solve? feat: support custom editing for model list ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-09 19:24:43 +08:00
Wang Qi	7ed1f1c865	Fix VLLM cannot add without /v1 (#15851 ) Fix VLLM cannot add without /v1	2026-06-09 19:11:15 +08:00
Jack	3eff41361b	fix: prevent None values in auto-metadata from causing KeyError (#15842 ) ## Problem When users configure auto-metadata for a dataset, parsing crashes with: ``` KeyError: 'properties' in gen_metadata → schema["properties"] ``` ## Root Cause Pydantic `AutoMetadataField` defaults `enum` and `description` to `None` when the frontend omits these fields: ```python class AutoMetadataField(Base): enum: Annotated[list[str] \| None, Field(default=None)] description: Annotated[str \| None, Field(default=None)] ``` These `None` values propagate through the call chain and cause two crashes:	2026-06-09 19:10:48 +08:00
Wang Qi	2773208159	Fix: MinerU cannot be added (#15841 ) Fix: MinerU cannot be added	2026-06-09 19:06:51 +08:00
Lynn	08a40711a0	Fix: model list (#15839 ) ### What problem does this PR solve? Dedup api_key and migrate `is_tools `in migration. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 19:06:31 +08:00
euvre	f97d6396b4	fix: BaiduYiyan API key validation fails in set_api_key (#15828 ) ### What problem does this PR solve? When setting the API key for the BaiduYiyan provider, all model validations fail with the error "Fail to access model using this api key. No valid response received". Root cause: 1. `BaiduYiyanChat` in `rag/llm/chat_model.py` does not override `async_chat_streamly()`. The `verify_api_key()` function uses `mdl.async_chat_streamly()` to validate, but `BaiduYiyanChat` inherits `Base.async_chat_streamly()` which uses the OpenAI client, not the Baidu Qianfan SDK (qianfan). Since BaiduYiyan has no OpenAI-compatible base_url, validation always fails. 2. `verify_api_key()` in `provider_api_service.py` does not format the raw API key string into the JSON format (`{"yiyan_ak": "...", "yiyan_sk": "..."}`) that `BaiduYiyanChat.__init__()` expects via `json.loads(key)`. Fix: 1. Add `async_chat_streamly()` method to `BaiduYiyanChat` using the qianfan SDK, consistent with the existing `chat_streamly()` method. 2. Add BaiduYiyan API key formatting in `provider_api_service.py` `verify_api_key()` to match the format expected by `BaiduYiyanChat.__init__()`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-06-09 19:05:58 +08:00
buua436	7b8d6f34b3	fix: force image parser json output (#15847 ) ### What problem does this PR solve? Force image parser runtime output format to JSON so downstream chunking reads OCR results from the JSON output and image parser chunks can be displayed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-09 19:02:37 +08:00
Jin Hai	719ce15c95	Go CLI: update list supported models (#15845 ) ### What problem does this PR solve? Now list supported models will show more info. ``` RAGFlow(api/default)> list supported models from 'gitee' 'test'; +-----------+------------+-------------+----------------------------------------------------------+---------------------------------------------+ \| dimension \| max_tokens \| model_types \| name \| thinking \| +-----------+------------+-------------+----------------------------------------------------------+---------------------------------------------+ \| \| \| \| Wan2.7 \| \| \| \| \| \| HappyHorse-1.0 \| \| \| \| \| \| Qwen3.6-27B@Qwen \| \| \| \| \| \| Qwen3.6-35B-A3B@Qwen \| \| \| \| 1048576 \| [chat] \| DeepSeek-V4-Flash@deepseek-ai \| map[clear_thinking:true default_value:true] \| \| \| 1048576 \| [chat] \| DeepSeek-V4-Pro@deepseek-ai \| map[clear_thinking:true default_value:true] \| +-----------+------------+-------------+----------------------------------------------------------+---------------------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-09 19:01:00 +08:00
Ju Boxiang	f0efa63bf2	fix: Remove trailing comma in CREATE TABLE tenant_model SQL (#15832 ) (#15836 ) ### What problem does this PR solve? The last column definition `INDEX idx_instance_id (instance_id),` in the `CREATE TABLE tenant_model` statement has a trailing comma, which causes a MySQL syntax error during deployment. Closes #15832 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### How was this tested? - [x] Visual inspection: the trailing comma on line 837 has been removed	2026-06-09 17:54:18 +08:00
buua436	c1496ffd43	fix: propagate memory tenant id in task collect (#15837 ) ### What problem does this PR solve? Propagate `tenant_id` from memory task messages into task collection so refactored task execution can build a valid context. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 17:47:48 +08:00
Hz_	d1c436b804	feat(api): implement `GET /api/v1/agents/prompts` endpoint in Go (#15748 ) ### Description This PR ports the `GET /api/v1/agents/prompts` endpoint from the Python backend to the Go backend. ### Changes Made - Handler: Added `GetPrompts` method to `internal/handler/agent.go`. - Router: Registered the `agents.GET("/prompts")` route in `internal/router/router.go`. - Logic: Leveraged the existing `service.LoadPrompt` utility to read `analyze_task_system`, `analyze_task_user`, `next_step`, `reflect`, and `citation_prompt` templates directly from the `rag/prompts` directory. - Unit Test: Added `TestGetPrompts_Success` to `internal/handler/agent_test.go` to mock the HTTP context and validate the JSON response structure. ### Motivation This is part of the ongoing effort to port the Agent API surface to Go. Since this specific endpoint only serves static prompt templates and does not require the complex DAG/Canvas execution engine, it can be seamlessly and safely handled by the Go backend. ### Testing - [x] Added automated unit test `TestGetPrompts_Success` (verified passing). - [x] Tested locally via `curl` against the Go server (port 9380) and Python server (port 9384). - [x] Verified that the Go JSON response structure and loaded prompt text are logically 100% identical to the Python implementation.	2026-06-09 17:03:42 +08:00
Yingfeng	01a2a44766	Clean CLI for filesystem (#15838 ) ### Type of change - [x] Refactoring	2026-06-09 17:00:10 +08:00
balibabu	287a4cfd2b	Fix: An error message appears when accessing the agent's launch page: "pagesize exceeds maximum value". (#15835 ) ### What problem does this PR solve? Fix: An error message appears when accessing the agent's launch page: "pagesize exceeds maximum value". ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-06-09 16:56:47 +08:00
Jonathan Chang	c586292993	feat: Implement checkpoint/resume support for GraphRAG community extraction and entity resolution (#15523 ) ## Summary This PR adds checkpoint/resume support for the GraphRAG `extract_community` and `resolve_entities` stages. The implementation stores successful intermediate results in the document store so interrupted ingestion can resume without repeating already-completed LLM work. Checkpoints are loaded before each stage, reused when available, saved after successful batch/community processing, and cleaned up after the stage completes successfully. ## Related Issue Closes: #15518 ## Change Type - [x] Feature - [x] Bug fix - [x] Test - [ ] Refactor - [ ] Documentation - [ ] Breaking change ## Real Behavior Proof Validation commands run locally: ```bash uv run python -m py_compile \ rag/graphrag/checkpoints.py \ rag/graphrag/general/community_reports_extractor.py \ rag/graphrag/entity_resolution.py \ rag/graphrag/general/index.py \ test/unit_test/rag/graphrag/test_checkpoints.py ``` Result: ```text Passed ``` ```bash uv run pytest test/unit_test/rag/graphrag/test_checkpoints.py ``` Result: ```text 4 passed ``` ```bash uv run pytest \ test/unit_test/rag/graphrag/test_phase_markers.py \ test/unit_test/rag/graphrag/test_graphrag_utils.py \ test/unit_test/rag/graphrag/test_checkpoints.py ``` Result: ```text 95 passed ``` ```bash git diff --check ``` Result: ```text Passed ``` ## Checklist - [x] Implemented checkpoint/resume support for `extract_community`. - [x] Implemented checkpoint/resume support for `resolve_entities`. - [x] Avoided touching unrelated API behavior. - [x] Added unit tests for the new checkpoint helper logic. - [x] Verified Python syntax compilation. - [x] Ran related GraphRAG unit tests successfully. - [x] Ran `git diff --check`. - [ ] Ran full project test suite. --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-09 15:34:47 +08:00
Jin Hai	d02eb6b596	Go: refactor CLI (#15728 ) ### What problem does this PR solve? ``` RAGFlow(user)> add api server 'ccc' host '127.0.0.1:9980'; SUCCESS RAGFlow(user)> list api server; +------------+---------------+-----------------+---------+-------------+---------------+ \| api_server \| api_server_ip \| api_server_port \| auth \| user_name \| user_password \| +------------+---------------+-----------------+---------+-------------+---------------+ \| ccc \| 127.0.0.1 \| 9980 \| no auth \| \| \| \| default \| 127.0.0.1 \| 9384 \| login \| aaa@aaa.com \| * \| +------------+---------------+-----------------+---------+-------------+---------------+ RAGFlow(user)> delete api server 'ccc'; SUCCESS RAGFlow(user)> list api server; +------------+---------------+-----------------+---------+ \| api_server \| api_server_ip \| api_server_port \| auth \| +------------+---------------+-----------------+---------+ \| default \| 127.0.0.1 \| 9384 \| no auth \| +------------+---------------+-----------------+---------+ RAGFlow(user)> show admin server; +--------------+-------+ \| field \| value \| +--------------+-------+ \| admin_server \| N/A \| +--------------+-------+ RAGFlow(user)> add admin server host '127.0.0.1:9880'; SUCCESS RAGFlow(user)> show admin server; +-------------------+-----------+ \| field \| value \| +-------------------+-----------+ \| admin_server_ip \| 127.0.0.1 \| \| admin_server_port \| 9880 \| \| auth \| no auth \| +-------------------+-----------+ RAGFlow(user)> delete admin server; SUCCESS RAGFlow(user)> show admin server; +--------------+-------+ \| field \| value \| +--------------+-------+ \| admin_server \| N/A \| +--------------+-------+ RAGFlow(user)> show current +-----------------+-------------+ \| field \| value \| +-----------------+-------------+ \| api_server_port \| 9384 \| \| user_name \| aaa@aaa.com \| \| user_password \| * \| \| mode \| api \| \| verbose \| false \| \| api_server \| default \| \| api_server_ip \| 127.0.0.1 \| \| auth \| login \| \| output \| table \| \| interactive \| true \| +-----------------+-------------+ ``` ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-09 15:22:50 +08:00
Lynn	1ab51a27bf	Fix: list intl Tongyi-Qianwen base_url (#15831 ) ### What problem does this PR solve? Display intl `base_url` for Tongyi-Qianwen ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 13:19:39 +08:00
chanx	298a23f74c	fix: resolve issue where some models do not use modelInfo parameter (#15830 ) ### What problem does this PR solve? fix: resolve issue where some models do not use modelInfo parameter ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 13:18:01 +08:00
Wang Qi	9c0cc77e35	Fix empty response set not take effect (#15824 ) Fix empty response set not take effect	2026-06-09 13:06:58 +08:00
Idriss Sbaaoui	faf78d3069	Fix: OceanBase tenant startup drift and docs (#15829 ) ### What problem does this PR solve? OceanBase could start without the `ragflow` tenant, so RAGFlow failed to connect with `root@ragflow`. This PR adds a safe startup reconcile step and documents the required host limits before using OceanBase. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Documentation Update	2026-06-09 13:04:05 +08:00
Wang Qi	93e4f6bc09	Fix: Add bge as embedding (#15784 ) Fix: Add bge as embedding	2026-06-09 09:31:24 +08:00
DearsisHS	d94b8c14cb	fix(api): await asyncio.wait_for in verify_api_key embedding branch (#15620 ) ## Summary The embedding branch of `verify_api_key` was missing `await` on `asyncio.wait_for(...)`, so valid embedding-only providers always failed API-key verification. ## Root cause `arr, tc = asyncio.wait_for(...)` (no `await`) returns a coroutine; unpacking it raises `TypeError: cannot unpack non-iterable coroutine`, which the `except` swallows as a failure. The chat branch (`await asyncio.wait_for(check_streamly())`) and rerank branch (`arr, tc = await asyncio.wait_for(...)`) already `await` correctly. ## Fix ```diff - arr, tc = asyncio.wait_for( + arr, tc = await asyncio.wait_for( asyncio.to_thread(mdl.encode, ["Test if the api key is available"]), timeout=timeout_seconds, ) ``` ## Files changed - `api/apps/services/provider_api_service.py` ## Verification - `ruff check` — clean - Fix mirrors the already-correct chat/rerank branches in the same function. Local full pytest not run (heavy RAG deps); CI validates. ## Note Implemented with LLM assistance (model: claude-opus-4-8). Closes #15619 Co-authored-by: dearsishs <MCarter112116@outlook.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 23:01:36 +08:00
DearsisHS	cbb3896aaa	fix(api): guard missing row in SearchService.get_detail (#15622 ) ## Summary `SearchService.get_detail` crashed with `AttributeError` (HTTP 500) when no matching row existed, because it called `.first().to_dict()` before the `if not search` guard — making that guard dead code. ## Root cause `.first()` returns `None` when the query matches nothing (deleted search app, or joined `User` not `VALID`). `None.to_dict()` raises before the guard runs. ## Fix ```diff .first() - .to_dict() ) if not search: return {} - return search + return search.to_dict() ``` Guard the `None` first, then serialize — restoring the intended `{}` "not found" return that every caller (`search_api`, `bot_api`, `chat_api`, `dataset_api_service`) already handles. ## Files changed - `api/db/services/search_service.py` ## Verification - `ruff check` — clean - Logic: `.first()` → `None` now hits `return {}` instead of `None.to_dict()`. Local full pytest not run (heavy RAG deps); CI validates. ## Note Implemented with LLM assistance (model: claude-opus-4-8). Closes #15621 Co-authored-by: dearsishs <MCarter112116@outlook.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 23:01:28 +08:00
Jin Hai	55abf4f565	Go: new CLI command, list all models and show model (#15786 ) ### What problem does this PR solve? ``` RAGFlow(user)> list models; +---------------------------+------------+-------------+--------------------+---------------------------------------------+ \| alias \| max_tokens \| model_types \| name \| thinking \| +---------------------------+------------+-------------+--------------------+---------------------------------------------+ \| \| 1048576 \| [chat] \| deepseek-v4-flash \| map[clear_thinking:true default_value:true] \| \| \| 1048576 \| [chat] \| deepseek-v4-pro \| map[clear_thinking:true default_value:true] \| \| \| 1024000 \| [chat] \| minimax-m3 \| map[clear_thinking:true default_value:true] \| \| \| 64000 \| [vision] \| glm-4.5v \| map[clear_thinking:true default_value:true] \| \| [baai/bge-m3] \| 8192 \| [embedding] \| bge-m3 \| \| \| [baai/bge-reranker-v2-m3] \| 1024 \| [rerank] \| bge-reranker-v2-m3 \| \| \| \| \| [tts] \| step-audio-tts-3b \| \| \| [qwen/qwen3-asr-1.7b] \| \| [asr] \| qwen3-asr-1.7b \| \| \| [paddleocr-vl-1.5] \| \| [ocr] \| paddleocr-vl-0.9b \| \| +---------------------------+------------+-------------+--------------------+---------------------------------------------+ RAGFlow(user)> show model 'minimax-m3'; +--------------+---------------------------------------------+ \| field \| value \| +--------------+---------------------------------------------+ \| name \| minimax-m3 \| \| max_tokens \| 1024000 \| \| model_types \| [chat] \| \| thinking \| map[clear_thinking:true default_value:true] \| \| class \| \| \| alias \| \| \| ModelTypeMap \| \| +--------------+---------------------------------------------+ RAGFlow(user)> show model 'baai/bge-m3'; +--------------+---------------+ \| field \| value \| +--------------+---------------+ \| model_types \| [embedding] \| \| thinking \| \| \| class \| \| \| alias \| [baai/bge-m3] \| \| ModelTypeMap \| \| \| name \| bge-m3 \| \| max_tokens \| 8192 \| +--------------+---------------+ ``` --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-08 21:38:15 +08:00
Jack	35527f6755	fix: guard http.DefaultTransport type assertion in xiaomi for Go 1.25 (#15787 ) ## Problem `TestXiaomiNewModelWithCustomDefaultTransport` panics on Go 1.25: ``` panic: interface conversion: http.RoundTripper is models.roundTripperFunc, not http.Transport ``` In Go 1.25, `http.DefaultTransport` is no longer `http.Transport`, so the unchecked type assertion in `NewXiaomiModel` panics when the test replaces it with a `roundTripperFunc`. ## Fix Use a safe type assertion with fallback to a new `http.Transport`, matching the pattern already used in `modelscope.go`. ## Verification ```bash go test -run TestXiaomiNewModelWithCustomDefaultTransport ./internal/entity/models/... # PASS ``` Internal contributors only. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 21:11:21 +08:00
Yash Raj Pandey	f2aadd3871	Fix: is_english() returns False for any list argument (broken language detection) (#15489 ) ### What problem does this PR solve? `is_english()` in `rag/nlp/__init__.py` compiles a single-character regex class and `fullmatch`es it against each item: ```python pattern = re.compile(r"[`a-zA-Z0-9\s.,':;/\"?<>!\-]") # no quantifier ... eng = sum(1 for t in texts if pattern.fullmatch(t.strip())) ``` For a string argument the text is first split into single characters (`texts = list(texts)`), so each `fullmatch` sees one character and works. But for a list argument each item is a whole multi-character string, and `fullmatch` of a one-character pattern against a multi-character string always fails — so `is_english()` returns `False` for any list, regardless of content. ```python is_english("This is English") # True (ok) is_english(["The quick brown fox jumps.", "Hello world."]) # False (bug — should be True) is_english(["这是中文。"]) # False (right answer, wrong reason) ``` Many call sites pass lists and were therefore silently always-`False`, e.g.: - `rag/llm/chat_model.py:1088`, `rag/llm/cv_model.py:168,1155` — `is_english([ans])` when an answer is truncated at `max_tokens`, so an English reply gets the Chinese "······由于长度的原因，回答被截断了，要继续吗？" continuation suffix instead of the English one. - `rag/app/book.py` — `remove_contents_table(..., eng=is_english([...sections...]))`, so English books have their contents table stripped in Chinese mode. - `common/doc_store/es_conn_base.py:339`, `rag/utils/opensearch_conn.py:733` — `is_english(txt.split())` in highlight handling. - plus `rag/app/qa.py`, `rag/flow/parser/utils.py`, `common/doc_store/infinity_conn_base.py`. ### Fix Add a `+` quantifier so an all-English multi-character item matches: ```python pattern = re.compile(r"[`a-zA-Z0-9\s.,':;/\"?<>!\-]+") ``` The string path is unchanged (single characters still match) and non-English lists still return `False`. Adds `test/unit_test/rag/test_is_english.py`; the two list cases fail before this change and pass after. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Used the Claude CLI while working on this.	2026-06-08 20:25:23 +08:00
Lynn	b9f06e6095	Feat: model list (#15774 ) ### What problem does this PR solve? Support model list for VolcEngine. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-08 20:18:00 +08:00
Jack	338fdb65fb	feat(ci): enable go test in CI pipeline (#15750 ) ## What problem does this PR solve? Go test files are never compiled in CI — only production binaries via `go build`. This allowed a missing `"sort"` import in `metadata_filter_test.go` to be merged without detection. ## Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) ## Changes - Add `go test -count=1 ./internal/...` step after Go build in CI workflow - Fix missing `"sort"` import in `metadata_filter_test.go` (pre-existing compile error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 20:06:57 +08:00

1 2 3 4 5 ...

6670 Commits