ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 15:31:05 +08:00

Author	SHA1	Message	Date
Wang Qi	985e3c1db5	Fix document progress not set to fail when embedding model error (#16381 )	2026-06-26 16:11:54 +08:00
Öndery	8081a77c7c	Fix missing move and copy methods in Python RAGFlowS3 storage implementation (#16350 )	2026-06-26 15:51:24 +08:00
Jin Hai	2667995b25	Go CLI: Fix show model and list models (#16380 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> show model 'WiseDiag-Z1 Think'; RAGFlow(api/default)> list models; RAGFlow(admin)> show model 'WiseDiag-Z1 Think'; RAGFlow(admin)> list models; ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 15:36:01 +08:00
Hz_	0de8f3e127	feat: add missing qwen models to all_models.json (#16379 ) Add 19 missing qwen models and 3 aliases to all_models.json. Models added: qwen-image-2.0-pro (2026-06-22, 2026-04-22), qwen3.5-ocr, qwen3.7-max-2026-05-17, qwen3.5-livetranslate-flash-realtime, qwen3.5-omni-plus/flash-realtime, qwen-deep-research-2025-12-15, qwen-flash-character-2026-02-26, qwen-plus-2025-11-05, qwen-deep-search-planning, qwen3-s2s-flash-realtime-2025-09-22, qwen-max-1201/longcontext/0107, qwen-1.8b-longcontext-chat Aliases: qwen3.5-plus-2026-04-20, qwen-turbo-0919, qwen-1.8b-chat	2026-06-26 15:35:30 +08:00
writinwaters	5af798607e	Docs: Added v0.26.2 release notes. (#16373 )	2026-06-26 15:18:54 +08:00
Jin Hai	8bc27d8df1	Go CLI: fix show variable (#16370 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> show var 'mail.port'; +-----------+-----------+--------------+-------+ \| data_type \| name \| setting_type \| value \| +-----------+-----------+--------------+-------+ \| integer \| mail.port \| config \| 30 \| +-----------+-----------+--------------+-------+ ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 13:51:56 +08:00
Jin Hai	65afaa1292	Model config: add tools (#16371 ) ### What problem does this PR solve? ``` { "name": "glm-4-flash", "max_tokens": 128000, "model_types": [ "chat" ], "tools": { "support": true } } ``` ``` RAGFlow(admin)> list provider 'zhipu-ai' models; +------------+---------------+------------+---------------+----------------+-----------+-----------+ \| dimensions \| max_dimension \| max_tokens \| model_type \| name \| thinking \| tools \| +------------+---------------+------------+---------------+----------------+-----------+-----------+ \| \| \| 204800 \| [chat] \| glm-5 \| supported \| supported \| \| \| \| 204800 \| [chat] \| glm-5-turbo \| supported \| supported \| ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 11:37:51 +08:00
Jack	70250ec88c	Fix: remove deepdoc dep (#16372 ) dev-20260626	2026-06-26 11:32:16 +08:00
Yash Raj Pandey	dd2c88b768	fix(excel_parser): keep zero-valued cells when building Excel text chunks (#16287 )	2026-06-26 09:30:09 +08:00
Jin Hai	58da1d6bc3	Go CLI: fix model related commands (#16368 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> show provider 'zhipu-ai' RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test'; RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test' balance; RAGFlow(api/default)> show provider 'zhipu-ai' model 'glm-4.5'; ``` ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 07:07:49 +08:00
Jin Hai	dbefadd86a	Go CLI: refactor (#16355 )	2026-06-25 20:36:50 +08:00
Jack	304d9e02bb	Refactor: migrate pdf_parser.py to golang (#16323 ) ### What problem does this PR solve? Http API based on onnx model. pdf_parser.py to golang ### Type of change - [x] Refactoring	2026-06-25 20:16:16 +08:00
Harsh Kashyap	c7052f4dd1	fix(rag/nlp): treat string input as one phrase in is_english (#16308 )	2026-06-25 20:07:09 +08:00
Wang Qi	5defb4e7d6	Revert "fix(deepdoc): keep zero and false Excel cells in __call__" (#16366 ) Reverts infiniflow/ragflow#16318	2026-06-25 19:56:47 +08:00
Harsh Kashyap	8d3c3f868c	fix(api): validate immutable document fields when value is zero (#16309 )	2026-06-25 19:29:12 +08:00
Harsh Kashyap	66d86154ab	fix(deepdoc): accept GFM table separators with one or more dashes (#16319 )	2026-06-25 19:25:57 +08:00
Hz_	e290a0d23e	feat(go-api): Langfuse API key migration behavior (#16356 ) ## Summary - Align Langfuse API key set/get/delete behavior with the Python implementation. - Improve DAO handling for Langfuse credential save/delete flows. - Add tests for Langfuse service error handling and API key lifecycle behavior.	2026-06-25 19:25:55 +08:00
Yoorim Choi	46b97bd1a1	fix(web): fix layout issues with text, overflow, and spacing consistency (#16324 )	2026-06-25 19:25:32 +08:00
cleanjunc	e8bb534b90	fix: naive_merge splits oversized sections and counts overlap tokens correctly (#15802 )	2026-06-25 19:19:38 +08:00
Harsh Kashyap	0af5d43e8d	fix(deepdoc): keep zero and false Excel cells in __call__ (#16318 )	2026-06-25 19:12:57 +08:00
Haruko386	43b96223b4	feat[go]: add router for connectors/<connector_id> PATCH (#16358 ) ### What problem does this PR solve? As title /api/v1/connectors/<connector_id> PATCH was implemented in #15512 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-06-25 19:07:52 +08:00
Haruko386	74597b8683	feat[Go]: implemet api: Search/Get/Update-Messages (#16307 ) ### What problem does this PR solve? As title: implement: ``` /api/v1/messages/search GET /api/v1/messages GET /api/v1/messages/<memory_id>:<message_id>/content GET /api/v1/memories/<memory_id>/config GET /api/v1/messages/<memory_id>:<message_id> PUT ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-25 19:07:34 +08:00
Harsh Kashyap	49312cace3	fix(api): align use_sql Markdown separator with Source header (#16317 )	2026-06-25 19:00:01 +08:00
balibabu	1dfc24003b	Fix: An empty message notification pops up at the top of the agent conversation. (#16353 )	2026-06-25 17:32:24 +08:00
Wang Qi	31e50b164f	Fix [ID:0] not converted to Fig. 1 (#16357 )	2026-06-25 17:17:46 +08:00
Wang Qi	ac9469e5f5	Fix add VLLM without apikey will fail (#16352 )	2026-06-25 17:17:29 +08:00
Wang Qi	97c519662a	Add env ALLOW_ANY_HOST to skip host check (#16351 )	2026-06-25 17:17:02 +08:00
maoyifeng	6e7aa75e71	Go:CLI add new response function (#16347 ) ### What problem does this PR solve? add new response function ### Type of change - [ ] New Feature (non-breaking change which adds functionality)	2026-06-25 16:49:47 +08:00
Yash Raj Pandey	091417980e	fix(html_parser): preserve original text when splitting oversized blocks (#16052 ) ### Bug `RAGFlowHtmlParser.chunk_block()` splits an oversized block by slicing the tokenized string and storing the joined tokens: ```python tks_str = rag_tokenizer.tokenize(block) ... tokens = tks_str.split(" ") while start < len(tokens): chunks.append(" ".join(tokens[start:start + chunk_token_num])) # tokenized form, not source ``` On the default (Elasticsearch) backend `rag_tokenizer.tokenize` transforms text: it lowercases/stems Latin words and inserts spaces between CJK characters. So any text block longer than `chunk_token_num` is stored as garbled, lowercased, space-segmented text instead of the source content. The small-block branch correctly stores the original `block`, so only oversized blocks are corrupted. Affects HTML and EPUB ingestion (both go through `chunk_block`), degrading retrieved chunks and the answers generated from them. ### Real tokenizer behavior (infinity-sdk 0.7.0, ES backend) ``` tokenize("Hello World FOO Bar Baz Qux Jumps") -> "hello world foo bar baz qux jump" # lowercased + stemmed tokenize("你好世界这是一个测试") -> "你好世界这是一个测试" # spaces inserted ``` ### Fix Split the original text: break it into atoms (whitespace-delimited runs for space-separated scripts, per-character for spaceless scripts such as Chinese) and pack them into pieces of at most `chunk_token_num` tokens. This preserves the source characters and still splits scripts that have no whitespace — a plain whitespace split would leave CJK as one un-splittable chunk. ### Proof (real tokenizer, before/after) Running the old vs new split against the real `infinity.rag_tokenizer`: ``` ENGLISH "Hello World FOO Bar Baz Qux Lazy Dogs" (chunk_token_num=4) OLD: ['hello world foo bar', 'baz qux jump over', 'lazi dog'] # lowercased + stemmed NEW: ['Hello World FOO Bar ', 'Baz Qux Jumps Over ', 'Lazy Dogs'] # preserved; each <= 4 tokens NEW preserves text exactly: True CHINESE "你好世界这是一个测试用例需要被切分成多个块" (chunk_token_num=3) OLD: ['你好世界这是', '一个测试用例需要', ...] # spurious spaces NEW: ['你好世', '界这是', '一个测', ...] # preserved; each <= 3 tokens NEW preserves text exactly: True ``` ### Tests Added `test/unit_test/deepdoc/parser/test_html_parser.py` (English + Chinese oversized blocks, plus small-block merge). Before the fix the two oversized tests fail (English shows lowercasing, Chinese shows inserted spaces); after the fix all pass. `ruff check` clean.	2026-06-25 16:43:35 +08:00
Jin Hai	edfa9be67f	Go CLI: fix list provider instance tasks (#16345 )	2026-06-25 15:49:31 +08:00
balibabu	3f3a2ece3d	Fix: Flexible Chat Configuration (#16293 )	2026-06-25 14:56:30 +08:00
Muhammad Furqan	fe14cc35cf	fix(agent/tools): DeepL component fails validation and drops errors (#16332 ) ### What problem does this PR solve? `DeepLParam.check()` validated `self.top_n`, but DeepL has no such parameter (it is not defined on the param class or its base), so `check()` always raised `AttributeError` and a DeepL component could never pass validation. Removed the bogus `top_n` check. Also fixed the `_run` except branch, which computed `be_output("Error...")` but never returned it, silently dropping the error message. Closes #16329 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Add test cases ### Testing Added `test/unit_test/agent/component/test_deepl.py` covering `DeepLParam.check()` with valid defaults and rejection of invalid source/target languages.	2026-06-25 14:40:56 +08:00
Harsh Kashyap	09047d6edf	fix(web): bump lodash past vulnerable range (#16281 )	2026-06-25 14:40:39 +08:00
Idriss Sbaaoui	fb8e5ad4b2	Fix multimodal chat image routing for VLM channel requests (#16343 )	2026-06-25 14:38:29 +08:00
Muhammad Furqan	3747a6bfeb	fix(agent/tools): PubMed tool always returns "Unknown Authors" (#16330 ) ### What problem does this PR solve? Fixes the PubMed tool always emitting `Authors: Unknown Authors`. The `safe_find` closure in `_format_pubmed_content` was hardcoded to search from the article root, so the per-author `LastName`/`ForeName` lookups never matched. `safe_find` now accepts an optional `base` node (defaults to `child`, preserving the existing field lookups), and the author loop passes the current `<Author>` element. Closes #16328 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Add test cases ### Testing Added `test/testcases/test_web_api/test_canvas_app/test_pubmed_unit.py` covering per-author parsing, intact title/journal/DOI fields, and the no-authors fallback. Before: `Authors: Unknown Authors` After: `Authors: Furqan Khan, Jane Smith`	2026-06-25 14:34:37 +08:00
Harsh Kashyap	b9445c67e2	fix(agent): coerce None Switch inputs before string operators (#16320 ) ## Summary - Coerce `None` canvas values to `""` before string comparison operators in `Switch.process_operator`. - Prevents `AttributeError` when upstream components yield `None` and the Switch uses contains/start with/end with. ## Test plan - [x] `.v/bin/python -m ruff check agent/component/switch.py test/unit_test/agent/component/test_switch.py` - [x] `.v/bin/python -m pytest test/unit_test/agent/component/test_switch.py -q` (3 passed) Fixes #16315 --------- Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local>	2026-06-25 14:18:24 +08:00
Hz_	54fb5b0fa7	feat(go-api): add Go support for POST /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks (#16256 ) ## Summary Add the Go implementation of `POST /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks`. This wires the full create-chunk path in Go: - router and handler registration - request/response structs - chunk creation service logic - embedding generation - chunk insert into doc engine - chunk/token counter increment - `tag_feas` validation - `image_base64` decoding and chunk image storage/merge - unit tests for handler and service ## Testing Unit tests: - `/usr/local/go/bin/go test ./internal/handler` - `/usr/local/go/bin/go test ./internal/service/chunk` - `/usr/local/go/bin/go test ./internal/service` - `/usr/local/go/bin/go test ./...` All passed locally. Manual curl checks: - basic text chunk: Go passed - chunk with `important_keywords` / `questions` / `tag_kwd` / `tag_feas`: Go passed - blank content validation: Go matched expected `code=102` - invalid `image_base64` validation: Go matched expected `code=102` - image upload and repeated image upload / merge path: Go passed twice	2026-06-25 14:15:29 +08:00
chanx	d44359826d	fix(web): agent log refetch and slider percentage rounding (#16344 )	2026-06-25 13:49:25 +08:00
Jin Hai	17b066e6ae	Go CLI: fix list dataset files by dataset name (#16341 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> list dataset 'ccc' files; Total: 1 ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-25 13:41:58 +08:00
Hz_	a6cc3023c5	feat(go-api): implement dataset document upload API (#16295 ) ## Summary Migrated the dataset document upload API (`POST /api/v1/datasets/:dataset_id/documents`) from Python to the Go backend. It supports local file uploads (`type=local`), web page ingestion (`type=web`), and empty document creation (`type=empty`). ## Changes - Router: Registered `POST /api/v1/datasets/:dataset_id/documents` route. - Handler: Implemented `UploadDocuments` handler and its routing functions (`uploadLocalDocuments`, `uploadWebDocument`, `uploadEmptyDocument`). - Service: Implemented `UploadLocalDocuments`, `UploadWebDocument`, and `UploadEmptyDocument` in `DocumentService`. - Refactoring: Moved permission checking logic to a shared helper for reuse in file and document services. - Tests: Added comprehensive unit tests for the new handler and service upload paths. ## Verification Ran and passed the test suite for service and handler packages: - `go test ./internal/service` - `go test ./internal/handler`	2026-06-25 13:36:49 +08:00
Hz_	ced51114f4	feat(go-api): add dataset search endpoint (#16304 ) ### What problem does this PR solve? - added the new dataset search route and handler - reused the existing shared SearchDatasets service by adapting single-dataset requests into dataset_ids=[dataset_id] - aligned handler error responses with Python behavior for argument/data errors - aligned key service error messages such as invalid search_id and mixed embedding models - added focused handler and service tests for request mapping and error behavior ### Tests: `/usr/local/go/bin/go test ./internal/service -run 'TestSearchDatasetRequestToSearchDatasetsRequest\|TestDatasetServiceSearchDatasets'` `/usr/local/go/bin/go test ./internal/handler -run 'TestDatasetsHandlerSearchDataset'`	2026-06-25 13:32:22 +08:00
Willsgao	824c88423c	fix(agent): log Wikipedia disambiguation and page errors instead of s… (#16207 ) ## Problem The Wikipedia tool silently swallows all exceptions with `except Exception: pass`, making it impossible to debug failures when fetching Wikipedia pages. ## Fix Replace the bare `except Exception: pass` with specific exception handling: - `DisambiguationError`: log available options - `PageError`: log page not found - `Exception`: log unexpected errors with full traceback Co-authored-by: wills <willsgao@163.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-25 13:10:29 +08:00
buua436	479a9a715e	feat: unify provider id or name routing (#16336 )	2026-06-25 13:04:21 +08:00
Wang Qi	d0fc75f1bb	Fix when empty response not set, it report: ERROR: 'knowledge' (#16338 )	2026-06-25 13:02:24 +08:00
Ilya Bogin	10d02e54a8	Add Keenable web search tool to the agent (#16233 ) Adds Keenable as a web search tool in the agent, alongside the existing Tavily/DuckDuckGo/SearXNG/Google tools. The main difference from the other search tools is that it doesn't require an API key. By default it uses Keenable's keyless public endpoint, so it works out of the box. Providing a key (in the tool config) switches to the authenticated endpoint and lifts the rate limits. ### Changes - Backend: `agent/tools/keenable.py` — `KeenableSearch`, follows the Tavily/DuckDuckGo tool shape (results go through `_retrieve_chunks`). Auto-registered by `agent/tools/__init__.py`. - Frontend: wired into the agent builder — operator + icon, config form (optional API key, search mode, site filter, top N), the search tool menu, and the existing api_key export sanitizer. ### Config - API key: optional. Blank = keyless free tier; set it to lift limits / enable `realtime` mode. - `site`: restrict to a single domain. - `mode`: `pro` (default) or `realtime`. ### Notes `KEENABLE_API_URL` can override the API base (HTTPS enforced; defaults to `https://api.keenable.ai`). The tool only sends the query (no URL fetch), so there's no SSRF surface. Verified the frontend with `vite build` and the backend search path against the public endpoint.	2026-06-25 12:12:28 +08:00
Jin Hai	06d45c50cb	Example: list_datasets.sh (#16335 ) ### Type of change - [x] Other (please describe): example Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-25 10:36:07 +08:00
Jin Hai	7ef4a4a06a	Go CLI: list provider instance models, sync and list provider (#16311 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> list provider 'zhipu-ai' instance 'test' models sync; +------------+---------------+------------+-------------+------------------+---------------------------------------------+ \| dimensions \| max_dimension \| max_tokens \| model_types \| name \| thinking \| +------------+---------------+------------+-------------+------------------+---------------------------------------------+ \| \| \| 128000 \| [chat] \| glm-4.5@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 128000 \| [chat] \| glm-4.5-air@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-4.6@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-4.7@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-5@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 200000 \| [chat] \| glm-5-turbo@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-5.1@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| \| [chat] \| glm-5.2@z-ai \| \| +------------+---------------+------------+-------------+------------------+---------------------------------------------+ RAGFlow(api/default)> list provider 'zhipu-ai' instance 'test' models; RAGFlow(api/default)> list dataset 'aaa' ingestion tasks; RAGFlow(api/default)> list dataset '0abe79f9423311f1ad8d38a74640adcc' documents; ``` --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-25 10:01:21 +08:00
Yingfeng	5b0b86c276	More resilient graph engine (#16325 ) ### What problem does this PR solve? - OpenTelemetry integration - Checkpoint conformance tests - State inspector API - Callbacks - A series of fault injection tests - Pregel integration tests ### Type of change - [x] Refactoring dev-20260625	2026-06-24 23:05:07 +08:00
Haruko386	dd46ece3bc	feat[go]: datasets/<dataset_id>/chunks DELETE (#16185 ) ### What problem does this PR solve? As title: `documents.POST("/ingest", r.documentHandler.Ingest)`: --- <img width="3750" height="2039" alt="image" src="https://github.com/user-attachments/assets/533c1c3d-af3e-47e6-9f51-a278539b7066" /> `datasets.DELETE("/:dataset_id/chunks", r.chunkHandler.StopParsing)` --- <img width="3621" height="2040" alt="image" src="https://github.com/user-attachments/assets/022adcdb-1e47-4883-9611-1a695c34007d" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-24 19:43:18 +08:00
Haruko386	c2665d4ab1	implement: <dataset_id>/embedding/check POST (#16266 )	2026-06-24 19:09:43 +08:00

1 2 3 4 5 ...

6973 Commits