ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Harsh Kashyap	0af5d43e8d	fix(deepdoc): keep zero and false Excel cells in __call__ (#16318 )	2026-06-25 19:12:57 +08:00
Haruko386	43b96223b4	feat[go]: add router for connectors/<connector_id> PATCH (#16358 ) ### What problem does this PR solve? As title /api/v1/connectors/<connector_id> PATCH was implemented in #15512 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-06-25 19:07:52 +08:00
Haruko386	74597b8683	feat[Go]: implemet api: Search/Get/Update-Messages (#16307 ) ### What problem does this PR solve? As title: implement: ``` /api/v1/messages/search GET /api/v1/messages GET /api/v1/messages/<memory_id>:<message_id>/content GET /api/v1/memories/<memory_id>/config GET /api/v1/messages/<memory_id>:<message_id> PUT ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-25 19:07:34 +08:00
Harsh Kashyap	49312cace3	fix(api): align use_sql Markdown separator with Source header (#16317 )	2026-06-25 19:00:01 +08:00
balibabu	1dfc24003b	Fix: An empty message notification pops up at the top of the agent conversation. (#16353 )	2026-06-25 17:32:24 +08:00
Wang Qi	31e50b164f	Fix [ID:0] not converted to Fig. 1 (#16357 )	2026-06-25 17:17:46 +08:00
Wang Qi	ac9469e5f5	Fix add VLLM without apikey will fail (#16352 )	2026-06-25 17:17:29 +08:00
Wang Qi	97c519662a	Add env ALLOW_ANY_HOST to skip host check (#16351 )	2026-06-25 17:17:02 +08:00
maoyifeng	6e7aa75e71	Go:CLI add new response function (#16347 ) ### What problem does this PR solve? add new response function ### Type of change - [ ] New Feature (non-breaking change which adds functionality)	2026-06-25 16:49:47 +08:00
Yash Raj Pandey	091417980e	fix(html_parser): preserve original text when splitting oversized blocks (#16052 ) ### Bug `RAGFlowHtmlParser.chunk_block()` splits an oversized block by slicing the tokenized string and storing the joined tokens: ```python tks_str = rag_tokenizer.tokenize(block) ... tokens = tks_str.split(" ") while start < len(tokens): chunks.append(" ".join(tokens[start:start + chunk_token_num])) # tokenized form, not source ``` On the default (Elasticsearch) backend `rag_tokenizer.tokenize` transforms text: it lowercases/stems Latin words and inserts spaces between CJK characters. So any text block longer than `chunk_token_num` is stored as garbled, lowercased, space-segmented text instead of the source content. The small-block branch correctly stores the original `block`, so only oversized blocks are corrupted. Affects HTML and EPUB ingestion (both go through `chunk_block`), degrading retrieved chunks and the answers generated from them. ### Real tokenizer behavior (infinity-sdk 0.7.0, ES backend) ``` tokenize("Hello World FOO Bar Baz Qux Jumps") -> "hello world foo bar baz qux jump" # lowercased + stemmed tokenize("你好世界这是一个测试") -> "你好世界这是一个测试" # spaces inserted ``` ### Fix Split the original text: break it into atoms (whitespace-delimited runs for space-separated scripts, per-character for spaceless scripts such as Chinese) and pack them into pieces of at most `chunk_token_num` tokens. This preserves the source characters and still splits scripts that have no whitespace — a plain whitespace split would leave CJK as one un-splittable chunk. ### Proof (real tokenizer, before/after) Running the old vs new split against the real `infinity.rag_tokenizer`: ``` ENGLISH "Hello World FOO Bar Baz Qux Lazy Dogs" (chunk_token_num=4) OLD: ['hello world foo bar', 'baz qux jump over', 'lazi dog'] # lowercased + stemmed NEW: ['Hello World FOO Bar ', 'Baz Qux Jumps Over ', 'Lazy Dogs'] # preserved; each <= 4 tokens NEW preserves text exactly: True CHINESE "你好世界这是一个测试用例需要被切分成多个块" (chunk_token_num=3) OLD: ['你好世界这是', '一个测试用例需要', ...] # spurious spaces NEW: ['你好世', '界这是', '一个测', ...] # preserved; each <= 3 tokens NEW preserves text exactly: True ``` ### Tests Added `test/unit_test/deepdoc/parser/test_html_parser.py` (English + Chinese oversized blocks, plus small-block merge). Before the fix the two oversized tests fail (English shows lowercasing, Chinese shows inserted spaces); after the fix all pass. `ruff check` clean.	2026-06-25 16:43:35 +08:00
Jin Hai	edfa9be67f	Go CLI: fix list provider instance tasks (#16345 )	2026-06-25 15:49:31 +08:00
balibabu	3f3a2ece3d	Fix: Flexible Chat Configuration (#16293 )	2026-06-25 14:56:30 +08:00
Muhammad Furqan	fe14cc35cf	fix(agent/tools): DeepL component fails validation and drops errors (#16332 ) ### What problem does this PR solve? `DeepLParam.check()` validated `self.top_n`, but DeepL has no such parameter (it is not defined on the param class or its base), so `check()` always raised `AttributeError` and a DeepL component could never pass validation. Removed the bogus `top_n` check. Also fixed the `_run` except branch, which computed `be_output("Error...")` but never returned it, silently dropping the error message. Closes #16329 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Add test cases ### Testing Added `test/unit_test/agent/component/test_deepl.py` covering `DeepLParam.check()` with valid defaults and rejection of invalid source/target languages.	2026-06-25 14:40:56 +08:00
Harsh Kashyap	09047d6edf	fix(web): bump lodash past vulnerable range (#16281 )	2026-06-25 14:40:39 +08:00
Idriss Sbaaoui	fb8e5ad4b2	Fix multimodal chat image routing for VLM channel requests (#16343 )	2026-06-25 14:38:29 +08:00
Muhammad Furqan	3747a6bfeb	fix(agent/tools): PubMed tool always returns "Unknown Authors" (#16330 ) ### What problem does this PR solve? Fixes the PubMed tool always emitting `Authors: Unknown Authors`. The `safe_find` closure in `_format_pubmed_content` was hardcoded to search from the article root, so the per-author `LastName`/`ForeName` lookups never matched. `safe_find` now accepts an optional `base` node (defaults to `child`, preserving the existing field lookups), and the author loop passes the current `<Author>` element. Closes #16328 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Add test cases ### Testing Added `test/testcases/test_web_api/test_canvas_app/test_pubmed_unit.py` covering per-author parsing, intact title/journal/DOI fields, and the no-authors fallback. Before: `Authors: Unknown Authors` After: `Authors: Furqan Khan, Jane Smith`	2026-06-25 14:34:37 +08:00
Harsh Kashyap	b9445c67e2	fix(agent): coerce None Switch inputs before string operators (#16320 ) ## Summary - Coerce `None` canvas values to `""` before string comparison operators in `Switch.process_operator`. - Prevents `AttributeError` when upstream components yield `None` and the Switch uses contains/start with/end with. ## Test plan - [x] `.v/bin/python -m ruff check agent/component/switch.py test/unit_test/agent/component/test_switch.py` - [x] `.v/bin/python -m pytest test/unit_test/agent/component/test_switch.py -q` (3 passed) Fixes #16315 --------- Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local>	2026-06-25 14:18:24 +08:00
Hz_	54fb5b0fa7	feat(go-api): add Go support for POST /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks (#16256 ) ## Summary Add the Go implementation of `POST /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks`. This wires the full create-chunk path in Go: - router and handler registration - request/response structs - chunk creation service logic - embedding generation - chunk insert into doc engine - chunk/token counter increment - `tag_feas` validation - `image_base64` decoding and chunk image storage/merge - unit tests for handler and service ## Testing Unit tests: - `/usr/local/go/bin/go test ./internal/handler` - `/usr/local/go/bin/go test ./internal/service/chunk` - `/usr/local/go/bin/go test ./internal/service` - `/usr/local/go/bin/go test ./...` All passed locally. Manual curl checks: - basic text chunk: Go passed - chunk with `important_keywords` / `questions` / `tag_kwd` / `tag_feas`: Go passed - blank content validation: Go matched expected `code=102` - invalid `image_base64` validation: Go matched expected `code=102` - image upload and repeated image upload / merge path: Go passed twice	2026-06-25 14:15:29 +08:00
chanx	d44359826d	fix(web): agent log refetch and slider percentage rounding (#16344 )	2026-06-25 13:49:25 +08:00
Jin Hai	17b066e6ae	Go CLI: fix list dataset files by dataset name (#16341 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> list dataset 'ccc' files; Total: 1 ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-25 13:41:58 +08:00
Hz_	a6cc3023c5	feat(go-api): implement dataset document upload API (#16295 ) ## Summary Migrated the dataset document upload API (`POST /api/v1/datasets/:dataset_id/documents`) from Python to the Go backend. It supports local file uploads (`type=local`), web page ingestion (`type=web`), and empty document creation (`type=empty`). ## Changes - Router: Registered `POST /api/v1/datasets/:dataset_id/documents` route. - Handler: Implemented `UploadDocuments` handler and its routing functions (`uploadLocalDocuments`, `uploadWebDocument`, `uploadEmptyDocument`). - Service: Implemented `UploadLocalDocuments`, `UploadWebDocument`, and `UploadEmptyDocument` in `DocumentService`. - Refactoring: Moved permission checking logic to a shared helper for reuse in file and document services. - Tests: Added comprehensive unit tests for the new handler and service upload paths. ## Verification Ran and passed the test suite for service and handler packages: - `go test ./internal/service` - `go test ./internal/handler`	2026-06-25 13:36:49 +08:00
Hz_	ced51114f4	feat(go-api): add dataset search endpoint (#16304 ) ### What problem does this PR solve? - added the new dataset search route and handler - reused the existing shared SearchDatasets service by adapting single-dataset requests into dataset_ids=[dataset_id] - aligned handler error responses with Python behavior for argument/data errors - aligned key service error messages such as invalid search_id and mixed embedding models - added focused handler and service tests for request mapping and error behavior ### Tests: `/usr/local/go/bin/go test ./internal/service -run 'TestSearchDatasetRequestToSearchDatasetsRequest\|TestDatasetServiceSearchDatasets'` `/usr/local/go/bin/go test ./internal/handler -run 'TestDatasetsHandlerSearchDataset'`	2026-06-25 13:32:22 +08:00
Willsgao	824c88423c	fix(agent): log Wikipedia disambiguation and page errors instead of s… (#16207 ) ## Problem The Wikipedia tool silently swallows all exceptions with `except Exception: pass`, making it impossible to debug failures when fetching Wikipedia pages. ## Fix Replace the bare `except Exception: pass` with specific exception handling: - `DisambiguationError`: log available options - `PageError`: log page not found - `Exception`: log unexpected errors with full traceback Co-authored-by: wills <willsgao@163.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-25 13:10:29 +08:00
buua436	479a9a715e	feat: unify provider id or name routing (#16336 )	2026-06-25 13:04:21 +08:00
Wang Qi	d0fc75f1bb	Fix when empty response not set, it report: ERROR: 'knowledge' (#16338 )	2026-06-25 13:02:24 +08:00
Ilya Bogin	10d02e54a8	Add Keenable web search tool to the agent (#16233 ) Adds Keenable as a web search tool in the agent, alongside the existing Tavily/DuckDuckGo/SearXNG/Google tools. The main difference from the other search tools is that it doesn't require an API key. By default it uses Keenable's keyless public endpoint, so it works out of the box. Providing a key (in the tool config) switches to the authenticated endpoint and lifts the rate limits. ### Changes - Backend: `agent/tools/keenable.py` — `KeenableSearch`, follows the Tavily/DuckDuckGo tool shape (results go through `_retrieve_chunks`). Auto-registered by `agent/tools/__init__.py`. - Frontend: wired into the agent builder — operator + icon, config form (optional API key, search mode, site filter, top N), the search tool menu, and the existing api_key export sanitizer. ### Config - API key: optional. Blank = keyless free tier; set it to lift limits / enable `realtime` mode. - `site`: restrict to a single domain. - `mode`: `pro` (default) or `realtime`. ### Notes `KEENABLE_API_URL` can override the API base (HTTPS enforced; defaults to `https://api.keenable.ai`). The tool only sends the query (no URL fetch), so there's no SSRF surface. Verified the frontend with `vite build` and the backend search path against the public endpoint.	2026-06-25 12:12:28 +08:00
Jin Hai	06d45c50cb	Example: list_datasets.sh (#16335 ) ### Type of change - [x] Other (please describe): example Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-25 10:36:07 +08:00
Jin Hai	7ef4a4a06a	Go CLI: list provider instance models, sync and list provider (#16311 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> list provider 'zhipu-ai' instance 'test' models sync; +------------+---------------+------------+-------------+------------------+---------------------------------------------+ \| dimensions \| max_dimension \| max_tokens \| model_types \| name \| thinking \| +------------+---------------+------------+-------------+------------------+---------------------------------------------+ \| \| \| 128000 \| [chat] \| glm-4.5@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 128000 \| [chat] \| glm-4.5-air@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-4.6@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-4.7@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-5@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 200000 \| [chat] \| glm-5-turbo@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| 202752 \| [chat] \| glm-5.1@z-ai \| map[clear_thinking:true default_value:true] \| \| \| \| \| [chat] \| glm-5.2@z-ai \| \| +------------+---------------+------------+-------------+------------------+---------------------------------------------+ RAGFlow(api/default)> list provider 'zhipu-ai' instance 'test' models; RAGFlow(api/default)> list dataset 'aaa' ingestion tasks; RAGFlow(api/default)> list dataset '0abe79f9423311f1ad8d38a74640adcc' documents; ``` --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-25 10:01:21 +08:00
Yingfeng	5b0b86c276	More resilient graph engine (#16325 ) ### What problem does this PR solve? - OpenTelemetry integration - Checkpoint conformance tests - State inspector API - Callbacks - A series of fault injection tests - Pregel integration tests ### Type of change - [x] Refactoring dev-20260625	2026-06-24 23:05:07 +08:00
Haruko386	dd46ece3bc	feat[go]: datasets/<dataset_id>/chunks DELETE (#16185 ) ### What problem does this PR solve? As title: `documents.POST("/ingest", r.documentHandler.Ingest)`: --- <img width="3750" height="2039" alt="image" src="https://github.com/user-attachments/assets/533c1c3d-af3e-47e6-9f51-a278539b7066" /> `datasets.DELETE("/:dataset_id/chunks", r.chunkHandler.StopParsing)` --- <img width="3621" height="2040" alt="image" src="https://github.com/user-attachments/assets/022adcdb-1e47-4883-9611-1a695c34007d" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-24 19:43:18 +08:00
Haruko386	c2665d4ab1	implement: <dataset_id>/embedding/check POST (#16266 )	2026-06-24 19:09:43 +08:00
Haruko386	48534d5af3	fix: new dataset can not update configuration (#16291 )	2026-06-24 19:08:56 +08:00
Jin Hai	1fc02606ea	Go CLI: fix key commands (#16306 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> set key 'ragflow-JgnarFSCUiV99oOvvMDei7ZzZg1cVlqGd1AMHrHeKE4'; SUCCESS RAGFlow(api/default)> unset key; SUCCESS RAGFlow(api/default)> list provider 'zhipu-ai' instances; RAGFlow(api/default)> list providers; RAGFlow(api/default)> list available providers; RAGFlow(api/default)> list provider 'zhipu-ai' instance 'test' models; ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-24 18:48:09 +08:00
Hz_	9a91564194	feat(go-api): align chat session get/update with python behavior (#16239 ) ## Summary Align `/chats/:chat_id/sessions/:session_id` GET and PATCH with Python behavior.	2026-06-24 17:34:01 +08:00
Hz_	dc8ff63f1d	feat(go-api): add dataset tags endpoints (#16231 ) ## Summary - add `GET /api/v1/datasets/:dataset_id/tags` - add `PUT /api/v1/datasets/:dataset_id/tags` - implement dataset tag listing and rename flow - align rename tag validation and response shape with the Python API - add handler and service tests for dataset tags ## Routes - `GET /api/v1/datasets/:dataset_id/tags` - `PUT /api/v1/datasets/:dataset_id/tags` ## Test - Run specific tests for dataset tags: ``` go test -v ./internal/service ./internal/handler -run 'TestDatasetServiceListTags\|TestDatasetServiceRenameTag\|TestDatasetsHandlerListTags\|TestDatasetsHandlerRenameTag' ``` - Run all tests for service and handler to verify no regressions: ``` go test ./internal/service ./internal/handler ``` - use curl cmd to test	2026-06-24 17:05:58 +08:00
Jin Hai	9624f70b22	Go CLI: refactor (#16299 ) ``` RAGFlow(api/default)> list dataset 'e93ab2c04ad111f1b17438a74640adcc' documents; Total: 1 RAGFlow(api/default)> list datasets; RAGFlow(api/default)> list chats; Total: 2 RAGFlow(api/default)> list agents; Total: 1 RAGFlow(api/default)> list searches; Total: 1 RAGFlow(api/default)> list keys; +----------------------------------+---------------+----------------------------------+-----------------------------------------------------+---------------+ \| beta \| create_time \| tenant_id \| token \| update_time \| +----------------------------------+---------------+----------------------------------+-----------------------------------------------------+---------------+ \| GKsLEdSUkl76gJz1k_4fJpSQRIlWsiki \| 1782285917523 \| 2ba4881420fa11f19e9c38a74640adcc \| ragflow-JgnarFSCUiV99oOvvMDei7ZzZg1cVlqGd1AMHrHeKE4 \| 1782285917523 \| +----------------------------------+---------------+----------------------------------+-----------------------------------------------------+---------------+ RAGFlow(api/default)> create key; SUCCESS RAGFlow(api/default)> drop key 'ragflow-aA4R7AuUD158yh2LDh7IDBiqwOKFDKeTwUSQSLVdPdM'; SUCCESS ``` --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-24 16:50:40 +08:00
Hz_	a8651e7f83	fix(go): normalizeDatasetID (#16301 ) fix `normalizeDatasetID`	2026-06-24 15:46:37 +08:00
Hz_	e35860ad74	feat(go-api): Align document metadata batch APIs and upload_info with Python (#16269 ) ## Summary Align the Go implementations of these APIs with the Python behavior: - `POST /api/v1/datasets/:dataset_id/metadata/update` - `PATCH /api/v1/datasets/:dataset_id/documents/metadatas` - `POST /api/v1/documents/upload` ## What changed - Added the Go routes and handlers for the 3 APIs. - Aligned batch document metadata updates with Python semantics: - support `match` in update items - support list append / replace behavior - support deleting specific list values - remove metadata entirely when it becomes empty - create metadata for documents that previously had none when updates apply - count `updated` only when a document actually changes - Aligned `documents/upload` file uploads with Python-style `upload_info` behavior: - store upload-info blobs in the per-user downloads bucket - return lightweight upload descriptors instead of normal file-management responses - Improved URL upload behavior: - SSRF-guarded fetch with redirect validation - redirect limit aligned to Python behavior - normalize filename and MIME type - add `.pdf` when the fetched content is PDF - normalize HTML content into readable text instead of storing raw HTML shells ## Validation ### Unit tests Passed: - `go test ./internal/service` - `go test ./internal/handler` Also verified targeted cases for: - batch metadata update semantics - upload_info URL handling - upload_info download bucket behavior ### curl checks Verified the new Go endpoints with `curl` and compared the response shape and behavior with Python for: - `POST /api/v1/datasets/{dataset_id}/metadata/update` - `PATCH /api/v1/datasets/{dataset_id}/documents/metadatas` - `POST /api/v1/documents/upload` The Go responses were checked against Python for: - argument validation - success response shape - metadata update results - upload_info result structure - file vs URL input handling	2026-06-24 14:52:47 +08:00
Haruko386	97718ec779	feat[Go] implement datasets/<dataset_id>/<index_type> DELETE (#16257 ) ### What problem does this PR solve? As title ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-24 14:47:55 +08:00
Hz_	368db6fa58	feat(go-api): migrate datasets tags aggregation API to Go (#16181 ) ### Description Migrates the datasets tags aggregation API `GET /api/v1/datasets/tags/aggregation` from Python to Go. ### Changes - Registered the `GET /api/v1/datasets/tags/aggregation` route. - Implemented `AggregateTags` in datasets `handler` and `service`. - Added handler and service `unit tests`. ### Test Verification - Verified by comparing results between Python (9380) and Go (9384) services. - Tested scenarios: single dataset, multiple datasets, empty parameters, and unauthorized/invalid IDs. - All tests and Go `unit tests` passed.	2026-06-24 14:42:10 +08:00
kpdev	68d2ca0ff1	fix(api): use dataset-owner tenant for legacy /chunks docstore cleanup (#15961 )	2026-06-24 14:24:40 +08:00
Lynn	ede46e0bb8	Fix: guess volc embedding model (#16298 )	2026-06-24 14:11:55 +08:00
Jin Hai	e615e4faab	Go CLI: fix mode switch (#16294 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> add admin host '127.0.0.1:9383'; SUCCESS RAGFlow(api/default)> use admin; SUCCESS RAGFlow(admin)> delete api 'default'; SUCCESS RAGFlow(admin)> delete api 'default'; CLI error: api server: default not found RAGFlow(admin)> add api 'default' host '127.0.0.1:9384'; SUCCESS RAGFlow(admin)> use api 'default'; SUCCESS RAGFlow(api/default)> delete admin SUCCESS RAGFlow(api/default)> delete admin; CLI error: admin server not exists RAGFlow(api/default)> list api server; +------------+---------------+-----------------+---------+ \| api_server \| api_server_ip \| api_server_port \| auth \| +------------+---------------+-----------------+---------+ \| default \| 127.0.0.1 \| 9384 \| no auth \| +------------+---------------+-----------------+---------+ RAGFlow(api/default)> add admin host '127.0.0.1:9383'; SUCCESS RAGFlow(api/default)> show admin server; +-------------------+-----------+ \| field \| value \| +-------------------+-----------+ \| admin_server_ip \| 127.0.0.1 \| \| admin_server_port \| 9383 \| \| auth \| no auth \| +-------------------+-----------+ ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-24 13:41:01 +08:00
Ambercssa	e9cdd09b67	fix(agent): handle different reference data formats (#16276 )	2026-06-24 13:33:59 +08:00
Wang Qi	6046bc6a8e	Fix: handle empty folder when link to datasets (#16296 )	2026-06-24 13:31:32 +08:00
helloxjade	1b2da645c3	fix: deduplicate markdown table chunks (#16143 )	2026-06-24 13:22:57 +08:00
Ju Boxiang	39b194453d	Fix: paginate get_flatted_meta_by_kbs to support datasets with >10k documents (#16034 ) (#16095 )	2026-06-24 13:20:07 +08:00
minion1227	14565b289a	Fix: docx parsing raises ValueError on 'Heading' styles (#16284 )	2026-06-24 13:16:16 +08:00
minion1227	0c19190daf	Fix: MCP document metadata cache can loop forever when documents returns an empty docs page (#16285 )	2026-06-24 13:09:48 +08:00
ちー	5928b8b9ae	fix(document_service): prevent NoneType error on progress_msg.strip() (#16289 ) ### What problem does this PR solve? When I run RAGFlow_server.py: ``` 2026-06-24 10:27:01,938 ERROR 3413485 fetch task exception Traceback (most recent call last): File "/home/infiniflow/Documents/development/ragflow/api/db/services/document_service.py", line 948, in _sync_progress if t.progress_msg.strip(): ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'strip' ``` fixed: ```python if t.progress_msg.strip(): # fix: if (t.progress_msg or "").strip(): ``` Fix crash in `_sync_progress` when `progress_msg` is `None`. #### Root Cause `progress_msg` from task records can be `None`, causing: ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-24 13:07:40 +08:00

1 2 3 4 5 ...

6954 Commits