ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-05 10:58:34 +08:00

Author	SHA1	Message	Date
Jack	939933649a	Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/list Http API - GET /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-20 14:54:40 +08:00
Daniil Sivak	c93ec0a1f3	Fix: reject empty/space-only content in update_chunk API (#14082 ) Closes #6541 ### What problem does this PR solve? Add content validation to `update_chunk` (SDK and non-SDK) to reject empty or whitespace-only content before it reaches the embedding model. Before: Calling `update_chunk` with space-only content (like `" "`, `""`, `"\n"`) bypassed validation and was sent directly to the embedding model, which returned an error. This was the same bug previously fixed for `add_chunk` in #6390, but `update_chunk` was missed. After: Empty/whitespace-only content is caught by validation and returns an error: `` `content` is required `` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-15 18:43:53 +08:00
Idriss Sbaaoui	d6987b4d8f	Fix p3 ci fails (#14069 ) ### What problem does this PR solve? fix issue with stale tests on p3 level ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-14 10:47:07 +08:00
Jack	577c96bf2a	Refactor: Merge document update API (#13962 ) ### What problem does this PR solve? Refactor: merge document.rename into document.update_document ### Type of change - [x] Refactoring <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added a unified document update API (PUT) supporting name, metadata, parser/chunk settings, and status changes. * Breaking Changes * Legacy single-parameter rename endpoint removed; renames now require dataset + document identifiers. * `/list` now reads dataset id from a different query parameter. * Validation / Bug Fixes * Stricter meta_fields and parser-config validation; unauthenticated requests return 401. * Frontend * UI now sends dataset id when saving document names. * Tests * Numerous unit and HTTP tests adjusted or removed to match new API and validations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com> Co-authored-by: Qi Wang <wangq8@outlook.com> Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com>	2026-04-09 11:17:38 +08:00
Jack	c4b0aaa874	Fix: #6098 - Add validation logic for parser_config when update document (#13911 ) ### What problem does this PR solve? Add validation logic for parser_config. Refactor the processing flow. Before change, validation logics and update logics are mixed up - some validation logis executes followed by some update logic executes and then another such "validation-and-then-update" which is not good. After change, all validation logic executes firstly. Update logic will be executed after ALL validation logic executed. Validation logic for parameters (that come from front end) will be checked using Pydantic. For validation logic that depends on data from DB, they will be in separate methods. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-04-07 11:33:05 +08:00
Yongteng Lei	b7daf6285b	Refa: Chat conversations /convsersation API to RESTFul (#13893 ) ### What problem does this PR solve? Chat conversations /convsersation API to RESTFul. ### Type of change - [x] Refactoring	2026-04-02 20:49:23 +08:00
Idriss Sbaaoui	dd529137eb	Fix: markdown table double extraction in parser (#13892 ) ### What problem does this PR solve? Fixes markdown tables being parsed twice (once as markdown and again as generated HTML), which caused duplicate table chunks in the chunk list UI. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-02 13:31:56 +08:00
Yongteng Lei	b622c47ed6	Refa: Chats /chat API to RESTFul (#13881 ) ### What problem does this PR solve? Refactor Chats /chat API to RESTFul. ### Type of change - [x] Refactoring	2026-04-01 20:10:37 +08:00
Liu An	b1d28b5898	Revert "Refa: Chats /chat API to RESTFul (#13871 )" (#13877 ) ### What problem does this PR solve? This reverts commit `1a608ac411`. ### Type of change - [x] Other (please describe):	2026-04-01 11:05:29 +08:00
Yongteng Lei	1a608ac411	Refa: Chats /chat API to RESTFul (#13871 ) ### What problem does this PR solve? Chats /chat API to RESTFul. ### Type of change - [x] Refactoring	2026-04-01 10:50:22 +08:00
Heyang Wang	641b319647	feat: support reading tags via API (#12891 ) (#13732 ) ### What problem does this PR solve? Enable reading Tag Set tags via API (expose tag_kwd field). The result of the queried list chunks is as shown below: <img width="1422" height="818" alt="image" src="https://github.com/user-attachments/assets/abd1960a-fe34-489e-9d72-525f8e574938" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>	2026-03-29 20:17:01 +08:00
Lynn	6a4a9debd2	Fix: allow create dataset with resume chunk_method (#13798 ) ### What problem does this PR solve? Allow create dataset with resume chunk_method. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-26 19:06:51 +08:00
Lynn	4bb1acaa5b	Refactor: dataset / kb API to RESTFul style (#13690 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-19 14:41:36 +08:00
Jin Hai	986dcf1cc8	Revert "Refactor: dataset / kb API to RESTFul style" (#13646 ) Reverts infiniflow/ragflow#13619	2026-03-17 12:09:48 +08:00
Lynn	1db5409d82	Refactor: dataset / kb API to RESTFul style (#13619 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring	2026-03-16 22:51:34 +08:00
Jin Hai	a2d72202cf	Revert "Refactor dataset / kb API to RESTFul style" (#13614 ) Reverts infiniflow/ragflow#13263	2026-03-16 10:44:38 +08:00
Lynn	7c32e206be	Refactor dataset / kb API to RESTFul style (#13263 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring	2026-03-13 20:02:35 +08:00
Yongteng Lei	e1b632a7bb	Feat: add delete all support for delete operations (#13530 ) ### What problem does this PR solve? Add delete all support for delete operations. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-12 09:47:42 +08:00
Liu An	852393c114	Test: Lower priority of chat assistant and chunk list API tests (#13540 ) ### What problem does this PR solve? Mark test cases as lower priority (p3) for: - Creating chat assistants - Deleting chat assistants - Listing chat assistants - Listing chunks within datasets ### Type of change - [x] Update testcases	2026-03-11 19:00:18 +08:00
Yongteng Lei	51be1f1442	Refa: empty ids means no-op operation (#13439 ) ### What problem does this PR solve? Empty ids means no-op operation. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-06 18:16:42 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
6ba3i	22c4d72891	tests: improve RAGFlow coverage based on Codecov report (#13219 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-26 19:03:26 +08:00
PandaMan	d43aebe701	Fix/13142 auto metadata (#13217 ) ### What problem does this PR solve? Close #13142 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 10:25:48 +08:00
6ba3i	38011f2c16	tests: improve RAGFlow coverage based on Codecov report (#13200 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-25 19:12:11 +08:00
6ba3i	fabbfcab90	Fix: failing p3 test for SDK/HTTP APIs (#13062 ) ### What problem does this PR solve? Adjust highlight parsing, add row-count SQL override, tweak retrieval thresholding, and update tests with engine-aware skips/utilities. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 14:56:10 +08:00
Liu An	c4f60b349d	Fix(test): downgrade test priorities (#12913 ) ### What problem does this PR solve? Changed test priorities in multiple test files, downgrading from p1 to p2 and p2 to p3. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-30 20:02:56 +08:00
6ba3i	aee9860970	Make document change-status idempotent for Infinity doc store (#12717 ) ### What problem does this PR solve? This PR makes the document change‑status endpoint idempotent under the Infinity doc store. If a document already has the requested status, the handler returns success without touching the engine, preventing unnecessary updates and avoiding missing‑table errors while keeping responses consistent. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 19:11:21 +08:00
6ba3i	0795616b34	Align p3 HTTP/SDK tests with current backend behavior (#12563 ) ### What problem does this PR solve? Updates pre-existing HTTP API and SDK tests to align with current backend behavior (validation errors, 404s, and schema defaults). This ensures p3 regression coverage is accurate without changing production code. ### Type of change - [x] Other (please describe): align p3 HTTP/SDK tests with current backend behavior --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-13 19:22:47 +08:00
Lynn	f9d4179bf2	Feat：memory sdk (#12538 ) ### What problem does this PR solve? Move memory and message apis to /api, and add sdk support. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-09 17:45:58 +08:00
Jin Hai	cc9546b761	Fix IDE warnings (#12010 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-18 11:27:02 +08:00
Jin Hai	30019dab9f	Change knowledge base to dataset (#11976 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-17 10:03:33 +08:00
Zhichang Yu	40e84ca41a	Use Infinity single-field-multi-index (#11444 ) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-11-26 11:06:37 +08:00
Liu An	bfc84ba95b	Test: handle duplicate names by appending "(1)" (#11244 ) ### What problem does this PR solve? - Updated tests to reflect new behavior of handling duplicate dataset names - Instead of returning an error, the system now appends "(1)" to duplicate names - This problem was introduced by PR #10960 ### Type of change - [x] Testcase update	2025-11-13 15:18:32 +08:00
Billy Bao	19f71a961a	Fix: Create dataset performance unmatched between HTTP api and web ui (#10960 ) ### What problem does this PR solve? Fix: Create dataset performance unmatched between HTTP api and web ui #10925 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-04 13:45:14 +08:00
Liu An	119713153c	Test: update test cases for chunk retrieval pagination (#10839 ) ### What problem does this PR solve? Updated test cases in test_retrieval_chunks.py to: - Remove skip mark from page pagination test case (#6646 resolved) - Add skip marks for page_size=1 tests due to new issue (#10692) ### Type of change - [x] Test update	2025-10-29 09:41:36 +08:00
Zhichang Yu	73144e278b	Don't release full image (#10654 ) ### What problem does this PR solve? Introduced gpu profile in .env Added Dockerfile_tei fix datrie Removed LIGHTEN flag ### Type of change - [x] Documentation Update - [x] Refactoring	2025-10-23 23:02:27 +08:00
writinwaters	6e862553cb	Docs: Deprecated 'Create session with agent' (#9464 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-08-14 12:13:11 +08:00
Liu An	b55c3d07dc	Test: Update error message assertions for chunk update tests (#9468 ) ### What problem does this PR solve? Modify test cases to accept additional error message format when updating chunks. fix actions: https://github.com/infiniflow/ragflow/actions/runs/16942741621/job/48015850297 ### Type of change - [x] Update test cases	2025-08-14 12:11:20 +08:00
Liu An	46dc3f1c48	Fix: Update test assertions and add GraphRAG config in dataset tests (#9386 ) ### What problem does this PR solve? - Modify error message assertion in chunk update test to check for document ownership - Add GraphRAG configuration with `use_graphrag: False` in dataset update tests - Fix actions: https://github.com/infiniflow/ragflow/actions/runs/16863637898/job/47767511582 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-11 17:15:48 +08:00
Zhichang Yu	342a04ec8a	Added infinity rank_feature support (#9044 ) ### What problem does this PR solve? Added infinity rank_feature support ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-29 09:14:23 +08:00
Liu An	b5ffca332a	Refa: validation utils to use Pydantic v2 style models (#9037 ) ### What problem does this PR solve? - Update BaseModel to use model_config instead of Config class - Replace StrEnum with Literal types for method fields - Convert Field declarations to Annotated style ### Type of change - [x] Refactoring	2025-07-25 12:16:45 +08:00
Liu An	b4b6d296ea	Fix: Increase timeouts for document parsing and model checks (#8996 ) ### What problem does this PR solve? - Extended embedding model timeout from 3 to 10 seconds in api_utils.py - Added more time for large file batches and concurrent parsing operations to prevent test flakiness - Import from #8940 - https://github.com/infiniflow/ragflow/actions/runs/16422052652 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-23 15:08:36 +08:00
Liu An	0020c50000	Fix: Refactor parser config handling and add GraphRAG defaults (#8778 ) ### What problem does this PR solve? - Update `get_parser_config` to merge provided configs with defaults - Add GraphRAG configuration defaults for all chunk methods - Make raptor and graphrag fields non-nullable in ParserConfig schema - Update related test cases to reflect config changes - Ensure backward compatibility while adding new GraphRAG support - #8396 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-23 09:29:37 +08:00
Liu An	f8524462b0	Fix: Increase default `chunk_token_num` from 128 to 512 in parser config (#8753 ) ### What problem does this PR solve? Updated the default `chunk_token_num` value in `api_utils.py` and `validation_utils.py` to 512 to accommodate larger text chunks. Adjusted corresponding test cases in HTTP and SDK API tests to reflect this change. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-10 09:53:20 +08:00
Yongteng Lei	4d7bfd2ba3	Fix: typo process_duration (#8696 ) ### What problem does this PR solve? Fix typo process_duration. ### Type of change - [x] Documentation Update - [x] Refactoring	2025-07-07 14:11:47 +08:00
Liu An	dac5bcdf17	Fix: Enforce default embedding model in create_dataset / update_dataset (#8486 ) ### What problem does this PR solve? Previous: - Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI' - Did not respect user-configured default embedding_model Now: - Correctly prioritizes user-configured default embedding_model Other: - Make embedding_model optional in CreateDatasetReq with proper None handling - Add default embedding model fallback in dataset update when empty - Enhance validation utils to handle None values and string normalization - Update SDK default embedding model to None to match API changes - Adjust related test cases to reflect new validation rules ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 16:41:32 +08:00
Jin Hai	e470645efd	Refactor code (#8341 ) ### What problem does this PR solve? 1. rename var 2. update if statement ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-18 16:40:30 +08:00
Liu An	a3bebeb599	Fix: Enforce 255-byte filename limit (#8290 ) ### What problem does this PR solve? - Add filename length validation (<=255 bytes) for document upload/rename in both HTTP and SDK APIs - Update error messages for consistency - Fix comparison operator in SDK from '>=' to '>' for filename length check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-16 16:39:41 +08:00
Liu An	64af09ce7b	Test: Add web API test suite for knowledge base operations (#8254 ) ### What problem does this PR solve? - Implement RAGFlowWebApiAuth class for web API authentication - Add comprehensive test cases for KB CRUD operations - Set up common fixtures and utilities in conftest.py - Add helper functions in common.py for web API requests The changes establish a complete testing framework for knowledge base management via web API endpoints. ### Type of change - [x] Add test case	2025-06-13 16:39:10 +08:00
Liu An	7fbbc9650d	Fix: Move pagerank field from create to update dataset API (#8217 ) ### What problem does this PR solve? - Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq - Add pagerank update logic in dataset update endpoint - Update API documentation to reflect changes - Modify related test cases and SDK references #8208 This change makes pagerank a mutable property that can only be set after dataset creation, and only when using elasticsearch as the doc engine. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:49 +08:00

1 2

56 Commits