ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Jin Hai	fa75aee3b9	Refactor system API (#13958 ) ### What problem does this PR solve? - ping - token - log level ### Type of change - [x] Refactoring <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * System endpoints consolidated under /api/v1/system: ping, health check, and token management moved to the centralized API surface. * Token management unified at /api/v1/system/tokens with list/create/delete behavior. * Documentation * API reference updated to reflect the new /api/v1/system paths. * Tests * Client fixtures and test utilities updated to use /api/v1/system/tokens; one unit test for health/oceanbase status removed. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-08 15:26:18 +08:00
Jin Hai	ad789f5c43	Fix list files (#13960 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Standardized the query parameter used when listing documents so listings behave consistently across the web and client interfaces. * Clarified the error message shown when a required dataset ID is missing to give clearer guidance to users. * Tests * Updated test coverage to reflect the standardized dataset identifier usage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-08 13:38:30 +08:00
dataCenter430	62a1333cf2	Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940 ) … ### What problem does this PR solve? Closes #13857 Parent-child chunking was introduced in v0.23.0 but is only configurable through the web UI. Users managing datasets programmatically cannot enable it via the HTTP API or Python SDK because `ParserConfig` uses `extra="forbid"`, rejecting the `children_delimiter` field at validation. ### What does this PR change? Adds a `parent_child` nested config to `ParserConfig`, following the same pattern as `raptor` and `graphrag`: ```json "parser_config": { "parent_child": { "use_parent_child": true, "children_delimiter": "\n" } } ``` - api/utils/validation_utils.py — new ParentChildConfig model, added to ParserConfig - api/utils/api_utils.py — naive defaults + flatten to children_delimiter for the execution layer - api/apps/services/dataset_api_service.py — flatten on the update path - test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG - test/testcases/test_http_api/test_dataset_management/test_create_dataset.py — 4 valid + 2 invalid test cases No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py). Existing UI flow via ext is unaffected. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added parent-child chunking configuration for dataset creation and updates with new `use_parent_child` toggle and customizable `children_delimiter` setting to specify how parent chunks are split into child chunks. * Documentation * Updated HTTP and Python API references with parent-child chunking configuration details and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-04-08 11:36:57 +08:00
MkDev11	cfee2bc9db	feat: Auto-adjust chunk recall weights based on user feedback (#12689 ) ### What problem does this PR solve? Implements automatic adjustment of knowledge base chunk recall weights based on user feedback (upvotes/downvotes). When users upvote or downvote a response, the system locates the corresponding knowledge snippets and adjusts their recall weight to improve future retrieval quality. Closes #12670 How it works: 1. User upvotes/downvotes a response via `POST /thumbup` 2. System extracts chunk IDs from the conversation reference 3. For each referenced chunk: - Reads current `pagerank_fea` value from document store - Increments (+1) for upvote or decrements (-1) for downvote - Clamps weight to [0, 100] range - Updates chunk in ES/Infinity/OceanBase 4. Future retrievals score these chunks higher/lower based on accumulated feedback Files changed: - `api/db/services/chunk_feedback_service.py` - New service for updating chunk pagerank weights - `api/apps/conversation_app.py` - Integrated feedback service into thumbup endpoint - `test/testcases/test_web_api/test_chunk_feedback/` - Unit tests ### Type of change - [x] New Feature (non-breaking change which adds functionality) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Chat message feedback now updates per-chunk relevance weights (feature-flag gated), with configurable weighting and atomic updates across storage backends. * Bug Fixes * Stricter validation for message feedback inputs and more robust handling of feedback transitions. * Tests * Expanded test coverage for chunk-feedback behavior, weighting strategies, storage backends, and thumb-flip scenarios. * Chores * CI workflow extended to run the new chunk-feedback web API tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>	2026-04-08 09:52:18 +08:00
Jin Hai	931021875a	Refactor system/version API to RESTful style (#13956 ) ### What problem does this PR solve? Refactor version API to RESTful style. Python and go server API also updated. ### Type of change - [x] Refactoring <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * Refactor * Migrated core API endpoints to the `/api/v1/` namespace for improved consistency and organization. * Standardized system version, search, and chat list endpoints under the new API versioning structure. * New Features * Added MinIO region configuration support, allowing specification of storage engine regional settings via environment variables or configuration files. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-07 19:07:47 +08:00
Jack	c4b0aaa874	Fix: #6098 - Add validation logic for parser_config when update document (#13911 ) ### What problem does this PR solve? Add validation logic for parser_config. Refactor the processing flow. Before change, validation logics and update logics are mixed up - some validation logis executes followed by some update logic executes and then another such "validation-and-then-update" which is not good. After change, all validation logic executes firstly. Update logic will be executed after ALL validation logic executed. Validation logic for parameters (that come from front end) will be checked using Pydantic. For validation logic that depends on data from DB, they will be in separate methods. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-04-07 11:33:05 +08:00
qinling0210	49386bc1b5	Implement UpdateDataset and UpdateMetadata in GO (#13928 ) ### What problem does this PR solve? Implement UpdateDataset and UpdateMetadata in GO Add cli: UPDATE CHUNK <chunk_id> OF DATASET <dataset_name> SET <update_fields> REMOVE TAGS 'tag1', 'tag2' from DATASET 'dataset_name'; SET METADATA OF DOCUMENT <doc_id> TO <meta> ### Type of change - [ ] Refactoring	2026-04-07 09:44:51 +08:00
Magicbook1108	69264b3a70	Feat: Refact pipeline (#13826 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 19:26:45 +08:00
Yongteng Lei	b7daf6285b	Refa: Chat conversations /convsersation API to RESTFul (#13893 ) ### What problem does this PR solve? Chat conversations /convsersation API to RESTFul. ### Type of change - [x] Refactoring	2026-04-02 20:49:23 +08:00
Idriss Sbaaoui	ee1bb8a8b5	Fix: overlapping document parse race that can clear chunks (#13900 ) ### What problem does this PR solve? This PR fixes a race in batch document parsing where overlapping parse requests for the same document could clear/rewrite chunk state and make previously parsed content appear lost. It adds an atomic per-document parse guard so only one parse can run at a time for that document (Fixes #13864 ). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-02 18:50:56 +08:00
Yongteng Lei	b622c47ed6	Refa: Chats /chat API to RESTFul (#13881 ) ### What problem does this PR solve? Refactor Chats /chat API to RESTFul. ### Type of change - [x] Refactoring	2026-04-01 20:10:37 +08:00
Liu An	b1d28b5898	Revert "Refa: Chats /chat API to RESTFul (#13871 )" (#13877 ) ### What problem does this PR solve? This reverts commit `1a608ac411`. ### Type of change - [x] Other (please describe):	2026-04-01 11:05:29 +08:00
Yongteng Lei	1a608ac411	Refa: Chats /chat API to RESTFul (#13871 ) ### What problem does this PR solve? Chats /chat API to RESTFul. ### Type of change - [x] Refactoring	2026-04-01 10:50:22 +08:00
Paul Y Hui	3e702c6265	fix: guard against missing/malformed Authorization header in apikey_required (#13860 ) ### What problem does this PR solve? Previously, `apikey_required` called `request.headers.get('Authorization').split()[1]` without checking for None or insufficient parts, causing an unhandled AttributeError or IndexError (500) instead of a proper 403 JSON response. This applies the same guarding pattern already used by `token_required` in the same file. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-03-31 15:25:00 +08:00
Zhichang Yu	0d85a8e7aa	feat: add dynamic log level adjustment APIs (#13850 ) Add REST APIs to dynamically query and modify log levels at runtime for both Python (Flask) and Go servers. Changes: - common/log_utils.py: add set_log_level() and get_log_levels() functions - admin/server/routes.py: add GET/PUT /api/v1/admin/log_levels endpoints - api/apps/system_app.py: add GET/PUT /api/{version}/system/log_levels endpoints - internal/logger/logger.go: add GetLevel() and SetLevel() with atomic level support - internal/handler/system.go: add GetLogLevel, SetLogLevel, Health handlers - internal/router/router.go: route /health to systemHandler - internal/admin/handler.go: add GetLogLevel, SetLogLevel handlers - internal/admin/router.go: add /api/v1/admin/log_level routes ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 18:40:58 +08:00
Heyang Wang	641b319647	feat: support reading tags via API (#12891 ) (#13732 ) ### What problem does this PR solve? Enable reading Tag Set tags via API (expose tag_kwd field). The result of the queried list chunks is as shown below: <img width="1422" height="818" alt="image" src="https://github.com/user-attachments/assets/abd1960a-fe34-489e-9d72-525f8e574938" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>	2026-03-29 20:17:01 +08:00
Idriss Sbaaoui	3b1e77a6d4	Fix: shared KB embedding authorization for team members (#13809 ) ### What problem does this PR solve? fixes issue #13799 where team members get model not authorized when running RAG on an admin-shared knowledge base after the admin changes the KB embedding model (for example to bge-m3). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-26 21:01:07 +08:00
Lynn	8d4a3d0dfe	Fix: create dataset with chunk_method or pipeline (#13814 ) ### What problem does this PR solve? Allow create datasets with parse_type == 1/None and chunk_method, or parse_type == 2 and pipeline_id. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-26 20:43:53 +08:00
Lynn	6a4a9debd2	Fix: allow create dataset with resume chunk_method (#13798 ) ### What problem does this PR solve? Allow create dataset with resume chunk_method. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-26 19:06:51 +08:00
Syed Shahmeer Ali	ff92b5575b	Fix: /file2document/convert blocks event loop on large folders causing 504 timeout (#13784 ) Problem The /file2document/convert endpoint ran all file lookups, document deletions, and insertions synchronously inside the request cycle. Linking a large folder (~1.7GB with many files) caused 504 Gateway Timeout because the blocking DB loop held the HTTP connection open for too long. Fix - Extracted the heavy DB work into a plain sync function _convert_files - Inputs are validated and folder file IDs expanded upfront (fast path) - The blocking work is dispatched to a thread pool via get_running_loop().run_in_executor() and the endpoint returns 200 immediately - Frontend only checks data.code === 0 so the response change (file2documents list → True) has no impact Fixes #13781 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 16:45:10 +08:00
Yongteng Lei	d19ca71b43	Refa: Searches /search API to RESTFul (#13770 ) ### What problem does this PR solve? Searches /search API to RESTFul ### Type of change - [x] Documentation Update - [x] Refactoring Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-26 01:07:41 +08:00
Jin Hai	24fcd6bbc7	Update CI (#13774 ) ### What problem does this PR solve? CI isn't stable, try to fix it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-25 18:17:52 +08:00
Yongteng Lei	1b29522279	Fix: migrate_add_unique_email silently skips unique constraint (#13744 ) ### What problem does this PR solve? Fix migrate_add_unique_email-silently-skips-unique-constraint-when-non-unique-user_email-index-exists. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-24 20:24:24 +08:00
Yongteng Lei	3d10e2075c	Refa: files /file API to RESTFul style (#13741 ) ### What problem does this PR solve? Files /file API to RESTFul style. ### Type of change - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-24 19:24:41 +08:00
Idriss Sbaaoui	df2cc32f51	Fix: dataset settings save (#13745 ) ### What problem does this PR solve? Saving dataset settings failed with validation error 101 (Extra inputs are not permitted) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-23 17:46:41 +08:00
Yongteng Lei	dd839f30e8	Fix: code supports matplotlib (#13724 ) ### What problem does this PR solve? Code as "final" node: ![img_v3_02vs_aece4caf-8403-4939-9e68-9845a22c2cfg](https://github.com/user-attachments/assets/9d87b8df-da6b-401c-bf6d-8b807fe92c22) Code as "mid" node: ![img_v3_02vv_f74f331f-d755-44ab-a18c-96fff8cbd34g](https://github.com/user-attachments/assets/c94ef3f9-2a6c-47cb-9d2b-19703d2752e4) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-20 20:32:00 +08:00
Lynn	4bb1acaa5b	Refactor: dataset / kb API to RESTFul style (#13690 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-19 14:41:36 +08:00
NeedmeFordev	c3f79dbcb0	fix(jira): prevent missed incremental updates after issue edits (#13674 ) ### What problem does this PR solve? Fixes [#13505](https://github.com/infiniflow/ragflow/issues/13505): Jira incremental sync could miss updated issues after initial sync, especially near time boundaries. Root cause: - Jira JQL uses minute-level precision for `updated` filters. - Incremental windows had no overlap buffer, so boundary updates could be skipped. - Sync log cursor tracking used a backward-facing update for `poll_range_start`. - Existing-doc updates in `upload_document` lacked a KB ownership guard for doc-id collisions. What changed: - Added Jira incremental overlap buffer (`time_buffer_seconds`, defaulting to `JIRA_SYNC_TIME_BUFFER_SECONDS`) when building JQL lower-bound time. - Preserved second-level post-filtering to avoid duplicate reprocessing while still catching boundary updates. - Improved Jira sync logging to include start/end window and overlap configuration. - Updated sync cursor tracking in `increase_docs` to keep `poll_range_start` moving forward with max update time. - Added KB ID safety check before updating existing document records in `upload_document`. Verification performed: - Python syntax compile checks passed for modified files. - Manual verification flow: 1. Run full Jira sync. 2. Edit an already-indexed Jira issue. 3. Run next incremental sync. 4. Confirm updated content is re-ingested into KB. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-18 23:31:05 +08:00
Daniil Sivak	60ad32a0c2	Feat: support epub parsing (#13650 ) Closes #1398 ### What problem does this PR solve? Adds native support for EPUB files. EPUB content is extracted in spine (reading) order and parsed using the existing HTML parser. No new dependencies required. ### Type of change - [x] New Feature (non-breaking change which adds functionality) To check this parser manually: ```python uv run --python 3.12 python -c " from deepdoc.parser import EpubParser with open('$HOME/some_epub_book.epub', 'rb') as f: data = f.read() sections = EpubParser()(None, binary=data, chunk_token_num=512) print(f'Got {len(sections)} sections') for i, s in enumerate(sections[:5]): print(f'\n--- Section {i} ---') print(s[:200]) " ```	2026-03-17 20:14:06 +08:00
Idriss Sbaaoui	1399c60164	fix builtin model fail when parsing (#13657 ) ### What problem does this PR solve? using builtin model when parsing gave an error because it expects fid==builtin. split_model_name_and_factory returns id=None. pr allows the model to be accepted wheter with or without @Builtin ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-17 19:38:54 +08:00
balibabu	6cae364ac2	Feat: Export Agent Logs. (#13658 ) ### What problem does this PR solve? Feat: Export Agent Logs. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-03-17 18:51:26 +08:00
Yongteng Lei	ca6c3218c3	Refa: follow-up expose agent structured outputs in non-stream completions (#13524 ) ### What problem does this PR solve? Follow-up expose agent structured outputs in non-stream completions #13389. ### Type of change - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-17 17:11:27 +08:00
Jin Hai	986dcf1cc8	Revert "Refactor: dataset / kb API to RESTFul style" (#13646 ) Reverts infiniflow/ragflow#13619	2026-03-17 12:09:48 +08:00
Lynn	1db5409d82	Refactor: dataset / kb API to RESTFul style (#13619 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring	2026-03-16 22:51:34 +08:00
Yongteng Lei	af7e24ba8c	Feat: add_chunk supports add image (#13629 ) ### What problem does this PR solve? Add_chunk supports add image. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-16 20:15:36 +08:00
Jin Hai	a2d72202cf	Revert "Refactor dataset / kb API to RESTFul style" (#13614 ) Reverts infiniflow/ragflow#13263	2026-03-16 10:44:38 +08:00
Lynn	7c32e206be	Refactor dataset / kb API to RESTFul style (#13263 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring	2026-03-13 20:02:35 +08:00
Magicbook1108	161659becc	Fix: model selecton rule in get_model_config_by_type_and_name (#13569 ) ### What problem does this PR solve? Fix: model selecton rule in get_model_config_by_type_and_name ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-13 19:46:13 +08:00
balibabu	717f1f1362	Feat: Modify the style of the release confirmation box. (#13542 ) ### What problem does this PR solve? Feat: Modify the style of the release confirmation box. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: balibabu <assassin_cike@163.com> Co-authored-by: 6ba3i <isbaaoui09@gmail.com>	2026-03-13 16:31:17 +08:00
Lynn	02070bab2a	Feat: record user_id in memory (#13585 ) ### What problem does this PR solve? Get user_id from canvas and record it. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-13 15:38:35 +08:00
Ethan T.	71804bf5bc	fix(db_models): guard MySQL-specific SQL in migration with DB_TYPE check (fixes #13544 ) (#13582 ) ## Summary Fixes #13544: PostgreSQL startup crash because `update_tenant_llm_to_id_primary_key()` unconditionally uses MySQL-specific SQL. - Split `update_tenant_llm_to_id_primary_key()` into `_update_tenant_llm_to_id_primary_key_mysql()` and `_update_tenant_llm_to_id_primary_key_postgres()`, dispatching on `settings.DATABASE_TYPE` - MySQL path: unchanged (existing `DATABASE()`, `SET @row = 0`, `AUTO_INCREMENT`, `DROP PRIMARY KEY` logic) - PostgreSQL path: uses `current_database()`, `ROW_NUMBER() OVER (ORDER BY ...)` for sequential IDs, `CREATE SEQUENCE` + `nextval()` for auto-increment, and `information_schema.table_constraints` to find the PK constraint name - Also fix `migrate_add_unique_email()`: MySQL-only `information_schema.statistics` is replaced with `pg_indexes` on PostgreSQL ## Test plan - [ ] Start RAGFlow with `DB_TYPE=postgres` — startup should complete without `function database() does not exist` error - [ ] Start RAGFlow with `DB_TYPE=mysql` (default) — existing behaviour unchanged, migration runs as before - [ ] Fresh PostgreSQL install: verify `tenant_llm.id` column is created as a serial primary key after migration - [ ] Idempotency: running migration twice on PostgreSQL should be a no-op (column already exists check passes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: gambletan <gambletan@github> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 11:53:01 +08:00
qinling0210	1be07a0a34	Fix "Result window is too large" during meta data search (#13521 ) ### What problem does this PR solve? Fix https://github.com/infiniflow/ragflow/issues/13210#issuecomment-3982878498 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-12 18:59:56 +08:00
Jinghan Xu	f6b06fab72	Fix: allow document parsing status recovery after transient errors (#13341 ) ### What problem does this PR solve? Fixes #13285 When an LLM returns a transient error (e.g. overloaded) during parsing, the task progress is set to -1. Previously, the progress could never be updated again, leaving the document permanently stuck in FAIL status even after the task successfully recovered and completed. Three coordinated changes address this: 1. task_service.update_progress: relax the progress update guard to accept prog >= 1 even when current progress is -1, so a task that recovers from a transient failure can report completion. 2. document_service.get_unfinished_docs: include documents that are marked FAIL (progress == -1) but still have at least one non-failed task (task.progress >= 0) in the polling set, so their status can be re-synced once a task recovers. Documents where all tasks have permanently failed are excluded to avoid unnecessary polling. 3. document_service.update_progress: explicitly set document status to RUNNING when not all tasks have finished, instead of preserving whatever stale status (potentially FAIL) the document previously had. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-12 18:02:12 +08:00
Josh	a353c7bdd7	Fix: avoid empty doc filter in knowledge retrieval (#13484 ) ## Summary Fix knowledge-base chat retrieval when no individual document IDs are selected. ## Root Cause `async_chat()` initialized `doc_ids` as an empty list when the request did not explicitly select documents. That empty list was then forwarded into retrieval as an active `doc_id` filter, effectively becoming `doc_id IN []` and suppressing all chunk matches. ## Changes - treat missing selected document IDs as `None` instead of `[]` - keep explicit document filtering when IDs are actually provided - add regression coverage for the shared chat retrieval path ## Validation - `python3 -m py_compile api/db/services/dialog_service.py test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py` - `.venv/bin/python -m pytest test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py` - manually verified that chat completions again inject retrieved knowledge into the prompt --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-12 16:03:30 +08:00
Yongteng Lei	375a910bcf	Fix: add deadlock retry (#13552 ) ### What problem does this PR solve? Add deadlock retry. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-12 12:39:01 +08:00
Yongteng Lei	e1b632a7bb	Feat: add delete all support for delete operations (#13530 ) ### What problem does this PR solve? Add delete all support for delete operations. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-12 09:47:42 +08:00
Ethan T.	1cee8b1a7b	fix: use context managers for file handles to prevent resource leaks (#13514 ) ## Summary - Convert bare `open()` calls to `with` context managers or `Path.read_text()` - File handles leak if not properly closed, especially on exceptions - Fixes in crypt.py, sequence2txt_model.py, term_weight.py, deepdoc/vision/__init__.py ## Test plan - [x] File operations work correctly with context managers - [x] Resources properly cleaned up on exceptions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 16:47:06 +08:00
qinling0210	1815f5950b	Call get_flatted_meta_by_kbs in dify retrieval (#13509 ) ### What problem does this PR solve? Fix https://github.com/infiniflow/ragflow/issues/13388 Call get_flatted_meta_by_kbs in dify retrieval. Remove get_meta_by_kbs. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-11 13:42:24 +08:00
Josh	2d2d3cdbcf	Fix document metadata loading for paged listings (#13515 ) ## Summary - scope normal document-list metadata lookups to the current page's document IDs - keep the `return_empty_metadata=True` path dataset-wide because it needs full knowledge of docs that already have metadata - add unit tests for both paged listing paths and the unchanged empty-metadata behavior ## Why `DocumentService.get_list()` and the normal `get_by_kb_id()` path were calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`, which loads metadata for the entire dataset on every page request. That becomes especially problematic on large datasets. The metadata scan path paginates through the full metadata index without an explicit sort, while the ES helper only switches to `search_after` beyond `10000` results when a sort is present. In practice this can lead to unnecessary full-dataset metadata work, slower document-list loading, and unreliable `meta_fields` in list responses for large KBs. This change keeps the existing empty-metadata filter behavior intact, but scopes normal list responses to metadata for the current page only.	2026-03-11 13:42:16 +08:00
yzy	07c9cf6cbe	Fix: return structured JSON output for non-streaming agent API (#13389 ) ### What problem does this PR solve? Previously, when an Agent component was configured with structured output, the non-streaming /agents/{agent_id}/completions API never returned the structured field in its response. The root cause: the non-streaming code path only collected message events to build full_content, then returned the workflow_finished payload — which only contains the output of the last component in the execution path (typically a Message component). Any structured output set by upstream components (e.g., Agent or LLM) was silently discarded. This PR fixes the non-streaming handler to iterate node_finished events and collect structured output from intermediate components. If any component produced a non-empty structured value, it is included in the final response under data.structured. The streaming path is unaffected, as it already exposes node_finished events to the caller. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 19:22:04 +08:00

1 2 3 4 5 ...

1450 Commits