ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 15:31:05 +08:00

Author	SHA1	Message	Date
Wang Qi	638b59fbcd	Fix handle move file failed (#16384 ) Follow on PR: #16350	2026-06-26 18:46:21 +08:00
Wang Qi	985e3c1db5	Fix document progress not set to fail when embedding model error (#16381 )	2026-06-26 16:11:54 +08:00
Harsh Kashyap	8d3c3f868c	fix(api): validate immutable document fields when value is zero (#16309 )	2026-06-25 19:29:12 +08:00
Harsh Kashyap	49312cace3	fix(api): align use_sql Markdown separator with Source header (#16317 )	2026-06-25 19:00:01 +08:00
Wang Qi	ac9469e5f5	Fix add VLLM without apikey will fail (#16352 )	2026-06-25 17:17:29 +08:00
Idriss Sbaaoui	fb8e5ad4b2	Fix multimodal chat image routing for VLM channel requests (#16343 )	2026-06-25 14:38:29 +08:00
buua436	479a9a715e	feat: unify provider id or name routing (#16336 )	2026-06-25 13:04:21 +08:00
Wang Qi	d0fc75f1bb	Fix when empty response not set, it report: ERROR: 'knowledge' (#16338 )	2026-06-25 13:02:24 +08:00
kpdev	68d2ca0ff1	fix(api): use dataset-owner tenant for legacy /chunks docstore cleanup (#15961 )	2026-06-24 14:24:40 +08:00
Ambercssa	e9cdd09b67	fix(agent): handle different reference data formats (#16276 )	2026-06-24 13:33:59 +08:00
Wang Qi	6046bc6a8e	Fix: handle empty folder when link to datasets (#16296 )	2026-06-24 13:31:32 +08:00
Ju Boxiang	39b194453d	Fix: paginate get_flatted_meta_by_kbs to support datasets with >10k documents (#16034 ) (#16095 )	2026-06-24 13:20:07 +08:00
ちー	5928b8b9ae	fix(document_service): prevent NoneType error on progress_msg.strip() (#16289 ) ### What problem does this PR solve? When I run RAGFlow_server.py: ``` 2026-06-24 10:27:01,938 ERROR 3413485 fetch task exception Traceback (most recent call last): File "/home/infiniflow/Documents/development/ragflow/api/db/services/document_service.py", line 948, in _sync_progress if t.progress_msg.strip(): ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'strip' ``` fixed: ```python if t.progress_msg.strip(): # fix: if (t.progress_msg or "").strip(): ``` Fix crash in `_sync_progress` when `progress_msg` is `None`. #### Root Cause `progress_msg` from task records can be `None`, causing: ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-24 13:07:40 +08:00
buua436	ba4021a9de	fix: restore dataflow rerun and detail payload (#16292 )	2026-06-24 13:06:06 +08:00
buua436	d5d9d19fbe	fix: keep chat channel bindings consistent (#16274 )	2026-06-24 11:51:35 +08:00
Wang Qi	a4f325be24	Fix: add /v1/document/upload_info -> /api/v1/documents/upload back (#16264 )	2026-06-23 17:47:55 +08:00
buua436	aba5d172bd	feat: add whatsapp web qr chat channel (#16238 ) Adds a WhatsApp chat channel backed by a QR-based web login flow so users can connect without manual token setup.	2026-06-23 17:45:31 +08:00
buua436	b409cfc3d5	feat: add dingtalk chat channel (#16183 ) ### What does this PR do? This PR adds a new DingTalk chat channel integration and hardens the inbound callback path. ### Summary - Adds DingTalk as a selectable chat channel in the UI and backend channel registry. - Adds the DingTalk chat channel icon asset. - Acknowledges DingTalk Stream callbacks and deduplicates repeated inbound messages to avoid duplicate replies.	2026-06-18 20:06:00 +08:00
Wang Qi	5ca1686ac7	Fix that agent cannot be the same name (#16192 ) Fix that agent cannot be the same name	2026-06-18 19:10:21 +08:00
qinling0210	563d855780	Implement OpenAI chat completions in GO (#16177 ) ### What problem does this PR solve? Implement OpenAI chat completions in GO POST /api/v1/openai/<chat_id>/chat/completions OpenAI chat cli: internal/development.md ### Type of change - [x] Refactoring	2026-06-18 18:07:27 +08:00
buua436	a2de7d0060	fix: chat channel defaults and feishu shutdown (#16176 ) This PR keeps the chat-channel default values and Feishu shutdown behavior consistent after the rebase.	2026-06-18 17:44:48 +08:00
Lynn	47bd9dd049	Fix: replace tenant_llm apis (#16131 ) Replace tenant_llm apis with provider-instance apis.	2026-06-18 16:38:32 +08:00
buua436	ea70663f09	feat: support wecom websocket channel (#16175 ) Added WeCom chat channel websocket mode alongside the existing webhook mode, plus frontend support for selecting the connection type.	2026-06-18 13:10:09 +08:00
buua436	43d121ad38	feat: add qqbot chat channel (#16140 ) ### What problem does this PR solve? Adds qqbot as a built-in chat channel so it can be discovered and started by the channel bootstrapper and shown in the chat channel settings UI. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-17 18:49:38 +08:00
buua436	be869f5d96	fix: chat channel runtime (#16129 ) ### What problem does this PR solve? Fix chat channel message routing to use the connected `chat_id`, and make the Feishu websocket client bind to the thread-local event loop. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 15:52:13 +08:00
buua436	78b4906f7a	fix: tighten embedding truncation threshold (#16123 ) ### What problem does this PR solve? Use a 95% max_length threshold before truncating embedding inputs, which reduces the chance of provider-side invalid-parameter errors on near-limit chunks. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 14:18:02 +08:00
euvre	9bd53ce675	fix: return full record in get_ingestion_log (#16120 ) ### What problem does this PR solve? The `get_ingestion_log` endpoint (both Python `dataset_api_service.get_ingestion_log` and Go `DatasetService.GetIngestionLog`) was returning only the dataset-level field set, which omits critical fields such as `dsl`, `document_id`, `parser_id`, `document_name`, `pipeline_id`, etc. This caused the front-end dataflow-result page to be unable to render the pipeline timeline and chunks when viewing a single ingestion log, regardless of whether the log was a dataset-level operation (graph/raptor/mindmap) or a per-file parse. ### Background `PipelineOperationLogService` provides two field sets: \| Method \| Fields \| \|---\|---\| \| `get_dataset_logs_fields` \| Minimal set (progress, status, timestamps, etc.) \| \| `get_file_logs_fields` \| Superset — includes `document_id`, `dsl`, `parser_id`, `document_name`, `pipeline_id`, … \| When listing logs, the API correctly distinguishes dataset-level vs file-level logs and uses the appropriate converter. However, when fetching a single log by ID, both the Python and Go implementations were hardcoded to the dataset-level set, dropping the extra fields that the front-end needs.	2026-06-17 13:03:51 +08:00
euvre	fe46244d30	fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#16106 ) The parser pods suffer from OOM kills when processing large PDF documents. The root cause is in api/db/services/task_service.py: when layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be processed as a single task with all pages loaded into memory simultaneously. This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the same way DeepDOC already does.	2026-06-17 09:33:53 +08:00
Wang Qi	17e3aad7ae	Revert "fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM" (#16104 ) Reverts infiniflow/ragflow#15951	2026-06-16 20:11:45 +08:00
euvre	d2a18d5c46	fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#15951 ) ### What problem does this PR solve? The parser pods suffer from OOM kills when processing large PDF documents. The root cause is in api/db/services/task_service.py: when layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be processed as a single task with all pages loaded into memory simultaneously. This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the same way DeepDOC already does. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [x] Performance Improvement - [ ] Other (please describe):	2026-06-16 20:07:19 +08:00
Wang Qi	8067e97f0d	Refactor: rename /chat_channels to /chat-channels (#16099 )	2026-06-16 19:15:43 +08:00
Kevin Hu	15f50e5cb2	fix: rename dialog_id to chat_id in chat_channel (backend + frontend) (#16096 ) ## Summary - The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id` via a migration (added in a prior commit). - Aligns the REST API layer (`chat_channel_api.py`, `chat_channel_service.py`) to use `chat_id` consistently. - Updates the frontend (`interface.ts`, `hooks.ts`, `connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write `chat_id` instead of `dialog_id`. - The joined `dialog_name` alias in the list query is unchanged (backend still returns it under that name). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-16 19:02:20 +08:00
Lynn	70792de899	Fix: v0.26.1 model provider (#16073 ) ### What problem does this PR solve? Fix: - Pass session_id to langfuse. - Get correct status for add model_type. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 16:21:43 +08:00
Lynn	b4a161b50e	Fix: filter unsupported model_type (#16062 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 13:15:42 +08:00
Kevin Hu	5a817762fa	Refactor: Change table chat_channel status data type. (#16061 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring	2026-06-16 12:02:12 +08:00
buua436	8e235b7b95	fix: add legacy chat/completions mode (#16014 ) ### What problem does this PR solve? Adds a legacy mode for /chat/completions that restores v0.23.0-style output by converting start_to_think/end_to_think back into raw <think></think> markers and streaming cumulative answer text. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 10:34:06 +08:00
Lynn	47495c1f6a	Feat: model provider (#16028 ) ### What problem does this PR solve? Feat: - Allow upsert model_type for instance model Fix: - Allow create instance with duplicate api_key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-06-15 19:10:33 +08:00
Wang Qi	f6a2075ad0	Fix one data source can be synced to multiple dataset (#16023 ) Fix one data source can be synced to multiple dataset Test add/delete - worked.	2026-06-15 16:54:25 +08:00
dripsmvcp	53d4d9b3bd	fix(api): return 4xx not 500 when attachment blob is missing (#15509 ) Guard the agent-attachment download against a missing or empty storage blob so the caller gets a structured 4xx (`Document not found!`) instead of an HTTP 500. Same bug class as #15365 on document preview. Resolve #15502	2026-06-15 15:41:49 +08:00
Yingfeng	b5bea72e4b	Add git-like file commit API (#15978 ) ### What problem does this PR solve? \| # \| Method \| Endpoint \| Description \| Git Equivalent \| \|---\|--------\|----------\|-------------\|----------------\| \| 1 \| `POST` \| `/api/v1/{prefix}/{folder_id}/commits` \| Create a snapshot commit with file changes (add/modify/delete/rename) \| `git add` + `git commit` \| \| 2 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits` \| List commit history (paginated) \| `git log` \| \| 3 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` \| Get commit detail with file changes \| `git show` \| \| 4 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` \| List file changes in a commit \| `git show --name-status` \| \| 5 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` \| Compare two commits and return differences \| `git diff` \| \| 6 \| `GET` \| `/api/v1/{prefix}/{folder_id}/changes` \| Get uncommitted changes (add/modify/delete) \| `git status` \| \| 7 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` \| Get the folder tree snapshot at commit time \| `git ls-tree` \| \| 8 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content` \| Get a file's content as it existed in a specific commit \| `git show HEAD:file` \| \| 9 \| `GET` \| `/api/v1/{prefix}/{file_id}/versions` \| Get version history for a specific file across all commits \| `git log -- file` \| Where `{prefix}/{id}` can be: - `folders/{folder_id}` — direct folder access - `workspaces/{workspace_id}` — alias of `folders/{folder_id}` - `datasets/{dataset_id}` — resolves to the dataset's folder - `memories/{memory_id}` — resolves to the memory's folder - `skills/{skill_id}` — resolves to the skill's folder ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-06-15 11:19:56 +08:00
Wang Qi	59d4203947	Fix last login time (#16004 ) Fix last login time	2026-06-15 10:06:24 +08:00
Kevin Hu	b5a426e6e0	Feat: chat channels — connect assistants to external messaging bots (#15850 ) ### What problem does this PR solve? #15844 Adds a Chat channels capability so a RAGFlow assistant (Dialog) can be exposed as a bot on external messaging platforms (Feishu/Lark, Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot in the UI, connects it to an assistant, and inbound messages are answered from that assistant's knowledge base — replies are delivered back on the channel. Feishu/Lark is implemented and tested end-to-end. Discord, Telegram, LINE, and WeCom are scaffolded against the same interface; the remaining listed channels are tracked as follow-ups. ### Design Backend - New `chat_channel` table (`tenant_id`, `name`, `channel`, `config` JSON holding `{credential: {...}}`, `dialog_id`, `status`) + `ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`. - Channel framework under `api/channels/`: a `core` registry + per-channel packages that self-register a builder and implement a common `Channel` interface (`start`/`stop`/`send` + inbound normalization) over `IncomingMessage`/`OutgoingMessage`. - Embedded reconcile loop in `ragflow_server` (`api/channels/bootstrap.py`): loads enabled bots, and starts/stops/restarts them as rows change (no server restart needed). Inbound messages run the connected dialog via the non-streaming completion path, keeping per-end-user conversation history. - Missing optional channel SDKs degrade gracefully (channel skipped with a warning; others unaffected). Channel-level errors are logged, not crashed. - Feishu's WebSocket client runs in a dedicated thread with its own event loop to avoid cross-loop/contextvars conflicts with the channel runtime. Frontend - Settings → Chat channels panel: available-channels grid + configured-bots list with add/edit/delete and a Connect assistant popup that binds a bot to a dialog. - Brand icons via simple-icons / reused shared data-source assets, with colored fallbacks for brands not available. - Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix so the settings page no longer highlights the Chat tab. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Notes - DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id` is also covered by a `migrate_db` `alter_db_add_column` for existing installs. - Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`, `line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies. - Screenshots / per-channel credential docs to follow. <img width="1338" height="1290" alt="Image" src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704" /> <img width="1344" height="738" alt="Image" src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617" /> <img width="672" height="887" alt="Image" src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84" /> ---------	2026-06-12 18:21:30 +08:00
Carl Harris	a2de880b6d	fix(profile): enforce profile name validation and input constraints (#15694 ) ### What problem does this PR solve? The Profile Name field currently lacks application-level validation and allows users to save excessively long names and unsupported special characters. While the database enforces a maximum length of 100 characters, neither the frontend nor backend validates nickname format before persistence. This can result in inconsistent user data, poor user experience, and UI layout issues when long names wrap across multiple lines. This PR introduces consistent frontend and backend validation for profile names, enforces length and character constraints, provides clear validation feedback, and prevents invalid values from being saved. Fixes #15693 ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-12 11:13:18 +08:00
Jonathan Chang	de06c9a60b	feat: Langfuse session grouping for multi-turn chat traces (#15679 ) ## Summary This PR passes `session_id` into Langfuse trace observations so multi-turn chat messages can be grouped under the same session in Langfuse. Changes include: - Propagate `session_id` from chat/session APIs into `dialog_service.async_chat`. - Pass `session_id` into Langfuse `start_observation(...)`. - Share Langfuse `trace_context` with chat, embedding, rerank, and TTS model bundles where applicable. - Add unit coverage to verify Langfuse observations receive `session_id`. - Update affected test stubs for the new optional Langfuse context arguments. ## Related Issue Closes: #15636 ## Change Type - [x] Feature - [x] Bug fix - [x] Test - [ ] Refactor - [ ] Documentation - [ ] Breaking change ## Real Behavior Proof Before this change: - Langfuse observations were created without `session_id`. - Multi-turn chat traces could not be grouped by session in Langfuse. After this change: - Chat/session flows pass `session_id` into `async_chat`. - Langfuse observations include `session_id`. - Related model bundles receive shared trace context and session metadata. Validation result: ```bash uv run python -m py_compile \ api/db/services/tenant_llm_service.py \ api/db/services/llm_service.py \ api/db/services/dialog_service.py \ api/db/services/conversation_service.py \ api/apps/restful_apis/chat_api.py \ test/unit_test/api/db/services/test_dialog_service_final_answer.py \ test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py ``` Passed. ```bash uv run pytest \ test/unit_test/api/db/services/test_dialog_service_final_answer.py \ test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py -q ``` Result: ```text 11 passed in 16.89s ``` ```bash git diff --check ``` Passed. ## Checklist - [x] Analyzed the issue requirement. - [x] Checked existing Langfuse trace integration. - [x] Implemented only the requested session grouping behavior. - [x] Added/updated unit tests. - [x] Ran focused tests successfully. - [x] Ran Python compile validation. - [x] Ran whitespace diff validation.	2026-06-12 10:18:06 +08:00
Lynn	9d5950963b	Fix: get is_tools from model record (#15946 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 17:29:28 +08:00
少卿	9614605bf9	fix: propagate max_tokens from model config to downstream consumers (#15945 ) ## Summary `get_model_config_from_provider_instance()` was not including `max_tokens` in its returned dict, causing all downstream consumers (dialog truncation, message fitting, knowledge base trimming, embedding, graphrag, RAPTOR) to fall back to the hardcoded default of 8192 tokens regardless of the actual model context window size (e.g., GPT-4o 128K, Claude 200K). Closes #15944 ## Root Cause The function builds `model_config` with only: `llm_factory`, `api_key`, `llm_name`, `api_base`, `model_type`, `is_tools`. `max_tokens` is never included. Yet the data exists in four independent sources: 1. `TenantModel.extra` JSON field — written by `provider_api_service.py:659` 2. `conf/llm_factories.json` — every model entry has `max_tokens` 3. `rag/llm/model_meta.py` — 9 provider classes fetch real context windows from APIs 4. `TenantLLM.max_tokens` database column None of them are read by this function. ## Fix Two lines added, one per return path: - Path B (model_obj exists → provider-instance model): reads `max_tokens` from `model_obj.extra` JSON - Path C (fallback → factory config): reads `max_tokens` from `llm_info` (sourced from `llm_factories.json`) Both fall back to 8192 when the value is absent, preserving backward compatibility. ## Impact This single 5-line change fixes the context window budget for all 78+ call sites across 20 files that construct `LLMBundle` or read `max_tokens` from the config dict, including: \| Consumer \| File \| Effect \| \|---\|---\|---\| \| Dialog chat truncation \| `dialog_service.py:562` \| `message_fit_in(msg, max_tokens * 0.95)` now uses real context window \| \| Knowledge base trimming \| `dialog_service.py:752` \| `kb_prompt(kbinfos, max_tokens)` now fits more retrieved content \| \| Agent message fitting \| `agent/component/llm.py:322` \| Agent prompts no longer truncated at 7946 tokens \| \| Embedding truncation \| `task_executor.py:704` \| Embedding input uses actual model limit \| \| GraphRAG extraction \| `graphrag/*/extractor.py` \| Entity extraction gets full context budget \| \| LLM4Tenant.max_length \| `tenant_llm_service.py:513` \| Chat model wrapper exposes real context window \|	2026-06-11 17:24:58 +08:00
balibabu	70ae25fc7b	Fix: Remove the pagination from the search and retrieval pages. (#15942 ) ### What problem does this PR solve? Fix: Remove the pagination from the search and retrieval pages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 16:36:05 +08:00
jaso0n0818	2971849783	fix: guard docStoreConn.delete with index_exist in parse and stop_parsing (#15876 ) ## What problem does this PR solve? Closes #15874 Both the `POST /api/v1/datasets/<dataset_id>/chunks` (re-parse) and `DELETE /api/v1/datasets/<dataset_id>/chunks` (stop-parsing) handlers called `settings.docStoreConn.delete` unconditionally. When the tenant/dataset index has not been created yet — fresh dataset, first parse interrupted before any chunks were indexed, or index manually removed — the delete call throws and the handler returns HTTP 500 after the document state was already mutated (RUNNING with zeroed counters for the parse path; CANCEL with zeroed counters for the stop path), leaving the document in an inconsistent state. The newer `parse_documents` path in `document_api.py` already uses `index_exist` before deleting: ## How to fix? Apply the same `index_exist` guard to both call sites in `chunk_api.py`: - `parse` (POST path, line ~192): guard the delete before `TaskService.filter_delete`. - `stop_parsing` (DELETE path, line ~242): guard the delete after `DocumentService.update_by_id`. Both sites already have the correct `search.index_name(tenant_id)` and `dataset_id` parameters; the guard is a one-line addition at each site. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 16:30:03 +08:00
bohdansolovie	381091df71	fix(dialog): guard async_ask() against empty or invalid kb_ids (#15530 ) Fixes #15529 . ### Problem `async_ask()` accessed `kbs[0]` without verifying that `KnowledgebaseService.get_by_ids()` returned any knowledge bases. Empty or stale `kb_ids` raised `IndexError`, which surfaced as HTTP 500 on search/bot SSE endpoints. ### Fix - Add an early guard when `kbs` is empty, yielding a final SSE error event (consistent with `gen_mindmap()` in the same module). - Add regression tests for empty `kb_ids` and deleted/invalid KB IDs. ### Test plan - [ ] `pytest test/unit_test/api/db/services/test_dialog_service_final_answer.py -k "async_ask_empty or async_ask_stale"` - [ ] Manual: `POST /api/v1/searchbots/ask` with invalid `kb_ids` returns SSE error, not HTTP 500 --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:52:59 +08:00
kpdev	de18313f97	fix(api): POST /documents/stop removes partial chunks and resets counters (#15789 ) ### What problem does this PR solve? `POST /api/v1/datasets/{dataset_id}/documents/stop` (`stop_parse_documents`) cancels parsing tasks and sets `run` to `CANCEL`, but it does not remove chunks already indexed in the doc store or reset `progress` / `chunk_num`. REST callers can end up with a “cancelled” document that still returns partial chunks in `GET .../chunks` and in retrieval. Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`) already performs full cleanup: it resets counters and calls `docStoreConn.delete`. This PR aligns the newer stop endpoint with that behavior so both paths leave the dataset consistent. Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Update `stop_parse_documents` in `document_api.py` to reset `progress` and `chunk_num` to `0` and delete partial chunks via `docStoreConn.delete` after `cancel_all_task_of`. - Add unit test `test_stop_parse_documents_cleans_partial_chunks` to assert counters reset and doc store delete is invoked. ### Test plan - [x] Unit test: `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks -v` - [ ] Manual: upload a slow document, start parse, call `POST .../documents/stop` while `RUNNING`, verify `GET .../chunks` returns zero chunks and UI `chunk_count` is 0 - [ ] Control: legacy `DELETE .../chunks` behavior unchanged --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:51:32 +08:00

1 2 3 4 5 ...

1777 Commits