ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Wang Qi	8067e97f0d	Refactor: rename /chat_channels to /chat-channels (#16099 )	2026-06-16 19:15:43 +08:00
Kevin Hu	15f50e5cb2	fix: rename dialog_id to chat_id in chat_channel (backend + frontend) (#16096 ) ## Summary - The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id` via a migration (added in a prior commit). - Aligns the REST API layer (`chat_channel_api.py`, `chat_channel_service.py`) to use `chat_id` consistently. - Updates the frontend (`interface.ts`, `hooks.ts`, `connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write `chat_id` instead of `dialog_id`. - The joined `dialog_name` alias in the list query is unchanged (backend still returns it under that name). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-16 19:02:20 +08:00
Lynn	b4a161b50e	Fix: filter unsupported model_type (#16062 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 13:15:42 +08:00
Kevin Hu	5a817762fa	Refactor: Change table chat_channel status data type. (#16061 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring	2026-06-16 12:02:12 +08:00
buua436	8e235b7b95	fix: add legacy chat/completions mode (#16014 ) ### What problem does this PR solve? Adds a legacy mode for /chat/completions that restores v0.23.0-style output by converting start_to_think/end_to_think back into raw <think></think> markers and streaming cumulative answer text. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 10:34:06 +08:00
Lynn	47495c1f6a	Feat: model provider (#16028 ) ### What problem does this PR solve? Feat: - Allow upsert model_type for instance model Fix: - Allow create instance with duplicate api_key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-06-15 19:10:33 +08:00
dripsmvcp	53d4d9b3bd	fix(api): return 4xx not 500 when attachment blob is missing (#15509 ) Guard the agent-attachment download against a missing or empty storage blob so the caller gets a structured 4xx (`Document not found!`) instead of an HTTP 500. Same bug class as #15365 on document preview. Resolve #15502	2026-06-15 15:41:49 +08:00
Yingfeng	b5bea72e4b	Add git-like file commit API (#15978 ) ### What problem does this PR solve? \| # \| Method \| Endpoint \| Description \| Git Equivalent \| \|---\|--------\|----------\|-------------\|----------------\| \| 1 \| `POST` \| `/api/v1/{prefix}/{folder_id}/commits` \| Create a snapshot commit with file changes (add/modify/delete/rename) \| `git add` + `git commit` \| \| 2 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits` \| List commit history (paginated) \| `git log` \| \| 3 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` \| Get commit detail with file changes \| `git show` \| \| 4 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` \| List file changes in a commit \| `git show --name-status` \| \| 5 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` \| Compare two commits and return differences \| `git diff` \| \| 6 \| `GET` \| `/api/v1/{prefix}/{folder_id}/changes` \| Get uncommitted changes (add/modify/delete) \| `git status` \| \| 7 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` \| Get the folder tree snapshot at commit time \| `git ls-tree` \| \| 8 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content` \| Get a file's content as it existed in a specific commit \| `git show HEAD:file` \| \| 9 \| `GET` \| `/api/v1/{prefix}/{file_id}/versions` \| Get version history for a specific file across all commits \| `git log -- file` \| Where `{prefix}/{id}` can be: - `folders/{folder_id}` — direct folder access - `workspaces/{workspace_id}` — alias of `folders/{folder_id}` - `datasets/{dataset_id}` — resolves to the dataset's folder - `memories/{memory_id}` — resolves to the memory's folder - `skills/{skill_id}` — resolves to the skill's folder ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-06-15 11:19:56 +08:00
Wang Qi	59d4203947	Fix last login time (#16004 ) Fix last login time	2026-06-15 10:06:24 +08:00
Kevin Hu	b5a426e6e0	Feat: chat channels — connect assistants to external messaging bots (#15850 ) ### What problem does this PR solve? #15844 Adds a Chat channels capability so a RAGFlow assistant (Dialog) can be exposed as a bot on external messaging platforms (Feishu/Lark, Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot in the UI, connects it to an assistant, and inbound messages are answered from that assistant's knowledge base — replies are delivered back on the channel. Feishu/Lark is implemented and tested end-to-end. Discord, Telegram, LINE, and WeCom are scaffolded against the same interface; the remaining listed channels are tracked as follow-ups. ### Design Backend - New `chat_channel` table (`tenant_id`, `name`, `channel`, `config` JSON holding `{credential: {...}}`, `dialog_id`, `status`) + `ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`. - Channel framework under `api/channels/`: a `core` registry + per-channel packages that self-register a builder and implement a common `Channel` interface (`start`/`stop`/`send` + inbound normalization) over `IncomingMessage`/`OutgoingMessage`. - Embedded reconcile loop in `ragflow_server` (`api/channels/bootstrap.py`): loads enabled bots, and starts/stops/restarts them as rows change (no server restart needed). Inbound messages run the connected dialog via the non-streaming completion path, keeping per-end-user conversation history. - Missing optional channel SDKs degrade gracefully (channel skipped with a warning; others unaffected). Channel-level errors are logged, not crashed. - Feishu's WebSocket client runs in a dedicated thread with its own event loop to avoid cross-loop/contextvars conflicts with the channel runtime. Frontend - Settings → Chat channels panel: available-channels grid + configured-bots list with add/edit/delete and a Connect assistant popup that binds a bot to a dialog. - Brand icons via simple-icons / reused shared data-source assets, with colored fallbacks for brands not available. - Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix so the settings page no longer highlights the Chat tab. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Notes - DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id` is also covered by a `migrate_db` `alter_db_add_column` for existing installs. - Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`, `line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies. - Screenshots / per-channel credential docs to follow. <img width="1338" height="1290" alt="Image" src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704" /> <img width="1344" height="738" alt="Image" src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617" /> <img width="672" height="887" alt="Image" src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84" /> ---------	2026-06-12 18:21:30 +08:00
Carl Harris	a2de880b6d	fix(profile): enforce profile name validation and input constraints (#15694 ) ### What problem does this PR solve? The Profile Name field currently lacks application-level validation and allows users to save excessively long names and unsupported special characters. While the database enforces a maximum length of 100 characters, neither the frontend nor backend validates nickname format before persistence. This can result in inconsistent user data, poor user experience, and UI layout issues when long names wrap across multiple lines. This PR introduces consistent frontend and backend validation for profile names, enforces length and character constraints, provides clear validation feedback, and prevents invalid values from being saved. Fixes #15693 ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-12 11:13:18 +08:00
Jonathan Chang	de06c9a60b	feat: Langfuse session grouping for multi-turn chat traces (#15679 ) ## Summary This PR passes `session_id` into Langfuse trace observations so multi-turn chat messages can be grouped under the same session in Langfuse. Changes include: - Propagate `session_id` from chat/session APIs into `dialog_service.async_chat`. - Pass `session_id` into Langfuse `start_observation(...)`. - Share Langfuse `trace_context` with chat, embedding, rerank, and TTS model bundles where applicable. - Add unit coverage to verify Langfuse observations receive `session_id`. - Update affected test stubs for the new optional Langfuse context arguments. ## Related Issue Closes: #15636 ## Change Type - [x] Feature - [x] Bug fix - [x] Test - [ ] Refactor - [ ] Documentation - [ ] Breaking change ## Real Behavior Proof Before this change: - Langfuse observations were created without `session_id`. - Multi-turn chat traces could not be grouped by session in Langfuse. After this change: - Chat/session flows pass `session_id` into `async_chat`. - Langfuse observations include `session_id`. - Related model bundles receive shared trace context and session metadata. Validation result: ```bash uv run python -m py_compile \ api/db/services/tenant_llm_service.py \ api/db/services/llm_service.py \ api/db/services/dialog_service.py \ api/db/services/conversation_service.py \ api/apps/restful_apis/chat_api.py \ test/unit_test/api/db/services/test_dialog_service_final_answer.py \ test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py ``` Passed. ```bash uv run pytest \ test/unit_test/api/db/services/test_dialog_service_final_answer.py \ test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py -q ``` Result: ```text 11 passed in 16.89s ``` ```bash git diff --check ``` Passed. ## Checklist - [x] Analyzed the issue requirement. - [x] Checked existing Langfuse trace integration. - [x] Implemented only the requested session grouping behavior. - [x] Added/updated unit tests. - [x] Ran focused tests successfully. - [x] Ran Python compile validation. - [x] Ran whitespace diff validation.	2026-06-12 10:18:06 +08:00
balibabu	70ae25fc7b	Fix: Remove the pagination from the search and retrieval pages. (#15942 ) ### What problem does this PR solve? Fix: Remove the pagination from the search and retrieval pages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 16:36:05 +08:00
jaso0n0818	2971849783	fix: guard docStoreConn.delete with index_exist in parse and stop_parsing (#15876 ) ## What problem does this PR solve? Closes #15874 Both the `POST /api/v1/datasets/<dataset_id>/chunks` (re-parse) and `DELETE /api/v1/datasets/<dataset_id>/chunks` (stop-parsing) handlers called `settings.docStoreConn.delete` unconditionally. When the tenant/dataset index has not been created yet — fresh dataset, first parse interrupted before any chunks were indexed, or index manually removed — the delete call throws and the handler returns HTTP 500 after the document state was already mutated (RUNNING with zeroed counters for the parse path; CANCEL with zeroed counters for the stop path), leaving the document in an inconsistent state. The newer `parse_documents` path in `document_api.py` already uses `index_exist` before deleting: ## How to fix? Apply the same `index_exist` guard to both call sites in `chunk_api.py`: - `parse` (POST path, line ~192): guard the delete before `TaskService.filter_delete`. - `stop_parsing` (DELETE path, line ~242): guard the delete after `DocumentService.update_by_id`. Both sites already have the correct `search.index_name(tenant_id)` and `dataset_id` parameters; the guard is a one-line addition at each site. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 16:30:03 +08:00
kpdev	de18313f97	fix(api): POST /documents/stop removes partial chunks and resets counters (#15789 ) ### What problem does this PR solve? `POST /api/v1/datasets/{dataset_id}/documents/stop` (`stop_parse_documents`) cancels parsing tasks and sets `run` to `CANCEL`, but it does not remove chunks already indexed in the doc store or reset `progress` / `chunk_num`. REST callers can end up with a “cancelled” document that still returns partial chunks in `GET .../chunks` and in retrieval. Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`) already performs full cleanup: it resets counters and calls `docStoreConn.delete`. This PR aligns the newer stop endpoint with that behavior so both paths leave the dataset consistent. Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Update `stop_parse_documents` in `document_api.py` to reset `progress` and `chunk_num` to `0` and delete partial chunks via `docStoreConn.delete` after `cancel_all_task_of`. - Add unit test `test_stop_parse_documents_cleans_partial_chunks` to assert counters reset and doc store delete is invoked. ### Test plan - [x] Unit test: `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks -v` - [ ] Manual: upload a slow document, start parse, call `POST .../documents/stop` while `RUNNING`, verify `GET .../chunks` returns zero chunks and UI `chunk_count` is 0 - [ ] Control: legacy `DELETE .../chunks` behavior unchanged --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:51:32 +08:00
bohdansolovie	47fb462e46	fix(api): guard dataset delete when File2Document row is missing (#15533 ) ## Summary Fixes #15532 — `delete_datasets()` crashes with `IndexError` when a document has no `File2Document` row. `delete_datasets()` in `dataset_api_service.py` called `File2DocumentService.get_by_document_id()` and immediately accessed `f2d[0].file_id` without checking whether the lookup returned any rows. Documents created via API ingestion or connector sync may exist without a linked file record, causing dataset deletion to abort with HTTP 500. This PR mirrors the existing guard already used in `file_service.py` and `document_api_service.py`.	2026-06-11 15:18:08 +08:00
Idriss Sbaaoui	9871a7e0b6	fix: replicate model provider (#15933 ) ### What problem does this PR solve? FIx replicate model provider failing with valid api key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:08:33 +08:00
zaviermeekz-cpu	a1dc2da7b4	fix: add model_name to embed completion request (#15883 ) (#15888 ) ### What problem does this PR solve? When embedding a chatbot, the API returned `"Model Name is required"`. The embed widget now includes the assistant's `llm_id` as `model_name` in the completion request. ### Type of change - [x] Bug Fix ### How has this been tested? - Created a chatbot with a default model. - Embedded it and sent a message – the error is gone and the assistant replies correctly. ### Related Issue Closes #15883 Co-authored-by: RAGFlow Dev <dev@ragflow.local> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 14:38:37 +08:00
zaviermeekz-cpu	c50f9c59aa	fix: allow zero message history window and clear history for new sessions (#15897 ) (#15902 ) ### What problem does this PR solve? Two bugs in the Agent Categorize component: 1. The backend rejected `message_history_window_size = 0` while frontend allowed it, causing API errors. 2. When calling the agent API without a `session_id`, a new session was created but retained history from previous conversations. ### Type of change - [x] Bug Fix ### How has this been tested? - Issue 1: `CategorizeParam().check()` now accepts `0` and rejects negative values. - Issue 2: `canvas.clear_history()` is called for new sessions (no `session_id`), ensuring fresh conversation state. Verified via UI and API that a second call without `session_id` does not remember the first conversation. ### Related Issue Closes #15897 Co-authored-by: RAGFlow Dev <dev@ragflow.local> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 13:24:48 +08:00
Wang Qi	238a01d9e3	Fix multiple tags (#15931 ) Fix multiple tags	2026-06-11 10:55:28 +08:00
Lynn	32559d2dfc	Fix: model list (#15914 ) ### What problem does this PR solve? Display OCR tag for model providers. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 09:40:45 +08:00
Wang Qi	acaeb416ca	Fix cannot add fish audio (#15913 ) Fix cannot add fish audio	2026-06-10 20:27:43 +08:00
balibabu	aafe6c5534	Fix: The dataset retrieval test returned an incorrect total number. (#15901 ) ### What problem does this PR solve? Fix: The dataset retrieval test returned an incorrect total number. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-06-10 19:11:31 +08:00
Wang Qi	3091d91cf7	Fix no need to put inactive models to bottom (#15903 ) Fix no need to put inactive models to bottom	2026-06-10 16:55:02 +08:00
buua436	dcf623d60d	feat: support multi-type factory models (#15893 ) ### What problem does this PR solve? Support factory models with multiple model types, so visual chat models can be exposed as both image2text and chat while preserving the database model-type-per-record design. This also updates the SILICONFLOW model list and adds a helper script to refresh SiliconFlow models from the provider API. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-10 15:35:21 +08:00
Lynn	478c9846a1	Fix: model list (#15860 ) ### What problem does this PR solve? Remove tenant_llm call in rag. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 14:59:57 +08:00
Wang Qi	9aa81e7cad	Fix paddle ocr / minerU cannot add (#15858 ) Fix paddle ocr / minerU cannot add	2026-06-10 13:04:13 +08:00
Wang Qi	7ed1f1c865	Fix VLLM cannot add without /v1 (#15851 ) Fix VLLM cannot add without /v1	2026-06-09 19:11:15 +08:00
Wang Qi	2773208159	Fix: MinerU cannot be added (#15841 ) Fix: MinerU cannot be added	2026-06-09 19:06:51 +08:00
euvre	f97d6396b4	fix: BaiduYiyan API key validation fails in set_api_key (#15828 ) ### What problem does this PR solve? When setting the API key for the BaiduYiyan provider, all model validations fail with the error "Fail to access model using this api key. No valid response received". Root cause: 1. `BaiduYiyanChat` in `rag/llm/chat_model.py` does not override `async_chat_streamly()`. The `verify_api_key()` function uses `mdl.async_chat_streamly()` to validate, but `BaiduYiyanChat` inherits `Base.async_chat_streamly()` which uses the OpenAI client, not the Baidu Qianfan SDK (qianfan). Since BaiduYiyan has no OpenAI-compatible base_url, validation always fails. 2. `verify_api_key()` in `provider_api_service.py` does not format the raw API key string into the JSON format (`{"yiyan_ak": "...", "yiyan_sk": "..."}`) that `BaiduYiyanChat.__init__()` expects via `json.loads(key)`. Fix: 1. Add `async_chat_streamly()` method to `BaiduYiyanChat` using the qianfan SDK, consistent with the existing `chat_streamly()` method. 2. Add BaiduYiyan API key formatting in `provider_api_service.py` `verify_api_key()` to match the format expected by `BaiduYiyanChat.__init__()`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-06-09 19:05:58 +08:00
buua436	c1496ffd43	fix: propagate memory tenant id in task collect (#15837 ) ### What problem does this PR solve? Propagate `tenant_id` from memory task messages into task collection so refactored task execution can build a valid context. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 17:47:48 +08:00
balibabu	287a4cfd2b	Fix: An error message appears when accessing the agent's launch page: "pagesize exceeds maximum value". (#15835 ) ### What problem does this PR solve? Fix: An error message appears when accessing the agent's launch page: "pagesize exceeds maximum value". ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-06-09 16:56:47 +08:00
Lynn	1ab51a27bf	Fix: list intl Tongyi-Qianwen base_url (#15831 ) ### What problem does this PR solve? Display intl `base_url` for Tongyi-Qianwen ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-09 13:19:39 +08:00
Lynn	b9f06e6095	Feat: model list (#15774 ) ### What problem does this PR solve? Support model list for VolcEngine. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-08 20:18:00 +08:00
buua436	0c5245e454	fix: await lmstudio embedding verification (#15772 ) ### What problem does this PR solve? Fix LM-Studio provider connection verification so embedding checks await the async wrapper correctly and log the full traceback on failures. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 19:17:47 +08:00
buua436	e81bca73d5	fix: normalize agent session chunks (#15756 ) ### What problem does this PR solve? Normalize agent session chunk references so they are mapped through a dedicated helper instead of duplicating the field extraction inline. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 15:29:55 +08:00
buua436	6bf7056422	feat: add placeholder model metas (#15753 ) ### What problem does this PR solve? add placeholder model metas ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 14:54:59 +08:00
qinling0210	c960dc2a4c	Refine handling of POST /api/v1/datasets/search in GO (#15583 ) ### What problem does this PR solve? Refine handling of POST /api/v1/datasets/search in GO ### Type of change - [x] Refactoring	2026-06-08 11:49:37 +08:00
Lynn	b05d5a5228	Feat: get model list from remote (#15711 ) ### What problem does this PR solve? Feat： - Get model list from remote provider. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-08 11:02:40 +08:00
Wang Qi	aa9545e4c9	Revert "fix: duplicate document ingest guard" (#15707 ) Reverts infiniflow/ragflow#15638	2026-06-05 17:45:29 +08:00
Wang Qi	214ee319f8	Revert "fix(api): authorize owner_ids for list chats and search apps (#14775 ) (#15698 ) This reverts PR #14775 commit `5a5e766386`.	2026-06-05 17:26:02 +08:00
Wang Qi	4cbe597d7e	Refactor: consolidate to use @login_required (#15652 ) Refactor: consolidate to use @login_required	2026-06-05 11:35:00 +08:00
kpdev	bd49fd70aa	fix(api): set SDK document download Content-Type from filename (#15112 ) (#15113 ) ## Summary - Infer `Content-Type` from the stored document filename on SDK download routes. - Covers `GET /api/v1/datasets/<dataset_id>/documents/<document_id>` and `GET /api/v1/documents/<document_id>`. - Aligns with REST preview/download via `CONTENT_TYPE_MAP`. ## Test plan - [x] `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_download_mimetype_from_filename` - [x] Manual: `curl -sSI` on SDK dataset document download for a PDF; expect `Content-Type: application/pdf` Fixes #15112.	2026-06-05 10:08:53 +08:00
Lynn	794c1f4b25	Fix: volc engine and other json key factories (#15653 ) ### What problem does this PR solve? Fix: - VolcEngine adapt to new api_key format - Save dict api_key as json ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-05 09:45:44 +08:00
buua436	423fb6faae	fix: duplicate document ingest guard (#15638 ) ### What problem does this PR solve? When a document is rerun or updated concurrently, the previous unconditional update could overwrite a newer task state. This change adds an `update_time`-based optimistic lock so the update only succeeds if the record has not been modified by another flow in the meantime. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 17:57:51 +08:00
Lynn	b65b18ba4c	Fix: model provider (#15634 ) ### What problem does this PR solve? Not display `success` when check not passed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 16:05:00 +08:00
buua436	c70f19e138	Fix: remove duplicate document preview access check (#15625 ) ### What problem does this PR solve? remove duplicate document preview access check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 13:05:15 +08:00
Lynn	597ac1e900	Fix: search bot and verify model instance (#15588 ) ### What problem does this PR solve? Fix: - Verify provider with empty llm list in llm_factories.json - Set search bot's chat_llm_name, use tenant default chat model as default ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 11:59:55 +08:00
kpdev	d26d799467	fix(api): restore accessible check on document preview (#15505 ) Restore `DocumentService.accessible` on `GET /api/v1/documents/{doc_id}/preview` so cross-tenant users cannot stream documents by UUID. Fixes #15501 ### What problem does this PR solve? PR #15146 (`71a52d579`) moved the agent attachment download route and accidentally removed the `DocumentService.accessible(doc_id, current_user.id)` guard from the REST preview handler. The endpoint still requires login, but any authenticated user who knows another tenant's `doc_id` can download the raw file bytes. This restores the same authorization check that existed before #15146, returning a generic `"Document not found!"` when access is denied (no cross-tenant ID enumeration). SDK download routes tracked in #15125 are unchanged. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 09:59:07 +08:00
dripsmvcp	2196f2260a	fix(api): restore DocumentService.accessible check on /preview (#15508 ) ## Summary Restore the `DocumentService.accessible(doc_id, current_user.id)` check that PR #15146 dropped from the REST document preview handler. Any authenticated caller could download any tenant's document bytes by guessing/knowing the `doc_id`. ## Root cause `api/apps/restful_apis/document_api.py` — the `GET /documents/<doc_id>/preview` handler called `DocumentService.get_by_id` and went straight to `File2DocumentService.get_storage_address` + `STORAGE_IMPL.get`, with no tenant check between the lookup and the read. The handler's docstring even promises "user must belong to the tenant that owns the document's knowledge base" — the code didn't enforce it. ## Fix - Add `current_user` to the existing `api.apps` import. - Immediately after `get_by_id`, call `DocumentService.accessible(doc_id, current_user.id)`; on denial, return the same `get_data_error_result(message="Document not found!")` shape used for the missing-doc branch. That makes a cross-tenant probe indistinguishable from a missing-doc probe, preventing ID enumeration (the issue body calls this out explicitly). - Emit `logging.warning` with caller user + doc_id for audit. - Restores symmetry with peer routes that already call `accessible(doc_id, user_id)` (e.g. `_run_sync` at `document_api.py:1380`). ## Test plan Adds `test/unit_test/api/apps/restful_apis/test_document_preview_accessible.py`: - `test_cross_tenant_preview_is_denied` — owner tenant ≠ caller tenant; asserts the response shape is `Document not found!` and the storage backend (`thread_pool_exec(STORAGE_IMPL.get, ...)`) is never invoked. - `test_missing_doc_returns_not_found` — missing-doc behaviour unchanged. Stub-loader pattern mirrors `test/unit_test/api/apps/sdk/test_dify_retrieval.py` (added in #15028, passing in CI). ## Provenance — how this fix was produced This PR was authored against a small cited knowledge base committed in the working tree as a `.vouch/` (see [vouchdev/vouch](https://github.com/vouchdev/vouch)). The loop used here: 1. Grounding first. Before reading the handler, queried the KB for prior context: `vouch context "tenant scoped accessible authorization"` → retrieved a cited claim distilled from PR #15028 (which restored the same `accessible()` check on `/dify/retrieval`). The retrieved rule: > ragflow REST endpoints that load by tenant-scoped id must call `<Service>.accessible(id, tenant_id)` after `get_by_id` and before storage/DB read; deny with code 109 'No authorization.' and log a warning. Established by PR #15028. 2. Applied the pattern with a domain refinement. For an API/JSON endpoint, `No authorization.` is the right denial shape. For a byte-streaming, browser-facing endpoint like `/preview`, leaking existence itself enables enumeration — so per the issue's expected behaviour, this PR denies with `Document not found!` (indistinguishable from missing) instead. Same auth check, narrower response. 3. Recorded the refinement back into the KB as a new cited claim, so the next IDOR-class issue starts already grounded in both the general pattern and the byte-route nuance. Net effect of the workflow: the fix replicates a known-good pattern instead of reinventing it, and the place where the pattern was nuanced is now retrievable for the next pass. Mechanism is fully independent of this PR — it's not a runtime dependency, just process discipline. Closes #15501	2026-06-04 09:58:26 +08:00

1 2 3 4 5 ...

1203 Commits