ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
euvre	a339e8a579	feat: handle partial upload success in document batch upload (#16438 )	2026-06-29 13:06:14 +08:00
Tim Wang	f0f10b6092	Fix: UserFillUp interactive forms not working in agent explore mode (#14589 ) ## Summary - Backend: `_iter_session_completion_events` in `agent_api.py` was filtering out `user_inputs` and `workflow_finished` SSE events, causing agents with UserFillUp components to silently fail in explore mode — the interactive form never appeared, while the same agent worked correctly in run (editor) mode. - Frontend: `SessionChat` component in explore mode was missing `DebugContent` children rendering inside `MessageItem`, so even if the backend forwarded the events, the form UI would not render. Added `DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and input-disabling logic to match the run mode's `chat/box.tsx` behavior. ## What was changed ### Backend (`api/apps/restful_apis/agent_api.py`) - Line 266: Added `"user_inputs"` and `"workflow_finished"` to the allowed event filter in `_iter_session_completion_events` ### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`) - Added imports: `DebugContent`, `MarkdownContent`, `useAwaitCompentData`, `useParams` - Added `sendFormMessage` from `useSendSessionMessage()` hook - Added `useAwaitCompentData` hook for form state management - Added `DebugContent` as `MessageItem` children for the latest assistant message (renders UserFillUp form) - Added `MarkdownContent` + submitted values display for previous assistant messages - Updated `NextMessageInput` disabled states to respect `isWaitting` (form submission in progress) ## Test plan - [x] Agent with UserFillUp component (e.g., email draft with send/edit/cancel options) shows interactive form in explore mode - [x] Same agent continues to work correctly in run (editor) mode - [x] Form submission sends data back to the agent and workflow continues - [x] Input field is disabled while waiting for form submission - [ ] Agents without UserFillUp components are unaffected in explore mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:17 +08:00
kpdev	212429bf9d	fix(api): gate sandbox artifact download on agent session ownership (#16169 ) Fixes #16168 ## Summary - Add session-scoped authorization for `GET /api/v1/documents/artifact/<filename>` - Allow download only when the artifact filename appears in the caller's `api_4_conversation` message and `UserCanvasService.accessible(dialog_id, user_id)` passes - Deny with generic `"Artifact not found."` before storage access (no cross-user enumeration) - Return 4xx when the blob is missing (existing behavior preserved) ## Approach Sandbox artifacts are runtime CodeExec outputs, not KB documents — this uses the same session gate pattern as `agent_chat_completion`, not `DocumentService.accessible`. ## Test plan - [x] Unit: denied when filename not referenced in user sessions - [x] Unit: denied when agent canvas is not accessible - [x] Unit: authorized user receives bytes; missing blob returns `"Artifact not found."` - [ ] `pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py -k get_artifact` --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Zhichang Yu	faef22c18a	Harden closed-advisory fixes (#16409 ) ## Summary - harden reopened advisory fixes across REST connector, invoke, document downloads, and markdown rendering - add targeted regression coverage for redirect-safe SSRF handling, invoke SSRF checks, document access control, and markdown sanitization - verify each referenced GHSA against the original GitHub advisory text and align the closed-advisory plan with the implemented remediation ## What changed - add tenant access checks to document download endpoints to avoid cross-tenant document disclosure - add per-hop SSRF validation, DNS pinning, redirect handling, and redirect limits to the REST API connector - ensure invoke requests validate and pin the resolved host and never follow redirects implicitly - keep the generic rate-limited request path wrapped, not just GET and POST helpers - sanitize markdown HTML before rendering in the highlight markdown component ## Validation - `cd web && npm test -- --runInBand src/components/highlight-markdown/__tests__/index.test.tsx` - `.venv/bin/python -m pytest -q test/unit_test/data_source/test_rest_api_connector.py` - targeted `test/testcases/test_web_api/...` unit additions were reviewed, but the suite cannot be executed end-to-end in this environment because parent `test/testcases/conftest.py` requires a local service on `127.0.0.1:9380` ## Notes - all GHSA entries referenced by the plan were checked against the original GitHub advisory text, not sampled - the closed-advisory plan document was updated locally during review, but is intentionally not included in this PR	2026-06-29 09:45:16 +08:00
kpdev	de18313f97	fix(api): POST /documents/stop removes partial chunks and resets counters (#15789 ) ### What problem does this PR solve? `POST /api/v1/datasets/{dataset_id}/documents/stop` (`stop_parse_documents`) cancels parsing tasks and sets `run` to `CANCEL`, but it does not remove chunks already indexed in the doc store or reset `progress` / `chunk_num`. REST callers can end up with a “cancelled” document that still returns partial chunks in `GET .../chunks` and in retrieval. Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`) already performs full cleanup: it resets counters and calls `docStoreConn.delete`. This PR aligns the newer stop endpoint with that behavior so both paths leave the dataset consistent. Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Update `stop_parse_documents` in `document_api.py` to reset `progress` and `chunk_num` to `0` and delete partial chunks via `docStoreConn.delete` after `cancel_all_task_of`. - Add unit test `test_stop_parse_documents_cleans_partial_chunks` to assert counters reset and doc store delete is invoked. ### Test plan - [x] Unit test: `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks -v` - [ ] Manual: upload a slow document, start parse, call `POST .../documents/stop` while `RUNNING`, verify `GET .../chunks` returns zero chunks and UI `chunk_count` is 0 - [ ] Control: legacy `DELETE .../chunks` behavior unchanged --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:51:32 +08:00
Wang Qi	aa9545e4c9	Revert "fix: duplicate document ingest guard" (#15707 ) Reverts infiniflow/ragflow#15638	2026-06-05 17:45:29 +08:00
kpdev	bd49fd70aa	fix(api): set SDK document download Content-Type from filename (#15112 ) (#15113 ) ## Summary - Infer `Content-Type` from the stored document filename on SDK download routes. - Covers `GET /api/v1/datasets/<dataset_id>/documents/<document_id>` and `GET /api/v1/documents/<document_id>`. - Aligns with REST preview/download via `CONTENT_TYPE_MAP`. ## Test plan - [x] `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_download_mimetype_from_filename` - [x] Manual: `curl -sSI` on SDK dataset document download for a PDF; expect `Content-Type: application/pdf` Fixes #15112.	2026-06-05 10:08:53 +08:00
buua436	423fb6faae	fix: duplicate document ingest guard (#15638 ) ### What problem does this PR solve? When a document is rerun or updated concurrently, the previous unconditional update could overwrite a newer task state. This change adds an `update_time`-based optimistic lock so the update only succeeds if the record has not been modified by another flow in the meantime. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 17:57:51 +08:00
buua436	c70f19e138	Fix: remove duplicate document preview access check (#15625 ) ### What problem does this PR solve? remove duplicate document preview access check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 13:05:15 +08:00
kpdev	d26d799467	fix(api): restore accessible check on document preview (#15505 ) Restore `DocumentService.accessible` on `GET /api/v1/documents/{doc_id}/preview` so cross-tenant users cannot stream documents by UUID. Fixes #15501 ### What problem does this PR solve? PR #15146 (`71a52d579`) moved the agent attachment download route and accidentally removed the `DocumentService.accessible(doc_id, current_user.id)` guard from the REST preview handler. The endpoint still requires login, but any authenticated user who knows another tenant's `doc_id` can download the raw file bytes. This restores the same authorization check that existed before #15146, returning a generic `"Document not found!"` when access is denied (no cross-tenant ID enumeration). SDK download routes tracked in #15125 are unchanged. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 09:59:07 +08:00
dripsmvcp	2196f2260a	fix(api): restore DocumentService.accessible check on /preview (#15508 ) ## Summary Restore the `DocumentService.accessible(doc_id, current_user.id)` check that PR #15146 dropped from the REST document preview handler. Any authenticated caller could download any tenant's document bytes by guessing/knowing the `doc_id`. ## Root cause `api/apps/restful_apis/document_api.py` — the `GET /documents/<doc_id>/preview` handler called `DocumentService.get_by_id` and went straight to `File2DocumentService.get_storage_address` + `STORAGE_IMPL.get`, with no tenant check between the lookup and the read. The handler's docstring even promises "user must belong to the tenant that owns the document's knowledge base" — the code didn't enforce it. ## Fix - Add `current_user` to the existing `api.apps` import. - Immediately after `get_by_id`, call `DocumentService.accessible(doc_id, current_user.id)`; on denial, return the same `get_data_error_result(message="Document not found!")` shape used for the missing-doc branch. That makes a cross-tenant probe indistinguishable from a missing-doc probe, preventing ID enumeration (the issue body calls this out explicitly). - Emit `logging.warning` with caller user + doc_id for audit. - Restores symmetry with peer routes that already call `accessible(doc_id, user_id)` (e.g. `_run_sync` at `document_api.py:1380`). ## Test plan Adds `test/unit_test/api/apps/restful_apis/test_document_preview_accessible.py`: - `test_cross_tenant_preview_is_denied` — owner tenant ≠ caller tenant; asserts the response shape is `Document not found!` and the storage backend (`thread_pool_exec(STORAGE_IMPL.get, ...)`) is never invoked. - `test_missing_doc_returns_not_found` — missing-doc behaviour unchanged. Stub-loader pattern mirrors `test/unit_test/api/apps/sdk/test_dify_retrieval.py` (added in #15028, passing in CI). ## Provenance — how this fix was produced This PR was authored against a small cited knowledge base committed in the working tree as a `.vouch/` (see [vouchdev/vouch](https://github.com/vouchdev/vouch)). The loop used here: 1. Grounding first. Before reading the handler, queried the KB for prior context: `vouch context "tenant scoped accessible authorization"` → retrieved a cited claim distilled from PR #15028 (which restored the same `accessible()` check on `/dify/retrieval`). The retrieved rule: > ragflow REST endpoints that load by tenant-scoped id must call `<Service>.accessible(id, tenant_id)` after `get_by_id` and before storage/DB read; deny with code 109 'No authorization.' and log a warning. Established by PR #15028. 2. Applied the pattern with a domain refinement. For an API/JSON endpoint, `No authorization.` is the right denial shape. For a byte-streaming, browser-facing endpoint like `/preview`, leaking existence itself enables enumeration — so per the issue's expected behaviour, this PR denies with `Document not found!` (indistinguishable from missing) instead. Same auth check, narrower response. 3. Recorded the refinement back into the KB as a new cited claim, so the next IDOR-class issue starts already grounded in both the general pattern and the byte-route nuance. Net effect of the workflow: the fix replicates a known-good pattern instead of reinventing it, and the place where the pattern was nuanced is now retrievable for the next pass. Mechanism is fully independent of this PR — it's not a runtime dependency, just process discipline. Closes #15501	2026-06-04 09:58:26 +08:00
Wang Qi	b946df8ba2	Fix: consolidate beta auth (#15581 ) Fix: consolidate beta auth	2026-06-03 19:58:06 +08:00
kpdev	76968af0ba	Guard missing storage blobs on preview and image endpoints (#15366 ) Fixes [#15365](https://github.com/infiniflow/ragflow/issues/15365) — `get_document_image()` and document preview call `make_response(None)` when storage returns no bytes, causing HTTP 500.	2026-06-03 11:33:03 +08:00
kpdev	0f6f7b3c3c	fix(api): document image_id parsing for hyphenated thumbnail keys (#15115 ) (#15116 ) ### What problem does this PR solve? Fixes #15115. `GET /api/v1/documents/images/<image_id>` returned Image not found when the thumbnail storage object key contained hyphens (e.g. `page-1.png`). Document APIs build URLs as `{dataset_id}-{thumbnail}`, but `get_document_image()` used `image_id.split("-")` and required exactly two segments, so keys like `<kb_id>-page-1.png` were rejected even though the blob existed. This PR splits only on the first hyphen (`split("-", 1)`) and sets `Content-Type` from the object key extension via `CONTENT_TYPE_MAP` instead of hardcoding `image/JPEG`.	2026-06-02 10:54:14 +08:00
kpdev	252cc19f93	Infer Content-Type for document image endpoint (#15368 ) ## Summary Fixes [#15367](https://github.com/infiniflow/ragflow/issues/15367) — `GET /api/v1/documents/images/<image_id>` always returned `Content-Type: image/JPEG` even for PNG/WebP chunk images and extensioned thumbnails. ## Related Issue Fixes #15367 ## Change Type - [x] Bug fix - [x] Regression tests - [ ] New feature - [ ] Refactor ## What Changed - Added `_detect_image_content_type_from_bytes()` — PNG/JPEG/GIF/WebP/BMP magic-byte detection - Added `_content_type_for_document_image()` — object-key extension via `CONTENT_TYPE_MAP`, then magic bytes, else `application/octet-stream` - `get_document_image()` — set inferred `Content-Type` instead of hardcoded `image/JPEG` - Also guards missing storage blob (`Image not found.`) to avoid `make_response(None)` (same handler; complements #15365) ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/document_api.py` \| MIME inference helpers + handler update \| \| `test/testcases/test_web_api/test_document_app/test_document_metadata.py` \| 3 unit tests \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_content_type_from_object_extension_unit -v pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_content_type_from_magic_bytes_unit -v pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_missing_blob_unit -v ``` ## Test Plan - [x] `.png` object key → `image/png` - [x] Extensionless chunk key + PNG bytes → `image/png` (magic bytes) - [x] Missing blob → 4xx `"Image not found."` - [ ] CI green	2026-06-01 19:08:32 +08:00
Wang Qi	0aff6a3f32	Feature: Allow page_size max value 100 (#15292 ) Feature: Allow page_size max value 100	2026-05-28 11:13:01 +08:00
Wang Qi	f4d36f7082	Fix #15170 cannot filter document status (#15216 ) Fix #15170 cannot filter document status ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-25 18:58:37 +08:00
Ahmad Intisar	e6068a7f7e	Fix: table parser metadata (#15127 ) ### What problem does this PR solve? This PR improves the table upload flow for CSV/Excel files by allowing table column role configuration at upload time. Previously, users had to: 1. Upload and parse a table file. 2. Open parser settings and manually set table column roles. 3. Re-parse the file for the roles to take effect. This was inefficient and required an unnecessary second parse. With this change: 1. When the knowledge base uses table parsing, the upload dialog extracts CSV/Excel headers client-side. 2. Users can choose Auto mode or Manual mode. 3. In Manual mode, users can assign per-column roles before upload. 4. The selected parser config is sent with the upload request and applied server-side during document creation. Result: configured table column roles are applied from the first parse. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-05-25 16:05:38 +08:00
buua436	71a52d579c	fix: move agent attachment download api (#15146 ) ### What problem does this PR solve? move agent attachment download api to the correct route and update frontend callers ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Notes - Move the attachment download endpoint from document routes to agent routes. - Update frontend download callers to use the agent attachment endpoint. - Reuse the shared file response header helper instead of duplicating it in `agent_api.py`.	2026-05-22 15:22:05 +08:00
buua436	ea1764a7dc	Revert "fix(api): infer /documents/{id}/download Content-Type from filename when ext is omitted (#15052 )" (#15138 ) Reverts infiniflow/ragflow#15053	2026-05-22 11:46:01 +08:00
kpdev	6932615852	fix(api): infer /documents/{id}/download Content-Type from filename when ext is omitted (#15052 ) (#15053 ) ## Summary - Align GET `/api/v1/documents/<doc_id>/download` with `/preview`: resolve extension and MIME type from the stored document name when the `ext` query parameter is omitted, instead of defaulting to `markdown`. - When `?ext=` is present, behavior stays the same as before (explicit extension / `Content-Type` mapping). - Enforce the same access + document lookup pattern as preview (`accessible` + `get_by_id`). - Extend unit tests for the no-`ext` PDF filename case. ## Test plan - [x] `uv run pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_download_attachment_success_and_exception_unit` - [x] Optional: `curl -sSI` against `/api/v1/documents/<pdf_doc_id>/download` without `ext` and confirm `Content-Type: application/pdf` Fixes #15052.	2026-05-21 15:31:36 +08:00
Hamza Amin Khokhar	2dbe3b8a62	fix: metadata_condition returning all docs when filter matches nothing (#14967 ) ### What problem does this PR solve? When _parse_doc_id_filter_with_metadata returns [], the empty list is falsy so the WHERE id IN (...) clause was silently skipped, causing the full dataset to be returned instead of an empty result. Change `if doc_ids:` to `if doc_ids is not None:` in both get_list() and get_by_kb_id() to distinguish between no filter (None) and a filter that matched zero documents ([]). Fixes #14962 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-18 18:54:30 +08:00
buua436	58819f5d3e	fix: add document download endpoint and refactor existing download function (#14927 ) ### What problem does this PR solve? add document download endpoint and refactor existing download function ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-15 09:36:58 +08:00
Magicbook1108	f7e8c39dcc	Fix: filter api in dataset document (#14728 ) ### What problem does this PR solve? Fix: filter api in dataset document ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-09 14:45:40 +08:00
web-dev0521	d51fb88573	Fix: enforce tenant authorization on document download endpoint (#14618 ) (#14625 ) ### What problem does this PR solve? Closes #14618. The `GET /v1/document/get/<doc_id>` endpoint in `api/apps/document_app.py` was protected only by `@login_required` and called `DocumentService.get_by_id(doc_id)` without verifying that the document's knowledge base belonged to the requesting user's tenant. Any authenticated user who knew (or guessed) a document ID could download files belonging to any other tenant — a cross-tenant IDOR. This PR adds a `DocumentService.accessible(doc_id, current_user.id)` check before serving the file. The helper already exists and joins `Document` → `Knowledgebase` → `UserTenant` to verify the requesting user belongs to the tenant that owns the document's KB. The same pattern is already used by `api/apps/restful_apis/document_api.py` and mirrors the tenant scoping in the SDK route at `api/apps/sdk/doc.py`. The check returns the existing `"Document not found!"` error for both non-existent and inaccessible documents, so attackers cannot use the response to enumerate valid doc IDs across tenants. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Other (please describe): Security fix (cross-tenant IDOR / authorization bypass)	2026-05-08 14:24:03 +08:00
buua436	f703169117	Refa: migrate document preview/download to RESTful API (#14633 ) ### What problem does this PR solve? migrate document preview/download to RESTful API ### Type of change - [x] Refactoring	2026-05-08 13:26:13 +08:00
buua436	06c6da5d94	Fix: add document delete permission check (#14472 ) ### What problem does this PR solve? add document delete permission check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-30 11:01:09 +08:00
Jack	c81081f8ef	Refactor: Doc change parser (#14327 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/change_parser HTTP API: PATCH /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API PATCH /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-27 23:42:57 +08:00
Jack	c5116b90e5	Refactor: migrate document thumbnails API (#14344 ) ### What problem does this PR solve? Before migration: GET /v1/document/thumbnails After migration: GET /api/v1/thumbnails ### Type of change - [x] Refactoring	2026-04-27 21:29:09 +08:00
Jack	49912a156e	Refactor: migrate document run api (#14351 ) ### What problem does this PR solve? Before migration: POST /v1/document/run After migration: POST /api/v1/documents/ingest/ ### Type of change - [x] Refactoring	2026-04-27 21:25:58 +08:00
Jack	a536980e22	Refactor: Doc batch change status (#14337 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/change_status After consolidation, Restful API POST /api/v1/datasets/<dataset_id>/documents/batch-update-status ### Type of change - [x] Refactoring	2026-04-27 20:00:23 +08:00
Jack	61a24a2c14	Refactor: migrate doc upload info used in chat (#14359 ) ### What problem does this PR solve? Before migration: POST /v1/document/upload_info/ After migration: POST /api/v1/documentss/upload/ ### Type of change - [x] Refactoring	2026-04-27 16:58:42 +08:00
Jack	290f0294d6	Refactor: migrate artifact API (#14348 ) ### What problem does this PR solve? Before migration: GET /v1/document/artifact/<filename> After migration: GET /api/v1/documents/artifact/<filename> ### Type of change - [x] Refactoring	2026-04-27 15:19:41 +08:00
buua436	a9e5724b46	Refa: unify document create flows under REST documents API (#14345 ) ### What problem does this PR solve? unify document create flows under REST documents API ### Type of change - [x] Refactoring	2026-04-27 10:18:16 +08:00
euvre	4dcc42e0e1	feat(api): add unified index API and dataset management endpoints (#14222 ) ### What problem does this PR solve? ## Summary Refactor the dataset API layer into a clean service/REST separation pattern, add a unified `/index` API for graph/raptor/mindmap operations, and introduce several new dataset management endpoints with full test coverage. ## Changes ### Service Layer (`dataset_api_service.py`) - Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace function for all index types - Added `run_index`, `delete_index` service functions - Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`, `get_ingestion_log` - Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`, `rename_tag` - Added `get_flattened_metadata`, `get_auto_metadata`, `update_auto_metadata` ### REST API Layer (`dataset_api.py`) New unified routes: \| Method \| Route \| Description \| \|--------\|-------\|-------------\| \| POST \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Run index task \| \| GET \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Trace index task \| \| DELETE \| `/datasets/<id>/<index_type>` \| Delete index \| \| GET \| `/datasets/<id>` \| Get dataset details \| \| GET \| `/datasets/<id>/ingestions/summary` \| Ingestion summary \| \| GET \| `/datasets/<id>/ingestions` \| List ingestion logs \| \| GET \| `/datasets/<id>/ingestions/<log_id>` \| Get single ingestion log \| \| POST \| `/datasets/<id>/embedding` \| Run embedding \| \| GET \| `/datasets/<id>/tags` \| List tags \| \| GET \| `/datasets/tags/aggregation` \| Aggregate tags across datasets \| \| DELETE \| `/datasets/<id>/tags` \| Delete tags \| \| PUT \| `/datasets/<id>/tags` \| Rename tag \| \| GET \| `/datasets/metadata/flattened` \| Get flattened metadata \| \| GET/PUT \| `/datasets/<id>/metadata/config` \| New metadata config path \| Removed routes (replaced by unified `/index`): - `POST /datasets/<id>/mindmap` - `GET /datasets/<id>/mindmap` Preserved legacy routes (backward compatibility): - `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor` - `/auto_metadata` GET/PUT ### Test Suite - Updated `common.py` helpers: added `trace_index`, removed `run_mindmap`/`trace_mindmap` - Added 7 new test files with 39 test cases total: \| Test File \| Cases \| \|-----------\|-------\| \| `test_get_dataset.py` \| 4 \| \| `test_ingestion_summary.py` \| 2 \| \| `test_ingestion_logs.py` \| 5 \| \| `test_index_api.py` \| 14 \| \| `test_embedding.py` \| 2 \| \| `test_tags.py` \| 8 \| \| `test_flattened_metadata.py` \| 4 \| - Deleted `test_mindmap_tasks.py` (covered by unified index tests) ## Design Decisions 1. Unified `/index?type=...` — single endpoint replaces 3 separate route pairs for graph/raptor/mindmap 2. Backward compatibility — old routes (`/run_graphrag`, `/run_raptor`, `/auto_metadata`) preserved alongside new paths 3. `_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}` — input validation via constant set 4. `_INDEX_TYPE_TO_TASK_ID_FIELD` — maps index type to KB model task ID field for clean dispatch ## Files Changed - `api/apps/restful_apis/dataset_api.py` - `api/apps/services/dataset_api_service.py` - `sdk/python/ragflow_sdk/modules/dataset.py` - `test/testcases/test_http_api/common.py` - `test/testcases/test_http_api/test_dataset_management/` (7 new files) ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 09:38:01 +08:00
Jack	dbf8c6ed90	Refactor: Doc metadata update (#14289 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/metadata/update After migration, Restful API PATCH /api/v2/datasets/<dataset_id>/documents/metadatas ### Type of change - [x] Refactoring	2026-04-23 12:04:34 +08:00
Jack	c08cd8e090	Refactor: Migrate document metadata config update API (#14286 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/update_metadata_setting After consolidation, Restful API PUT /api/v1/datasets/<dataset_id>/documents/<document_id>/metadata/config ### Type of change - [x] Refactoring	2026-04-22 20:01:31 +08:00
Jack	3d8a82c0aa	Refactor: Consolidation WEB API & HTTP API for document delete api (#14254 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/rm Http API - DELETE /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- DELETE /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-22 10:49:52 +08:00
Jack	2d05475693	Refactor: Consolidation WEB API & HTTP API for document infos (#14239 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/infos Http API - GET /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?ids=id1&ids=id2 ### Type of change - [ ] Refactoring	2026-04-21 19:35:11 +08:00
Jack	009e538a4e	Refactor: Consolidation WEB API & HTTP API for document get_filter (#14248 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/filter Http API - GET /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?type=filter ### Type of change - [x] Refactoring	2026-04-21 18:55:30 +08:00
Jack	939933649a	Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/list Http API - GET /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-20 14:54:40 +08:00
Jack	bc5f78996b	Consolidateion of document upload API (#14106 ) ### What problem does this PR solve? Consolidation WEB API & HTTP API for document upload Before consolidation Web API: POST /v1/document/upload Http API - POST /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- POST /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-15 11:27:43 +08:00
Jack	576431de99	Refactor: Change update doc from PUT to patch (#14067 ) ### What problem does this PR solve? Before change, update_document in api/apps/restful_apis/document_api.py is using "PUT". After change, it will use "PATCH" which is more suitable. ### Type of change - [x] Refactoring	2026-04-14 17:12:23 +08:00
Jack	4046a4cfb6	Consolidateion metadata summary API (#14031 ) ### What problem does this PR solve? Consolidation WEB API & HTTP API for document metadata summary Before consolidation Web API: POST /api/v1/document/metadata/summary Http API - GET /v1/datasets/<dataset_id>/metadata/summary After consolidation, Restful API -- GET /v1/datasets/<dataset_id>/metadata/summary ### Type of change - [x] Refactoring	2026-04-10 18:41:30 +08:00
Jack	577c96bf2a	Refactor: Merge document update API (#13962 ) ### What problem does this PR solve? Refactor: merge document.rename into document.update_document ### Type of change - [x] Refactoring <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added a unified document update API (PUT) supporting name, metadata, parser/chunk settings, and status changes. * Breaking Changes * Legacy single-parameter rename endpoint removed; renames now require dataset + document identifiers. * `/list` now reads dataset id from a different query parameter. * Validation / Bug Fixes * Stricter meta_fields and parser-config validation; unauthenticated requests return 401. * Frontend * UI now sends dataset id when saving document names. * Tests * Numerous unit and HTTP tests adjusted or removed to match new API and validations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com> Co-authored-by: Qi Wang <wangq8@outlook.com> Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com>	2026-04-09 11:17:38 +08:00

45 Commits