## Summary
- **Backend**: `_iter_session_completion_events` in `agent_api.py` was
filtering out `user_inputs` and `workflow_finished` SSE events, causing
agents with UserFillUp components to silently fail in explore mode — the
interactive form never appeared, while the same agent worked correctly
in run (editor) mode.
- **Frontend**: `SessionChat` component in explore mode was missing
`DebugContent` children rendering inside `MessageItem`, so even if the
backend forwarded the events, the form UI would not render. Added
`DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and
input-disabling logic to match the run mode's `chat/box.tsx` behavior.
## What was changed
### Backend (`api/apps/restful_apis/agent_api.py`)
- Line 266: Added `"user_inputs"` and `"workflow_finished"` to the
allowed event filter in `_iter_session_completion_events`
### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`)
- Added imports: `DebugContent`, `MarkdownContent`,
`useAwaitCompentData`, `useParams`
- Added `sendFormMessage` from `useSendSessionMessage()` hook
- Added `useAwaitCompentData` hook for form state management
- Added `DebugContent` as `MessageItem` children for the latest
assistant message (renders UserFillUp form)
- Added `MarkdownContent` + submitted values display for previous
assistant messages
- Updated `NextMessageInput` disabled states to respect `isWaitting`
(form submission in progress)
## Test plan
- [x] Agent with UserFillUp component (e.g., email draft with
send/edit/cancel options) shows interactive form in **explore mode**
- [x] Same agent continues to work correctly in **run (editor) mode**
- [x] Form submission sends data back to the agent and workflow
continues
- [x] Input field is disabled while waiting for form submission
- [ ] Agents without UserFillUp components are unaffected in explore
mode
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Fixes#16168
## Summary
- Add session-scoped authorization for `GET
/api/v1/documents/artifact/<filename>`
- Allow download only when the artifact filename appears in the caller's
`api_4_conversation` message and
`UserCanvasService.accessible(dialog_id, user_id)` passes
- Deny with generic `"Artifact not found."` before storage access (no
cross-user enumeration)
- Return 4xx when the blob is missing (existing behavior preserved)
## Approach
Sandbox artifacts are runtime CodeExec outputs, not KB documents — this
uses the same session gate pattern as `agent_chat_completion`, not
`DocumentService.accessible`.
## Test plan
- [x] Unit: denied when filename not referenced in user sessions
- [x] Unit: denied when agent canvas is not accessible
- [x] Unit: authorized user receives bytes; missing blob returns
`"Artifact not found."`
- [ ] `pytest
test/testcases/test_web_api/test_document_app/test_document_metadata.py
-k get_artifact`
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
## Summary
- harden reopened advisory fixes across REST connector, invoke, document
downloads, and markdown rendering
- add targeted regression coverage for redirect-safe SSRF handling,
invoke SSRF checks, document access control, and markdown sanitization
- verify each referenced GHSA against the original GitHub advisory text
and align the closed-advisory plan with the implemented remediation
## What changed
- add tenant access checks to document download endpoints to avoid
cross-tenant document disclosure
- add per-hop SSRF validation, DNS pinning, redirect handling, and
redirect limits to the REST API connector
- ensure invoke requests validate and pin the resolved host and never
follow redirects implicitly
- keep the generic rate-limited request path wrapped, not just GET and
POST helpers
- sanitize markdown HTML before rendering in the highlight markdown
component
## Validation
- `cd web && npm test -- --runInBand
src/components/highlight-markdown/__tests__/index.test.tsx`
- `.venv/bin/python -m pytest -q
test/unit_test/data_source/test_rest_api_connector.py`
- targeted `test/testcases/test_web_api/...` unit additions were
reviewed, but the suite cannot be executed end-to-end in this
environment because parent `test/testcases/conftest.py` requires a local
service on `127.0.0.1:9380`
## Notes
- all GHSA entries referenced by the plan were checked against the
original GitHub advisory text, not sampled
- the closed-advisory plan document was updated locally during review,
but is intentionally not included in this PR
### What problem does this PR solve?
`POST /api/v1/datasets/{dataset_id}/documents/stop`
(`stop_parse_documents`) cancels parsing tasks and sets `run` to
`CANCEL`, but it does **not** remove chunks already indexed in the doc
store or reset `progress` / `chunk_num`. REST callers can end up with a
“cancelled” document that still returns partial chunks in `GET
.../chunks` and in retrieval.
Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`)
already performs full cleanup: it resets counters and calls
`docStoreConn.delete`. This PR aligns the newer stop endpoint with that
behavior so both paths leave the dataset consistent.
Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Changes
- Update `stop_parse_documents` in `document_api.py` to reset `progress`
and `chunk_num` to `0` and delete partial chunks via
`docStoreConn.delete` after `cancel_all_task_of`.
- Add unit test `test_stop_parse_documents_cleans_partial_chunks` to
assert counters reset and doc store delete is invoked.
### Test plan
- [x] Unit test: `pytest
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks
-v`
- [ ] Manual: upload a slow document, start parse, call `POST
.../documents/stop` while `RUNNING`, verify `GET .../chunks` returns
zero chunks and UI `chunk_count` is 0
- [ ] Control: legacy `DELETE .../chunks` behavior unchanged
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
## Summary
- Infer `Content-Type` from the stored document filename on SDK download
routes.
- Covers `GET /api/v1/datasets/<dataset_id>/documents/<document_id>` and
`GET /api/v1/documents/<document_id>`.
- Aligns with REST preview/download via `CONTENT_TYPE_MAP`.
## Test plan
- [x] `pytest
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_download_mimetype_from_filename`
- [x] Manual: `curl -sSI` on SDK dataset document download for a PDF;
expect `Content-Type: application/pdf`
Fixes#15112.
### What problem does this PR solve?
When a document is rerun or updated concurrently, the previous
unconditional update could overwrite a newer task state.
This change adds an `update_time`-based optimistic lock so the update
only succeeds if the record has not been modified by another flow in the
meantime.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
remove duplicate document preview access check
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Restore `DocumentService.accessible` on `GET
/api/v1/documents/{doc_id}/preview` so cross-tenant users cannot stream
documents by UUID.
Fixes#15501
### What problem does this PR solve?
PR #15146 (`71a52d579`) moved the agent attachment download route and
accidentally removed the `DocumentService.accessible(doc_id,
current_user.id)` guard from the REST preview handler. The endpoint
still requires login, but any authenticated user who knows another
tenant's `doc_id` can download the raw file bytes.
This restores the same authorization check that existed before #15146,
returning a generic `"Document not found!"` when access is denied (no
cross-tenant ID enumeration). SDK download routes tracked in #15125 are
unchanged.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Restore the `DocumentService.accessible(doc_id, current_user.id)` check
that PR #15146 dropped from the REST document preview handler. Any
authenticated caller could download any tenant's document bytes by
guessing/knowing the `doc_id`.
## Root cause
`api/apps/restful_apis/document_api.py` — the `GET
/documents/<doc_id>/preview` handler called `DocumentService.get_by_id`
and went straight to `File2DocumentService.get_storage_address` +
`STORAGE_IMPL.get`, with no tenant check between the lookup and the
read. The handler's docstring even promises "user must belong to the
tenant that owns the document's knowledge base" — the code didn't
enforce it.
## Fix
- Add `current_user` to the existing `api.apps` import.
- Immediately after `get_by_id`, call
`DocumentService.accessible(doc_id, current_user.id)`; on denial, return
the **same** `get_data_error_result(message="Document not found!")`
shape used for the missing-doc branch. That makes a cross-tenant probe
indistinguishable from a missing-doc probe, preventing ID enumeration
(the issue body calls this out explicitly).
- Emit `logging.warning` with caller user + doc_id for audit.
- Restores symmetry with peer routes that already call
`accessible(doc_id, user_id)` (e.g. `_run_sync` at
`document_api.py:1380`).
## Test plan
Adds
`test/unit_test/api/apps/restful_apis/test_document_preview_accessible.py`:
- **`test_cross_tenant_preview_is_denied`** — owner tenant ≠ caller
tenant; asserts the response shape is `Document not found!` and the
storage backend (`thread_pool_exec(STORAGE_IMPL.get, ...)`) is **never**
invoked.
- **`test_missing_doc_returns_not_found`** — missing-doc behaviour
unchanged.
Stub-loader pattern mirrors
`test/unit_test/api/apps/sdk/test_dify_retrieval.py` (added in #15028,
passing in CI).
## Provenance — how this fix was produced
This PR was authored against a small cited knowledge base committed in
the working tree as a `.vouch/` (see
[vouchdev/vouch](https://github.com/vouchdev/vouch)). The loop used
here:
1. **Grounding first.** Before reading the handler, queried the KB for
prior context: `vouch context "tenant scoped accessible authorization"`
→ retrieved a cited claim distilled from PR #15028 (which restored the
same `accessible()` check on `/dify/retrieval`). The retrieved rule:
> *ragflow REST endpoints that load by tenant-scoped id must call
`<Service>.accessible(id, tenant_id)` after `get_by_id` and before
storage/DB read; deny with code 109 'No authorization.' and log a
warning. Established by PR #15028.*
2. **Applied the pattern with a domain refinement.** For an API/JSON
endpoint, `No authorization.` is the right denial shape. For a
**byte-streaming, browser-facing** endpoint like `/preview`, leaking
*existence* itself enables enumeration — so per the issue's expected
behaviour, this PR denies with `Document not found!` (indistinguishable
from missing) instead. Same auth check, narrower response.
3. **Recorded the refinement back into the KB** as a new cited claim, so
the next IDOR-class issue starts already grounded in both the general
pattern and the byte-route nuance.
Net effect of the workflow: the fix replicates a known-good pattern
instead of reinventing it, *and* the place where the pattern was nuanced
is now retrievable for the next pass. Mechanism is fully independent of
this PR — it's not a runtime dependency, just process discipline.
Closes#15501
### What problem does this PR solve?
Fixes#15115.
`GET /api/v1/documents/images/<image_id>` returned **Image not found**
when the thumbnail storage object key contained hyphens (e.g.
`page-1.png`). Document APIs build URLs as `{dataset_id}-{thumbnail}`,
but `get_document_image()` used `image_id.split("-")` and required
exactly two segments, so keys like `<kb_id>-page-1.png` were rejected
even though the blob existed.
This PR splits only on the first hyphen (`split("-", 1)`) and sets
`Content-Type` from the object key extension via `CONTENT_TYPE_MAP`
instead of hardcoding `image/JPEG`.
### What problem does this PR solve?
This PR improves the table upload flow for CSV/Excel files by allowing
table column role configuration at upload time.
Previously, users had to:
1. Upload and parse a table file.
2. Open parser settings and manually set table column roles.
3. Re-parse the file for the roles to take effect.
This was inefficient and required an unnecessary second parse.
With this change:
1. When the knowledge base uses table parsing, the upload dialog
extracts CSV/Excel headers client-side.
2. Users can choose Auto mode or Manual mode.
3. In Manual mode, users can assign per-column roles before upload.
4. The selected parser config is sent with the upload request and
applied server-side during document creation.
Result: configured table column roles are applied from the first parse.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
### What problem does this PR solve?
move agent attachment download api to the correct route and update
frontend callers
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Notes
- Move the attachment download endpoint from document routes to agent
routes.
- Update frontend download callers to use the agent attachment endpoint.
- Reuse the shared file response header helper instead of duplicating it
in `agent_api.py`.
## Summary
- Align **GET `/api/v1/documents/<doc_id>/download`** with
**`/preview`**: resolve extension and MIME type from the stored document
name when the **`ext` query parameter is omitted**, instead of
defaulting to `markdown`.
- When **`?ext=`** is present, behavior stays the same as before
(explicit extension / `Content-Type` mapping).
- Enforce the same access + document lookup pattern as preview
(**`accessible`** + **`get_by_id`**).
- Extend unit tests for the no-`ext` PDF filename case.
## Test plan
- [x] `uv run pytest
test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_download_attachment_success_and_exception_unit`
- [x] Optional: `curl -sSI` against
`/api/v1/documents/<pdf_doc_id>/download` without `ext` and confirm
`Content-Type: application/pdf`
Fixes#15052.
### What problem does this PR solve?
When _parse_doc_id_filter_with_metadata returns [], the empty list is
falsy so the WHERE id IN (...) clause was silently skipped, causing the
full dataset to be returned instead of an empty result.
Change `if doc_ids:` to `if doc_ids is not None:` in both get_list() and
get_by_kb_id() to distinguish between no filter (None) and a filter that
matched zero documents ([]).
Fixes#14962
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
add document download endpoint and refactor existing download function
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Closes#14618.
The `GET /v1/document/get/<doc_id>` endpoint in
`api/apps/document_app.py` was protected only by `@login_required` and
called `DocumentService.get_by_id(doc_id)` without verifying that the
document's knowledge base belonged to the requesting user's tenant. Any
authenticated user who knew (or guessed) a document ID could download
files belonging to any other tenant — a cross-tenant IDOR.
This PR adds a `DocumentService.accessible(doc_id, current_user.id)`
check before serving the file. The helper already exists and joins
`Document` → `Knowledgebase` → `UserTenant` to verify the requesting
user belongs to the tenant that owns the document's KB. The same pattern
is already used by `api/apps/restful_apis/document_api.py` and mirrors
the tenant scoping in the SDK route at `api/apps/sdk/doc.py`.
The check returns the existing `"Document not found!"` error for both
non-existent and inaccessible documents, so attackers cannot use the
response to enumerate valid doc IDs across tenants.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Other (please describe): Security fix (cross-tenant IDOR /
authorization bypass)
### What problem does this PR solve?
Before migration
Web API: POST /v1/document/change_parser
HTTP API: PATCH /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API
PATCH /api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration: GET /v1/document/thumbnails
After migration: GET /api/v1/thumbnails
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration: POST /v1/document/run
After migration: POST /api/v1/documents/ingest/
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration
Web API: POST /v1/document/change_status
After consolidation, Restful API
POST /api/v1/datasets/<dataset_id>/documents/batch-update-status
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration: POST /v1/document/upload_info/
After migration: POST /api/v1/documentss/upload/
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration: GET /v1/document/artifact/<filename>
After migration: GET /api/v1/documents/artifact/<filename>
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration
Web API: POST /v1/document/metadata/update
After migration, Restful API
PATCH /api/v2/datasets/<dataset_id>/documents/metadatas
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before migration
Web API: POST /v1/document/update_metadata_setting
After consolidation, Restful API
PUT
/api/v1/datasets/<dataset_id>/documents/<document_id>/metadata/config
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/rm
Http API - DELETE /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- DELETE
/api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/infos
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents?ids=id1&ids=id2
### Type of change
- [ ] Refactoring
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/filter
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents?type=filter
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/list
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Consolidation WEB API & HTTP API for document upload
Before consolidation
Web API: POST /v1/document/upload
Http API - POST /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- POST
/api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before change, update_document in api/apps/restful_apis/document_api.py
is using "PUT".
After change, it will use "PATCH" which is more suitable.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Consolidation WEB API & HTTP API for document metadata summary
Before consolidation
Web API: POST /api/v1/document/metadata/summary
Http API - GET /v1/datasets/<dataset_id>/metadata/summary
After consolidation, Restful API -- GET
/v1/datasets/<dataset_id>/metadata/summary
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Refactor: merge document.rename into document.update_document
### Type of change
- [x] Refactoring
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a unified document update API (PUT) supporting name, metadata,
parser/chunk settings, and status changes.
* **Breaking Changes**
* Legacy single-parameter rename endpoint removed; renames now require
dataset + document identifiers.
* `/list` now reads dataset id from a different query parameter.
* **Validation / Bug Fixes**
* Stricter meta_fields and parser-config validation; unauthenticated
requests return 401.
* **Frontend**
* UI now sends dataset id when saving document names.
* **Tests**
* Numerous unit and HTTP tests adjusted or removed to match new API and
validations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
Co-authored-by: Qi Wang <wangq8@outlook.com>
Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>