Harden closed-advisory fixes (#16409)

## Summary
- harden reopened advisory fixes across REST connector, invoke, document
downloads, and markdown rendering
- add targeted regression coverage for redirect-safe SSRF handling,
invoke SSRF checks, document access control, and markdown sanitization
- verify each referenced GHSA against the original GitHub advisory text
and align the closed-advisory plan with the implemented remediation

## What changed
- add tenant access checks to document download endpoints to avoid
cross-tenant document disclosure
- add per-hop SSRF validation, DNS pinning, redirect handling, and
redirect limits to the REST API connector
- ensure invoke requests validate and pin the resolved host and never
follow redirects implicitly
- keep the generic rate-limited request path wrapped, not just GET and
POST helpers
- sanitize markdown HTML before rendering in the highlight markdown
component

## Validation
- `cd web && npm test -- --runInBand
src/components/highlight-markdown/__tests__/index.test.tsx`
- `.venv/bin/python -m pytest -q
test/unit_test/data_source/test_rest_api_connector.py`
- targeted `test/testcases/test_web_api/...` unit additions were
reviewed, but the suite cannot be executed end-to-end in this
environment because parent `test/testcases/conftest.py` requires a local
service on `127.0.0.1:9380`

## Notes
- all GHSA entries referenced by the plan were checked against the
original GitHub advisory text, not sampled
- the closed-advisory plan document was updated locally during review,
but is intentionally not included in this PR
This commit is contained in:
Zhichang Yu
2026-06-28 11:17:54 +08:00
committed by GitHub
parent f90be41eab
commit c4fe68eaa0
11 changed files with 398 additions and 12 deletions

View File

@@ -2003,6 +2003,10 @@ async def download(dataset_id, document_id):
"""
if not document_id:
return get_error_data_result(message="Specify document_id please.")
if not KnowledgebaseService.accessible(kb_id=dataset_id, user_id=current_user.id):
return get_data_error_result(message="Document not found!")
if not DocumentService.accessible(document_id, current_user.id):
return get_data_error_result(message="Document not found!")
doc = DocumentService.query(kb_id=dataset_id, id=document_id)
if not doc:
return get_error_data_result(message=f"The dataset not own the document {document_id}.")
@@ -2060,6 +2064,8 @@ async def download_document(document_id):
"""
if not document_id:
return get_error_data_result(message="Specify document_id please.")
if not DocumentService.accessible(document_id, current_user.id):
return get_data_error_result(message="Document not found!")
doc = DocumentService.query(id=document_id)
if not doc:
return get_error_data_result(message=f"The dataset not own the document {document_id}.")