Files
ragflow/internal/router
Hz_ e35860ad74 feat(go-api): Align document metadata batch APIs and upload_info with Python (#16269)
## Summary

  Align the Go implementations of these APIs with the Python behavior:

  - `POST /api/v1/datasets/:dataset_id/metadata/update`
  - `PATCH /api/v1/datasets/:dataset_id/documents/metadatas`
  - `POST /api/v1/documents/upload`

  ## What changed

  - Added the Go routes and handlers for the 3 APIs.
  - Aligned batch document metadata updates with Python semantics:
    - support `match` in update items
    - support list append / replace behavior
    - support deleting specific list values
    - remove metadata entirely when it becomes empty
- create metadata for documents that previously had none when updates
apply
    - count `updated` only when a document actually changes
- Aligned `documents/upload` file uploads with Python-style
`upload_info` behavior:
    - store upload-info blobs in the per-user downloads bucket
- return lightweight upload descriptors instead of normal
file-management responses
  - Improved URL upload behavior:
    - SSRF-guarded fetch with redirect validation
    - redirect limit aligned to Python behavior
    - normalize filename and MIME type
    - add `.pdf` when the fetched content is PDF
- normalize HTML content into readable text instead of storing raw HTML
shells

  ## Validation

  ### Unit tests

  Passed:

  - `go test ./internal/service`
  - `go test ./internal/handler`

  Also verified targeted cases for:

  - batch metadata update semantics
  - upload_info URL handling
  - upload_info download bucket behavior

  ### curl checks

Verified the new Go endpoints with `curl` and compared the response
shape and behavior with Python for:

  - `POST /api/v1/datasets/{dataset_id}/metadata/update`
  - `PATCH /api/v1/datasets/{dataset_id}/documents/metadatas`
  - `POST /api/v1/documents/upload`

  The Go responses were checked against Python for:
  - argument validation
  - success response shape
  - metadata update results
  - upload_info result structure
  - file vs URL input handling
2026-06-24 14:52:47 +08:00
..