mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
## Summary
Align the Go implementations of these APIs with the Python behavior:
- `POST /api/v1/datasets/:dataset_id/metadata/update`
- `PATCH /api/v1/datasets/:dataset_id/documents/metadatas`
- `POST /api/v1/documents/upload`
## What changed
- Added the Go routes and handlers for the 3 APIs.
- Aligned batch document metadata updates with Python semantics:
- support `match` in update items
- support list append / replace behavior
- support deleting specific list values
- remove metadata entirely when it becomes empty
- create metadata for documents that previously had none when updates
apply
- count `updated` only when a document actually changes
- Aligned `documents/upload` file uploads with Python-style
`upload_info` behavior:
- store upload-info blobs in the per-user downloads bucket
- return lightweight upload descriptors instead of normal
file-management responses
- Improved URL upload behavior:
- SSRF-guarded fetch with redirect validation
- redirect limit aligned to Python behavior
- normalize filename and MIME type
- add `.pdf` when the fetched content is PDF
- normalize HTML content into readable text instead of storing raw HTML
shells
## Validation
### Unit tests
Passed:
- `go test ./internal/service`
- `go test ./internal/handler`
Also verified targeted cases for:
- batch metadata update semantics
- upload_info URL handling
- upload_info download bucket behavior
### curl checks
Verified the new Go endpoints with `curl` and compared the response
shape and behavior with Python for:
- `POST /api/v1/datasets/{dataset_id}/metadata/update`
- `PATCH /api/v1/datasets/{dataset_id}/documents/metadatas`
- `POST /api/v1/documents/upload`
The Go responses were checked against Python for:
- argument validation
- success response shape
- metadata update results
- upload_info result structure
- file vs URL input handling