11 Commits

Author SHA1 Message Date
Hz_
a6cc3023c5 feat(go-api): implement dataset document upload API (#16295)
## Summary
Migrated the dataset document upload API (`POST
/api/v1/datasets/:dataset_id/documents`) from Python to the Go backend.
It supports local file uploads (`type=local`), web page ingestion
(`type=web`), and empty document creation (`type=empty`).

## Changes
- **Router**: Registered `POST /api/v1/datasets/:dataset_id/documents`
route.
- **Handler**: Implemented `UploadDocuments` handler and its routing
functions (`uploadLocalDocuments`, `uploadWebDocument`,
`uploadEmptyDocument`).
- **Service**: Implemented `UploadLocalDocuments`, `UploadWebDocument`,
and `UploadEmptyDocument` in `DocumentService`.
- **Refactoring**: Moved permission checking logic to a shared helper
for reuse in file and document services.
- **Tests**: Added comprehensive unit tests for the new handler and
service upload paths.

## Verification
Ran and passed the test suite for service and handler packages:
- `go test ./internal/service`
- `go test ./internal/handler`
2026-06-25 13:36:49 +08:00
Haruko386
dd46ece3bc feat[go]: datasets/<dataset_id>/chunks DELETE (#16185)
### What problem does this PR solve?

As title:

`documents.POST("/ingest", r.documentHandler.Ingest)`:

---

<img width="3750" height="2039" alt="image"
src="https://github.com/user-attachments/assets/533c1c3d-af3e-47e6-9f51-a278539b7066"
/>

`datasets.DELETE("/:dataset_id/chunks", r.chunkHandler.StopParsing)`

---

<img width="3621" height="2040" alt="image"
src="https://github.com/user-attachments/assets/022adcdb-1e47-4883-9611-1a695c34007d"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-24 19:43:18 +08:00
Hz_
e35860ad74 feat(go-api): Align document metadata batch APIs and upload_info with Python (#16269)
## Summary

  Align the Go implementations of these APIs with the Python behavior:

  - `POST /api/v1/datasets/:dataset_id/metadata/update`
  - `PATCH /api/v1/datasets/:dataset_id/documents/metadatas`
  - `POST /api/v1/documents/upload`

  ## What changed

  - Added the Go routes and handlers for the 3 APIs.
  - Aligned batch document metadata updates with Python semantics:
    - support `match` in update items
    - support list append / replace behavior
    - support deleting specific list values
    - remove metadata entirely when it becomes empty
- create metadata for documents that previously had none when updates
apply
    - count `updated` only when a document actually changes
- Aligned `documents/upload` file uploads with Python-style
`upload_info` behavior:
    - store upload-info blobs in the per-user downloads bucket
- return lightweight upload descriptors instead of normal
file-management responses
  - Improved URL upload behavior:
    - SSRF-guarded fetch with redirect validation
    - redirect limit aligned to Python behavior
    - normalize filename and MIME type
    - add `.pdf` when the fetched content is PDF
- normalize HTML content into readable text instead of storing raw HTML
shells

  ## Validation

  ### Unit tests

  Passed:

  - `go test ./internal/service`
  - `go test ./internal/handler`

  Also verified targeted cases for:

  - batch metadata update semantics
  - upload_info URL handling
  - upload_info download bucket behavior

  ### curl checks

Verified the new Go endpoints with `curl` and compared the response
shape and behavior with Python for:

  - `POST /api/v1/datasets/{dataset_id}/metadata/update`
  - `PATCH /api/v1/datasets/{dataset_id}/documents/metadatas`
  - `POST /api/v1/documents/upload`

  The Go responses were checked against Python for:
  - argument validation
  - success response shape
  - metadata update results
  - upload_info result structure
  - file vs URL input handling
2026-06-24 14:52:47 +08:00
Haruko386
5046626c17 feat[Go]: implement /datasets/<dataset_id>/documents/batch-update-status (#16258)
### What problem does this PR solve?

accident close #16072

As title

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-23 19:19:08 +08:00
Hz_
f59332bc37 feat(go-api): implement Go-side document PATCH API & align parsing/metadata sync behavior (#15975)
### What problem does this PR solve?

This PR implements the Go backend counterpart for the document partial
update API:
`PATCH /api/v1/datasets/:dataset_id/documents/:document_id`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-06-18 11:08:47 +08:00
Jin Hai
e96bc37d06 Go: use NATS as the message queue (#15327)
### What problem does this PR solve?

```
RAGFlow(admin)> mq publish 'msg2';
SUCCESS
RAGFlow(admin)> mq publish 'msg3';
SUCCESS
RAGFlow(admin)> mq list;
+---------+---------------+
| message | subject       |
+---------+---------------+
| msg1    | tasks.RAGFLOW |
| msg2    | tasks.RAGFLOW |
| msg3    | tasks.RAGFLOW |
+---------+---------------+
RAGFlow(admin)> mq pull 2;
+---------+---------------+
| message | subject       |
+---------+---------------+
| msg1    | tasks.RAGFLOW |
| msg2    | tasks.RAGFLOW |
+---------+---------------+
RAGFlow(admin)> mq pull noack;
+---------+---------------+
| message | subject       |
+---------+---------------+
| abc     | tasks.RAGFLOW |
+---------+---------------+
RAGFlow(admin)> mq show
+-------------------+----------------+--------+---------------+---------------+-------------------+---------------+
| ack_pending_count | consumer_count | memory | message_count | pending_count | redelivered_count | waiting_count |
+-------------------+----------------+--------+---------------+---------------+-------------------+---------------+
| 2                 | 1              | 0      | 2             | 0             | 1                 | 0             |
+-------------------+----------------+--------+---------------+---------------+-------------------+---------------+

RAGFlow(admin)> list ingestors;
+--------------+-------------------------------------------+--------+
| host         | name                                      | status |
+--------------+-------------------------------------------+--------+
| 192.168.1.38 | ingestor-8f0e4bd5650a4ac58b0151969fbf6935 | alive  |
+--------------+-------------------------------------------+--------+

RAGFlow(admin)> list ingestion tasks;
+----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+
| document_id                      | id                               | status    | step | user        | user_id                          |
+----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+
| ffe64fae423411f1a2d938a74640adcc | 90d3d0f6528941c1ac8eb0360effccc4 | COMPLETED | 5    | aaa@aaa.com | 2ba4881420fa11f19e9c38a74640adcc |
+----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+

RAGFlow(admin)> remove ingestion tasks '90d3d0f6528941c1ac8eb0360effccc4';
+---------+----------------------------------+
| delete  | task_id                          |
+---------+----------------------------------+
| success | 90d3d0f6528941c1ac8eb0360effccc4 |
+---------+----------------------------------+

RAGFlow(admin)> stop ingestion tasks 'e89e20d9a25848a1b79bd9345ddbfe1d';
+----------+----------------------------------+
| status   | task_id                          |
+----------+----------------------------------+
| STOPPING | e89e20d9a25848a1b79bd9345ddbfe1d |
+----------+----------------------------------+

# Publish a message
RAGFlow(admin)> mq publish 'cdd';
SUCCESS

# List current tasks in the message queue
RAGFlow(admin)> mq list
+----------------------------------+---------------+
| message                          | subject       |
+----------------------------------+---------------+
| 7ce392a3c1624cd2be4b5276e8825059 | tasks.RAGFLOW |
+----------------------------------+---------------+

# Consume a task from the message queue
RAGFlow(admin)> mq pull
+------+-----+----------------+
| ack  | id  | type           |
+------+-----+----------------+
| true | cdd | ingestion_test |
+------+-----+----------------+

# User mode
# List ingestion tasks, followed by dataset id
RAGFlow(user)> list ingestion tasks from '0abe79f9423311f1ad8d38a74640adcc';
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| create_date               | create_time   | dataset_id                       | document_id                      | id                               | schema | status    | update_date               | update_time   | user_id                          |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| 2026-05-30T20:21:06+08:00 | 1780143666289 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | 8d758cd14a8b4ba8ab505003fb52017d |        | COMPLETED | 2026-05-30T20:21:26+08:00 | 1780143686431 | 2ba4881420fa11f19e9c38a74640adcc |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+

RAGFlow(user)> list ingestion tasks;
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| create_date               | create_time   | dataset_id                       | document_id                      | id                               | schema | status    | update_date               | update_time   | user_id                          |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| 2026-06-02T19:02:31+08:00 | 1780398151417 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | e89e20d9a25848a1b79bd9345ddbfe1d |        | COMPLETED | 2026-06-02T19:02:52+08:00 | 1780398172208 | 2ba4881420fa11f19e9c38a74640adcc |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+

# Create an ingestion task
# First argument is document id, second argument is dataset id
RAGFlow(user)> start ingestion 'ffe64fae423411f1a2d938a74640adcc' from '0abe79f9423311f1ad8d38a74640adcc';
+----------------------------------+-------------------------------------------+
| document_id                      | result                                    |
+----------------------------------+-------------------------------------------+
| ffe64fae423411f1a2d938a74640adcc | task_id: 8d758cd14a8b4ba8ab505003fb52017d |
+----------------------------------+-------------------------------------------+

# Pause an ingestion task, first argument is ingestion id
RAGFlow(user)> stop ingestion '8d758cd14a8b4ba8ab505003fb52017d';
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| create_date               | create_time   | dataset_id                       | document_id                      | id                               | schema | status    | update_date               | update_time   | user_id                          |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| 2026-05-30T20:21:06+08:00 | 1780143666289 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | 8d758cd14a8b4ba8ab505003fb52017d |        | COMPLETED | 2026-05-30T20:21:26+08:00 | 1780143686431 | 2ba4881420fa11f19e9c38a74640adcc |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+

# Delete an ingestion task
RAGFlow(api/default)> remove ingestion tasks 'f366450a27d54677aec1c7090add30f0';
+---------+----------------------------------+
| remove  | task_id                          |
+---------+----------------------------------+
| success | f366450a27d54677aec1c7090add30f0 |
+---------+----------------------------------+

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-12 14:56:44 +08:00
Hz_
d4fe3bb148 feat(go-api): Add GET dataset metadata summary API (#15843)
## What

Adds the RESTful dataset metadata summary endpoint:

`GET /api/v1/datasets/{dataset_id}/metadata/summary`

The endpoint supports optional document filtering through:

`?doc_ids=doc_id_1,doc_id_2`
2026-06-09 19:27:47 +08:00
Jack
338fdb65fb feat(ci): enable go test in CI pipeline (#15750)
## What problem does this PR solve?

Go test files are never compiled in CI — only production binaries via
`go build`. This allowed a missing `"sort"` import in
`metadata_filter_test.go` to be merged without detection.

## Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

## Changes

- Add `go test -count=1 ./internal/...` step after Go build in CI
workflow
- Fix missing `"sort"` import in `metadata_filter_test.go` (pre-existing
compile error)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 20:06:57 +08:00
Hz_
074c331cdf fix(go-api): sync document handler interface and enforce preview acce… (#15688)
### Description

This PR syncs the `documentServiceIface` interface and introduces
handler methods for document preview, artifact fetching, and downloading
in the Go API. It also ensures that strict dataset alignment and access
checks are enforced when retrieving or downloading documents.

Furthermore, this PR introduces comprehensive unit tests for both the
newly added Handler and Service methods to ensure robustness and prevent
future regressions.

### Key Changes
* **Router & Handler Integration**: 
  * Added and wired new API endpoints in `internal/router/router.go`.
* Synchronized the `documentServiceIface` with `GetDocumentArtifact`,
`GetDocumentPreview`, and `DownloadDocument`.
* Implemented handlers for these endpoints in
`internal/handler/document.go`.
* **Access & Validation Enforcement**: 
* Refactored `internal/service/document.go` to strictly check if a
document belongs to the requested dataset before allowing downloads or
previews.
* Added robust artifact file sanitization (`sanitizeArtifactFilename`)
and attachment handling (`shouldForceArtifactAttachment`).
* **Comprehensive Unit Testing**:
* **Handler Layer (`internal/handler/document_test.go`)**: Added mock
service implementations and Gin router tests covering success,
not-found, and internal error states for all 3 new endpoints.
* **Service Layer (`internal/service/document_test.go`)**: Added
extensive business logic tests including dataset mismatch checks,
non-existent document checks, and artifact file validation.
2026-06-08 11:37:06 +08:00
Jack
c6eee09ed3 feat: migrate POST /api/v1/datasets/<dataset_id>/documents/stop to Go (#15597)
## Summary

Migrate the stop parse documents endpoint from Python to Go.

### Python endpoint
`POST /api/v1/datasets/<dataset_id>/documents/stop` —
`api/apps/restful_apis/document_api.py:1542-1641`

### Changes
| File | Change |
|------|--------|
| `internal/dao/task.go` | Add `GetByDocID` method |
| `internal/dao/task_test.go` | 3 DAO tests (new file) |
| `internal/service/document.go` | Add `StopParseDocuments` + refactor
shared helpers |
| `internal/service/document_test.go` | 8 service tests |
| `internal/handler/document.go` | Add handler + request struct +
interface |
| `internal/handler/document_test.go` | 5 handler tests |
| `internal/router/router.go` | Add `POST /:dataset_id/documents/stop`
route |

### How it works
1. Validates all document IDs belong to the dataset
2. For each document in RUNNING/CANCEL state (or with unfinished tasks):
- Sets Redis cancel signal `{task_id}-cancel` for each associated task
   - Updates `document.run` to CANCEL ("2")
3. Returns `{"success_count": N, "errors": [...]}`

### Test strategy
- **DAO/Service**: SQLite in-memory DB, zero mocks. Redis is nil-safe by
design.
- **Handler**: `fakeDocumentService` implementing `documentServiceIface`
interface.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-04 14:16:13 +08:00
Jack
67c3e73d70 feat: migrate DELETE /api/v1/datasets/:dataset_id/documents to Go (#15577)
## Summary

Migrate the batch document deletion endpoint from Python to Go. Two
modes supported: explicit `ids` list and `delete_all`.

## Changes

| File | Change |
|------|--------|
| `internal/dao/file2document.go` | Add `GetByDocumentID`,
`DeleteByDocumentID` |
| `internal/dao/file2document_test.go` | 5 new tests |
| `internal/dao/kb_test.go` | 2 new tests (`DecreaseDocumentNum`) |
| `internal/service/document.go` | Add `deleteDocumentFull` +
`DeleteDocuments`, refactor `DeleteDocument` |
| `internal/service/document_test.go` | 10 new tests |
| `internal/handler/document.go` | Add `documentServiceIface` +
`DeleteDocuments` handler |
| `internal/handler/document_test.go` | 7 new tests |
| `internal/router/router.go` | Register `DELETE /:dataset_id/documents`
|
| `cmd/server_main.go` | Support `RAGFLOW_DICT_PATH` env var |
| `internal/binding/rag_analyzer.go` | Use `-lpcre2-8` dynamic linking |
| `internal/dao/database.go` | Skip Error 1091/1138 during migration |
| `internal/service/llm.go` | Fix vet warning |

## Per-document cleanup

- Delete tasks from DB
- Hard-delete document + decrement KB counters
- Delete chunks from document engine (nil-guarded)
- Delete metadata from document engine (nil-guarded)
- Remove file2document mapping + file record + storage blob

## Test Results

**24 unit tests all passing** (7 DAO + 10 service + 7 handler) using
SQLite :memory: + gin.TestMode.

See [test report](docs/test_report_delete_documents.md) for manual
integration test results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 20:55:53 +08:00