## Summary
Align the Go implementations of these APIs with the Python behavior:
- `POST /api/v1/datasets/:dataset_id/metadata/update`
- `PATCH /api/v1/datasets/:dataset_id/documents/metadatas`
- `POST /api/v1/documents/upload`
## What changed
- Added the Go routes and handlers for the 3 APIs.
- Aligned batch document metadata updates with Python semantics:
- support `match` in update items
- support list append / replace behavior
- support deleting specific list values
- remove metadata entirely when it becomes empty
- create metadata for documents that previously had none when updates
apply
- count `updated` only when a document actually changes
- Aligned `documents/upload` file uploads with Python-style
`upload_info` behavior:
- store upload-info blobs in the per-user downloads bucket
- return lightweight upload descriptors instead of normal
file-management responses
- Improved URL upload behavior:
- SSRF-guarded fetch with redirect validation
- redirect limit aligned to Python behavior
- normalize filename and MIME type
- add `.pdf` when the fetched content is PDF
- normalize HTML content into readable text instead of storing raw HTML
shells
## Validation
### Unit tests
Passed:
- `go test ./internal/service`
- `go test ./internal/handler`
Also verified targeted cases for:
- batch metadata update semantics
- upload_info URL handling
- upload_info download bucket behavior
### curl checks
Verified the new Go endpoints with `curl` and compared the response
shape and behavior with Python for:
- `POST /api/v1/datasets/{dataset_id}/metadata/update`
- `PATCH /api/v1/datasets/{dataset_id}/documents/metadatas`
- `POST /api/v1/documents/upload`
The Go responses were checked against Python for:
- argument validation
- success response shape
- metadata update results
- upload_info result structure
- file vs URL input handling
### Description
Migrates the datasets tags aggregation API `GET
/api/v1/datasets/tags/aggregation` from Python to Go.
### Changes
- Registered the `GET /api/v1/datasets/tags/aggregation` route.
- Implemented `AggregateTags` in datasets `handler` and `service`.
- Added handler and service `unit tests`.
### Test Verification
- Verified by comparing results between Python (9380) and Go (9384)
services.
- Tested scenarios: single dataset, multiple datasets, empty parameters,
and unauthorized/invalid IDs.
- All tests and Go `unit tests` passed.
### What problem does this PR solve?
```
RAGFlow(api/default)> add admin host '127.0.0.1:9383';
SUCCESS
RAGFlow(api/default)> use admin;
SUCCESS
RAGFlow(admin)> delete api 'default';
SUCCESS
RAGFlow(admin)> delete api 'default';
CLI error: api server: default not found
RAGFlow(admin)> add api 'default' host '127.0.0.1:9384';
SUCCESS
RAGFlow(admin)> use api 'default';
SUCCESS
RAGFlow(api/default)> delete admin
SUCCESS
RAGFlow(api/default)> delete admin;
CLI error: admin server not exists
RAGFlow(api/default)> list api server;
+------------+---------------+-----------------+---------+
| api_server | api_server_ip | api_server_port | auth |
+------------+---------------+-----------------+---------+
| default | 127.0.0.1 | 9384 | no auth |
+------------+---------------+-----------------+---------+
RAGFlow(api/default)> add admin host '127.0.0.1:9383';
SUCCESS
RAGFlow(api/default)> show admin server;
+-------------------+-----------+
| field | value |
+-------------------+-----------+
| admin_server_ip | 127.0.0.1 |
| admin_server_port | 9383 |
| auth | no auth |
+-------------------+-----------+
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
When I run RAGFlow_server.py:
```
2026-06-24 10:27:01,938 ERROR 3413485 fetch task exception
Traceback (most recent call last):
File "/home/infiniflow/Documents/development/ragflow/api/db/services/document_service.py", line 948, in _sync_progress
if t.progress_msg.strip():
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'
```
fixed:
```python
if t.progress_msg.strip():
# fix:
if (t.progress_msg or "").strip():
```
Fix crash in `_sync_progress` when `progress_msg` is `None`.
#### Root Cause
`progress_msg` from task records can be `None`, causing:
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR follows up on
[#15863](https://github.com/infiniflow/ragflow/pull/15863) (Korean i18n)
with translation refinements and i18n coverage for hardcoded strings
found in the UI.
- Refine awkward Korean phrasing (e.g. 'Chunk 만들기' → 'Chunk 생성', '유형' →
'타입', etc.)
- Apply i18n to hardcoded strings in `message-item`,
`next-message-item`, `multi-select`, `chat-prompt-engine`, and various
filter hooks
- Rename `use-selelct-filters.ts` → `use-select-filters.ts` (typo fix)
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix release
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
This PR fixes two issues discovered during testing of the PaddleOCR
async API refactoring:
### 1. PP-OCRv6 returns `ocrResults` instead of `layoutParsingResults`
Models like PP-OCRv6 are pure text recognition models that return
results in `ocrResults.prunedResult.rec_texts` format rather than the
`layoutParsingResults.prunedResult.parsing_res_list` format used by
layout-aware models (PaddleOCR-VL series).
**Changes:**
- `deepdoc/parser/paddleocr_parser.py`: Extract `ocrResults` alongside
`layoutParsingResults` in `_send_request()`, add fallback logic in
`_transfer_to_sections()` and `parse_image()`
- `internal/entity/models/paddleocr.go`: Add `ocrResults` struct and
fallback extraction in Go OCR handler
### 2. Image parsing not integrated into picture chunker
The `parse_image()` method existed in PaddleOCRParser but was never
called from `rag/app/picture.py` (the module that handles image file
uploads). Users configuring PaddleOCR as their layout recognizer would
still get local deepdoc OCR for images.
**Changes:**
- `rag/app/picture.py`: When `layout_recognize` is set to PaddleOCR, use
`PaddleOCROcrModel.parse_image()` instead of local OCR. Falls back
gracefully to local OCR on failure.
## Testing
Verified end-to-end in Docker:
- PaddleOCR-VL-1.6 PDF parsing: ✅ (10 text blocks with bbox)
- PaddleOCR-VL-1.6 image parsing: ✅ (219 chars)
- PP-OCRv6 PDF parsing with ocrResults fallback: ✅ (10 text blocks)
- PP-OCRv6 image parsing with ocrResults fallback: ✅ (136 chars)
## Related PRs
- #15967 (merged) - PaddleOCR async Job API refactoring + new models
- #16086 (merged) - PaddleOCR image parsing support
### What problem does this PR solve?
`_get_optimal_clusters` in `rag/raptor.py` had two edge-case issues in
GMM cluster-count selection:
1. It used `np.arange(1, max_clusters)`, which never evaluates the
upper-bound candidate (`max_clusters`).
2. When effective `max_clusters` becomes `1`, the candidate list was
empty and `argmin` crashed.
This PR makes candidate evaluation inclusive (`1..max_clusters`) and
guards the single-cluster case by returning `1` directly.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Validation
- `pytest test/unit_test/rag/test_raptor_psi_tree_builder.py
--config-file pyproject.toml -q`
- `ruff check rag/raptor.py
test/unit_test/rag/test_raptor_psi_tree_builder.py`
### Tests added
- Regression test for `max_cluster == 1` path (no crash, returns 1)
- Regression test verifying upper-bound candidate is evaluated and can
be selected
_AI-assistance disclosure: parts of this change (bug triage and test
scaffolding) were drafted with AI assistance and fully reviewed and
verified by me._
---------
Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
### What problem does this PR solve?
- Tools management
- Pregel engine wrapper for better usage
- UT race
- Coding style
### Type of change
- [x] Refactoring
### What problem does this PR solve?
```
RAGFlow(admin)> show model 'abc';
+------------+----------------------------------------------------------------+
| field | value |
+------------+----------------------------------------------------------------+
| command | get_model_by_model_name |
| error | 'get model by model name' is implemented in enterprise edition |
| model_name | abc |
+------------+----------------------------------------------------------------+
RAGFlow(admin)> list models;
+-----------------+--------------------------------------------------------+
| command | error |
+-----------------+--------------------------------------------------------+
| list_all_models | 'list all models' is implemented in enterprise edition |
+-----------------+--------------------------------------------------------+
```
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Implement:
1. `/api/v1/datasets/<dataset_id>/documents/<document_id>/chunks GET`
2.
`/api/v1/datasets/<dataset_id>/documents/<document_id>/chunks/<chunk_id>
PATCH`
3. `/api/v1/datasets/<dataset_id>/documents/<document_id>/chunks PATCH`
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Provider default public key for CLI
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Refactor the Go agent port's logging so every log line — gin access,
agent canvas events, harness warnings, fatal boot errors — flows through
a single common.Logger (zap) backed by a rotated file, with structured
fields, level filtering, and configurable rotation.
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Problem
`thread_pool_exec()` dispatches work via `loop.run_in_executor()`, which
submits the callable with a plain `executor.submit(func, *args)` and
does **not** copy the caller's `contextvars.Context`. So a `ContextVar`
set in the async caller is not visible inside the function running in
the worker thread.
This differs from `asyncio.to_thread()`, which runs the callable inside
a copied context. `run_in_executor()` has never propagated context
(verified on Python 3.12 and 3.13) — so this is a pre-existing gap in
the helper, **not** a regression or a Python-version compatibility
issue.
Concretely, any code that sets a `ContextVar` in async code and reads it
inside a function dispatched via `thread_pool_exec` (request tracing,
per-task state, Langfuse trace propagation, etc.) silently loses that
context.
## Fix
Copy the current context before submitting and run the callable inside
it with `ctx.run()`, matching what `asyncio.to_thread()` does:
```python
async def thread_pool_exec(func, *args, **kwargs):
loop = asyncio.get_running_loop()
ctx = contextvars.copy_context()
if kwargs:
inner = functools.partial(func, *args, **kwargs)
return await loop.run_in_executor(_thread_pool_executor(), ctx.run, inner)
return await loop.run_in_executor(_thread_pool_executor(), ctx.run, func, *args)
```
This explicitly **adds** ContextVar propagation to the helper (it does
not restore any prior behavior). Backward-compatible.
## Tests
`TestThreadPoolExec` covers propagation, the kwargs path, per-call
isolation and the unset-default case.
> Note: the branch name still contains `python313` for historical
reasons; the change is unrelated to any Python version.
### What problem does this PR solve?
Updated the release workflow to install SIMDe headers into the MSYS2
toolchain include directory. Adjusted CMake flags to remove references
to the previous SIMDE_INCLUDE_DIR.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Stabilizes the Go unit-test surface so the test suite can run reliably
in CI and locally via \`bash build.sh --test\`.
## Verification
\`\`\`bash
bash build.sh --test -- -count=10 -run TestWithCancel_SequentialAgent
./internal/harness/core/
bash build.sh --test -- -count=5 -run TestSiliconflowChatExtracts
./internal/entity/models/
bash build.sh --test # full suite
\`\`\`
All previously failing packages (\`admin\`, \`cli\`, \`handler\`,
\`parser\`,
\`router\`, \`service\`, \`service/chunk\`) now build and test
successfully.
\`TestWithCancel_SequentialAgent\` passes 10/10 (was flaky). SiliconFlow
reasoning test passes after switching the assertion to the SiliconFlow
wire
format.
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
Several keys added in recent releases were missing from the French
(`fr.ts`) locale file.
- **`top`** — missing in both the common section and the dataset section
- **Chat channels** — all UI strings for the new chat channels feature
(`chatChannels`, `chatChannelDesc.*`, `connectDialog`, `notConnected`,
etc.)
- **Username validation** — `usernameMaxLength`,
`usernameInvalidCharacters`
- **Model editing** — `editCustomModelTitle`
## Changes
- `web/src/locales/fr.ts` — 47 lines added, no other files touched
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
```
RAGFlow(admin)> show role 'user' default models;
+--------------------------+-----------------------------------------------------------------+-----------+
| command | error | role_name |
+--------------------------+-----------------------------------------------------------------+-----------+
| show_role_default_models | 'show role default models' is implemented in enterprise edition | user |
+--------------------------+-----------------------------------------------------------------+-----------+
RAGFlow(admin)> set role 'user' default chat 'glm4.5@test@zhipu-ai';
+------------+---------------------------------------------------------------+
| field | value |
+------------+---------------------------------------------------------------+
| model_id | |
| model_type | chat |
| role_name | user |
| command | set_role_default_model |
| error | 'set role default model' is implemented in enterprise edition |
+------------+---------------------------------------------------------------+
RAGFlow(admin)> reset role 'user' default chat;
+------------+-----------------------------------------------------------------+
| field | value |
+------------+-----------------------------------------------------------------+
| command | reset_role_default_model |
| error | 'reset role default model' is implemented in enterprise edition |
| model_type | chat |
| role_name | user |
+------------+-----------------------------------------------------------------+
```
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- add public Go route for `/api/v1/searchbots/detail`
- implement beta-token auth flow for shared search access
- add tenant-based access check for shared search apps
- add joined search detail query for the share response
- align Go response shape with the current Python runtime behavior
- add DAO / service / handler tests for the new endpoint
close#16132
## Summary
This PR completes the Go-side merge and cleanup for chat channel APIs,
including handler/service wiring, route registration, and test coverage.
Implemented and aligned 5 chat channel APIs:
```
- POST `/api/v1/chat-channels`
- GET `/api/v1/chat-channels`
- GET `/api/v1/chat-channels/:channel_id`
- PATCH `/api/v1/chat-channels/:channel_id`
- DELETE `/api/v1/chat-channels/:channel_id`
```
Co-authored-by: Haruko386 <tryeverypossible@163.com>
### What problem does this PR solve?
```
fixed:
RAGFlow(api/default)> use model 'minimax-m2.5@test@minimax'
SUCCESS
RAGFlow(api/default)> chat message 'who r u'
Answer: Hey! I'm MiniMax-M2.5, an AI assistant here to help you with questions, tasks, or whatever you need. What can I do for you?
Time: 1.727263
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Removed references to 'simde' from the package lists and updated paths
for compiler detection and CMake configuration to ensure proper handling
of Windows executables.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)