### What problem does this PR solve?
```
RAGFlow(api/default)> show model 'WiseDiag-Z1 Think';
RAGFlow(api/default)> list models;
RAGFlow(admin)> show model 'WiseDiag-Z1 Think';
RAGFlow(admin)> list models;
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
```
RAGFlow(api/default)> show var 'mail.port';
+-----------+-----------+--------------+-------+
| data_type | name | setting_type | value |
+-----------+-----------+--------------+-------+
| integer | mail.port | config | 30 |
+-----------+-----------+--------------+-------+
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
```
RAGFlow(api/default)> show provider 'zhipu-ai'
RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test';
RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test' balance;
RAGFlow(api/default)> show provider 'zhipu-ai' model 'glm-4.5';
```
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Align Langfuse API key set/get/delete behavior with the Python
implementation.
- Improve DAO handling for Langfuse credential save/delete flows.
- Add tests for Langfuse service error handling and API key lifecycle
behavior.
### What problem does this PR solve?
As title
/api/v1/connectors/<connector_id> PATCH was implemented in #15512
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
As title:
implement:
```
/api/v1/messages/search GET
/api/v1/messages GET
/api/v1/messages/<memory_id>:<message_id>/content GET
/api/v1/memories/<memory_id>/config GET
/api/v1/messages/<memory_id>:<message_id> PUT
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Bug
`RAGFlowHtmlParser.chunk_block()` splits an oversized block by slicing
the **tokenized** string and storing the joined tokens:
```python
tks_str = rag_tokenizer.tokenize(block)
...
tokens = tks_str.split(" ")
while start < len(tokens):
chunks.append(" ".join(tokens[start:start + chunk_token_num])) # tokenized form, not source
```
On the default (Elasticsearch) backend `rag_tokenizer.tokenize`
transforms text: it lowercases/stems Latin words and inserts spaces
between CJK characters. So any text block longer than `chunk_token_num`
is stored as garbled, lowercased, space-segmented text instead of the
source content. The small-block branch correctly stores the original
`block`, so only oversized blocks are corrupted. Affects HTML and EPUB
ingestion (both go through `chunk_block`), degrading retrieved chunks
and the answers generated from them.
### Real tokenizer behavior (infinity-sdk 0.7.0, ES backend)
```
tokenize("Hello World FOO Bar Baz Qux Jumps") -> "hello world foo bar baz qux jump" # lowercased + stemmed
tokenize("你好世界这是一个测试") -> "你好世界 这 是 一个 测试" # spaces inserted
```
### Fix
Split the **original** text: break it into atoms (whitespace-delimited
runs for space-separated scripts, per-character for spaceless scripts
such as Chinese) and pack them into pieces of at most `chunk_token_num`
tokens. This preserves the source characters and still splits scripts
that have no whitespace — a plain whitespace split would leave CJK as
one un-splittable chunk.
### Proof (real tokenizer, before/after)
Running the old vs new split against the real `infinity.rag_tokenizer`:
```
ENGLISH "Hello World FOO Bar Baz Qux Lazy Dogs" (chunk_token_num=4)
OLD: ['hello world foo bar', 'baz qux jump over', 'lazi dog'] # lowercased + stemmed
NEW: ['Hello World FOO Bar ', 'Baz Qux Jumps Over ', 'Lazy Dogs'] # preserved; each <= 4 tokens
NEW preserves text exactly: True
CHINESE "你好世界这是一个测试用例需要被切分成多个块" (chunk_token_num=3)
OLD: ['你好世界 这 是', '一个 测试用例 需要', ...] # spurious spaces
NEW: ['你好世', '界这是', '一个测', ...] # preserved; each <= 3 tokens
NEW preserves text exactly: True
```
### Tests
Added `test/unit_test/deepdoc/parser/test_html_parser.py` (English +
Chinese oversized blocks, plus small-block merge). Before the fix the
two oversized tests fail (English shows lowercasing, Chinese shows
inserted spaces); after the fix all pass. `ruff check` clean.
### What problem does this PR solve?
`DeepLParam.check()` validated `self.top_n`, but DeepL has no such
parameter (it is not defined on the param class or its base), so
`check()` always raised `AttributeError` and a DeepL component could
never pass validation. Removed the bogus `top_n` check.
Also fixed the `_run` except branch, which computed
`be_output("**Error**...")` but never returned it, silently dropping the
error message.
Closes#16329
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Add test cases
### Testing
Added `test/unit_test/agent/component/test_deepl.py` covering
`DeepLParam.check()` with valid defaults and rejection of invalid
source/target languages.
### What problem does this PR solve?
Fixes the PubMed tool always emitting `Authors: Unknown Authors`. The
`safe_find` closure in `_format_pubmed_content` was hardcoded to search
from the article root, so the per-author `LastName`/`ForeName` lookups
never matched.
`safe_find` now accepts an optional `base` node (defaults to `child`,
preserving the existing field lookups), and the author loop passes the
current `<Author>` element.
Closes#16328
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Add test cases
### Testing
Added `test/testcases/test_web_api/test_canvas_app/test_pubmed_unit.py`
covering per-author parsing, intact title/journal/DOI fields, and the
no-authors fallback.
Before: `Authors: Unknown Authors`
After: `Authors: Furqan Khan, Jane Smith`
## Summary
Add the Go implementation of `POST
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks`.
This wires the full create-chunk path in Go:
- router and handler registration
- request/response structs
- chunk creation service logic
- embedding generation
- chunk insert into doc engine
- chunk/token counter increment
- `tag_feas` validation
- `image_base64` decoding and chunk image storage/merge
- unit tests for handler and service
## Testing
Unit tests:
- `/usr/local/go/bin/go test ./internal/handler`
- `/usr/local/go/bin/go test ./internal/service/chunk`
- `/usr/local/go/bin/go test ./internal/service`
- `/usr/local/go/bin/go test ./...`
All passed locally.
Manual curl checks:
- basic text chunk: Go passed
- chunk with `important_keywords` / `questions` / `tag_kwd` /
`tag_feas`: Go passed
- blank content validation: Go matched expected `code=102`
- invalid `image_base64` validation: Go matched expected `code=102`
- image upload and repeated image upload / merge path: Go passed twice
### What problem does this PR solve?
```
RAGFlow(api/default)> list dataset 'ccc' files;
Total: 1
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
Migrated the dataset document upload API (`POST
/api/v1/datasets/:dataset_id/documents`) from Python to the Go backend.
It supports local file uploads (`type=local`), web page ingestion
(`type=web`), and empty document creation (`type=empty`).
## Changes
- **Router**: Registered `POST /api/v1/datasets/:dataset_id/documents`
route.
- **Handler**: Implemented `UploadDocuments` handler and its routing
functions (`uploadLocalDocuments`, `uploadWebDocument`,
`uploadEmptyDocument`).
- **Service**: Implemented `UploadLocalDocuments`, `UploadWebDocument`,
and `UploadEmptyDocument` in `DocumentService`.
- **Refactoring**: Moved permission checking logic to a shared helper
for reuse in file and document services.
- **Tests**: Added comprehensive unit tests for the new handler and
service upload paths.
## Verification
Ran and passed the test suite for service and handler packages:
- `go test ./internal/service`
- `go test ./internal/handler`
### What problem does this PR solve?
- added the new dataset search route and handler
- reused the existing shared SearchDatasets service by adapting
single-dataset requests into dataset_ids=[dataset_id]
- aligned handler error responses with Python behavior for argument/data
errors
- aligned key service error messages such as invalid search_id and mixed
embedding models
- added focused handler and service tests for request mapping and error
behavior
### Tests:
`/usr/local/go/bin/go test ./internal/service -run
'TestSearchDatasetRequestToSearchDatasetsRequest|TestDatasetServiceSearchDatasets'`
`/usr/local/go/bin/go test ./internal/handler -run
'TestDatasetsHandlerSearchDataset'`
## Problem
The Wikipedia tool silently swallows all exceptions with `except
Exception: pass`, making it impossible to debug failures when fetching
Wikipedia pages.
## Fix
Replace the bare `except Exception: pass` with specific exception
handling:
- `DisambiguationError`: log available options
- `PageError`: log page not found
- `Exception`: log unexpected errors with full traceback
Co-authored-by: wills <willsgao@163.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Adds Keenable as a web search tool in the agent, alongside the existing
Tavily/DuckDuckGo/SearXNG/Google tools.
The main difference from the other search tools is that it doesn't
require an
API key. By default it uses Keenable's keyless public endpoint, so it
works out
of the box. Providing a key (in the tool config) switches to the
authenticated
endpoint and lifts the rate limits.
### Changes
- Backend: `agent/tools/keenable.py` — `KeenableSearch`, follows the
Tavily/DuckDuckGo tool shape (results go through `_retrieve_chunks`).
Auto-registered by `agent/tools/__init__.py`.
- Frontend: wired into the agent builder — operator + icon, config form
(optional API key, search mode, site filter, top N), the search tool
menu,
and the existing api_key export sanitizer.
### Config
- API key: optional. Blank = keyless free tier; set it to lift limits /
enable
`realtime` mode.
- `site`: restrict to a single domain.
- `mode`: `pro` (default) or `realtime`.
### Notes
`KEENABLE_API_URL` can override the API base (HTTPS enforced; defaults
to
`https://api.keenable.ai`). The tool only sends the query (no URL
fetch), so
there's no SSRF surface. Verified the frontend with `vite build` and the
backend search path against the public endpoint.
### What problem does this PR solve?
- OpenTelemetry integration
- Checkpoint conformance tests
- State inspector API
- Callbacks
- A series of fault injection tests
- Pregel integration tests
### Type of change
- [x] Refactoring