Commit Graph

6978 Commits

Author SHA1 Message Date
Wang Qi
638b59fbcd Fix handle move file failed (#16384)
Follow on PR: #16350
2026-06-26 18:46:21 +08:00
balibabu
d14d2068c4 Fix: If the type of the loop variable in the Loop operator is set to object, an error occurs when clicking the Variable Replicator operator inside it. (#16388) 2026-06-26 18:44:56 +08:00
Lynn
bf1eabea72 Feat: support new qwen model (#16385) 2026-06-26 17:30:16 +08:00
buua436
f80d4c7843 fix: tighten loop validation (#16374) 2026-06-26 16:29:08 +08:00
chanx
9610173a74 feat: add log icon to parsing status display (#16383) 2026-06-26 16:13:01 +08:00
Wang Qi
985e3c1db5 Fix document progress not set to fail when embedding model error (#16381) 2026-06-26 16:11:54 +08:00
Öndery
8081a77c7c Fix missing move and copy methods in Python RAGFlowS3 storage implementation (#16350) 2026-06-26 15:51:24 +08:00
Jin Hai
2667995b25 Go CLI: Fix show model and list models (#16380)
### What problem does this PR solve?

```
RAGFlow(api/default)> show model 'WiseDiag-Z1 Think';

RAGFlow(api/default)> list models;

RAGFlow(admin)> show model 'WiseDiag-Z1 Think';

RAGFlow(admin)> list models;
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-26 15:36:01 +08:00
Hz_
0de8f3e127 feat: add missing qwen models to all_models.json (#16379)
Add 19 missing qwen models and 3 aliases to all_models.json.

Models added: qwen-image-2.0-pro (2026-06-22, 2026-04-22), qwen3.5-ocr,
qwen3.7-max-2026-05-17, qwen3.5-livetranslate-flash-realtime,
qwen3.5-omni-plus/flash-realtime, qwen-deep-research-2025-12-15,
qwen-flash-character-2026-02-26, qwen-plus-2025-11-05,
qwen-deep-search-planning, qwen3-s2s-flash-realtime-2025-09-22,
qwen-max-1201/longcontext/0107, qwen-1.8b-longcontext-chat

Aliases: qwen3.5-plus-2026-04-20, qwen-turbo-0919, qwen-1.8b-chat
2026-06-26 15:35:30 +08:00
writinwaters
5af798607e Docs: Added v0.26.2 release notes. (#16373) 2026-06-26 15:18:54 +08:00
Jin Hai
8bc27d8df1 Go CLI: fix show variable (#16370)
### What problem does this PR solve?

```
RAGFlow(api/default)> show var 'mail.port';
+-----------+-----------+--------------+-------+
| data_type | name      | setting_type | value |
+-----------+-----------+--------------+-------+
| integer   | mail.port | config       | 30    |
+-----------+-----------+--------------+-------+
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-26 13:51:56 +08:00
Jin Hai
65afaa1292 Model config: add tools (#16371)
### What problem does this PR solve?

```
{
      "name": "glm-4-flash",
      "max_tokens": 128000,
      "model_types": [
        "chat"
      ],
      "tools": {
        "support": true
      }
}
```

```
RAGFlow(admin)> list provider 'zhipu-ai' models;
+------------+---------------+------------+---------------+----------------+-----------+-----------+
| dimensions | max_dimension | max_tokens | model_type    | name           | thinking  | tools     |
+------------+---------------+------------+---------------+----------------+-----------+-----------+
|            |               | 204800     | [chat]        | glm-5          | supported | supported |
|            |               | 204800     | [chat]        | glm-5-turbo    | supported | supported |
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-26 11:37:51 +08:00
Jack
70250ec88c Fix: remove deepdoc dep (#16372) dev-20260626 2026-06-26 11:32:16 +08:00
Yash Raj Pandey
dd2c88b768 fix(excel_parser): keep zero-valued cells when building Excel text chunks (#16287) 2026-06-26 09:30:09 +08:00
Jin Hai
58da1d6bc3 Go CLI: fix model related commands (#16368)
### What problem does this PR solve?

```
RAGFlow(api/default)> show provider 'zhipu-ai'

RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test';

RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test' balance;

RAGFlow(api/default)> show provider 'zhipu-ai' model 'glm-4.5';
```

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-26 07:07:49 +08:00
Jin Hai
dbefadd86a Go CLI: refactor (#16355) 2026-06-25 20:36:50 +08:00
Jack
304d9e02bb Refactor: migrate pdf_parser.py to golang (#16323)
### What problem does this PR solve?

Http API based on onnx model.
pdf_parser.py to golang

### Type of change

- [x] Refactoring
2026-06-25 20:16:16 +08:00
Harsh Kashyap
c7052f4dd1 fix(rag/nlp): treat string input as one phrase in is_english (#16308) 2026-06-25 20:07:09 +08:00
Wang Qi
5defb4e7d6 Revert "fix(deepdoc): keep zero and false Excel cells in __call__" (#16366)
Reverts infiniflow/ragflow#16318
2026-06-25 19:56:47 +08:00
Harsh Kashyap
8d3c3f868c fix(api): validate immutable document fields when value is zero (#16309) 2026-06-25 19:29:12 +08:00
Harsh Kashyap
66d86154ab fix(deepdoc): accept GFM table separators with one or more dashes (#16319) 2026-06-25 19:25:57 +08:00
Hz_
e290a0d23e feat(go-api): Langfuse API key migration behavior (#16356)
## Summary

- Align Langfuse API key set/get/delete behavior with the Python
implementation.
- Improve DAO handling for Langfuse credential save/delete flows.
- Add tests for Langfuse service error handling and API key lifecycle
behavior.
2026-06-25 19:25:55 +08:00
Yoorim Choi
46b97bd1a1 fix(web): fix layout issues with text, overflow, and spacing consistency (#16324) 2026-06-25 19:25:32 +08:00
cleanjunc
e8bb534b90 fix: naive_merge splits oversized sections and counts overlap tokens correctly (#15802) 2026-06-25 19:19:38 +08:00
Harsh Kashyap
0af5d43e8d fix(deepdoc): keep zero and false Excel cells in __call__ (#16318) 2026-06-25 19:12:57 +08:00
Haruko386
43b96223b4 feat[go]: add router for connectors/<connector_id> PATCH (#16358)
### What problem does this PR solve?

As title

/api/v1/connectors/<connector_id> PATCH was implemented in #15512

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-06-25 19:07:52 +08:00
Haruko386
74597b8683 feat[Go]: implemet api: Search/Get/Update-Messages (#16307)
### What problem does this PR solve?

As title:
implement:
```
/api/v1/messages/search GET
/api/v1/messages GET
/api/v1/messages/<memory_id>:<message_id>/content GET
/api/v1/memories/<memory_id>/config GET
/api/v1/messages/<memory_id>:<message_id> PUT
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-25 19:07:34 +08:00
Harsh Kashyap
49312cace3 fix(api): align use_sql Markdown separator with Source header (#16317) 2026-06-25 19:00:01 +08:00
balibabu
1dfc24003b Fix: An empty message notification pops up at the top of the agent conversation. (#16353) 2026-06-25 17:32:24 +08:00
Wang Qi
31e50b164f Fix [ID:0] not converted to Fig. 1 (#16357) 2026-06-25 17:17:46 +08:00
Wang Qi
ac9469e5f5 Fix add VLLM without apikey will fail (#16352) 2026-06-25 17:17:29 +08:00
Wang Qi
97c519662a Add env ALLOW_ANY_HOST to skip host check (#16351) 2026-06-25 17:17:02 +08:00
maoyifeng
6e7aa75e71 Go:CLI add new response function (#16347)
### What problem does this PR solve?

 add new response function

### Type of change

- [ ] New Feature (non-breaking change which adds functionality)
2026-06-25 16:49:47 +08:00
Yash Raj Pandey
091417980e fix(html_parser): preserve original text when splitting oversized blocks (#16052)
### Bug

`RAGFlowHtmlParser.chunk_block()` splits an oversized block by slicing
the **tokenized** string and storing the joined tokens:

```python
tks_str = rag_tokenizer.tokenize(block)
...
tokens = tks_str.split(" ")
while start < len(tokens):
    chunks.append(" ".join(tokens[start:start + chunk_token_num]))  # tokenized form, not source
```

On the default (Elasticsearch) backend `rag_tokenizer.tokenize`
transforms text: it lowercases/stems Latin words and inserts spaces
between CJK characters. So any text block longer than `chunk_token_num`
is stored as garbled, lowercased, space-segmented text instead of the
source content. The small-block branch correctly stores the original
`block`, so only oversized blocks are corrupted. Affects HTML and EPUB
ingestion (both go through `chunk_block`), degrading retrieved chunks
and the answers generated from them.

### Real tokenizer behavior (infinity-sdk 0.7.0, ES backend)

```
tokenize("Hello World FOO Bar Baz Qux Jumps")  -> "hello world foo bar baz qux jump"   # lowercased + stemmed
tokenize("你好世界这是一个测试")                 -> "你好世界 这 是 一个 测试"            # spaces inserted
```

### Fix

Split the **original** text: break it into atoms (whitespace-delimited
runs for space-separated scripts, per-character for spaceless scripts
such as Chinese) and pack them into pieces of at most `chunk_token_num`
tokens. This preserves the source characters and still splits scripts
that have no whitespace — a plain whitespace split would leave CJK as
one un-splittable chunk.

### Proof (real tokenizer, before/after)

Running the old vs new split against the real `infinity.rag_tokenizer`:

```
ENGLISH "Hello World FOO Bar Baz Qux Lazy Dogs"  (chunk_token_num=4)
  OLD: ['hello world foo bar', 'baz qux jump over', 'lazi dog']          # lowercased + stemmed
  NEW: ['Hello World FOO Bar ', 'Baz Qux Jumps Over ', 'Lazy Dogs']      # preserved; each <= 4 tokens
  NEW preserves text exactly: True

CHINESE "你好世界这是一个测试用例需要被切分成多个块"  (chunk_token_num=3)
  OLD: ['你好世界 这 是', '一个 测试用例 需要', ...]                      # spurious spaces
  NEW: ['你好世', '界这是', '一个测', ...]                               # preserved; each <= 3 tokens
  NEW preserves text exactly: True
```

### Tests

Added `test/unit_test/deepdoc/parser/test_html_parser.py` (English +
Chinese oversized blocks, plus small-block merge). Before the fix the
two oversized tests fail (English shows lowercasing, Chinese shows
inserted spaces); after the fix all pass. `ruff check` clean.
2026-06-25 16:43:35 +08:00
Jin Hai
edfa9be67f Go CLI: fix list provider instance tasks (#16345) 2026-06-25 15:49:31 +08:00
balibabu
3f3a2ece3d Fix: Flexible Chat Configuration (#16293) 2026-06-25 14:56:30 +08:00
Muhammad Furqan
fe14cc35cf fix(agent/tools): DeepL component fails validation and drops errors (#16332)
### What problem does this PR solve?

`DeepLParam.check()` validated `self.top_n`, but DeepL has no such
parameter (it is not defined on the param class or its base), so
`check()` always raised `AttributeError` and a DeepL component could
never pass validation. Removed the bogus `top_n` check.

Also fixed the `_run` except branch, which computed
`be_output("**Error**...")` but never returned it, silently dropping the
error message.

Closes #16329

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Add test cases

### Testing

Added `test/unit_test/agent/component/test_deepl.py` covering
`DeepLParam.check()` with valid defaults and rejection of invalid
source/target languages.
2026-06-25 14:40:56 +08:00
Harsh Kashyap
09047d6edf fix(web): bump lodash past vulnerable range (#16281) 2026-06-25 14:40:39 +08:00
Idriss Sbaaoui
fb8e5ad4b2 Fix multimodal chat image routing for VLM channel requests (#16343) 2026-06-25 14:38:29 +08:00
Muhammad Furqan
3747a6bfeb fix(agent/tools): PubMed tool always returns "Unknown Authors" (#16330)
### What problem does this PR solve?

Fixes the PubMed tool always emitting `Authors: Unknown Authors`. The
`safe_find` closure in `_format_pubmed_content` was hardcoded to search
from the article root, so the per-author `LastName`/`ForeName` lookups
never matched.

`safe_find` now accepts an optional `base` node (defaults to `child`,
preserving the existing field lookups), and the author loop passes the
current `<Author>` element.

Closes #16328

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Add test cases

### Testing

Added `test/testcases/test_web_api/test_canvas_app/test_pubmed_unit.py`
covering per-author parsing, intact title/journal/DOI fields, and the
no-authors fallback.

Before: `Authors: Unknown Authors`
After:  `Authors: Furqan Khan, Jane Smith`
2026-06-25 14:34:37 +08:00
Harsh Kashyap
b9445c67e2 fix(agent): coerce None Switch inputs before string operators (#16320)
## Summary
- Coerce `None` canvas values to `""` before string comparison operators
in `Switch.process_operator`.
- Prevents `AttributeError` when upstream components yield `None` and
the Switch uses contains/start with/end with.

## Test plan
- [x] `.v/bin/python -m ruff check agent/component/switch.py
test/unit_test/agent/component/test_switch.py`
- [x] `.v/bin/python -m pytest
test/unit_test/agent/component/test_switch.py -q` (3 passed)

Fixes #16315

---------

Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local>
2026-06-25 14:18:24 +08:00
Hz_
54fb5b0fa7 feat(go-api): add Go support for POST /api/v1/datasets/{dataset_id}/documents/{document_id}/chunks (#16256)
## Summary
Add the Go implementation of `POST
/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks`.

This wires the full create-chunk path in Go:
- router and handler registration
- request/response structs
- chunk creation service logic
- embedding generation
- chunk insert into doc engine
- chunk/token counter increment
- `tag_feas` validation
- `image_base64` decoding and chunk image storage/merge
- unit tests for handler and service

## Testing
Unit tests:
- `/usr/local/go/bin/go test ./internal/handler`
- `/usr/local/go/bin/go test ./internal/service/chunk`
- `/usr/local/go/bin/go test ./internal/service`
- `/usr/local/go/bin/go test ./...`

All passed locally.

Manual curl checks:
- basic text chunk: Go passed
- chunk with `important_keywords` / `questions` / `tag_kwd` /
`tag_feas`: Go passed
- blank content validation: Go matched expected `code=102`
- invalid `image_base64` validation: Go matched expected `code=102`
- image upload and repeated image upload / merge path: Go passed twice
2026-06-25 14:15:29 +08:00
chanx
d44359826d fix(web): agent log refetch and slider percentage rounding (#16344) 2026-06-25 13:49:25 +08:00
Jin Hai
17b066e6ae Go CLI: fix list dataset files by dataset name (#16341)
### What problem does this PR solve?

```
RAGFlow(api/default)> list dataset 'ccc' files;
Total: 1
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-25 13:41:58 +08:00
Hz_
a6cc3023c5 feat(go-api): implement dataset document upload API (#16295)
## Summary
Migrated the dataset document upload API (`POST
/api/v1/datasets/:dataset_id/documents`) from Python to the Go backend.
It supports local file uploads (`type=local`), web page ingestion
(`type=web`), and empty document creation (`type=empty`).

## Changes
- **Router**: Registered `POST /api/v1/datasets/:dataset_id/documents`
route.
- **Handler**: Implemented `UploadDocuments` handler and its routing
functions (`uploadLocalDocuments`, `uploadWebDocument`,
`uploadEmptyDocument`).
- **Service**: Implemented `UploadLocalDocuments`, `UploadWebDocument`,
and `UploadEmptyDocument` in `DocumentService`.
- **Refactoring**: Moved permission checking logic to a shared helper
for reuse in file and document services.
- **Tests**: Added comprehensive unit tests for the new handler and
service upload paths.

## Verification
Ran and passed the test suite for service and handler packages:
- `go test ./internal/service`
- `go test ./internal/handler`
2026-06-25 13:36:49 +08:00
Hz_
ced51114f4 feat(go-api): add dataset search endpoint (#16304)
### What problem does this PR solve?


- added the new dataset search route and handler
- reused the existing shared SearchDatasets service by adapting
single-dataset requests into dataset_ids=[dataset_id]
- aligned handler error responses with Python behavior for argument/data
errors
- aligned key service error messages such as invalid search_id and mixed
embedding models
- added focused handler and service tests for request mapping and error
behavior

### Tests:

`/usr/local/go/bin/go test ./internal/service -run
'TestSearchDatasetRequestToSearchDatasetsRequest|TestDatasetServiceSearchDatasets'`
`/usr/local/go/bin/go test ./internal/handler -run
'TestDatasetsHandlerSearchDataset'`
2026-06-25 13:32:22 +08:00
Willsgao
824c88423c fix(agent): log Wikipedia disambiguation and page errors instead of s… (#16207)
## Problem
The Wikipedia tool silently swallows all exceptions with `except
Exception: pass`, making it impossible to debug failures when fetching
Wikipedia pages.

## Fix
Replace the bare `except Exception: pass` with specific exception
handling:
- `DisambiguationError`: log available options
- `PageError`: log page not found
- `Exception`: log unexpected errors with full traceback

Co-authored-by: wills <willsgao@163.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-25 13:10:29 +08:00
buua436
479a9a715e feat: unify provider id or name routing (#16336) 2026-06-25 13:04:21 +08:00
Wang Qi
d0fc75f1bb Fix when empty response not set, it report: ERROR: 'knowledge' (#16338) 2026-06-25 13:02:24 +08:00
Ilya Bogin
10d02e54a8 Add Keenable web search tool to the agent (#16233)
Adds Keenable as a web search tool in the agent, alongside the existing
Tavily/DuckDuckGo/SearXNG/Google tools.

The main difference from the other search tools is that it doesn't
require an
API key. By default it uses Keenable's keyless public endpoint, so it
works out
of the box. Providing a key (in the tool config) switches to the
authenticated
endpoint and lifts the rate limits.

### Changes

- Backend: `agent/tools/keenable.py` — `KeenableSearch`, follows the
  Tavily/DuckDuckGo tool shape (results go through `_retrieve_chunks`).
  Auto-registered by `agent/tools/__init__.py`.
- Frontend: wired into the agent builder — operator + icon, config form
(optional API key, search mode, site filter, top N), the search tool
menu,
  and the existing api_key export sanitizer.

### Config

- API key: optional. Blank = keyless free tier; set it to lift limits /
enable
  `realtime` mode.
- `site`: restrict to a single domain.
- `mode`: `pro` (default) or `realtime`.

### Notes

`KEENABLE_API_URL` can override the API base (HTTPS enforced; defaults
to
`https://api.keenable.ai`). The tool only sends the query (no URL
fetch), so
there's no SSRF surface. Verified the frontend with `vite build` and the
backend search path against the public endpoint.
2026-06-25 12:12:28 +08:00