Commit Graph

6841 Commits

Author SHA1 Message Date
Jin Hai
20d11648a4 Go: add statistics command (#16119)
### What problem does this PR solve?

Prepare for enterprise command

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-18 15:21:44 +08:00
Haruko386
351b61a243 Go CLI: add support for windows, linux, macos (#16082)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2026-06-18 15:20:00 +08:00
jaso0n0818
a70c7e8cc7 fix(deepdoc): attach lone header lines to the following section when delimiter is set (#16109)
## Summary
Fixes #15487 — lone markdown headers are no longer isolated as empty
chunks when a custom `delimiter` is set.

- Merge consecutive lone headers before attaching to the following prose
body
- Skip code fences, tables, lists, and blockquotes via
`_is_attachable_body()`
- Unit tests include the `# Title / ## Intro / Body` regression from
CodeRabbit review

## Validation
- `pytest test/unit_test/deepdoc/parser/test_markdown_parser.py` (11
passed locally)

Closes #15487
2026-06-18 14:24:09 +08:00
Haruko386
27d723e13a fix: fix some bugs in check_conn and drop_inst (#16180)
### What problem does this PR solve?

As title:

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-18 14:19:46 +08:00
balibabu
a9021528c3 Fix: Lint error. (#16172)
### What problem does this PR solve?

Fix: Lint error.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-18 13:14:18 +08:00
buua436
ea70663f09 feat: support wecom websocket channel (#16175)
Added WeCom chat channel websocket mode alongside the existing webhook mode, plus frontend support for selecting the connection type.
2026-06-18 13:10:09 +08:00
Hz_
69dbc44983 feat(go-api): migrate MCP server detail and download API to Go (#16113)
### What problem does this PR solve?

- Migrated MCP server detail and export (download) API from Python to
Go.
- Registered route: `GET /api/v1/mcp/servers/:mcp_id` (supporting
`?mode=download` query parameter).
2026-06-18 11:09:22 +08:00
Hz_
f59332bc37 feat(go-api): implement Go-side document PATCH API & align parsing/metadata sync behavior (#15975)
### What problem does this PR solve?

This PR implements the Go backend counterpart for the document partial
update API:
`PATCH /api/v1/datasets/:dataset_id/documents/:document_id`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-06-18 11:08:47 +08:00
Idriss Sbaaoui
8ff6a21af9 Fix: cli points to the wrong api endpoints (#16171)
### What problem does this PR solve?

fix the cli endpoints

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-18 10:54:33 +08:00
xu haiLong
a9ddcae0b3 Fix: MCP dataset discovery fails due to REST API max page size limit … (#16148)
Fix #16146
2026-06-18 09:39:37 +08:00
Wang Qi
99a25dca34 Fix Chat/Search/Agent bot show image (#16152)
Fix Chat/Search/Agent bot show image
2026-06-18 09:38:31 +08:00
Hz_
065797b047 Refactor(go-cli): improve variable and label naming in CLI parseAddModel (#16145)
### What problem does this PR solve?

This PR improves code readability in the CLI parser by renaming the loop
index `i` to `modelIndex`. It also renames the loop label `A` to
`optionsLoop` to align with standard Go naming conventions.

### Type of change

- [x] Refactoring
2026-06-17 20:21:42 +08:00
Wang Qi
27a05be643 Fix the launch script (#16159)
Fix the launch script
2026-06-17 20:20:37 +08:00
Haruko386
a3e3bdd386 fix back release.yml to old version (#16160)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
v0.26.1
2026-06-17 20:02:42 +08:00
dependabot[bot]
c1c79c2e55 build(deps): bump python-multipart from 0.0.21 to 0.0.31 (#16088) 2026-06-17 19:39:42 +08:00
Liu An
4379269374 Docs: Update version references to v0.26.1 in READMEs and docs (#16158)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.26.0 to v0.26.1
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-06-17 19:35:32 +08:00
Idriss Sbaaoui
7d3928e501 Enhancement: update ci for parallel test execution (#16133)
### What problem does this PR solve?

split ci into multiple jobs

### Type of change

- [x] Performance Improvement
2026-06-17 19:22:24 +08:00
BitToby
2ab9256e8a fix(go): correct OpenRouter streaming URL routing and reasoning parameter (#16111)
### What problem does this PR solve?

Fixes two bugs in the OpenRouter streaming chat request builder
(`internal/entity/models/openrouter.go`, `ChatStreamlyWithSender`):

1. **qwen/glm models streamed to a broken URL.** The code routed any
`qwen`/`glm` model to
`URLSuffix.AsyncChat`, but `conf/models/openrouter.json` defines no
`async_chat` suffix
(empty), so the request was POSTed to `<base>/` instead of
`<base>/chat/completions` —
breaking streaming for every qwen/glm model. The non-stream path has no
such branch.
Fix: all models use the standard `Chat` suffix, consistent with the
non-stream path.

2. **Streaming reasoning was never enabled.** The request set reasoning
via a non-standard
`thinking` key, which OpenRouter ignores. OpenRouter's API — and this
provider's own
non-stream request (line ~110) and its streamed `delta.reasoning` parser
(line ~311) —
use the `reasoning` object. Fix: send `reasoning: {"enabled":
<thinking>}` (and
`{"effort": ...}` when set, taking precedence as in the non-stream
path).

Closes #16110

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 19:14:13 +08:00
balibabu
cf7b06c0f3 Fix: A pipeline created from a template fails immediately upon execution with a "hierarchy does not exist" error. (#16151)
### What problem does this PR solve?

Fix: A pipeline created from a template fails immediately upon execution
with a "hierarchy does not exist" error.
2026-06-17 19:07:04 +08:00
Lynn
a5cce29f22 Fix: add mimo (#16136)
### What problem does this PR solve?

Add chat model factory for Xiaomi model.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 19:02:33 +08:00
writinwaters
cb2e061120 Docs: Updated v0.26.1 release date. (#16154)
### What problem does this PR solve?

Updated v0.26.1 release date.

### Type of change


- [x] Documentation Update
2026-06-17 18:53:06 +08:00
buua436
43d121ad38 feat: add qqbot chat channel (#16140)
### What problem does this PR solve?
Adds qqbot as a built-in chat channel so it can be discovered and
started by the channel bootstrapper and shown in the chat channel
settings UI.

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2026-06-17 18:49:38 +08:00
Hunnyboy1217
e178c81bb4 refactor(go-models): harden Ollama ListModels and route through ParseListModel (#15853) (#15955)
### What problem does this PR solve?

Part of #15853 (provider model-list refactor).

Refactors **Ollama** `ListModels` onto the shared `ParseListModel`
pattern and fixes two correctness issues:

- **Endpoint:** switch the models suffix from `api/ps` (only
currently-running models) to `api/tags` (all installed models) — the
latter is what a model picker should show.
- **Parsing:** Ollama returns `{"models":[{"name","model"}]}`, a
non-OpenAI shape. Decode it into a typed struct, map the names into
`ModelList`, then enrich through `ParseListModel`. This removes the
previous unchecked type assertions (`result["models"].([]interface{})` /
`.(map[string]interface{})` / `.(string)`) that **panicked** when the
body was missing the `models` array or any field, and adds a fallback to
the `model` field when `name` is blank.
- Drops the no-op GET request body and a dead base-URL reassignment.

#### Drive-by fix
Shared gitee_test.go `DSModelList` -> `ModelList` compile fix (renamed
in #15900) so the models test package builds; auto-resolves against the
sibling #15853 PRs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-06-17 18:47:27 +08:00
balibabu
70f319c536 Fix: The pipeline created from the template fails immediately upon execution. (#16149)
### What problem does this PR solve?

Fix: The pipeline created from the template fails immediately upon
execution.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 17:03:17 +08:00
chanx
9302233b95 fix: misc frontend fixes for agent log, login, search settings (#16137)
### What problem does this PR solve?

fix: misc frontend fixes for agent log, login, search settings
- agent-log: restore server-side pagination on export and search;
replace hardcoded labels with i18n keys; switch container to
text-text-primary
- login: validate register nickname against NICKNAME_PATTERN with
reusable setting i18n
- next-search: align llm_setting schema with chat (LlmSettingFieldSchema
+ LLMIdFormField nested, LlmSettingEnabledSchema at form
root) so the slider Switch reads the correct path; strip *Enabled flags
before submit to avoid backend "Unrecognized field name"
  errors
  - locales: add common.reset (zh/en)
  - skills/go-naming: fix relative link to rules/named.md

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 16:20:26 +08:00
balibabu
3247e353c7 Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. (#16134)
### What problem does this PR solve?

Fix: The .docx file is not displaying fully; the hierarchy of the
pipeline created from the template is missing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 16:18:47 +08:00
Wang Qi
fcb4f78d97 Dev: add go starter (#16138)
Dev: add go starter
2026-06-17 16:09:53 +08:00
Wang Qi
e08bcd4d0d Update doc rerank_id from int to string (#16142)
Update doc rerank_id from int to string
2026-06-17 16:09:33 +08:00
buua436
be869f5d96 fix: chat channel runtime (#16129)
### What problem does this PR solve?
Fix chat channel message routing to use the connected `chat_id`, and
make the Feishu websocket client bind to the thread-local event loop.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 15:52:13 +08:00
Idriss Sbaaoui
44164e18d8 Enhancement: optimize ci (#16130)
### What problem does this PR solve?

optimize ci by fixing flaky clean-ups and rendundant tasks

### Type of change

- [x] Performance Improvement
2026-06-17 15:16:11 +08:00
Wang Qi
b3ac03b96c Set default Paddle OCR URL (#16128)
Set default Paddle OCR URL
2026-06-17 14:29:20 +08:00
buua436
486b28c409 fix: show telegram chat channel (#16125)
### What problem does this PR solve?
Show Telegram in the chat channel picker alongside the existing Discord
and Feishu entries.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 14:18:16 +08:00
buua436
78b4906f7a fix: tighten embedding truncation threshold (#16123)
### What problem does this PR solve?
Use a 95% max_length threshold before truncating embedding inputs, which
reduces the chance of provider-side invalid-parameter errors on
near-limit chunks.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 14:18:02 +08:00
Zhichang Yu
e45659868a feat(agent): ship the Go agent canvas port — eino interrupt/resume + Redis check-pointing (#16035)
Replaces the Python agent canvas runtime with a Go implementation that
runs inside `cmd/server_main`.

The canvas compiles into an eino Workflow that pauses on wait-for-user
via native Interrupt/Resume (no sentinel flag) and resumes from a
Redis-backed CheckPointStore.

All 21 Python agent components and ~35 tools are ported with functional
parity.

Sandbox providers now read their JSON config from the admin-panel
system_settings table with env fallback.

234 files / +35,413 / -6,111. All Go files are gofmt-clean (CI gate
added); drops the v2 DSL E2E step and the gap-analysis plan (both
redundant after the port ships).

## Type of change

- [x] Refactoring
- [x] New feature
- [x] Bug fix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-17 13:24:03 +08:00
Wang Qi
2290bb0023 Fix MinerU table option sanitization (#16118)
Follow on issue: #14831 and PR: #14920 to fix the table options, with
table recognition enabled, do not sanitize html tags.
2026-06-17 13:06:07 +08:00
euvre
9bd53ce675 fix: return full record in get_ingestion_log (#16120)
### What problem does this PR solve?

The `get_ingestion_log` endpoint (both Python
`dataset_api_service.get_ingestion_log` and Go
`DatasetService.GetIngestionLog`) was returning only the
**dataset-level** field set, which omits critical fields such as `dsl`,
`document_id`, `parser_id`, `document_name`, `pipeline_id`, etc.

This caused the front-end **dataflow-result page** to be unable to
render the pipeline timeline and chunks when viewing a single ingestion
log, regardless of whether the log was a dataset-level operation
(graph/raptor/mindmap) or a per-file parse.

### Background

`PipelineOperationLogService` provides two field sets:

| Method | Fields |
|---|---|
| `get_dataset_logs_fields` | Minimal set (progress, status, timestamps,
etc.) |
| `get_file_logs_fields` | Superset — includes `document_id`, `dsl`,
`parser_id`, `document_name`, `pipeline_id`, … |

When listing logs, the API correctly distinguishes dataset-level vs
file-level logs and uses the appropriate converter. However, when
**fetching a single log by ID**, both the Python and Go implementations
were hardcoded to the dataset-level set, dropping the extra fields that
the front-end needs.
2026-06-17 13:03:51 +08:00
Hunnyboy1217
fd196f694e feat(go-models): harden ListModels for FishAudio (#15853) (#15957)
### What problem does this PR solve?

Part of #15853 (provider model-list refactor). Final two providers.

- **voyage:** Voyage AI exposes no live model-list endpoint — its public
API only has `/v1/embeddings` and `/v1/rerank` — so the previous
`ListModels` was a `no such method` stub. Replace it with a
static-catalog listing sourced from the loaded provider definition,
carrying each model's `max_tokens`, `model_types`, and embedding
`dimensions`. `list models from voyage` now returns the 13-model catalog
instead of erroring.
- **fishaudio:** route the existing `/model` voice listing through the
shared `ParseListModel` helper for consistency; keep the human-readable
`title` as the model name and fall back to `_id` when a title is blank.

#### Drive-by fix
Shared gitee_test.go `DSModelList` -> `ModelList` compile fix (renamed
in #15900); auto-resolves against the sibling #15853 PRs.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

Co-authored-by: Haruko386 <tryeverypossible@163.com>
2026-06-17 11:56:20 +08:00
writinwaters
0aaba0033f Docs: Updated Converse with chat assistant (#16117)
### What problem does this PR solve?

Miscellaneous editorial updates to the API reference.

### Type of change


- [x] Documentation Update
2026-06-17 11:50:14 +08:00
Wang Qi
02ccd35241 Fix RAGFlow cannot start (#16116)
# Summary
- The culprit is commit b4c8711d5 / PR #15415 (fix: upgrade crawl4ai to
0.8.0).
- That upgrade brought in unclecode-litellm, which installs the same
top-level litellm namespace as upstream litellm.
- The crash happens when files from one LiteLLM distribution are mixed
with files from the other: custom_guardrail.py expects
GuardrailTracingDetail, but types/utils.py can come from the older
conflicting package.
2026-06-17 11:27:31 +08:00
Hz_
b48f03d0f5 feat(go/dao): migrate chat channel database entity and DAO to Go (#16055)
## Changes
1. **Entity (`internal/entity/chat_channel.go`)**:
- Implemented `ChatChannel` struct mapping the `chat_channel` database
table.
- Declared `ChatChannelListResponse` as a DTO to filter out sensitive
credentials (`config` field) and fetch the associated `dialog_name` via
left join.
2. **GORM Migration (`internal/dao/database.go`)**:
- Registered `&entity.ChatChannel{}` in the `dataModels` array inside
`InitDB()` to enable safe GORM schema synchronization.
3. **DAO (`internal/dao/chat_channel.go`)**:
- Implemented `ChatChannelDAO` wrapping GORM CRUD methods (`Create`,
`GetByID`, `UpdateByID`, `DeleteByID`).
- Implemented `ListByTenantID` performing a `LEFT JOIN` on the `dialog`
table to retrieve `dialog_name` while excluding `config` values to avoid
credential leaks.
4. **Test (`internal/dao/chat_channel_test.go`)**:
- Added integration unit tests testing the full CRUD lifecycle and GORM
left-join mapping list querying.
2026-06-17 11:26:13 +08:00
balibabu
5de00bdf50 Fix: Importing the MCP dialog causes duplicate submissions. (#16037)
### What problem does this PR solve?

Fix: Importing the MCP dialog causes duplicate submissions.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 09:49:51 +08:00
euvre
fe46244d30 fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#16106)
The parser pods suffer from OOM kills when processing large PDF
documents. The root cause is in api/db/services/task_service.py: when
layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to
MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be
processed as a single task with all pages loaded into memory
simultaneously.

This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the
same way DeepDOC already does.
2026-06-17 09:33:53 +08:00
Jin Hai
6865039a22 Go: add more start server parameters (#16093)
### What problem does this PR solve?

```
$ ./bin/ragflow_server --version 
RAGFlow version: v0.26.0-65-g549f6109c

$ ./bin/ragflow_server --debug # start server with debug log level

$ ./bin/admin_server --version 
RAGFlow version: v0.26.0-65-g549f6109c

$ ./bin/admin_server --debug # start server with debug log level

$ ./bin/admin_server --init-superuser # init default superuser

$ ./bin/ingestor --version
RAGFlow version: v0.26.0-68-g6f6c39706

$ ./bin/ingestor --debug
```


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-16 20:27:37 +08:00
Wang Qi
17e3aad7ae Revert "fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM" (#16104)
Reverts infiniflow/ragflow#15951
2026-06-16 20:11:45 +08:00
buua436
1e4796da9d Docs: update chat completions docs (#16100)
### What problem does this PR solve?
Syncs the /api/v1/chat/completions docs with the current behavior,
including the new legacy streaming mode.
### Type of change
- [x]  Documentation Update
2026-06-16 20:08:23 +08:00
dependabot[bot]
b732636546 build(deps): bump aiohttp from 3.13.3 to 3.14.1 (#16090) 2026-06-16 20:07:32 +08:00
euvre
d2a18d5c46 fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#15951)
### What problem does this PR solve?

The parser pods suffer from OOM kills when processing large PDF
documents. The root cause is in api/db/services/task_service.py: when
layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to
MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be
processed as a single task with all pages loaded into memory
simultaneously.

This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the
same way DeepDOC already does.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2026-06-16 20:07:19 +08:00
Rander
62698725ca feat(paddleocr): add image parsing support with async Job API (#16086)
## Summary

Add image parsing capability to PaddleOCR integration, building on top
of #15967 (async Job API migration).

## Changes

### `deepdoc/parser/paddleocr_parser.py`
- Add `parse_image()` method that uses the same async Job API flow as
`parse_pdf()`
- Extracts text from `layoutParsingResults` → `prunedResult` →
`parsing_res_list`
- Returns concatenated block content as a single string

### `rag/llm/ocr_model.py`
- Add `parse_image()` wrapper to `PaddleOCROcrModel` with availability
check and logging

## Relationship to other PRs

- **Depends on**: #15967 (async Job API migration) — this PR is based on
that branch
- **Replaces**: #14826 (original image processing PR based on old sync
API)

## Notes

This PR uses `base_url` and the async Job API (submit → poll → fetch)
consistent with #15967, rather than the old `api_url` + sync POST
pattern from #14826.
2026-06-16 19:34:38 +08:00
Rander
1235da7093 refactor(paddleocr): migrate from sync API to async Job API (#15967)
## Summary

Migrate PaddleOCR integration from the deprecated synchronous HTTP API
to the new asynchronous Job API (`submit → poll → fetch`), aligning with
PaddleOCR 3.6.0+ architecture.

## Changes

### Python (`deepdoc/parser/paddleocr_parser.py`)
- Replace synchronous `requests.post()` with async Job API flow (submit
→ poll → fetch)
- Authentication: `token {token}` → `Bearer {token}`
- File transfer: base64 JSON body → multipart file upload
- Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout
controlled by `request_timeout`)
- Result: fetch full JSONL from result URL, preserving `prunedResult`
with bbox info for crop functionality
- Rename `api_url` → `base_url` (backward compatible: `api_url` still
accepted as fallback)

### Python (`rag/llm/ocr_model.py`)
- Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to
`paddleocr_api_url` / `PADDLEOCR_API_URL`

### Go (`internal/entity/models/paddleocr.go`)
- Add `Client-Platform: ragflow` header to submit and poll requests
- Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5,
max 15s)

### Python (`common/constants.py`)
- Add `PADDLEOCR_BASE_URL` to env keys and default config

## Backward Compatibility

- Old env var `PADDLEOCR_API_URL` still works (used as fallback)
- Frontend field `paddleocr_api_url` still works (backend reads it as
fallback)
- No user-facing configuration changes required for existing setups

## Why not use the `paddleocr` SDK package directly?

RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing
`block_bbox`, `block_label`, `parsing_res_list`) from the raw API
response for PDF crop functionality. The SDK's public `parse_document()`
API only returns `DocParsingResult` with `markdown_text`, discarding the
bbox data. Therefore we implement the async Job API flow directly via
HTTP, following the same logic as the SDK internally.
2026-06-16 19:34:21 +08:00
Jin Hai
3d8bc76e27 Go refactor: merge similar functions (#16098)
### What problem does this PR solve?

Merge password related functions

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-16 19:26:42 +08:00