## Summary
- Replace `gen_conf={}` with `gen_conf=None` + guard in
`rag/llm/chat_model.py` (12 instances across Base, BaiChuanChat,
LocalLLM, MistralChat, ReplicateChat, BaiduYiyanChat, GoogleChat
classes)
- Replace `doc_ids=[]` with `doc_ids=None` + guard in
`api/db/services/document_service.py` (1 instance)
- Mutable default arguments are shared across all calls, causing
potential cross-request state contamination
- See Python docs:
https://docs.python.org/3/faq/programming.html#why-are-default-values-shared-between-objects
## Test plan
- [x] Verify LLM calls work with and without explicit gen_conf
- [x] No behavior change for existing callers — `None` is replaced with
`{}` at function entry
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Closes#14853
The `/agents/download` and `/agents/<agent_id>/upload` endpoints in the
agent API are missing `@login_required` and `@add_tenant_id_to_kwargs`
decorators, allowing unauthenticated access. This is a security issue —
any user can upload files to or download files from an agent without
being logged in. Additionally, the upload endpoint bypasses canvas
access control (`@_require_canvas_access_async`).
This PR adds the missing authentication and authorization decorators to
both endpoints and replaces the manual `user_id` / `created_by` lookups
with the `tenant_id` provided by the auth middleware, making these
endpoints consistent with the rest of the agent API.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
In `api/utils/web_utils.py`, `__get_pdf_from_html()` creates a Chrome
WebDriver but only calls `driver.quit()` inside the `TimeoutException`
handler. If the page element becomes stale before the timeout (no
exception raised), the WebDriver is never quit, leaking the Chrome
browser process and returning `None`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Changes
- Move the PDF printing logic and `driver.quit()` outside the `except`
block so they execute on all code paths
- Use `try/finally` to ensure `driver.quit()` is always called, even if
the `Page.printToPDF` DevTools call fails
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
## Summary
`Graph.set_variable_param_value()` in `agent/canvas.py` has a bug in its
nested path traversal logic. The `for` loop iterates through **all**
keys in the path (including the last one), descending into every level.
After the loop, it then tries to set `cur[keys[-1]] = value`, but `cur`
has already descended one level too deep.
**Example:** For `path = "a.b"`, `value = "hello"`:
- **Before (bug):** `obj["a"]["b"]` becomes `{"b": "hello"}` instead of
`"hello"`
- **After (fix):** `obj["a"]["b"]` becomes `"hello"` as expected
The fix changes `for key in keys:` to `for key in keys[:-1]:`, so the
loop only navigates to the parent dict, and the final key is set
directly. This is consistent with how the read-side counterpart
`get_variable_param_value()` works.
This method is called by `set_variable_value()` when assigning to nested
variable paths (e.g., `component@root.nested.key`), which is used by the
`VariableAssigner` component.
## Test plan
- [ ] Create a canvas with a VariableAssigner that writes to a nested
path (e.g., `component@obj.nested.key`)
- [ ] Verify the value is set correctly at the expected path, not
wrapped in an extra dict layer
- [ ] Verify single-key paths (e.g., `component@key`) still work
correctly
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed a bug in variable parameter assignment where nested structures
were being incorrectly modified, ensuring values are now properly set at
their intended locations without unintended overwrites.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### What problem does this PR solve?
This PR implement TTS for MiniMax provider and CLI testing for TTS
**The following functionalities are now supported:**
**MiniMax:**
- [x] Chat / Stream Chat
- [x] Embedding
- [x] Rerank
- [x] Model listing
- [x] Provider connection checking
- [x] Text To Speech
- [ ] OCRFile
- [ ] ~~Audio To Text~~
- [ ] ~~Balance~~
**Verified examples from the CLI:**
```plaintext
RAGFlow(user)> tts with 'speech-2.8-hd@test@minimax' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"voice_setting": {"voice_id": "English_radiant_girl", "speed": 1, "vol": 1, "pitch": 0}, "audio_setting": {"sample_rate": 32000, "bitrate": 128000, "format": "wav", "channel": 1}, "output_format": "hex"}'
Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/speech-2.8-hd_output.wav
SUCCESS
RAGFlow(user)> stream tts with 'speech-2.8-hd@test@minimax' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"voice_setting": {"voice_id": "English_radiant_girl", "speed": 1, "vol": 1, "pitch": 0}, "audio_setting": {"sample_rate": 32000, "bitrate": 128000, "format": "wav", "channel": 1}, "output_format": "hex"}'
Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/speech-2.8-hd_output.wav
SUCCESS
```
Set `Play` to play audio in CLI
Set `Save` `PATH_TO_SAVE` to save file
Set `format` to save file in wav or mp3
Set `Param` align with official request body
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Closes#9837
The Langfuse integration currently only sends the output text to
`langfuse_generation.update()` without including token usage
information. This means Langfuse cannot track input/output token
consumption for cost analysis and monitoring.
### Solution
Add the `usage` parameter to `langfuse_generation.update()` with:
- `input`: approximate input token count from `message_fit_in()`
- `output`: approximate output token count from
`num_tokens_from_string(answer)`
- `total`: sum of input and output
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
\`assert \"string\"\` always passes in Python because non-empty strings
are truthy. This silently skips input validation:
- **variable_assigner.py line 51**: \`assert \"Variable is not
complete.\"\` → \`raise ValueError(\"Variable is not complete.\")\`
- **loop.py line 59**: \`assert \"Loop Variable is not complete.\"\` →
\`raise ValueError(\"Loop Variable is not complete.\")\`
Without this fix, incomplete variables pass validation silently and
cause a confusing KeyError on the next line.
## Description
This PR fixes critical bugs and improves the robustness of the RAG
reranking module while maintaining **100% backward compatibility** with
all existing functionality and providers.
## Key Changes
1. **Network Stability**: Added 30s timeout to all API requests to
prevent service blocking
2. **Boundary Protection**: Added empty query/text validation for all
rerank models
3. **Response Fault Tolerance**: Replaced hardcoded key access with
`.get()` to avoid KeyError crashes
4. **Bug Fixes**:
- Fixed `Ai302Rerank` (completely non-functional before)
- Fixed `GPUStackRerank` incorrect exception catching
- Fixed `_normalize_rank` empty array crash
5. **Code Specification**: Added type annotations, standardized
unimplemented class prompts
## Compatibility
- ✅ No changes to any class/method names
- ✅ All rerank providers (Jina/Cohere/NVIDIA/HuggingFace etc.) work as
before
- ✅ No breaking changes, zero impact on existing workflows
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
- Update version tags in README files (including translations) from
v0.25.3 to v0.25.4
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Includes gpt-5.4-mini and gpt-5.4-nano to the OpenAI model list
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add a Go driver for Voyage AI (https://voyageai.com), one of the
unchecked providers on the umbrella tracking issue #14736. Voyage AI is
**embed + rerank only** — no chat, no streaming, no `/v1/models`
endpoint. It's the first provider in the Go layer of this shape.
Until this PR, a tenant who configured `voyage` as a model provider in
the Go layer fell through to the default branch of
`internal/entity/models/factory.go` and got the dummy driver.
### What this PR includes
- New `internal/entity/models/voyage.go` with a `VoyageModel`
implementing the `ModelDriver` interface.
- New `conf/models/voyage.json` with 6 embedding models (`voyage-3.5`,
`voyage-3.5-lite`, `voyage-3-large`, `voyage-code-3`, `voyage-law-2`,
`voyage-finance-2`) and 2 rerank models (`rerank-2`, `rerank-2-lite`).
- `factory.go`: route `"voyage"` to `NewVoyageModel`.
- `internal/entity/models/voyage_test.go`: 19 unit tests.
### How the driver works
- **Embed**: `POST /v1/embeddings`. Response is OpenAI-shaped (`{data:
[{embedding, index, object, text}], model, usage}`). Driver reorders by
`index`, rejects duplicate / out-of-range / missing slots, and
short-circuits empty input without an HTTP call.
- **Rerank**: `POST /v1/rerank`. Voyage uses **`top_k`** as the request
param name (not `top_n` like Aliyun/SiliconFlow); the driver translates
`RerankConfig.TopN` → `top_k`. Response is Cohere-shaped (`{data:
[{relevance_score, index}], model}`), so the existing
`RerankResponse{Data: []RerankResult{Index, RelevanceScore}}` shape fits
cleanly.
- **`ListModels`**: returns a hardcoded list of `voyageKnownModels`.
Voyage does **not** expose `/v1/models` (probed live, returns 404), so
the driver synthesizes the list from the same set the config ships. New
upstream models are added by extending one slice.
- **`CheckConnection`**: pings a 1-input embed call against
`voyage-3.5`. Without `/v1/models`, this is the cheapest way to verify
the API key + network path before a tenant tries a real workload.
- **`ChatWithMessages` / `ChatStreamlyWithSender` / `Balance` /
`TranscribeAudio` / `AudioSpeech` / `OCRFile`**: all return `"no such
method"`. Voyage does not host any of these surfaces.
No interface change. No new dependencies.
### How was this tested?
**19 unit tests** in `internal/entity/models/voyage_test.go` — all pass
on go 1.25:
```
$ go test -vet=off -run TestVoyage -count=1 ./internal/entity/models/...
ok ragflow/internal/entity/models 0.036s
```
Coverage: Name; Embed (happy path, reorder, empty-input, missing
key/model, duplicate index, out-of-range index, missing slot); Rerank
(happy path with `top_k` assertion, default-to-len-documents, empty
documents, out-of-range index); ListModels (static list, missing key);
CheckConnection (happy, 401); chat methods sentinels; Balance sentinel;
audio/OCR sentinels.
`go build ./internal/entity/models/...` exits 0.
**Live integration test** against `api.voyageai.com`:
```
=== RUN TestVoyageLiveSmoke
[OK] Name() = "voyage"
[OK] ListModels (static): 8 models -> [voyage-3.5 voyage-3.5-lite voyage-3-large voyage-code-3 voyage-law-2 voyage-finance-2 rerank-2 rerank-2-lite]
[OK] CheckConnection
[OK] Embed vectors=3 dim=1024 indices=[0 1 2]
[OK] Embed(empty) -> 0 vectors
[OK] Rerank results=3 scores=[0.8125 0.59765625 0.39453125]
[OK] ChatWithMessages returns voyage, no such method
[OK] Balance returns voyage, no such method
VOYAGE LIVE SMOKE PASSED
--- PASS: TestVoyageLiveSmoke (0.81s)
```
What the live run proves on the wire:
- Auth (`Bearer <key>`) accepted by `api.voyageai.com`.
- Embed `voyage-3.5` on 3 inputs returns 3 vectors at dim 1024 with
`index` field preserved as `[0, 1, 2]` — the reorder-by-index code is
exercised on real data.
- Empty input short-circuits without an HTTP call (mock server would
have been hit if it did).
- Rerank `rerank-2` on 3 docs returns 3 real `relevance_score` floats
`[0.8125, 0.598, 0.395]`. The `top_k` translation works on the live
wire.
- All sentinel methods return the documented `"no such method"` strings.
### Note on PR history
This branch was previously named for LocalAI Embed work which is now
consolidated into PR #14813. The branch was reset to `upstream/main` and
rebuilt for Voyage. Diff against `main` is a clean +838 lines across 4
files.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Tracking: #14736
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
Closes#14774.
Adds free-form tags on agents (UserCanvas) with full UI + API:
- Stored as comma-separated `tags` column on `UserCanvas` with online
migration.
- New endpoints: `GET /v1/agents/tags` (aggregate counts) and `PUT
/v1/agent/<id>/tags` (write). `GET /v1/agents` accepts a `tags=` query.
- "Edit tags" item in agent dropdown opens a chip-style editor dialog;
tags render as badges on each agent card.
- New "Tags" facet in the agents filter bar, with counts.
## Implementation notes
- **Tag matching is exact-token**: the SQL filter wraps stored tags as
`,…,` and matches `,ml,` so `ml` doesn't match `ml-ops`.
- **Server-side normalization** in `UserCanvasService.update_tags`:
dedup (case-insensitive), per-tag cap of 64 chars, total length capped
at 512 chars to fit the column, commas inside tag values are replaced
with spaces.
- **Tenant authorization**: `PUT /v1/agent/<id>/tags` gates on
`UserCanvasService.accessible(canvas_id, tenant_id)`.
- **Tag listing scope**: `UserCanvasService.list_tags` follows the same
own + team-shared rule as `get_by_tenant_ids`.
- **i18n**: keys added to `en.ts` and `zh.ts` only (per project
convention; other locales fall back).
- **`HomeCard`** gets a non-breaking `extra?: ReactNode` slot for the
chip row; no `src/components/ui/` files modified.
## Test plan
- [ ] Backend boot runs `migrate_db` → confirm `user_canvas.tags` column
exists (`DESCRIBE user_canvas`).
- [ ] Agents page renders cards normally (no console error from missing
field).
- [ ] `⋯ → Edit tags` opens a dialog that stays open (regression: dialog
was unmounting with the dropdown).
- [ ] Typing a tag without pressing Enter and clicking Save persists it
(regression: last typed tag was being dropped).
- [ ] Chip input supports Enter/comma to commit, Backspace on empty to
remove, `×` to remove individual chip.
- [ ] Tag containing a comma sent via API is stored with the comma
replaced by a space.
- [ ] 20 long tags sent via API does not error (length cap silently
truncates).
- [ ] "Tags" filter in the filter bar shows counts and narrows the list.
- [ ] Filtering by `ml` does **not** return agents tagged `ml-ops`.
- [ ] UI in Chinese shows 编辑标签 / 添加标签以整理和筛选你的智能体 etc.
- [ ] `PUT /v1/agent/<other-tenant-id>/tags` returns `Agent not found or
no permission.`
Introduce comprehensive floating widget customization: add new widget
settings (title, subtitle, footer, colors, mute, streaming) with types
and defaults, and expose them via EmbedDialog UI (split into Embed Setup
and Widget Customization tabs). Persist and load settings through Agent
page by reading/writing globals and wiring an onSaveWidgetSettings
handler to setAgent; show a loading ButtonLoading for saving. Update
embed iframe query params and FloatingChatWidget to honor URL params
(colors, text, mute/streaming) with validation/normalization, color
darkening for gradients, footer link normalization, and improved
styling. Also add copy-to-clipboard in message toolbar, adjust syntax
highlighter layout and Copy button, and add i18n key for muteWidget.
### What problem does this PR solve?
Adds a few fields to the embed widget modal to customize the appearance
of the floating widget when embedded into a page.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Noah <Noah.Thompson@ecn.forces.gc.ca>
## Summary
- Convert `pdfplumber.open()` to use `with` context manager in
`api/utils/file_utils.py` (`thumbnail_img` function)
- If any exception occurs between `open()` and `close()`, the PDF file
handle leaks
- The rest of the codebase (e.g. `read_potential_broken_pdf` in the same
file) already uses `with pdfplumber.open(...)` correctly
## Test plan
- [x] PDF thumbnail generation works correctly with context manager
- [x] Resources properly cleaned up on exceptions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
# feat: Add Generic REST API Connector
## What problem does this PR solve?
RAGFlow supports many specific data source connectors (MySQL, Slack,
Google Drive, etc.), but there was no way to connect an arbitrary REST
API as a data source. Users with custom or third-party APIs had to write
a new connector class for each one.
This PR adds a **generic, configuration-driven REST API connector** that
lets users connect any REST API as a data source entirely through the UI
— no code changes needed per API.
---
## Features
### Core Connector (`common/data_source/rest_api_connector.py`)
- Implements `LoadConnector` and `PollConnector` interfaces for full and
incremental sync
- **Configurable authentication:** None, API Key (custom header), Bearer
Token, Basic Auth
- **Pluggable pagination:** Page-based, Offset-based, Cursor-based, or
None
- Smart page-size inference from user's query parameters to avoid
duplicate/conflicting params
- Configurable request delay between pages to prevent API rate limiting
- Auto-detection of the items array in JSON responses (`items`,
`results`, `data`, `records`, or first list found)
- **Advanced field mapping** with dot-notation (`country.name`), array
wildcards (`newsType[*].name`), type hints, and default values
- Optional content template rendering (`"Title: {title}\nBody: {body}"`)
- HTML stripping for content fields
- Stable document IDs via `hash128` from a configurable ID field or
auto-generated from item content
- Pydantic configuration schema with automatic coercion of UI string
inputs to dicts/lists
### Backend Registration (`rag/svr/sync_data_source.py`,
`common/constants.py`, `common/data_source/config.py`)
- `REST_API` sync class wired into RAGFlow's `func_factory`
- Full sync (`load_from_state`) and incremental polling (`poll_source`)
support
- Credentials and config passed from task to connector following
existing patterns (MySQL, SeaFile, etc.)
### Test Connection Endpoint (`api/apps/connector_app.py`)
- `POST /v1/connector/<id>/test` validates config schema,
authentication, and API connectivity without triggering a sync
- Clear error messages for auth failures vs. config issues
### Frontend UI (`web/src/pages/user-setting/data-source/constant/`)
- **Postman-style configuration:** Base URL, Query Parameters (key=value
per line), Auth, Content Fields, Metadata Fields, Pagination Type
- Auth-type-aware form: fields for API key header/value, Bearer token,
or Basic username/password appear only when relevant
- **Advanced Settings** toggle for: Custom Headers, Max Pages, Request
Delay, Poll Timestamp Field, Request Body (POST)
- Connector icon (SVG) and i18n strings (English)
- **"Test Connection"** button to validate before syncing
---
## Controls & Safety
- Configurable max pages safety cap (default: 1000, adjustable in UI)
- Configurable request delay between pages (default: 0.5s, adjustable in
UI)
- Auth errors (401/403) fail immediately without retries; transient
errors retry with exponential backoff
- Diagnostic logging: auth setup confirmation, request details on
failure, content field extraction status
---
## Type of change
- [x] New Feature (non-breaking change which adds functionality)
##Visual Screenshots of Features
<img width="482" height="510" alt="Screenshot 2026-03-11 at 5 19 52 PM"
src="https://github.com/user-attachments/assets/dcb7ab4a-1622-44f3-bb02-d6f0527314c4"
/>
(Connector can be configured within the external data sources tab)
Configuration Parameters:
<img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 46 PM"
src="https://github.com/user-attachments/assets/5e154e71-4ab5-4872-bfb2-04f02b73c18a"
/>
<img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 54 PM"
src="https://github.com/user-attachments/assets/00cb14b7-0bcf-4b94-9d71-34e93369ecb2"
/>
Connection can be tested before attaching to dataset:
<img width="981" height="681" alt="Screenshot 2026-03-11 at 5 21 40 PM"
src="https://github.com/user-attachments/assets/aaa6eeeb-89a7-4349-bc34-2423bf8be9ee"
/>
Ingestion tested with API connector (works perfectly fine):
<img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 22 30 PM"
src="https://github.com/user-attachments/assets/afcd0d58-cadd-4152-badc-d2f14d96fbec"
/>
Search & Retrieval works as well with metadata flow:
<img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 23 05 PM"
src="https://github.com/user-attachments/assets/d41ee935-dcf7-4456-b317-22a76ca032c0"
/>
---------
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### Related issues
Closes#14781
### What problem does this PR solve?
Some retrieval endpoints accepted caller-supplied `tenant_rerank_id` and
resolved it through `get_model_config_by_id(...)`. That helper loaded
`TenantLLM` rows by global database id and returned decoded model
configuration without checking whether the model belonged to the
authenticated tenant or the dataset owner tenant.
This meant dataset access was validated, but rerank-model selection was
not. A caller who knew or could guess another tenant's
`tenant_rerank_id` could attempt retrieval with a foreign rerank model
config, creating a cross-tenant authorization gap for model usage.
This PR closes that gap by making `tenant_rerank_id` resolution
tenant-aware across the retrieval paths that accept it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Solution
- Extend `get_model_config_by_id(...)` to accept an optional
`allowed_tenant_ids` set and reject `TenantLLM` rows whose `tenant_id`
is outside that set.
- Pass the allowed tenant scope from retrieval endpoints that accept
`tenant_rerank_id`:
- `api/apps/sdk/doc.py`
- `api/apps/sdk/session.py`
- `api/apps/services/dataset_api_service.py`
- Use the authenticated tenant plus dataset-owner tenant ids already
derived by each retrieval flow as the authorization boundary for rerank
model selection.
- Add focused unit coverage to assert unauthorized `tenant_rerank_id`
values are rejected and that the allowed tenant set is propagated
correctly.
### Testing
- `python -m py_compile` on:
- `api/db/joint_services/tenant_model_service.py`
- `api/apps/services/dataset_api_service.py`
- `api/apps/sdk/doc.py`
- `api/apps/sdk/session.py`
- Added unit tests in:
-
`test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py`
-
`test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
### Notes for reviewers
- This change is intentionally narrow: it affects only the
`tenant_rerank_id` path, not the normal `rerank_id` name-based
resolution path.
- Local lint/syntax checks passed.
- Full pytest execution could not be completed in this environment
because the local test runtime is missing `strenum`, so the route-test
files fail during collection before exercising the updated cases.
---------
Co-authored-by: jony376 <jony376@gmail.com>
### What problem does this PR solve?
A draft 0.25.3 release note.
### Type of change
- [x] Documentation Update
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
The LocalAI Go driver landed in #14809 and Embed landed in #14811.
`Rerank` was left as a stub that returns `"not implemented"`. This PR
fills the gap.
LocalAI exposes a public rerank endpoint at `<tenant-url>/v1/rerank`
with a Cohere-shaped request and response (`{model, query, documents,
top_n}` → `{results: [{index, relevance_score}]}`). The Python side has
had `LocalAIRerank` in `rag/llm/rerank_model.py` for a long time. Until
this PR, a tenant who wanted to use LocalAI for reranking in the Go
layer got `"not implemented"`.
### What this PR includes
- `conf/models/localai.json`: add `"rerank": "rerank"` under
`url_suffix` so the driver can build the URL from config. This matches
the `URLSuffix.Rerank` field already used by aliyun and siliconflow.
- `internal/entity/models/localai.go`: replace the `Rerank` stub with a
real implementation that POSTs to `/v1/rerank`. Adds local
request/response types `localAIRerankRequest` and
`localAIRerankResponse`.
No factory change. No interface change.
### How the implementation works
- Validate the model name and resolve the tenant-supplied base URL with
the existing `resolveBaseURL` helper.
- Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so
the call has a clear deadline. Same pattern `ChatWithMessages`,
`ListModels`, and `Embed` already use in this file.
- Only set the `Authorization` header when a non-empty API key was
supplied. LocalAI accepts an empty key by default, so this preserves the
optional-auth contract.
- Default `top_n` to `len(documents)` when `rerankConfig.TopN == 0`,
matching the existing Aliyun and SiliconFlow rerank implementations.
- Validate every `results[].index` against `len(documents)`. If the
upstream returns an out-of-range index, fail clearly instead of silently
writing past the slice.
- An empty `documents` slice returns `&RerankResponse{}` with no HTTP
call.
- Non-200 responses propagate the upstream status line and body.
### Note on stacking
This PR builds on #14809 (LocalAI driver) and #14811 (LocalAI Embed).
Until both merge, this PR's diff on GitHub will include all three
commits. After #14809 and #14811 land on `main`, GitHub will auto-reduce
this PR to only the `Rerank` changes (one commit, ~99 line diff in
`localai.go` plus 1 line in `localai.json`).
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### How was this tested?
- `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the
`go.mod` minimum).
- The full method set on `LocalAIModel` still matches the `ModelDriver`
interface.
- Pattern parity with the existing Aliyun Rerank
(`internal/entity/models/aliyun.go`) and SiliconFlow Rerank
(`internal/entity/models/siliconflow.go`) implementations.
Closes#14812
Depends on #14809, #14811
Tracking: #14736
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
- Update version tags in README files (including translations) from
v0.25.2 to v0.25.3
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
RAGFlow(user)> ocr with 'hunyuanocr@test@gitee' file './picture.png'
+----------------------------------------------------------+
| text |
+----------------------------------------------------------+
| 生活不是等待风暴过去,而是学会在雨中翩翩起舞。
——佚名 |
+----------------------------------------------------------+
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes#14884
The ZhipuAI Go driver in `internal/entity/models/zhipu-ai.go` had a stub
`ListModels` method that always returned `"zhipu-ai, no such method"`.
The DeepSeek, Gitee, NVIDIA, OpenAI, SiliconFlow, and OpenRouter drivers
in the same package already implement `ListModels` against the
OpenAI-compatible `/models` endpoint, and the model picker UI relies on
it. This PR brings ZhipuAI in line with that pattern.
### Changes
- `internal/entity/models/zhipu-ai.go`: implement
`ZhipuAIModel.ListModels`.
- Resolve region with default fallback.
- GET `${BaseURL[region]}/${URLSuffix.Models}` (resolves to
`https://open.bigmodel.cn/api/paas/v4/models` with the default region).
- Send `Authorization: Bearer <api_key>` when an API key is configured.
Omit the header when the key is empty, so an unauthenticated caller gets
a clear `401` from upstream.
- Surface non-200 responses with the upstream status line and body,
matching the other Go drivers.
- Parse the response via the package-level `DSModelList` / `DSModel`
types already used by DeepSeek, Gitee, and SiliconFlow.
- When the response includes `owned_by`, render the entry as
`id@owned_by`, matching the convention of Gitee and SiliconFlow.
- `conf/models/zhipu-ai.json`: add `"models": "models"` to `url_suffix`.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: Set embedded models during form initialization.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add a Go driver for LongCat (Meituan, https://longcat.chat), one of the
unchecked providers on the umbrella tracking issue #14736. LongCat
exposes an OpenAI-compatible REST API at
`https://api.longcat.chat/openai/v1` with three public chat models
including `LongCat-Flash-Thinking`, a reasoning model that returns
chain-of-thought in `reasoning_content` (OpenAI o-series shape).
Until this PR, a tenant who configured `longcat` as a model provider in
the Go layer fell through to the default branch of
`internal/entity/models/factory.go` and got the dummy driver.
### What this PR includes
- New `internal/entity/models/longcat.go` with a `LongCatModel`
implementing the `ModelDriver` interface.
- New `conf/models/longcat.json` with the 3 public chat models
(Flash-Chat, Flash-Lite, Flash-Thinking) and `url_suffix` for `chat` and
`models`.
- `factory.go`: route `"longcat"` to `NewLongCatModel`.
Method coverage:
- `ChatWithMessages`: `POST /openai/v1/chat/completions`, non-streaming
- `ChatStreamlyWithSender`: SSE stream against the same endpoint
- `ListModels` / `CheckConnection`: `GET /openai/v1/models`
- **Reasoning extraction**: `message.reasoning_content` (non-stream) and
`delta.reasoning_content` (stream) flow into
`ChatResponse.ReasonContent` / the sender's second arg. Matches the
OpenAI o-series convention also used by kimi-k2.6 and DeepSeek-R1.
- **`reasoning_effort` propagation**: `ChatConfig.Effort` → request body
`reasoning_effort` (LongCat-Flash-Thinking honors it; non-reasoning
models ignore it).
- `Embed` / `Rerank` / `Balance` / `TranscribeAudio` / `AudioSpeech` /
`OCRFile` return `"no such method"` (LongCat does not expose any of
these surfaces).
No interface change. No new dependencies.
### How was this tested?
**21 unit tests** in `internal/entity/models/longcat_test.go` — all
pass:
```
$ go test -vet=off -run TestLongCat -count=1 -v ./internal/entity/models/...
=== RUN TestLongCatName
--- PASS: TestLongCatName (0.00s)
=== RUN TestLongCatChatHappyPath
--- PASS: TestLongCatChatHappyPath (0.00s)
=== RUN TestLongCatChatExtractsReasoningContent
--- PASS: TestLongCatChatExtractsReasoningContent (0.00s)
=== RUN TestLongCatChatPropagatesReasoningEffort
--- PASS: TestLongCatChatPropagatesReasoningEffort (0.00s)
=== RUN TestLongCatChatOmitsReasoningEffortWhenUnset
--- PASS: TestLongCatChatOmitsReasoningEffortWhenUnset (0.00s)
=== RUN TestLongCatChatRequiresAPIKey
--- PASS: TestLongCatChatRequiresAPIKey (0.00s)
=== RUN TestLongCatChatRequiresMessages
--- PASS: TestLongCatChatRequiresMessages (0.00s)
=== RUN TestLongCatChatRejectsHTTPError
--- PASS: TestLongCatChatRejectsHTTPError (0.00s)
=== RUN TestLongCatStreamHappyPath
--- PASS: TestLongCatStreamHappyPath (0.00s)
=== RUN TestLongCatStreamExtractsReasoningContent
--- PASS: TestLongCatStreamExtractsReasoningContent (0.00s)
=== RUN TestLongCatStreamRejectsExplicitFalse
--- PASS: TestLongCatStreamRejectsExplicitFalse (0.00s)
=== RUN TestLongCatStreamRequiresSender
--- PASS: TestLongCatStreamRequiresSender (0.00s)
=== RUN TestLongCatStreamFailsWithoutTerminal
--- PASS: TestLongCatStreamFailsWithoutTerminal (0.00s)
=== RUN TestLongCatListModelsHappyPath
--- PASS: TestLongCatListModelsHappyPath (0.00s)
=== RUN TestLongCatListModelsRequiresAPIKey
--- PASS: TestLongCatListModelsRequiresAPIKey (0.00s)
=== RUN TestLongCatCheckConnectionDelegatesToListModels
--- PASS: TestLongCatCheckConnectionDelegatesToListModels (0.00s)
=== RUN TestLongCatEmbedReturnsNoSuchMethod
--- PASS: TestLongCatEmbedReturnsNoSuchMethod (0.00s)
=== RUN TestLongCatRerankReturnsNoSuchMethod
--- PASS: TestLongCatRerankReturnsNoSuchMethod (0.00s)
=== RUN TestLongCatBalanceReturnsNoSuchMethod
--- PASS: TestLongCatBalanceReturnsNoSuchMethod (0.00s)
=== RUN TestLongCatAudioOCRReturnNoSuchMethod
--- PASS: TestLongCatAudioOCRReturnNoSuchMethod (0.00s)
PASS
ok ragflow/internal/entity/models 0.020s
```
`go build ./internal/entity/models/...` exits 0 on go 1.25.
**Live integration test** against `api.longcat.chat`:
```
=== RUN TestLongCatLiveSmoke
[OK] Name() = "longcat"
[OK] CheckConnection
[OK] ListModels: 5 models -> [LongCat-Flash-Lite LongCat-Flash-Chat LongCat-Flash-Thinking-2601 LongCat-Flash-Omni-2603 LongCat-2.0-Preview]
[OK] Chat (Flash-Chat) answer="Got it! Let me know if you" reason=""
[OK] Chat (Flash-Thinking) answer len=443 head="To find 15 % of 80, follow these steps:\n\n1. **Convert the percentage to a frac..."
ReasonContent len=557 head="The user asks: \"15% of 80?\" They want step by step reasoning and final answer in \\boxed{}. So we need to compute 15% of ..."
[OK] Stream content: 78 chunks, 351 chars
[OK] Stream reasoning: 107 chunks, 537 chars
[OK] Balance returns longcat, no such method
[OK] Embed returns longcat, no such method
[OK] Rerank returns longcat, no such method
LONGCAT LIVE SMOKE PASSED
--- PASS: TestLongCatLiveSmoke (31.01s)
```
What the live run proves on the wire:
- Auth header (`Bearer <key>`) is accepted by `api.longcat.chat`.
- `/openai/v1/models` parser handles the real 5-model response (note:
live API returns versioned aliases `LongCat-Flash-Thinking-2601`,
`LongCat-Flash-Omni-2603`, `LongCat-2.0-Preview` plus the un-versioned
`LongCat-Flash-Chat` and `LongCat-Flash-Lite`).
- Non-stream chat against `LongCat-Flash-Chat`: visible answer parses
correctly, `ReasonContent` correctly empty.
- Non-stream chat against `LongCat-Flash-Thinking`: 443-char answer
flows into `Answer`, 557-char chain-of-thought flows into
`ReasonContent` via the new `message.reasoning_content` extraction.
- Streaming chat against `LongCat-Flash-Thinking`: 107 reasoning chunks
(537 chars) reach the sender's second arg via `delta.reasoning_content`;
78 content chunks (351 chars) reach the first arg. Before this code, the
reasoning chunks would have been silently dropped.
- All sentinel methods (Balance, Embed, Rerank, audio/OCR) return the
documented `"no such method"` strings.
### Note on PR history
This branch was previously named for LocalAI work which is now
consolidated into PR #14813. The branch was reset to `upstream/main` and
rebuilt for LongCat. The diff against `main` is a clean +969 lines
across 4 files.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Tracking: #14736
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
The delete /graph is duplicated of
`/datasets/<dataset_id>/<index_type>`, delete it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
add new testing suite for the new restful api endpoints meant to replace
http and web api tests
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
Add a Go driver for Novita.ai (https://novita.ai), one of the unchecked
providers on the umbrella tracking issue #14736. Novita exposes an
OpenAI-compatible REST API at `https://api.novita.ai/v3/openai` and
proxies a large catalog of third-party models (DeepSeek, Llama, Qwen3,
Kimi, Gemma, Mistral, MiniMax, GLM, etc.) behind a single OpenAI-shaped
surface — 102 models live at the time of writing.
Until this PR, a tenant who configured `novita` as a model provider in
the Go layer fell through to the default branch of
`internal/entity/models/factory.go` and got the dummy driver.
### What this PR includes
- New `internal/entity/models/novita.go` with a `NovitaModel`
implementing the `ModelDriver` interface (~520 lines).
- New `conf/models/novita.json` with 7 representative chat models
(DeepSeek-V4, Llama-3.3-70B, Qwen3-30B/235B reasoning, Kimi-K2,
Gemma-3-27B, Mistral-Nemo).
- `factory.go`: route `"novita"` to `NewNovitaModel`.
- `internal/entity/models/novita_test.go`: 23 unit tests.
### Notable design point: `<think>...</think>` reasoning extraction
Novita-routed reasoning models like `qwen3-*` and `deepseek-r1-*` embed
their chain-of-thought **inline inside content as `<think>...</think>`
tags**, rather than in a separate `reasoning_content` field. Verified
live by probing `api.novita.ai`:
```
content head 200: <think>
Okay, let's see. I need to find 15% of 80. Hmm, percentages can sometimes be tricky, but I think
content tail 100: h, that works.
Alternatively, 0.15 × 80. If I move the decimal two places to the left for </think>
```
Without handling, a tenant picking qwen3 via Novita would see raw
`<think>` tags in their UI answer — different from every other reasoning
provider in the Go layer.
The driver detects those tags and routes the inner text to
`ChatResponse.ReasonContent` (non-stream) or the sender's second arg
(stream), keeping the visible answer clean of tag clutter:
- **`splitNovitaThink`** — scans a complete content string. Used by the
non-streaming path. Handles multiple `<think>` blocks, unclosed tags
(the model got cut off mid-reasoning), pure-text content with no tags.
- **`novitaThinkExtractor`** — stateful streaming version. Buffers
trailing bytes that might be the start of a tag (e.g. `<thi` held back
when the next chunk completes `nk>`), then emits segments in routing
order so callers can pipe them to a UI. Tested with byte-level chunk
boundaries and tag-spanning scenarios.
### Method coverage
| Method | Behavior |
|---|---|
| `ChatWithMessages` | `POST /v3/openai/chat/completions`, `<think>`
extraction on response |
| `ChatStreamlyWithSender` | SSE stream, stateful `<think>` extraction
across deltas |
| `ListModels` / `CheckConnection` | `GET /v3/openai/models` (102 live)
|
| `Embed` / `Rerank` / `Balance` / `TranscribeAudio` / `AudioSpeech` /
`OCRFile` | `"no such method"` — Novita's OpenAI-compatible surface does
not expose any |
No interface change. No new dependencies.
### How was this tested?
**23 unit tests** in `internal/entity/models/novita_test.go` — all pass:
```
$ go test -vet=off -run "TestNovita|TestSplitNovita" -count=1 ./internal/entity/models/...
ok ragflow/internal/entity/models 0.020s
```
Coverage:
- `splitNovitaThink` (5 cases: pure text, single block, leading text,
multiple blocks, unclosed tag)
- `novitaThinkExtractor` (6 cases: single-chunk, opening tag span,
closing tag span, byte-level chunking, no tags, lone `<` not as tag
start)
- `ChatWithMessages`: pure text, with `<think>` tags, missing API key,
empty messages, HTTP error
- `ChatStreamlyWithSender`: tag-stripping with spanning deltas, pure
content, sender-required, stream-true-required
- `ListModels` / `CheckConnection` (happy paths)
- All sentinel methods
`go build ./internal/entity/models/...` exits 0 on go 1.25.
**Live integration test** against `api.novita.ai/v3/openai`:
```
=== RUN TestNovitaLiveSmoke
[OK] Name() = "novita"
[OK] CheckConnection
[OK] ListModels: 102 models (showing first 6) [deepseek/deepseek-v4-pro deepseek/deepseek-v4-flash deepseek/deepseek-v3.2 xiaomimimo/mimo-v2.5-pro moonshotai/kimi-k2.6 zai-org/glm-5.1]
[OK] Chat (llama-3.3) answer="ok" reason=""
[OK] Chat (qwen3) answer len=0 head=""
ReasonContent len=1657 head="Okay, so I need to figure out what 15% of 80 is. Hmm, percentages can sometimes trip me up, but let ..."
[OK] Stream content: 0 chunks, 0 chars; reasoning: 600 chunks, 1667 chars
[OK] Embed/Rerank/Balance/TranscribeAudio/AudioSpeech/OCRFile all return "novita, no such method"
NOVITA LIVE SMOKE PASSED
--- PASS: TestNovitaLiveSmoke (26.18s)
```
What the live run proves on the wire:
- Auth (`Bearer <key>`) accepted by `api.novita.ai`.
- `/v3/openai/models` parser handles the real 102-model response.
- Non-stream chat against `meta-llama/llama-3.3-70b-instruct`: clean
string answer, empty ReasonContent (non-reasoning model, pure-text
path).
- Non-stream chat against `qwen/qwen3-30b-a3b-fp8`: 1657-char reasoning
extracted from `<think>...</think>` and routed to
`ChatResponse.ReasonContent`. Visible answer is 0 chars in this run
because qwen3 spent its 600-token budget entirely on reasoning before
reaching the answer phase — that's the model's behavior, not a driver
bug. The important thing: **no `<think>` tags leaked into the visible
Answer field**.
- Streaming against qwen3: 600 reasoning chunks (1667 chars) emitted via
the sender's 2nd arg across SSE deltas; **no `<think>` tag fragments
leaked into the content channel** despite tag boundaries crossing chunk
boundaries on the wire.
- All 6 sentinel methods return the documented `"no such method"`
strings.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Tracking: #14736
… modal
- Add vertical resizing functionality for the text field
### What problem does this PR solve?
_Fix the issue where the text content of the knowledge base editing
parsing block is too long to scroll._
<img width="701" height="775" alt="image"
src="https://github.com/user-attachments/assets/b258422e-fbc1-466d-abab-062e642c21d5"
/>
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: chenyun <chenyun@chenyundemacbook-pro.local>
### What problem does this PR solve?
Fix delete graphrag not take effect in UI
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The Baidu (Qianfan) `Encode` method silently swallowed malformed
responses. If a `data[]` item from the API was missing a field (`index`,
`embedding`, or unexpected shape), the loop did `continue` instead of
returning an error, leaving `nil` entries in the result slice. Callers
got back partial results with no indication anything went wrong, which
then crashes downstream consumers when they try to use a `nil` vector.
Concrete gaps fixed:
- No count-mismatch check between `data` length and input texts (only
checked for empty)
- No duplicate-index detection (a duplicate would silently overwrite)
- No missing-index final scan
- No empty-embedding rejection
- No per-call context timeout
- `EmbeddingConfig.Dimension` (added in #14735) was not propagated
This PR replaces `map[string]interface{}` parsing with a typed
`baiduEmbeddingResponse` struct, applies the standard four-layer
validation (count → out-of-range → duplicate → empty → final
missing-index scan), adds `context.WithTimeout(nonStreamCallTimeout)`,
and forwards `embeddingConfig.Dimension` as the `dimensions` parameter
(Baidu Qianfan v2 uses an OpenAI-compatible API).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
When parsing DOCX files with many tables, DeepDOC generates chunks
containing only empty HTML table tags, such as:
```html
<table><tr><td></td></tr><tr><td></td></tr><tr><td></td></tr><tr><td></td></tr></table>
```
After the regex cleanup at `task_executor.py:584`, this becomes `" "`
(whitespace only).
The guard at line 585 (`if not c`) only catches empty strings `""`, but
whitespace strings are truthy in Python and pass through. When sent to
Zhipu `embedding-3` API, it rejects them with error 1213:
`未正常接收到prompt参数`.
## Root Cause
```python
c = re.sub(r"</?(table|td|caption|tr|th)( [^<>]{0,12})?>", " ", c)
if not c: # ← only catches "", not " " / "\n" / "\t"
c = "None"
```
Verified with Zhipu `embedding-3`:
| Input | Result |
|---|---|
| `""` | error 1213 |
| `" "` | error 1213 |
| `"\n"` | error 1213 |
| `"None"` | OK |
## Fix
```diff
- if not c:
+ if not c.strip():
c = "None"
```
## Testing
Reproduced with a 678KB DOCX file (166 tables, 270 chunks). Chunk #89 is
the empty table above. After fix, `"None"` is sent instead and embedding
succeeds.
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
Closes#14768
### What problem does this PR solve?
The `list_chats` and `list_searches` REST API endpoints did not enforce
authorization on the `owner_ids` query parameter. Any authenticated user
could pass arbitrary tenant IDs to `owner_ids` and retrieve chats or
search apps belonging to other tenants they are not a member of.
This PR resolves the issue by:
1. Looking up the current user's authorized tenants via
`TenantService.get_joined_tenants_by_user_id` and rejecting any
`owner_ids` that fall outside that set.
2. When no `owner_ids` are provided, scoping the query to only the
user's authorized tenants instead of returning an unfiltered result.
3. Adding unit tests that verify unauthorized `owner_ids` are rejected
with `OPERATING_ERROR`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
What problem does this PR solve?
In rag/app/audio.py, the supported audio extensions list contains
duplicate entries: .wav appears twice (positions 3 and 5) and .aac
appears twice (positions 6 and 14). While this does not affect runtime
behavior, it is redundant and makes the code harder to maintain.
This PR removes the duplicate entries to keep the list clean and
consistent.
Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
```
RAGFlow(user)> chat with 'glm-ocr@test@zhipu-ai' message 'what is this'
CLI error: expect model glm-ocr@zhipu-ai is a chat or multimodal model
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: When a Wait Node precedes a Message Node within a Loop Node, the
outgoing message is split into two separate messages.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)