Commit Graph

529 Commits

Author SHA1 Message Date
Tim Wang
ca96d61e73 Feat: Add New API model provider for OpenAI-compatible gateways (#15991)
## Summary

Add support for **"New API"** as a model provider, enabling connection
to [New API](https://github.com/QuantumNous/new-api) /
[one-api](https://github.com/songquanpeng/one-api) compatible gateways
that aggregate multiple LLM backends behind a unified OpenAI-compatible
`/v1` endpoint.

### Features

- **All model types**: Chat, Embedding, Rerank, Image2Text, TTS,
Speech2Text
- **List Models discovery**: `NewAPI(OpenAIAPICompatible)` class in
`model_meta.py` queries the gateway's `/v1/models` to auto-discover
available models via the native `GET /api/v1/providers/<name>/models`
endpoint
- **Model parameter editing**: Pencil icon on each discovered model row
to edit `model_type`, `max_tokens`, and `features` (e.g. tool call
support) before submitting
- **Custom model addition**: "Add Custom Model" button at the bottom of
the List Models dropdown for models not returned by the API
- **Gear icon settings**: Enabled the Settings gear button on provider
instances to manage models on existing instances (viewMode)
- **viewMode credential passthrough**: Fixed List Models in viewMode —
merges `initialValues` credentials when `api_key`/`base_url` fields are
hidden by `hideWhenInstanceExists`

### Changes

**Backend** (8 files):
- `rag/llm/chat_model.py` — `NewAPIChat(Base)` class
- `rag/llm/embedding_model.py` — `NewAPIEmbed(OpenAIEmbed)` class (no
auto `/v1` append)
- `rag/llm/rerank_model.py` — `NewAPIRerank(Base)` class (uses `/rerank`
endpoint)
- `rag/llm/cv_model.py` — `NewAPICv(GptV4)` class
- `rag/llm/tts_model.py` — `NewAPITTS(OpenAITTS)` class
- `rag/llm/sequence2txt_model.py` — `NewAPISeq2txt(GPTSeq2txt)` class
- `rag/llm/model_meta.py` — `NewAPI(OpenAIAPICompatible)` class for List
Models discovery
- `conf/llm_factories.json` — New API factory entry with all model type
tags

**Frontend** (8 files + 1 new SVG):
- `web/src/assets/svg/llm/new-api.svg` — New API logo icon
- `web/src/constants/llm.ts` — `LLMFactory.NewAPI` enum + `IconMap`
entry
- `web/src/components/svg-icon.tsx` — `NewAPI` added to `svgIcons`
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/field-config/local-llm-configs.ts`
— New API `buildLocalConfig`
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/constants.ts`
— `LIST_MODEL_PROVIDERS` includes NewAPI
- `web/src/pages/user-setting/setting-model/components/used-model.tsx` —
Enable Settings gear button
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-picker.ts`
— viewMode credential merge + model editing state/handlers
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-options.tsx`
— Pencil edit icon per model row
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/index.tsx`
— `AddCustomModelDialog` import + edit dialog rendering

**Note on Go implementation**: A Go model driver (`NewAPIModel`
delegating to `OpenAIModel`) has been prepared but is deferred until the
Go runtime is enabled in a future release (current v0.26.0 images use
`API_PROXY_SCHEME=python` and do not compile Go binaries). Will submit
as a follow-up PR.

## Related

- Depends on: #15996 (provider instance API improvements — server-side
credential lookup, idempotent `add_model`, security fixes — required for
viewMode gear icon and batch model submission)

## Test plan

- [ ] Add New API provider with api_key and base_url pointing to an
OpenAI-compatible gateway
- [ ] Click "List Models" — should discover and display available models
from `/v1/models`
- [ ] Click pencil icon on a model — should open edit dialog to change
model_type, max_tokens, features
- [ ] Select multiple models and click OK — should add all selected
models
- [ ] Click gear icon on the added instance — should open viewMode with
List Models working
- [ ] In viewMode, select new models including pre-existing ones, click
OK — should succeed (requires #15996)
- [ ] Verify all model types work: create a Chat assistant, Embedding
KB, Rerank setting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Tim Wang <wanghualoong@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-26 18:47:20 +08:00
Lynn
ede46e0bb8 Fix: guess volc embedding model (#16298) 2026-06-24 14:11:55 +08:00
Günter Lukas
398f488b1b fix: support Google Cloud Gemini eu/us multipoint endpoints (#15990)
fix: support Google Cloud Gemini eu/us multipoint endpoints (#15990)
2026-06-24 11:07:05 +08:00
Lynn
a5cce29f22 Fix: add mimo (#16136)
### What problem does this PR solve?

Add chat model factory for Xiaomi model.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 19:02:33 +08:00
Rander
62698725ca feat(paddleocr): add image parsing support with async Job API (#16086)
## Summary

Add image parsing capability to PaddleOCR integration, building on top
of #15967 (async Job API migration).

## Changes

### `deepdoc/parser/paddleocr_parser.py`
- Add `parse_image()` method that uses the same async Job API flow as
`parse_pdf()`
- Extracts text from `layoutParsingResults` → `prunedResult` →
`parsing_res_list`
- Returns concatenated block content as a single string

### `rag/llm/ocr_model.py`
- Add `parse_image()` wrapper to `PaddleOCROcrModel` with availability
check and logging

## Relationship to other PRs

- **Depends on**: #15967 (async Job API migration) — this PR is based on
that branch
- **Replaces**: #14826 (original image processing PR based on old sync
API)

## Notes

This PR uses `base_url` and the async Job API (submit → poll → fetch)
consistent with #15967, rather than the old `api_url` + sync POST
pattern from #14826.
2026-06-16 19:34:38 +08:00
Rander
1235da7093 refactor(paddleocr): migrate from sync API to async Job API (#15967)
## Summary

Migrate PaddleOCR integration from the deprecated synchronous HTTP API
to the new asynchronous Job API (`submit → poll → fetch`), aligning with
PaddleOCR 3.6.0+ architecture.

## Changes

### Python (`deepdoc/parser/paddleocr_parser.py`)
- Replace synchronous `requests.post()` with async Job API flow (submit
→ poll → fetch)
- Authentication: `token {token}` → `Bearer {token}`
- File transfer: base64 JSON body → multipart file upload
- Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout
controlled by `request_timeout`)
- Result: fetch full JSONL from result URL, preserving `prunedResult`
with bbox info for crop functionality
- Rename `api_url` → `base_url` (backward compatible: `api_url` still
accepted as fallback)

### Python (`rag/llm/ocr_model.py`)
- Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to
`paddleocr_api_url` / `PADDLEOCR_API_URL`

### Go (`internal/entity/models/paddleocr.go`)
- Add `Client-Platform: ragflow` header to submit and poll requests
- Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5,
max 15s)

### Python (`common/constants.py`)
- Add `PADDLEOCR_BASE_URL` to env keys and default config

## Backward Compatibility

- Old env var `PADDLEOCR_API_URL` still works (used as fallback)
- Frontend field `paddleocr_api_url` still works (backend reads it as
fallback)
- No user-facing configuration changes required for existing setups

## Why not use the `paddleocr` SDK package directly?

RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing
`block_bbox`, `block_label`, `parsing_res_list`) from the raw API
response for PDF crop functionality. The SDK's public `parse_document()`
API only returns `DocParsingResult` with `markdown_text`, discarding the
bbox data. Therefore we implement the async Job API flow directly via
HTTP, following the same logic as the SDK internally.
2026-06-16 19:34:21 +08:00
Dexterity
bde2b1fc6d fix(llm): correct error handling, token accounting, and truncation in embedding providers (#15424)
### Summary

Closes #15423

`rag/llm/embedding_model.py` hosts about 40 embedding providers that
shared several defects affecting indexing reliability, cost accounting,
and error visibility. This PR fixes four concrete bugs.

**Masked, inconsistent errors (27 sites).** Nearly every provider ran
`log_exception(_e, res)` followed by `raise Exception(f"Error: {res}")`.
Because `log_exception` always raises, the second line was dead code,
and the surfaced exception varied with whether the SDK response exposed
a `.text` attribute. Every failure path now raises a single
`EmbeddingError` that includes the underlying response detail, so the
cause of a failed embedding is consistent and visible.

**Fabricated token counts.** `LocalAIEmbed` returned a hardcoded `1024`
and `OllamaEmbed` added `128` per text. These values feed `used_tokens`
and therefore billing and usage tracking. Both now report the real count
from the API (Ollama `prompt_eval_count`, LocalAI `usage`) and fall back
to a local token count only when the server omits it.

**Truncation overshoot.** The `8196` limit used by Mistral and Bedrock
exceeded the standard `8192` ceiling and could push boundary sized
inputs past the model limit. Limits are corrected to `8192` and made
intentional per provider, and providers that rely on server side
truncation now request it explicitly (Ollama `truncate=True`, Cohere
`truncate="END"`).

**Missing batching on Zhipu and Ollama.** Both issued one request per
text. They now batch like the other OpenAI compatible providers, turning
N round trips into `ceil(N / batch_size)`. Batched results are realigned
by response `index` so a chunk always keeps its own vector.

A shared `Base._batched_encode` helper owns the batch loop, optional
truncation, result accumulation, and the single error path. It is the
mechanism that lets these fixes live in one place instead of across 27
duplicated sites. The public `encode()` and `encode_queries()` contract
stays the same, so existing callers are unaffected.

Tests covering all four fixes are added under
`test/unit_test/rag/llm/test_embedding_model.py`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-11 19:29:46 +08:00
jaso0n0818
d4fbc013b9 fix: tolerate raw api_key string in AzureEmbed and AzureGptV4 __init__ (#15877)
Fixes #15587

## Problem

`AzureEmbed.__init__` in `rag/llm/embedding_model.py` and
`AzureGptV4.__init__` in `rag/llm/cv_model.py` both call
`json.loads(key)` unconditionally:

```python
api_key = json.loads(key).get("api_key", "")
api_version = json.loads(key).get("api_version", "2024-02-01")
```

When a user stores a plain API key string (not a JSON object) in the
model configuration — which is a valid and common way to configure Azure
OpenAI — `json.loads` raises `JSONDecodeError`. This makes the model
fail to initialize and causes document parsing/embedding to return a 500
error.

## Fix

Wrap `json.loads` in `try/except (json.JSONDecodeError, TypeError)` and
fall back to using the raw string as the `api_key` with the default
`api_version`. This is the same pattern already applied to the Azure
chat model in PR #15604.

## Files changed

- `rag/llm/embedding_model.py` — `AzureEmbed.__init__`
- `rag/llm/cv_model.py` — `AzureGptV4.__init__`

Fixes #15857

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
2026-06-11 16:28:29 +08:00
Idriss Sbaaoui
9871a7e0b6 fix: replicate model provider (#15933)
### What problem does this PR solve?

FIx replicate model provider failing with valid api key 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Wang Qi <wangq8@outlook.com>
2026-06-11 15:08:33 +08:00
Jack
0d3e410826 fix: strip Ollama-style tag suffix from LocalAI model names (#15908)
## Summary

LocalAI exposes two API surfaces with conflicting naming conventions:
- `GET /api/tags` returns model names with `:latest` suffix (Ollama
format)
- `POST /v1/chat/completions` expects names without `:latest` (OpenAI
format)

RAGFlow discovered models via `/api/tags` and stored the tagged name,
then used it with `/v1/chat/completions`, causing a 404 error because
LocalAI didn't recognize `model:latest`.

## Fix

In `LocalAI.get_model_list()`, strip the tag suffix from model names
using `model["name"].rsplit(":", 1)[0]`, so stored names match what the
OpenAI-compatible endpoints expect.
2026-06-10 19:05:05 +08:00
Idriss Sbaaoui
357cb84cd4 Fix: cohere call failing (#15899)
### What problem does this PR solve?

cohere api call failing because of missing prefix

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-10 15:57:10 +08:00
buua436
093eec3105 fix: handle qwen rerank error response (#15881)
### What problem does this PR solve?
Fix QWen rerank error handling so DashScope error responses without a
text attribute do not raise a secondary KeyError and hide the real
provider error.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-10 13:05:24 +08:00
euvre
f97d6396b4 fix: BaiduYiyan API key validation fails in set_api_key (#15828)
### What problem does this PR solve?

When setting the API key for the BaiduYiyan provider, all model
validations fail with the error "Fail to access model using this api
key. No valid response received".

**Root cause:**

1. `BaiduYiyanChat` in `rag/llm/chat_model.py` does not override
`async_chat_streamly()`. The `verify_api_key()` function uses
`mdl.async_chat_streamly()` to validate, but `BaiduYiyanChat` inherits
`Base.async_chat_streamly()` which uses the OpenAI client, not the Baidu
Qianfan SDK (qianfan). Since BaiduYiyan has no OpenAI-compatible
base_url, validation always fails.

2. `verify_api_key()` in `provider_api_service.py` does not format the
raw API key string into the JSON format (`{"yiyan_ak": "...",
"yiyan_sk": "..."}`) that `BaiduYiyanChat.__init__()` expects via
`json.loads(key)`.

**Fix:**

1. Add `async_chat_streamly()` method to `BaiduYiyanChat` using the
qianfan SDK, consistent with the existing `chat_streamly()` method.
2. Add BaiduYiyan API key formatting in `provider_api_service.py`
`verify_api_key()` to match the format expected by
`BaiduYiyanChat.__init__()`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-06-09 19:05:58 +08:00
Wang Qi
93e4f6bc09 Fix: Add bge as embedding (#15784)
Fix: Add bge as embedding
2026-06-09 09:31:24 +08:00
Lynn
b9f06e6095 Feat: model list (#15774)
### What problem does this PR solve?

Support model list for VolcEngine.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-08 20:18:00 +08:00
Wang Qi
c5d0060e0b Delete not supported model providers list (#15783)
Delete not supported model providers list
2026-06-08 20:06:03 +08:00
Wang Qi
8e4fba6cd2 Fix OpenRouter key JSONDecodeError (#15776)
Fix OpenRouter key JSONDecodeError
2026-06-08 19:19:10 +08:00
euvre
2c64febc93 feat: add ModelMeta implementations for Xinference, LocalAI, BaiduYiyan, and Tencent Cloud (#15752)
### What problem does this PR solve?

This PR adds `ModelMeta` implementations for four additional LLM/RAG
ecosystem platforms, building on the ModelMeta infrastructure introduced
in #15711.

Currently, only `Ollama` and `VolcEngine` have `ModelMeta` classes that
enable remote model list fetching. This PR extends that support to four
more platforms.

### Changes

Added four new `ModelMeta` subclasses in `rag/llm/model_meta.py`:

| Platform | `_FACTORY_NAME` | Has model list | Has full model info |
Approach |

|----------|-----------------|----------------|---------------------|----------|
| **Xinference** | `"Xinference"` |  |  | Parses `model_type` and
`context_length` from `/v1/models` response. Maps 6 model types
(LLM/embedding/rerank/image/TTS/speech2text). |
| **LocalAI** | `"LocalAI"` |  |  | Uses Ollama-compatible `GET
/api/tags` + `POST /api/show` endpoints. Returns capabilities
(completion/embedding/vision/tools/thinking) and
`general.context_length`. |
| **BaiduYiyan** | `"BaiduYiyan"` |  |  | Uses Qianfan SDK static
model catalog + `get_model_info()` for `max_input_tokens`. Returns 60
models (56 chat + 4 embedding) with real context lengths. |
| **Tencent Cloud** | `"Tencent Cloud"` |  |  | `NotImplementedError`
— uses SDK-based SID/SK HMAC signing, no model list REST API available.
|

All classes are automatically discovered and registered via the existing
`__init__.py` mechanism — no additional configuration needed.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-08 19:05:25 +08:00
天海蒼灆
17f27b9df2 fix(browser): show resolved variables in workflow run log input (#15325)
### What problem does this PR solve?

Browser parsed sys.query from prompts but never called set_input_value,
so node_finished inputs displayed null in the agent orchestration run
log.
Additionally, Browser’s tenant-model path could trigger unsupported
structured-output modes (response_format/tool_choice) for some
OpenAI-compatible providers (notably DeepSeek thinking models), causing
step failures.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 18:12:56 +08:00
Wang Qi
4bbd59823a Addd OpenRouter OpenAI API compatible list models (#15764)
Addd OpenRouter OpenAI API compatible list models
1. openrouter
2. OpenAI API compatible
3. VLLM
4. LM Studio

Open Router
<img width="1318" height="1217" alt="image"
src="https://github.com/user-attachments/assets/1d11b1e3-8c72-44fd-bff2-e9502d88d97d"
/>

VLLM
<img width="1433" height="931" alt="image"
src="https://github.com/user-attachments/assets/088801a6-0481-4623-976b-e7e93253ea07"
/>
2026-06-08 16:42:17 +08:00
buua436
6bf7056422 feat: add placeholder model metas (#15753)
### What problem does this PR solve?

add placeholder model metas

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 14:54:59 +08:00
cleanjunc
38f9ea5fec fix(rerank): normalize reranker scores onto a single scale before hybrid blend (#15429)
### What problem does this PR solve?

Closes #15428

The hybrid score in `rag/nlp/search.py` (`rerank_by_model`) blends
reranker similarity with token similarity on a fixed `[0, 1]` scale:

```python
return tkweight * np.array(tksim) + vtweight * vtsim + rank_fea  # tkweight=0.3, vtweight=0.7
```

The reranker implementations did not agree on that scale. Only three of
roughly 17 providers normalized their output, and `NvidiaRerank`
returned raw, unbounded logits. Weighted at `0.7`, a negative logit
could push a genuinely relevant chunk below pure keyword matches, and
its magnitude swamped `tksim`, which lives in `[0, 1]`. The practical
effect was that the same query produced differently scaled scores
depending on the configured reranker, and logit based providers degraded
retrieval quality instead of improving it.

This PR enforces a single scoring contract in one place:

- `Base.similarity` is now the only public entry point. It
short-circuits empty input and guarantees a normalized result. Each
provider implements its raw scoring in `_compute_rank`, which removes
sixteen duplicated empty input guards and the three scattered
normalization calls.
- Normalization is range aware. Providers that already return calibrated
`[0, 1]` relevance scores (Cohere, Jina, Voyage, and others) keep their
absolute magnitudes, so `similarity_threshold` filtering and the
reported `vector_similarity` stay meaningful. Only out-of-range output
such as NVIDIA logits is min-max rescaled into `[0, 1]`.
- The twelve leftover `[DEBUG ...]` prints in `rerank_by_model`,
introduced in #14231, are removed. They ran on every retrieval, added
per chunk overhead, and leaked queries, keywords, and document content
to stdout and logs.

A new regression suite in
`test/unit_test/rag/llm/test_rerank_normalization.py` covers logit
rescaling (positive, negative, and flat batches), preservation of
already calibrated scores, ordering, empty input handling, and the per
provider HTTP path. It also asserts that no provider overrides
`similarity()`, so the contract cannot silently drift.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 11:53:22 +08:00
Lynn
b05d5a5228 Feat: get model list from remote (#15711)
### What problem does this PR solve?

Feat:
- Get model list from remote provider. 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-08 11:02:40 +08:00
Lynn
794c1f4b25 Fix: volc engine and other json key factories (#15653)
### What problem does this PR solve?

Fix:
- VolcEngine adapt to new api_key format
- Save dict api_key as json

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-05 09:45:44 +08:00
Wang Qi
d41373cfa9 Feature: Add the new anthropic and voyage models (#15516)
add the newanthropic and voyage models. Strip opus 4.7 and 4.8 of
certain usnspported keys

Co-authored-by: Idriss Sbaaoui <112825897+6ba3i@users.noreply.github.com>
2026-06-02 17:29:18 +08:00
Aeovy
600590cd18 Fix: disable thinking to avoid potential infinite loops in Qwen3.5/Qwen3.6 models (#15101)
### What problem does this PR solve?

This PR fixes the issue where Qwen3.5/Qwen3.6 series models may spend
excessive time on simple document-parsing tasks, such as Auto Metadata
extraction, keyword extraction, question generation, and image
description when using the MinerU parser.

For these tasks, Qwen3.5/Qwen3.6 models may perform unnecessary
reasoning by default, which can lead to very long response times, high
token consumption, and, in some cases, potential infinite output loops.

Since Qwen3.5/Qwen3.6 multimodal models are instantiated as `CvModel`
when configured as `image2text`, the existing `enable_thinking=False`
logic in `chat_model.py` does not apply to them. This PR adds the
corresponding handling for the CV/image-to-text model path as well.

This helps reduce unnecessary thinking time, avoid potential infinite
loops, and improve parsing efficiency without noticeably affecting
output quality for these simple extraction and image-description tasks.

Fixes #15083.
2026-06-02 13:21:35 +08:00
nickmopen
bebf6ed244 fix(llm): strip non-generation keys from gen_conf for LiteLLM providers (#15427) (#15432)
### What problem does this PR solve?

Fixes #15427.

All LiteLLM-routed chats fail with:

- Anthropic: `litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error","message":"model_type: Extra inputs are
not permitted"}`
- OpenAI: `litellm.BadRequestError: OpenAIException - Unknown parameter:
'model_type'`

This is a regression from v0.25.4.

#### Root cause

A chat assistant's `llm_setting` is forwarded to the model as
`gen_conf`. `llm_setting` can legitimately carry RAGFlow-internal
metadata such as `model_type` (the chat REST APIs in
`api/apps/restful_apis/` read it back out of `llm_setting`), so that key
ends up inside `gen_conf`.

`Base._clean_conf` (OpenAI-compatible providers) already **whitelists**
the keys it forwards, so direct-OpenAI providers were unaffected.
`LiteLLMBase._clean_conf` only dropped `max_tokens` and passed
everything else straight through to `litellm.acompletion`, which
forwarded `model_type` to the upstream provider — and Anthropic / OpenAI
reject it. Because both Claude and GPT route through LiteLLM, every chat
broke.

#### Fix

- Extract the allowed-key set into a shared `ALLOWED_GEN_CONF_KEYS`
constant and reuse it in `Base._clean_conf`.
- Apply the same whitelist in `LiteLLMBase._clean_conf`, plus the
LiteLLM-specific reasoning params (`thinking`, `reasoning_effort`,
`extra_body`) that the model-family policies inject for reasoning
models.

This covers all four LiteLLM completion paths (`async_chat`,
`async_chat_streamly`, `async_chat_with_tools`,
`async_chat_streamly_with_tools`), since they all route through
`_clean_conf`.

#### Tests

Adds `test/unit_test/rag/llm/test_clean_conf_whitelist.py` covering both
backends: `model_type` (and other stray keys) are dropped, genuine
generation params and `thinking` survive, `max_tokens` is removed, and
the whitelist invariants hold.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Added test cases
2026-06-02 10:04:11 +08:00
Wang Qi
1a6df01b53 Bug fix: Enhance embeding model to give better error message (#15346)
To resolve https://github.com/infiniflow/ragflow/issues/15343 enhance
the model embedding message to give extact failure message to customer.


# QWen

## Retrieval
<img width="3321" height="1033" alt="image"
src="https://github.com/user-attachments/assets/6b82921a-a3a7-4a33-a383-1cf316398ee2"
/>

## Chat
<img width="2241" height="311" alt="image"
src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716"
/>


# SiliconFlow
## Retrieval
<img width="3321" height="1033" alt="image"
src="https://github.com/user-attachments/assets/ee2cd191-a27d-4729-b53d-2fbdb4e352cd"
/>

## Chat
<img width="1562" height="210" alt="image"
src="https://github.com/user-attachments/assets/10376a8e-a3f4-422f-bc2e-96f2a8a96448"
/>

# Baichuan
## Retrieval
<img width="3321" height="1107" alt="image"
src="https://github.com/user-attachments/assets/dcb5409d-f7fc-4804-b186-5e1ee11e09c4"
/>

## Chat
<img width="2241" height="311" alt="image"
src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716"
/>


# Zhipu
zhipu is good.
2026-06-01 19:18:16 +08:00
呆萌闷油瓶
658ff06ca4 feat: add 4 new models for siliconflow (#15383)
### What problem does this PR solve?

Added 4 new models:
deepseek-ai/DeepSeek-V4-Pro
deepseek-ai/DeepSeek-V4-Flash
Pro/moonshotai/Kimi-K2.6
Pro/zai-org/GLM-5.1

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-29 19:28:29 +08:00
wdeveloper16
4b36801b53 fix: resolve asyncio correctness issues (fire-and-forget tasks, event loop nesting) (#14761)
## Summary

Fixes the confirmed asyncio anti-patterns from #14755. Only the three
verified bugs are addressed; patterns already correctly using
`asyncio.new_event_loop()` in a fresh thread are left untouched.

### Changes

**`api/apps/restful_apis/tenant_api.py` — fire-and-forget
`send_invite_email`**

`asyncio.create_task()` was called without storing the `Task` reference.
CPython's GC can collect an unfinished task, silently cancelling it and
swallowing exceptions. Fixed by storing the task in a module-level
`_background_tasks: set[Task]` with a `done_callback` to discard it on
completion — the standard Python idiom for safe background tasks.

**`api/apps/restful_apis/agent_api.py` — fire-and-forget
`background_run`**

Same root cause in the webhook "Immediately" execution path. Same fix
applied.

**`rag/llm/chat_model.py` (`LocalLLM._stream_response`) —
`asyncio.get_event_loop()` on running loop**

`asyncio.get_event_loop()` returns Quart's running event loop when
called from an async context.
Calling `loop.run_until_complete()` on it raises `RuntimeError`.
Replaced with `asyncio.new_event_loop()` so the generator
uses a dedicated fresh loop, closed in a `finally` block.

## What was NOT changed

- `llm_service._sync_from_async_stream` and
`evaluation_service._sync_from_async_gen`: both already correctly use
`asyncio.new_event_loop()` inside a fresh thread.
- `llm_service._run_coroutine_sync`: only caller is `rag/app/resume.py`
(sync context), so `thread.join()` is correct there.
- `requests` in agent tools: sync methods dispatched through thread
pools; httpx migration is a separate, larger refactor.

## Test plan

- [ ] Invite a team member and confirm the email is sent with no task
warnings in logs.
- [ ] Trigger a webhook agent in "Immediately" mode; confirm canvas
state is persisted after background run.
- [ ] Verify `LocalLLM` (Jina backend) chat and streaming work
end-to-end.

Closes #14755

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-05-25 22:45:40 +08:00
Jonathan Hill
111cdc77b5 fix: guard LLM response against empty choices (fixes #14711) (#14988)
## Summary

Fixes 10 unguarded `response.choices[0]` accesses that cause
`IndexError` or `AttributeError` when the LLM returns an empty `choices`
list — the scenario described in #14711.

- `rag/llm/cv_model.py`
- `rag/llm/chat_model.py`

Each access site is now guarded with:
```python
if not response.choices:
    raise ValueError("LLM returned empty response")
```

## Verification

Detected and verified by [pact](https://github.com/qizwiz/pact) — a
sheaf-cohomological LLM contract checker using Z3 as a local theory
solver.

**pact sheaf-cohomological proof status after fix:**

| File | Ȟ¹ (after) | Z3 |
|------|-----------|-----|
| `rag/llm/cv_model.py` | 0 | UNSAT ✓ |
| `rag/llm/chat_model.py` | 0 | UNSAT ✓ |

All access sites proven safe (Z3 UNSAT certificate).

The checker was also used to verify the autogen streaming-None fix in
[microsoft/autogen#7711](https://github.com/microsoft/autogen/pull/7711).

## Test plan
- [ ] Existing test suite passes
- [ ] Manually test with a provider that returns empty `choices` under
load (e.g. Vertex AI)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Jonathan Hill <jonathan.f.hill@gmail.com>
2026-05-21 15:37:19 +08:00
Kevin Hu
e7544562cc Feat: @tool decorator for chat-model tool registration (#15047)
## Summary

- Adds a lightweight `@tool` decorator and `FunctionToolSession` adapter
in `rag/llm/tool_decorator.py` that let callers register plain Python
functions as LLM tools without hand-writing OpenAI function schemas or
building an MCP-style session.
- Refactors `Base.bind_tools` and `LiteLLMBase.bind_tools` in
`rag/llm/chat_model.py` to accept either the new decorator form
`bind_tools(tools=[fn1, fn2])` or the existing `(toolcall_session,
tools_schemas)` form, so existing agent/dialog call-sites in
`agent/component/agent_with_tools.py`, `api/db/services/llm_service.py`,
and `api/db/services/dialog_service.py` are unaffected.
- Adds 8 unit tests in `test/unit_test/rag/llm/test_tool_decorator.py`
covering schema shape, required/optional inference, sync + async
dispatch, and bad-input rejection.

## Usage

```python
from rag.llm.tool_decorator import tool

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city.

    :param city: City name to look up.
    """
    return f"{city}: 21 C, partly cloudy"

chat_mdl.bind_tools(tools=[get_weather])
ans, tk = await chat_mdl.async_chat_with_tools(system, history)
```

The decorator introspects `inspect.signature` + type hints + the
docstring (`:param name:` style) and attaches an OpenAI-format
`openai_schema` to the callable. `FunctionToolSession` duck-types the
existing `ToolCallSession` protocol, dispatching async callables
directly and sync ones through `thread_pool_exec` so the event loop is
never blocked.

## Design notes

- `tool_decorator.py` deliberately does **not** live inside
`rag/llm/__init__.py` to avoid forcing every consumer through the heavy
provider auto-discovery loop and to sidestep a circular import
(`__init__.py` imports `chat_model`, which would otherwise need symbols
from `__init__.py`).
- `FunctionToolSession` is duck-typed against
`common.mcp_tool_call_conn.ToolCallSession` rather than explicitly
inheriting from it, so importing the decorator doesn't pull the MCP
client SDK into the import graph.
- Docstring parsing is intentionally minimal (`:param name:` only) to
keep this dependency-free; Google/NumPy styles can be added later via
`docstring_parser` if needed.

## Test plan

- [x] `python -m pytest test/unit_test/rag/llm/test_tool_decorator.py
-v` — 8 passed
- [x] `python -m pytest test/unit_test/rag/llm/
--ignore=test/unit_test/rag/llm/test_perplexity_embed.py` — 11 passed
(the ignored test has a pre-existing `numpy` import that's unrelated)
- [ ] Reviewer: smoke-test the new path end-to-end with a live model via
`chat_mdl.bind_tools(tools=[my_fn])` to confirm the OpenAI-format
schemas pass through unchanged

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 15:32:17 +08:00
Magicbook1108
b28e134944 Feat: add local & ssh provider in admin panel (#15039)
### What problem does this PR solve?

Feat: add local & ssh provider in admin panel

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-20 16:56:20 +08:00
wdeveloper16
14c0985182 feat: bump Python minimum from 3.12 to 3.13, drop strenum backport (#14767)
Closes #14753

## What changed

| File | Change |
|---|---|
| `pyproject.toml` | `requires-python` → `>=3.13,<3.15`; remove
`strenum==0.4.15` |
| `Dockerfile` | `uv python install 3.13`, `uv sync --python 3.13` |
| `.github/workflows/tests.yml` | `uv sync --python 3.13` on both matrix
legs |
| `CLAUDE.md` | dev setup command + requirements note updated |
| `deepdoc/parser/mineru_parser.py` | `from strenum import StrEnum` →
`from enum import StrEnum` |
| `agent/tools/code_exec.py` | same |

`StrEnum` has been in the stdlib since Python 3.11 — the `strenum`
backport package is no longer needed once the floor is 3.13.

## Why uv.lock is not regenerated

`uv lock --python 3.13` fails because:

1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0`
2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels)
depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0`
3. These two constraints are irreconcilable on Python 3.13

The lockfile regeneration requires loosening the `numpy` upper bound in
the `infiniflow/graspologic` fork. Once that fork commit is updated and
the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will
succeed.

## RFC corrections

Two claims in the original RFC (#14753) did not hold up under code
review:

- **"graspologic hard-blocks 3.13"** — the infiniflow fork at the pinned
commit has no `<3.13` Python constraint. The blocker is the transitive
`numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a
direct Python version cap.
- **"free-threading throughput gains for I/O-bound workload"** — Python
3.13 free-threading requires a special `--disable-gil` build and
provides no benefit for async I/O code (the GIL is already released
during I/O). The real motivation is forward compatibility and improved
error messages.
2026-05-15 14:40:53 +08:00
sham-sr
ef2969a462 fix(llm): Tongyi-Qianwen embeddings use correct DashScope native API for intl URLs (#14784)
## Summary
- Fixes **Tongyi-Qianwen** (`QWenEmbed`) text embeddings when the
configured `base_url` points at DashScope **international**
(`dashscope-intl.aliyuncs.com`) or **China** (`dashscope.aliyuncs.com`)
hosts, including values copied from Model Studio that use the
**OpenAI-compatible** path (`.../compatible-mode/v1`).
- The `dashscope` Python SDK (`TextEmbedding.call`) expects the
**native** HTTP root (`https://<host>/api/v1`), not the
OpenAI-compatible base URL. Without mapping, international accounts
could hit the wrong host or path.

## Implementation
- Added `_dashscope_native_http_api_url()` to normalize known DashScope
hosts to `.../api/v1`, and wired `QWenEmbed` to set
`dashscope.base_http_api_url` before each embedding call (document and
query).

## Notes
- In-code comments document the Tongyi-Qianwen / DashScope intl vs CN
behavior for future maintainers.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 10:07:48 +08:00
07heco
8dc5b1b42d fix: optimize reranking module robustness and bug fixes (#14264)
## Description
This PR fixes critical bugs and improves the robustness of the RAG
reranking module while maintaining **100% backward compatibility** with
all existing functionality and providers.

## Key Changes
1. **Network Stability**: Added 30s timeout to all API requests to
prevent service blocking
2. **Boundary Protection**: Added empty query/text validation for all
rerank models
3. **Response Fault Tolerance**: Replaced hardcoded key access with
`.get()` to avoid KeyError crashes
4. **Bug Fixes**:
   - Fixed `Ai302Rerank` (completely non-functional before)
   - Fixed `GPUStackRerank` incorrect exception catching
   - Fixed `_normalize_rank` empty array crash
5. **Code Specification**: Added type annotations, standardized
unimplemented class prompts

## Compatibility
-  No changes to any class/method names
-  All rerank providers (Jina/Cohere/NVIDIA/HuggingFace etc.) work as
before
-  No breaking changes, zero impact on existing workflows

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-14 11:56:09 +08:00
07heco
e46989832e fix: complete robustness fixes for rerank module addressing all review comments (#14265)
## Summary
This PR fully addresses all CodeRabbit review feedback and enhances the
robustness of the reranking module with 100% backward compatibility.

## Key Fixes
1. Fixed JinaRerank hardcoded base_url to support subclass endpoint
overrides
2. Corrected GPUStackRerank exception handling to use proper requests
exceptions and preserve stack traces
3. Added 30s timeout to all API calls to prevent service hanging
4. Added empty input validation for all rerank providers
5. Replaced direct dict key access with .get() to eliminate KeyError
crashes
6. Fixed _normalize_rank edge case for empty arrays
7. Implemented missing functionality for Ai302Rerank
8. Standardized type hints and fixed typo issues

## Compatibility
- No breaking changes to any existing functionality
- All rerank providers work as originally intended
- Fully compatible with existing configurations and workflows

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-11 12:40:41 +08:00
Ricardo-M-L
13922209e6 fix(llm): add timeout to HTTP requests in LLM integration layer (#14313)
### What problem does this PR solve?

Multiple `requests.post()` calls across the LLM integration layer lack a
`timeout` parameter. Without a timeout, a single unresponsive upstream
service can block the calling thread **indefinitely**, eventually
exhausting the thread pool and degrading the entire system.

This is a well-known issue — Python's `requests` library defaults to
`timeout=None` (infinite wait), and [the library docs explicitly
recommend](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts)
always setting a timeout.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Change

Added `timeout` to all `requests.post()` calls missing it:

| File | Calls fixed | Timeout |
|------|-------------|---------|
| `rag/llm/rerank_model.py` | 9 | 30s |
| `rag/llm/embedding_model.py` | 8 | 30s |
| `rag/llm/cv_model.py` | 3 | 60s |
| `rag/llm/tts_model.py` | 2 | 60s |
| `rag/llm/sequence2txt_model.py` | 2 | 60s |

Embedding/rerank calls use 30s (lightweight API calls). Vision, TTS, and
audio transcription use 60s (heavier workloads with file uploads).

Note: other files in the codebase (e.g. `check_minio_alive`,
`check_ragflow_server_alive`) already use `timeout=10`, so this PR
brings the LLM layer in line with existing practice.

Signed-off-by: Ricardo-M-L <Sibyl_Hartmanbnb@webname.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-11 11:19:07 +08:00
VincentLambert
08bb53bbb1 Feat: add BedrockCV for vision/image2text inference via LiteLLM (#14705)
## Summary

- `CvModel["Bedrock"]` was absent from `rag/llm/cv_model.py`, causing
`model_instance()` to return `None` when a Bedrock model was used as a
PDF parser — even after correct model resolution.
- This PR adds `BedrockCV`, enabling Bedrock vision models (e.g.
`amazon.nova-pro-v1:0`, `anthropic.claude-3-5-sonnet`) to be used as PDF
parsers.

## What problem does this PR solve?

When a Bedrock model is selected as the PDF parser in a knowledge base,
ingestion failed with:

```
'LiteLLMBase' object has no attribute 'describe_with_prompt'
```

The root cause: `LiteLLMBase` (the Bedrock chat implementation) was the
only registered handler for the Bedrock factory. It does not implement
`describe_with_prompt`. `CvModel` had no Bedrock entry, so
`model_instance()` returned `None` for `image2text` requests.

## Type of change

- [x] New Feature (non-breaking change which adds functionality)

## Changes

**`rag/llm/cv_model.py`**

Adds `BedrockCV(Base)` with `_FACTORY_NAME = "Bedrock"`:

- Uses `litellm.completion` with the `bedrock/` prefix (consistent with
`LiteLLMBase`)
- Parses AWS credentials from the JSON key assembled by `add_llm`
(`auth_mode`, `bedrock_ak`, `bedrock_sk`, `bedrock_region`,
`aws_role_arn`)
- Supports three auth modes: `access_key_secret`, `iam_role` (via STS
`assume_role`), and default credential chain (IRSA, instance profile)
- Implements `describe_with_prompt` and `describe`

## Test plan

- [ ] Configure a Bedrock vision model (e.g. `amazon.nova-pro-v1:0`)
with valid AWS credentials
- [ ] Select it as PDF parser in a knowledge base
- [ ] Verify ingestion of a PDF document completes without errors
- [ ] Verify `CvModel["Bedrock"]` resolves to `BedrockCV`

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 10:29:58 +08:00
Igor Ilinskii
889aba6a32 fix base_url handling in HuggingfaceRerank (#14555)
### What problem does this PR solve?

HuggingfaceRerank.post() unconditionally prepends `http://` to base_url,
which already contains a protocol. This creates invalid URLs like
http://http://127.0.0.1:8080/rerank, breaking all requests. The fix
normalizes URL handling to match the rest of the codebase, removing
redunant `http://`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Related Issues
- #7318 
- #7796

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-05-11 10:04:40 +08:00
Ricardo-M-L
1046042e01 fix(llm): replace mutable default gen_conf={} with None + defensive copy (#14566)
### What

19 methods across `rag/llm/chat_model.py` and `rag/llm/cv_model.py`
declare `gen_conf={}` (or `gen_conf: dict = {}`) as a parameter default
and then mutate `gen_conf` in place — typically `del
gen_conf["max_tokens"]`, `gen_conf["penalty_score"] = ...`, or
`gen_conf.pop(...)` as part of provider-specific normalization.

### The two bugs in this pattern

**1. Mutable default argument (Python footgun).** Python evaluates
default values **once** at function-definition time, so the single `{}`
dict is *shared* across every caller that doesn't pass `gen_conf`. The
first such call's mutations leak into the default seen by every
subsequent call.

```python
# Before
def chat_streamly(self, system, history, gen_conf={}, **kwargs):
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]   # mutates the SHARED default dict
    ...
```

After call N with `max_tokens` set, call N+1 that omits `gen_conf` no
longer sees `max_tokens` — even though the caller never touched it.

**2. Caller-dict pollution.** When the caller *does* pass a `gen_conf`
dict, the same in-place mutations modify the caller's dict. A reused
`gen_conf` (very common for chat-loop callers that build the config once
and pass it on every turn) silently loses `max_tokens`,
`presence_penalty`, etc. after the first round.

### The fix

In every affected method:

- Change `gen_conf={}` (or `gen_conf: dict = {}`) → `gen_conf=None`.
- Add `gen_conf = dict(gen_conf or {})` as the first statement of the
body so all subsequent mutations operate on a fresh local copy.

```python
# After
def chat_streamly(self, system, history, gen_conf=None, **kwargs):
    gen_conf = dict(gen_conf or {})
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]   # local copy — safe
    ...
```

This is byte-for-byte identical provider-side behavior for callers that
already pass a fresh `gen_conf` per call. The new `dict(...)` copy is
O(small constant) per call.

### Files changed

- `rag/llm/chat_model.py` — 17 methods
- `rag/llm/cv_model.py` — 2 methods

### Tests

Adds `test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py` — an
`ast`-based regression guard that walks both modules and asserts no
parameter named `gen_conf` ever has a mutable literal (`{}` or `[]`) as
its default. The test caught **five additional `gen_conf: dict = {}`
sites** that an initial `gen_conf={}` text grep had missed (annotated
parameters with whitespace), and would fail again if the pattern is ever
reintroduced.

```
$ pytest test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py -v
============================== 3 passed in 0.04s ===============================
```

`ruff check` passes on all touched files.

### Notes

- This PR is intentionally focused on **just** the `gen_conf` default +
copy fix. There's a related (but separate) `history.insert(0, ...)`
pattern in the same files that mutates the caller's history list in 12
places — left for a follow-up so this PR stays mechanical and easy to
review.

### Latest revision (`700bb54a7`) — addresses CodeRabbit review

- Type annotation: `gen_conf: dict = None` → `gen_conf: dict | None =
None` (5 occurrences in `chat_model.py`). The old annotation was a
static-checker mismatch since `None` isn't a `dict`.
- Regression test: the AST check accessed `default.keys` directly.
`ast.List` has no `.keys` attribute — a future `gen_conf=[]` would crash
with `AttributeError` instead of being caught. Use `getattr` for both
`.keys` (Dict) and `.elts` (List). Manually verified the updated check
correctly catches both `gen_conf={}` and `gen_conf=[]` while ignoring
`gen_conf=None` and non-empty literals.

---------

Co-authored-by: Ricardo <ricardo@example.com>
2026-05-09 13:11:44 +08:00
Zhichang Yu
86fe78c73f feat(llm): add MiniMax GroupId header support (#14610)
## Summary
- Add MiniMax provider GroupId query parameter support in `LiteLLMBase`
- Extract `group_id` from key configuration in `__init__`
- Append `GroupId` as query parameter to `api_base` in
`_construct_complete_args`

## Why this change is needed

MiniMax provides an OpenAI-compatible API endpoint
(`/v1/chat/completions`), but `GroupId` is a MiniMax-specific account
identifier required for billing and rate limiting - it is not part of
the OpenAI standard.

Looking at LiteLLM's `MinimaxChatConfig`:
- `get_complete_url()` only constructs the base URL (e.g.,
`https://api.minimaxi.com/v1/chat/completions`)
- LiteLLM does **not** automatically inject `GroupId` into requests
- This must be handled by the caller (ragflow's chat_model.py)

The implementation appends `GroupId` as a query parameter to `api_base`:
```python
api_base = completion_args.get("api_base", self.base_url)
separator = "&" if "?" in api_base else "?"
completion_args["api_base"] = f"{api_base}{separator}GroupId={self.group_id}"
```

This matches MiniMax's official API format (as documented by
LlamaFactory):
```bash
curl --location 'https://api.minimaxi.chat/v1/text/chatcompletion?GroupId=你的GroupId' \
  --header 'Authorization: Bearer 你的API_Key'
```

## Test plan
- [ ] Verify MiniMax API calls work with GroupId query parameter
- [ ] Verify backward compatibility for other providers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 11:54:49 +08:00
euvre
8269fa01b4 Fix AttributeError when appending non-streaming tool calls to chat history in Agentic Agent (#14456)
### What problem does this PR solve?

Fix #14340 

## Problem Description

When using an **Agentic Agent** (not Workflow) with one or more
Retrieval tools (e.g., Dataset Retrieval + Memory Retrieval), the agent
silently returns an empty response (`agent_response: ""`) after hanging
for several minutes. The server logs show:

```
AttributeError: 'ChatCompletionMessageToolCall' object has no attribute 'index'
```

This error propagates as a `GENERIC_ERROR`, causing the canvas to return
an empty response. The subsequent Memory save task then receives the
empty `agent_response` and logs:

```
Document for referred_document_id XXXX not found
```

## Reproduction Steps

1. Set `DOC_ENGINE=infinity` (or `elasticsearch` — the engine itself is
not the root cause).
2. Create a blank **Agentic Agent** (not a Workflow).
3. Add **two Retrieval tools** to the Agent node:
   - `Retrieval_DS` → Dataset (Knowledge Base)
   - `Retrieval_Mem` → Memory component
4. Add a **Message** node with **Save to Memory** enabled.
5. Launch the agent and send any message (e.g., "hola").
6. The agent hangs and returns an empty response.

## Root Cause Analysis

The crash occurs in `_append_history` and `_append_history_batch` inside
`rag/llm/chat_model.py`. These methods directly access `.index` on tool
call objects:

```python
# _append_history_batch
{
    "index": tc.index,   # <-- crashes here
    ...
}
```

However, **non-streaming** LLM responses (`stream=False`) return
`ChatCompletionMessageToolCall` objects, which **do not have an `index`
field** according to the OpenAI API specification. The `index` field
only exists on `ChoiceDeltaToolCall` objects returned in **streaming**
responses (`stream=True`).

When the agentic agent triggers an internal `full_question` call (used
to compress multi-turn conversation history), the request is incorrectly
routed through `async_chat_with_tools` because `is_tools=True` is set at
the `LLMBundle` level. If the LLM decides to emit `tool_calls` during
this auxiliary request, the code enters the non-streaming tool loop and
crashes when trying to append history.

## Fix

Replaced all direct `.index` accesses with `getattr(..., "index", None)`
for safe, backward-compatible access:

| Method | File | Line | Change |
|--------|------|------|--------|
| `_append_history` | `rag/llm/chat_model.py` | ~L304 |
`tool_call.index` → `getattr(tool_call, "index", None)` |
| `_append_history_batch` | `rag/llm/chat_model.py` | ~L332 | `tc.index`
→ `getattr(tc, "index", None)` |
| `_append_history` | `rag/llm/chat_model.py` | ~L1467 |
`tool_call.index` → `getattr(tool_call, "index", None)` |
| `_append_history_batch` | `rag/llm/chat_model.py` | ~L1496 |
`tc.index` → `getattr(tc, "index", None)` |

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-05-06 14:39:40 +08:00
sapienza yoan
811e9826d0 perf: avoid O(n²) array growth in embedding accumulation (#14369)
### What problem does this PR solve?

Both tokenizer (`rag/flow/tokenizer/tokenizer.py`) and
`BuiltinEmbed.encode`
(`rag/llm/embedding_model.py`) currently accumulate embedding batches
via
`np.concatenate` inside the per-batch loop. `np.concatenate` allocates a
new
array and copies all existing data on every call, so accumulating N
batches
is O(N²) in both time and peak memory.

Replacing the incremental concatenate with a list-of-batches + a single
`np.vstack` at the end gives O(N) total work.

For tokenizer the title-vector broadcast `np.concatenate([vts[0]] * N)`
is
also replaced by `np.tile`, which does the same job with a single
contiguous
allocation instead of building a Python list of references.

This is purely a CPU/memory optimisation — output shape and dtype are
unchanged. Measured impact grows with document size:
  -   1k chunks (batch 512, 2 iters):    ~negligible
  -  10k chunks (20 iters):              ~10× speedup on this stage
  - 100k chunks (195 iters):             ~100× speedup, and peak RAM
drops from O(N) extra to near-zero

### Type of change

- [x] Performance Improvement

Co-authored-by: yoan sapienza <Yoan Sapienza yoan.sapienza@orange.fr Yoan Sapienza zappy@macbookpro.home>
2026-04-30 11:00:10 +08:00
FuturMix
2548c28d65 feat: add FuturMix as model provider (#14419)
## Summary

Add [FuturMix](https://futurmix.ai) as a new model provider. FuturMix is
an OpenAI-compatible unified AI gateway that provides access to 22+
models (GPT, Claude, Gemini, DeepSeek, and more) through a single API
endpoint and key.

- **API Base**: `https://futurmix.ai/v1` (OpenAI-compatible)
- **Supported capabilities**: Chat, Embedding, Image2Text, TTS,
Speech2Text, Rerank

### Changes

| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Add `FuturMix` to `SupportedLiteLLMProvider`
enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `FuturMixChat(Base)` — follows
Astraflow/Avian pattern |
| `rag/llm/embedding_model.py` | Add `FuturMixEmbed(OpenAIEmbed)` —
follows Astraflow pattern |
| `rag/llm/cv_model.py` | Add `FuturMixCV(GptV4)` — follows
SILICONFLOW/OpenRouter pattern |
| `rag/llm/tts_model.py` | Add `FuturMixTTS(OpenAITTS)` — follows
CometAPI/DeerAPI pattern |
| `rag/llm/sequence2txt_model.py` | Add `FuturMixSeq2txt(GPTSeq2txt)` —
follows StepFun pattern |
| `rag/llm/rerank_model.py` | Add `FuturMixRerank(OpenAI_APIRerank)` |
| `conf/llm_factories.json` | Add factory config with 8 chat, 2
embedding, 1 image2text, 2 TTS, 1 speech2text models |
| `docs/guides/models/supported_models.mdx` | Add FuturMix to supported
models table |

### Models included

- **Chat**: claude-sonnet-4-20250514, claude-3.5-haiku, gpt-4o,
gpt-4o-mini, gemini-2.5-flash, gemini-2.0-flash, deepseek-chat,
deepseek-reasoner
- **Embedding**: text-embedding-3-small, text-embedding-3-large
- **Image2Text**: gpt-4o
- **TTS**: tts-1, tts-1-hd
- **Speech2Text**: whisper-1

## Test plan

- [ ] Verify FuturMix appears in the model provider list in RAGFlow UI
- [ ] Configure FuturMix with API key and test chat completion
- [ ] Test embedding model with document indexing
- [ ] Test image2text with a sample image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-30 10:59:37 +08:00
Stephen Hu
345bec812d refactor: improve QwenRerank logic (#14388)
### What problem does this PR solve?

improve QwenRerank logic

### Type of change

- [x] Refactoring
2026-04-28 20:17:34 +08:00
buua436
e6e80041f5 Fix: agent toolcall null response & schema validation & DeepSeek think history (#14425)
### What problem does this PR solve?
agent toolcall null response & schema validation & DeepSeek think
history

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 17:09:08 +08:00
wdeveloper16
78188ce9e9 Feat: add OpenDataLoader PDF parser backend (#14058) (#14097)
### What problem does this PR solve?

Closes #14058.

RAGFlow supports multiple PDF parsing backends (DeepDOC, MinerU,
Docling, TCADP, PaddleOCR). This PR adds **OpenDataLoader**
([opendataloader-project/opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf))
as a new optional backend, giving users a deterministic, local-first
alternative with competitive table extraction accuracy.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---

### Changes

#### Backend
- `deepdoc/parser/opendataloader_parser.py` — new `OpenDataLoaderParser`
class inheriting `RAGFlowPdfParser`. Implements `check_installation()`
(guards Python package + Java 11+ runtime), `parse_pdf()` with
JSON-first extraction (heading/paragraph/table/list/image/formula) and
Markdown fallback, position-tag generation compatible with the shared
`@@page\tx0\tx1\ty0\ty1##` format, and temp-dir lifecycle with cleanup.
- `rag/app/naive.py` — new `by_opendataloader()` wrapper, registered in
`PARSERS` dict, added to `chunk_token_num=0` override list.
- `rag/flow/parser/parser.py` — `"opendataloader"` branch in the
pipeline PDF handler + check validation list.

#### Infrastructure
- `docker/entrypoint.sh` — `ensure_opendataloader()` function: opt-in
via `USE_OPENDATALOADER=true`, skips gracefully if Java is not on PATH.

#### Frontend
- `web/src/components/layout-recognize-form-field.tsx` —
`OpenDataLoader` added to `ParseDocumentType` enum and parser dropdown.
Cascades automatically to the pipeline editor's Parser component.

#### Docs
- `docs/guides/dataset/select_pdf_parser.md` — added OpenDataLoader
entry and full env-var reference.

---

### Environment variables

| Variable | Default | Description |
|---|---|---|
| `USE_OPENDATALOADER` | `false` | Set `true` to install
`opendataloader-pdf` on container startup |
| `OPENDATALOADER_VERSION` | latest | Pin the PyPI release (e.g.
`==2.2.1`) |
| `OPENDATALOADER_HYBRID` | _(unset)_ | Enable hybrid AI mode (e.g.
`docling-fast`) |
| `OPENDATALOADER_IMAGE_OUTPUT` | _(unset)_ | `off` / `embedded` /
`external` |
| `OPENDATALOADER_OUTPUT_DIR` | _(tmp)_ | Persistent output dir; temp
dir used + cleaned if unset |
| `OPENDATALOADER_DELETE_OUTPUT` | `1` | `0` to retain intermediate
files for debugging |
| `OPENDATALOADER_SANITIZE` | _(unset)_ | `1` to filter prompt-injection
patterns from output |

---

### Dependencies

- **Runtime**: `opendataloader-pdf` (PyPI, Apache 2.0) — opt-in, not
added to `pyproject.toml` core deps. Installed by
`ensure_opendataloader()` at container startup when
`USE_OPENDATALOADER=true`.
- **System**: Java 11+ on PATH (JVM is the underlying engine). The
installer skips with a warning if `java` is not found.

---

### How to test

**Standalone parser:**
```bash
source .venv/bin/activate
uv pip install opendataloader-pdf
python3 -c "
import sys; sys.path.insert(0, '.')
from deepdoc.parser.opendataloader_parser import OpenDataLoaderParser
p = OpenDataLoaderParser()
print('available:', p.check_installation())
s, t = p.parse_pdf('path/to/test.pdf', parse_method='pipeline')
print(f'sections={len(s)} tables={len(t)}')
"

```
### Benchmark vs Docling
```
file                      parser            secs  sections  tables
----------------------------------------------------------------------
text-heavy.pdf            docling           45.29       148      10
text-heavy.pdf            opendataloader     3.14       559       0
table-heavy.pdf           docling           7.05        76       3
table-heavy.pdf           opendataloader     3.71        90       0
complex.pdf               docling            42.67       114       8
complex.pdf               opendataloader     3.51       180       0
```
2026-04-25 00:33:02 +08:00
qinling0210
1473000135 Implement retrieval_test in GO (#14231)
### What problem does this PR solve?

Implement retrieval_test in GO

### Type of change

- [x] Refactoring
2026-04-24 15:30:14 +08:00
ucloudnb666
f853a39b40 feat: Add Astraflow provider support (global + China endpoints) (#14270)
## Add Astraflow Provider Support

This PR integrates [Astraflow](https://astraflow.ucloud.cn/) (by UCloud
/ 优刻得) as a new AI model provider in RAGFlow, with support for both
global and China endpoints.

### About Astraflow
Astraflow is an OpenAI-compatible AI model aggregation platform
supporting 200+ models from major providers including DeepSeek, Qwen,
GPT, Claude, Gemini, Llama, Mistral, and more.

| Variant | Factory Name | Endpoint | Env Var |
|---------|-------------|----------|---------|
| Global | `Astraflow` | `https://api-us-ca.umodelverse.ai/v1` |
`ASTRAFLOW_API_KEY` |
| China | `Astraflow-CN` | `https://api.modelverse.cn/v1` |
`ASTRAFLOW_CN_API_KEY` |

- **API key signup**: https://astraflow.ucloud.cn/

---

### Files Changed

| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Register `Astraflow` and `Astraflow-CN` in
`SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and
`LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `AstraflowChat` and `AstraflowCNChat`
(OpenAI-compatible `Base` subclass) |
| `rag/llm/embedding_model.py` | Add `AstraflowEmbed` and
`AstraflowCNEmbed` (subclasses of `OpenAIEmbed`) |
| `rag/llm/rerank_model.py` | Add `AstraflowRerank` and
`AstraflowCNRerank` (subclasses of `OpenAI_APIRerank`) |
| `rag/llm/cv_model.py` | Add `AstraflowCV` and `AstraflowCNCV`
(subclasses of `GptV4`) |
| `rag/llm/tts_model.py` | Add `AstraflowTTS` and `AstraflowCNTTS`
(subclasses of `OpenAITTS`) |
| `rag/llm/sequence2txt_model.py` | Add `AstraflowSeq2txt` and
`AstraflowCNSeq2txt` (subclasses of `GPTSeq2txt`) |
| `conf/llm_factories.json` | Register `Astraflow` and `Astraflow-CN`
factories with a curated list of popular models |

---

### Supported Model Types
-  **Chat / LLM** — DeepSeek-V3/R1, Qwen3, GPT-4o/4.1, Claude 3.5/3.7,
Gemini 2.0/2.5 Flash, Llama 3.3/4, Mistral, and 200+ more
-  **Text Embedding** — text-embedding-3-small/large
-  **Image / Vision (IMAGE2TEXT)** — GPT-4o, GPT-4.1, Claude, Gemini,
Llama-4, etc.
-  **Text Re-Rank**
-  **TTS** — tts-1
-  **Speech-to-Text (SPEECH2TEXT)** — whisper-1

### Implementation Notes
- Uses the `openai/` LiteLLM prefix — consistent with other
OpenAI-compatible aggregation platforms (SILICONFLOW, DeerAPI, CometAPI,
OpenRouter, n1n, Avian, etc.)
- `Astraflow` (global, rank 250) and `Astraflow-CN` (China, rank 249)
are separate factory entries, allowing users to choose the optimal
endpoint based on their region.
- All model classes cleanly subclass existing base classes (`Base`,
`OpenAIEmbed`, `OpenAI_APIRerank`, `GptV4`, `OpenAITTS`, `GPTSeq2txt`)
with no custom logic needed — the provider is fully OpenAI-compatible.

---------

Co-authored-by: user <user@xzaaaMacBook-Air.local>
2026-04-22 15:38:34 +08:00