Commit Graph

509 Commits

Author SHA1 Message Date
buua436
6bf7056422 feat: add placeholder model metas (#15753)
### What problem does this PR solve?

add placeholder model metas

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 14:54:59 +08:00
cleanjunc
38f9ea5fec fix(rerank): normalize reranker scores onto a single scale before hybrid blend (#15429)
### What problem does this PR solve?

Closes #15428

The hybrid score in `rag/nlp/search.py` (`rerank_by_model`) blends
reranker similarity with token similarity on a fixed `[0, 1]` scale:

```python
return tkweight * np.array(tksim) + vtweight * vtsim + rank_fea  # tkweight=0.3, vtweight=0.7
```

The reranker implementations did not agree on that scale. Only three of
roughly 17 providers normalized their output, and `NvidiaRerank`
returned raw, unbounded logits. Weighted at `0.7`, a negative logit
could push a genuinely relevant chunk below pure keyword matches, and
its magnitude swamped `tksim`, which lives in `[0, 1]`. The practical
effect was that the same query produced differently scaled scores
depending on the configured reranker, and logit based providers degraded
retrieval quality instead of improving it.

This PR enforces a single scoring contract in one place:

- `Base.similarity` is now the only public entry point. It
short-circuits empty input and guarantees a normalized result. Each
provider implements its raw scoring in `_compute_rank`, which removes
sixteen duplicated empty input guards and the three scattered
normalization calls.
- Normalization is range aware. Providers that already return calibrated
`[0, 1]` relevance scores (Cohere, Jina, Voyage, and others) keep their
absolute magnitudes, so `similarity_threshold` filtering and the
reported `vector_similarity` stay meaningful. Only out-of-range output
such as NVIDIA logits is min-max rescaled into `[0, 1]`.
- The twelve leftover `[DEBUG ...]` prints in `rerank_by_model`,
introduced in #14231, are removed. They ran on every retrieval, added
per chunk overhead, and leaked queries, keywords, and document content
to stdout and logs.

A new regression suite in
`test/unit_test/rag/llm/test_rerank_normalization.py` covers logit
rescaling (positive, negative, and flat batches), preservation of
already calibrated scores, ordering, empty input handling, and the per
provider HTTP path. It also asserts that no provider overrides
`similarity()`, so the contract cannot silently drift.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 11:53:22 +08:00
Lynn
b05d5a5228 Feat: get model list from remote (#15711)
### What problem does this PR solve?

Feat:
- Get model list from remote provider. 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-08 11:02:40 +08:00
Lynn
794c1f4b25 Fix: volc engine and other json key factories (#15653)
### What problem does this PR solve?

Fix:
- VolcEngine adapt to new api_key format
- Save dict api_key as json

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-05 09:45:44 +08:00
Wang Qi
d41373cfa9 Feature: Add the new anthropic and voyage models (#15516)
add the newanthropic and voyage models. Strip opus 4.7 and 4.8 of
certain usnspported keys

Co-authored-by: Idriss Sbaaoui <112825897+6ba3i@users.noreply.github.com>
2026-06-02 17:29:18 +08:00
Aeovy
600590cd18 Fix: disable thinking to avoid potential infinite loops in Qwen3.5/Qwen3.6 models (#15101)
### What problem does this PR solve?

This PR fixes the issue where Qwen3.5/Qwen3.6 series models may spend
excessive time on simple document-parsing tasks, such as Auto Metadata
extraction, keyword extraction, question generation, and image
description when using the MinerU parser.

For these tasks, Qwen3.5/Qwen3.6 models may perform unnecessary
reasoning by default, which can lead to very long response times, high
token consumption, and, in some cases, potential infinite output loops.

Since Qwen3.5/Qwen3.6 multimodal models are instantiated as `CvModel`
when configured as `image2text`, the existing `enable_thinking=False`
logic in `chat_model.py` does not apply to them. This PR adds the
corresponding handling for the CV/image-to-text model path as well.

This helps reduce unnecessary thinking time, avoid potential infinite
loops, and improve parsing efficiency without noticeably affecting
output quality for these simple extraction and image-description tasks.

Fixes #15083.
2026-06-02 13:21:35 +08:00
nickmopen
bebf6ed244 fix(llm): strip non-generation keys from gen_conf for LiteLLM providers (#15427) (#15432)
### What problem does this PR solve?

Fixes #15427.

All LiteLLM-routed chats fail with:

- Anthropic: `litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error","message":"model_type: Extra inputs are
not permitted"}`
- OpenAI: `litellm.BadRequestError: OpenAIException - Unknown parameter:
'model_type'`

This is a regression from v0.25.4.

#### Root cause

A chat assistant's `llm_setting` is forwarded to the model as
`gen_conf`. `llm_setting` can legitimately carry RAGFlow-internal
metadata such as `model_type` (the chat REST APIs in
`api/apps/restful_apis/` read it back out of `llm_setting`), so that key
ends up inside `gen_conf`.

`Base._clean_conf` (OpenAI-compatible providers) already **whitelists**
the keys it forwards, so direct-OpenAI providers were unaffected.
`LiteLLMBase._clean_conf` only dropped `max_tokens` and passed
everything else straight through to `litellm.acompletion`, which
forwarded `model_type` to the upstream provider — and Anthropic / OpenAI
reject it. Because both Claude and GPT route through LiteLLM, every chat
broke.

#### Fix

- Extract the allowed-key set into a shared `ALLOWED_GEN_CONF_KEYS`
constant and reuse it in `Base._clean_conf`.
- Apply the same whitelist in `LiteLLMBase._clean_conf`, plus the
LiteLLM-specific reasoning params (`thinking`, `reasoning_effort`,
`extra_body`) that the model-family policies inject for reasoning
models.

This covers all four LiteLLM completion paths (`async_chat`,
`async_chat_streamly`, `async_chat_with_tools`,
`async_chat_streamly_with_tools`), since they all route through
`_clean_conf`.

#### Tests

Adds `test/unit_test/rag/llm/test_clean_conf_whitelist.py` covering both
backends: `model_type` (and other stray keys) are dropped, genuine
generation params and `thinking` survive, `max_tokens` is removed, and
the whitelist invariants hold.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Added test cases
2026-06-02 10:04:11 +08:00
Wang Qi
1a6df01b53 Bug fix: Enhance embeding model to give better error message (#15346)
To resolve https://github.com/infiniflow/ragflow/issues/15343 enhance
the model embedding message to give extact failure message to customer.


# QWen

## Retrieval
<img width="3321" height="1033" alt="image"
src="https://github.com/user-attachments/assets/6b82921a-a3a7-4a33-a383-1cf316398ee2"
/>

## Chat
<img width="2241" height="311" alt="image"
src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716"
/>


# SiliconFlow
## Retrieval
<img width="3321" height="1033" alt="image"
src="https://github.com/user-attachments/assets/ee2cd191-a27d-4729-b53d-2fbdb4e352cd"
/>

## Chat
<img width="1562" height="210" alt="image"
src="https://github.com/user-attachments/assets/10376a8e-a3f4-422f-bc2e-96f2a8a96448"
/>

# Baichuan
## Retrieval
<img width="3321" height="1107" alt="image"
src="https://github.com/user-attachments/assets/dcb5409d-f7fc-4804-b186-5e1ee11e09c4"
/>

## Chat
<img width="2241" height="311" alt="image"
src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716"
/>


# Zhipu
zhipu is good.
2026-06-01 19:18:16 +08:00
呆萌闷油瓶
658ff06ca4 feat: add 4 new models for siliconflow (#15383)
### What problem does this PR solve?

Added 4 new models:
deepseek-ai/DeepSeek-V4-Pro
deepseek-ai/DeepSeek-V4-Flash
Pro/moonshotai/Kimi-K2.6
Pro/zai-org/GLM-5.1

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-29 19:28:29 +08:00
wdeveloper16
4b36801b53 fix: resolve asyncio correctness issues (fire-and-forget tasks, event loop nesting) (#14761)
## Summary

Fixes the confirmed asyncio anti-patterns from #14755. Only the three
verified bugs are addressed; patterns already correctly using
`asyncio.new_event_loop()` in a fresh thread are left untouched.

### Changes

**`api/apps/restful_apis/tenant_api.py` — fire-and-forget
`send_invite_email`**

`asyncio.create_task()` was called without storing the `Task` reference.
CPython's GC can collect an unfinished task, silently cancelling it and
swallowing exceptions. Fixed by storing the task in a module-level
`_background_tasks: set[Task]` with a `done_callback` to discard it on
completion — the standard Python idiom for safe background tasks.

**`api/apps/restful_apis/agent_api.py` — fire-and-forget
`background_run`**

Same root cause in the webhook "Immediately" execution path. Same fix
applied.

**`rag/llm/chat_model.py` (`LocalLLM._stream_response`) —
`asyncio.get_event_loop()` on running loop**

`asyncio.get_event_loop()` returns Quart's running event loop when
called from an async context.
Calling `loop.run_until_complete()` on it raises `RuntimeError`.
Replaced with `asyncio.new_event_loop()` so the generator
uses a dedicated fresh loop, closed in a `finally` block.

## What was NOT changed

- `llm_service._sync_from_async_stream` and
`evaluation_service._sync_from_async_gen`: both already correctly use
`asyncio.new_event_loop()` inside a fresh thread.
- `llm_service._run_coroutine_sync`: only caller is `rag/app/resume.py`
(sync context), so `thread.join()` is correct there.
- `requests` in agent tools: sync methods dispatched through thread
pools; httpx migration is a separate, larger refactor.

## Test plan

- [ ] Invite a team member and confirm the email is sent with no task
warnings in logs.
- [ ] Trigger a webhook agent in "Immediately" mode; confirm canvas
state is persisted after background run.
- [ ] Verify `LocalLLM` (Jina backend) chat and streaming work
end-to-end.

Closes #14755

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-05-25 22:45:40 +08:00
Jonathan Hill
111cdc77b5 fix: guard LLM response against empty choices (fixes #14711) (#14988)
## Summary

Fixes 10 unguarded `response.choices[0]` accesses that cause
`IndexError` or `AttributeError` when the LLM returns an empty `choices`
list — the scenario described in #14711.

- `rag/llm/cv_model.py`
- `rag/llm/chat_model.py`

Each access site is now guarded with:
```python
if not response.choices:
    raise ValueError("LLM returned empty response")
```

## Verification

Detected and verified by [pact](https://github.com/qizwiz/pact) — a
sheaf-cohomological LLM contract checker using Z3 as a local theory
solver.

**pact sheaf-cohomological proof status after fix:**

| File | Ȟ¹ (after) | Z3 |
|------|-----------|-----|
| `rag/llm/cv_model.py` | 0 | UNSAT ✓ |
| `rag/llm/chat_model.py` | 0 | UNSAT ✓ |

All access sites proven safe (Z3 UNSAT certificate).

The checker was also used to verify the autogen streaming-None fix in
[microsoft/autogen#7711](https://github.com/microsoft/autogen/pull/7711).

## Test plan
- [ ] Existing test suite passes
- [ ] Manually test with a provider that returns empty `choices` under
load (e.g. Vertex AI)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Jonathan Hill <jonathan.f.hill@gmail.com>
2026-05-21 15:37:19 +08:00
Kevin Hu
e7544562cc Feat: @tool decorator for chat-model tool registration (#15047)
## Summary

- Adds a lightweight `@tool` decorator and `FunctionToolSession` adapter
in `rag/llm/tool_decorator.py` that let callers register plain Python
functions as LLM tools without hand-writing OpenAI function schemas or
building an MCP-style session.
- Refactors `Base.bind_tools` and `LiteLLMBase.bind_tools` in
`rag/llm/chat_model.py` to accept either the new decorator form
`bind_tools(tools=[fn1, fn2])` or the existing `(toolcall_session,
tools_schemas)` form, so existing agent/dialog call-sites in
`agent/component/agent_with_tools.py`, `api/db/services/llm_service.py`,
and `api/db/services/dialog_service.py` are unaffected.
- Adds 8 unit tests in `test/unit_test/rag/llm/test_tool_decorator.py`
covering schema shape, required/optional inference, sync + async
dispatch, and bad-input rejection.

## Usage

```python
from rag.llm.tool_decorator import tool

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city.

    :param city: City name to look up.
    """
    return f"{city}: 21 C, partly cloudy"

chat_mdl.bind_tools(tools=[get_weather])
ans, tk = await chat_mdl.async_chat_with_tools(system, history)
```

The decorator introspects `inspect.signature` + type hints + the
docstring (`:param name:` style) and attaches an OpenAI-format
`openai_schema` to the callable. `FunctionToolSession` duck-types the
existing `ToolCallSession` protocol, dispatching async callables
directly and sync ones through `thread_pool_exec` so the event loop is
never blocked.

## Design notes

- `tool_decorator.py` deliberately does **not** live inside
`rag/llm/__init__.py` to avoid forcing every consumer through the heavy
provider auto-discovery loop and to sidestep a circular import
(`__init__.py` imports `chat_model`, which would otherwise need symbols
from `__init__.py`).
- `FunctionToolSession` is duck-typed against
`common.mcp_tool_call_conn.ToolCallSession` rather than explicitly
inheriting from it, so importing the decorator doesn't pull the MCP
client SDK into the import graph.
- Docstring parsing is intentionally minimal (`:param name:` only) to
keep this dependency-free; Google/NumPy styles can be added later via
`docstring_parser` if needed.

## Test plan

- [x] `python -m pytest test/unit_test/rag/llm/test_tool_decorator.py
-v` — 8 passed
- [x] `python -m pytest test/unit_test/rag/llm/
--ignore=test/unit_test/rag/llm/test_perplexity_embed.py` — 11 passed
(the ignored test has a pre-existing `numpy` import that's unrelated)
- [ ] Reviewer: smoke-test the new path end-to-end with a live model via
`chat_mdl.bind_tools(tools=[my_fn])` to confirm the OpenAI-format
schemas pass through unchanged

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 15:32:17 +08:00
Magicbook1108
b28e134944 Feat: add local & ssh provider in admin panel (#15039)
### What problem does this PR solve?

Feat: add local & ssh provider in admin panel

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-20 16:56:20 +08:00
wdeveloper16
14c0985182 feat: bump Python minimum from 3.12 to 3.13, drop strenum backport (#14767)
Closes #14753

## What changed

| File | Change |
|---|---|
| `pyproject.toml` | `requires-python` → `>=3.13,<3.15`; remove
`strenum==0.4.15` |
| `Dockerfile` | `uv python install 3.13`, `uv sync --python 3.13` |
| `.github/workflows/tests.yml` | `uv sync --python 3.13` on both matrix
legs |
| `CLAUDE.md` | dev setup command + requirements note updated |
| `deepdoc/parser/mineru_parser.py` | `from strenum import StrEnum` →
`from enum import StrEnum` |
| `agent/tools/code_exec.py` | same |

`StrEnum` has been in the stdlib since Python 3.11 — the `strenum`
backport package is no longer needed once the floor is 3.13.

## Why uv.lock is not regenerated

`uv lock --python 3.13` fails because:

1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0`
2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels)
depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0`
3. These two constraints are irreconcilable on Python 3.13

The lockfile regeneration requires loosening the `numpy` upper bound in
the `infiniflow/graspologic` fork. Once that fork commit is updated and
the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will
succeed.

## RFC corrections

Two claims in the original RFC (#14753) did not hold up under code
review:

- **"graspologic hard-blocks 3.13"** — the infiniflow fork at the pinned
commit has no `<3.13` Python constraint. The blocker is the transitive
`numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a
direct Python version cap.
- **"free-threading throughput gains for I/O-bound workload"** — Python
3.13 free-threading requires a special `--disable-gil` build and
provides no benefit for async I/O code (the GIL is already released
during I/O). The real motivation is forward compatibility and improved
error messages.
2026-05-15 14:40:53 +08:00
sham-sr
ef2969a462 fix(llm): Tongyi-Qianwen embeddings use correct DashScope native API for intl URLs (#14784)
## Summary
- Fixes **Tongyi-Qianwen** (`QWenEmbed`) text embeddings when the
configured `base_url` points at DashScope **international**
(`dashscope-intl.aliyuncs.com`) or **China** (`dashscope.aliyuncs.com`)
hosts, including values copied from Model Studio that use the
**OpenAI-compatible** path (`.../compatible-mode/v1`).
- The `dashscope` Python SDK (`TextEmbedding.call`) expects the
**native** HTTP root (`https://<host>/api/v1`), not the
OpenAI-compatible base URL. Without mapping, international accounts
could hit the wrong host or path.

## Implementation
- Added `_dashscope_native_http_api_url()` to normalize known DashScope
hosts to `.../api/v1`, and wired `QWenEmbed` to set
`dashscope.base_http_api_url` before each embedding call (document and
query).

## Notes
- In-code comments document the Tongyi-Qianwen / DashScope intl vs CN
behavior for future maintainers.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 10:07:48 +08:00
07heco
8dc5b1b42d fix: optimize reranking module robustness and bug fixes (#14264)
## Description
This PR fixes critical bugs and improves the robustness of the RAG
reranking module while maintaining **100% backward compatibility** with
all existing functionality and providers.

## Key Changes
1. **Network Stability**: Added 30s timeout to all API requests to
prevent service blocking
2. **Boundary Protection**: Added empty query/text validation for all
rerank models
3. **Response Fault Tolerance**: Replaced hardcoded key access with
`.get()` to avoid KeyError crashes
4. **Bug Fixes**:
   - Fixed `Ai302Rerank` (completely non-functional before)
   - Fixed `GPUStackRerank` incorrect exception catching
   - Fixed `_normalize_rank` empty array crash
5. **Code Specification**: Added type annotations, standardized
unimplemented class prompts

## Compatibility
-  No changes to any class/method names
-  All rerank providers (Jina/Cohere/NVIDIA/HuggingFace etc.) work as
before
-  No breaking changes, zero impact on existing workflows

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-14 11:56:09 +08:00
07heco
e46989832e fix: complete robustness fixes for rerank module addressing all review comments (#14265)
## Summary
This PR fully addresses all CodeRabbit review feedback and enhances the
robustness of the reranking module with 100% backward compatibility.

## Key Fixes
1. Fixed JinaRerank hardcoded base_url to support subclass endpoint
overrides
2. Corrected GPUStackRerank exception handling to use proper requests
exceptions and preserve stack traces
3. Added 30s timeout to all API calls to prevent service hanging
4. Added empty input validation for all rerank providers
5. Replaced direct dict key access with .get() to eliminate KeyError
crashes
6. Fixed _normalize_rank edge case for empty arrays
7. Implemented missing functionality for Ai302Rerank
8. Standardized type hints and fixed typo issues

## Compatibility
- No breaking changes to any existing functionality
- All rerank providers work as originally intended
- Fully compatible with existing configurations and workflows

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-11 12:40:41 +08:00
Ricardo-M-L
13922209e6 fix(llm): add timeout to HTTP requests in LLM integration layer (#14313)
### What problem does this PR solve?

Multiple `requests.post()` calls across the LLM integration layer lack a
`timeout` parameter. Without a timeout, a single unresponsive upstream
service can block the calling thread **indefinitely**, eventually
exhausting the thread pool and degrading the entire system.

This is a well-known issue — Python's `requests` library defaults to
`timeout=None` (infinite wait), and [the library docs explicitly
recommend](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts)
always setting a timeout.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Change

Added `timeout` to all `requests.post()` calls missing it:

| File | Calls fixed | Timeout |
|------|-------------|---------|
| `rag/llm/rerank_model.py` | 9 | 30s |
| `rag/llm/embedding_model.py` | 8 | 30s |
| `rag/llm/cv_model.py` | 3 | 60s |
| `rag/llm/tts_model.py` | 2 | 60s |
| `rag/llm/sequence2txt_model.py` | 2 | 60s |

Embedding/rerank calls use 30s (lightweight API calls). Vision, TTS, and
audio transcription use 60s (heavier workloads with file uploads).

Note: other files in the codebase (e.g. `check_minio_alive`,
`check_ragflow_server_alive`) already use `timeout=10`, so this PR
brings the LLM layer in line with existing practice.

Signed-off-by: Ricardo-M-L <Sibyl_Hartmanbnb@webname.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-11 11:19:07 +08:00
VincentLambert
08bb53bbb1 Feat: add BedrockCV for vision/image2text inference via LiteLLM (#14705)
## Summary

- `CvModel["Bedrock"]` was absent from `rag/llm/cv_model.py`, causing
`model_instance()` to return `None` when a Bedrock model was used as a
PDF parser — even after correct model resolution.
- This PR adds `BedrockCV`, enabling Bedrock vision models (e.g.
`amazon.nova-pro-v1:0`, `anthropic.claude-3-5-sonnet`) to be used as PDF
parsers.

## What problem does this PR solve?

When a Bedrock model is selected as the PDF parser in a knowledge base,
ingestion failed with:

```
'LiteLLMBase' object has no attribute 'describe_with_prompt'
```

The root cause: `LiteLLMBase` (the Bedrock chat implementation) was the
only registered handler for the Bedrock factory. It does not implement
`describe_with_prompt`. `CvModel` had no Bedrock entry, so
`model_instance()` returned `None` for `image2text` requests.

## Type of change

- [x] New Feature (non-breaking change which adds functionality)

## Changes

**`rag/llm/cv_model.py`**

Adds `BedrockCV(Base)` with `_FACTORY_NAME = "Bedrock"`:

- Uses `litellm.completion` with the `bedrock/` prefix (consistent with
`LiteLLMBase`)
- Parses AWS credentials from the JSON key assembled by `add_llm`
(`auth_mode`, `bedrock_ak`, `bedrock_sk`, `bedrock_region`,
`aws_role_arn`)
- Supports three auth modes: `access_key_secret`, `iam_role` (via STS
`assume_role`), and default credential chain (IRSA, instance profile)
- Implements `describe_with_prompt` and `describe`

## Test plan

- [ ] Configure a Bedrock vision model (e.g. `amazon.nova-pro-v1:0`)
with valid AWS credentials
- [ ] Select it as PDF parser in a knowledge base
- [ ] Verify ingestion of a PDF document completes without errors
- [ ] Verify `CvModel["Bedrock"]` resolves to `BedrockCV`

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 10:29:58 +08:00
Igor Ilinskii
889aba6a32 fix base_url handling in HuggingfaceRerank (#14555)
### What problem does this PR solve?

HuggingfaceRerank.post() unconditionally prepends `http://` to base_url,
which already contains a protocol. This creates invalid URLs like
http://http://127.0.0.1:8080/rerank, breaking all requests. The fix
normalizes URL handling to match the rest of the codebase, removing
redunant `http://`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Related Issues
- #7318 
- #7796

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-05-11 10:04:40 +08:00
Ricardo-M-L
1046042e01 fix(llm): replace mutable default gen_conf={} with None + defensive copy (#14566)
### What

19 methods across `rag/llm/chat_model.py` and `rag/llm/cv_model.py`
declare `gen_conf={}` (or `gen_conf: dict = {}`) as a parameter default
and then mutate `gen_conf` in place — typically `del
gen_conf["max_tokens"]`, `gen_conf["penalty_score"] = ...`, or
`gen_conf.pop(...)` as part of provider-specific normalization.

### The two bugs in this pattern

**1. Mutable default argument (Python footgun).** Python evaluates
default values **once** at function-definition time, so the single `{}`
dict is *shared* across every caller that doesn't pass `gen_conf`. The
first such call's mutations leak into the default seen by every
subsequent call.

```python
# Before
def chat_streamly(self, system, history, gen_conf={}, **kwargs):
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]   # mutates the SHARED default dict
    ...
```

After call N with `max_tokens` set, call N+1 that omits `gen_conf` no
longer sees `max_tokens` — even though the caller never touched it.

**2. Caller-dict pollution.** When the caller *does* pass a `gen_conf`
dict, the same in-place mutations modify the caller's dict. A reused
`gen_conf` (very common for chat-loop callers that build the config once
and pass it on every turn) silently loses `max_tokens`,
`presence_penalty`, etc. after the first round.

### The fix

In every affected method:

- Change `gen_conf={}` (or `gen_conf: dict = {}`) → `gen_conf=None`.
- Add `gen_conf = dict(gen_conf or {})` as the first statement of the
body so all subsequent mutations operate on a fresh local copy.

```python
# After
def chat_streamly(self, system, history, gen_conf=None, **kwargs):
    gen_conf = dict(gen_conf or {})
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]   # local copy — safe
    ...
```

This is byte-for-byte identical provider-side behavior for callers that
already pass a fresh `gen_conf` per call. The new `dict(...)` copy is
O(small constant) per call.

### Files changed

- `rag/llm/chat_model.py` — 17 methods
- `rag/llm/cv_model.py` — 2 methods

### Tests

Adds `test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py` — an
`ast`-based regression guard that walks both modules and asserts no
parameter named `gen_conf` ever has a mutable literal (`{}` or `[]`) as
its default. The test caught **five additional `gen_conf: dict = {}`
sites** that an initial `gen_conf={}` text grep had missed (annotated
parameters with whitespace), and would fail again if the pattern is ever
reintroduced.

```
$ pytest test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py -v
============================== 3 passed in 0.04s ===============================
```

`ruff check` passes on all touched files.

### Notes

- This PR is intentionally focused on **just** the `gen_conf` default +
copy fix. There's a related (but separate) `history.insert(0, ...)`
pattern in the same files that mutates the caller's history list in 12
places — left for a follow-up so this PR stays mechanical and easy to
review.

### Latest revision (`700bb54a7`) — addresses CodeRabbit review

- Type annotation: `gen_conf: dict = None` → `gen_conf: dict | None =
None` (5 occurrences in `chat_model.py`). The old annotation was a
static-checker mismatch since `None` isn't a `dict`.
- Regression test: the AST check accessed `default.keys` directly.
`ast.List` has no `.keys` attribute — a future `gen_conf=[]` would crash
with `AttributeError` instead of being caught. Use `getattr` for both
`.keys` (Dict) and `.elts` (List). Manually verified the updated check
correctly catches both `gen_conf={}` and `gen_conf=[]` while ignoring
`gen_conf=None` and non-empty literals.

---------

Co-authored-by: Ricardo <ricardo@example.com>
2026-05-09 13:11:44 +08:00
Zhichang Yu
86fe78c73f feat(llm): add MiniMax GroupId header support (#14610)
## Summary
- Add MiniMax provider GroupId query parameter support in `LiteLLMBase`
- Extract `group_id` from key configuration in `__init__`
- Append `GroupId` as query parameter to `api_base` in
`_construct_complete_args`

## Why this change is needed

MiniMax provides an OpenAI-compatible API endpoint
(`/v1/chat/completions`), but `GroupId` is a MiniMax-specific account
identifier required for billing and rate limiting - it is not part of
the OpenAI standard.

Looking at LiteLLM's `MinimaxChatConfig`:
- `get_complete_url()` only constructs the base URL (e.g.,
`https://api.minimaxi.com/v1/chat/completions`)
- LiteLLM does **not** automatically inject `GroupId` into requests
- This must be handled by the caller (ragflow's chat_model.py)

The implementation appends `GroupId` as a query parameter to `api_base`:
```python
api_base = completion_args.get("api_base", self.base_url)
separator = "&" if "?" in api_base else "?"
completion_args["api_base"] = f"{api_base}{separator}GroupId={self.group_id}"
```

This matches MiniMax's official API format (as documented by
LlamaFactory):
```bash
curl --location 'https://api.minimaxi.chat/v1/text/chatcompletion?GroupId=你的GroupId' \
  --header 'Authorization: Bearer 你的API_Key'
```

## Test plan
- [ ] Verify MiniMax API calls work with GroupId query parameter
- [ ] Verify backward compatibility for other providers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 11:54:49 +08:00
euvre
8269fa01b4 Fix AttributeError when appending non-streaming tool calls to chat history in Agentic Agent (#14456)
### What problem does this PR solve?

Fix #14340 

## Problem Description

When using an **Agentic Agent** (not Workflow) with one or more
Retrieval tools (e.g., Dataset Retrieval + Memory Retrieval), the agent
silently returns an empty response (`agent_response: ""`) after hanging
for several minutes. The server logs show:

```
AttributeError: 'ChatCompletionMessageToolCall' object has no attribute 'index'
```

This error propagates as a `GENERIC_ERROR`, causing the canvas to return
an empty response. The subsequent Memory save task then receives the
empty `agent_response` and logs:

```
Document for referred_document_id XXXX not found
```

## Reproduction Steps

1. Set `DOC_ENGINE=infinity` (or `elasticsearch` — the engine itself is
not the root cause).
2. Create a blank **Agentic Agent** (not a Workflow).
3. Add **two Retrieval tools** to the Agent node:
   - `Retrieval_DS` → Dataset (Knowledge Base)
   - `Retrieval_Mem` → Memory component
4. Add a **Message** node with **Save to Memory** enabled.
5. Launch the agent and send any message (e.g., "hola").
6. The agent hangs and returns an empty response.

## Root Cause Analysis

The crash occurs in `_append_history` and `_append_history_batch` inside
`rag/llm/chat_model.py`. These methods directly access `.index` on tool
call objects:

```python
# _append_history_batch
{
    "index": tc.index,   # <-- crashes here
    ...
}
```

However, **non-streaming** LLM responses (`stream=False`) return
`ChatCompletionMessageToolCall` objects, which **do not have an `index`
field** according to the OpenAI API specification. The `index` field
only exists on `ChoiceDeltaToolCall` objects returned in **streaming**
responses (`stream=True`).

When the agentic agent triggers an internal `full_question` call (used
to compress multi-turn conversation history), the request is incorrectly
routed through `async_chat_with_tools` because `is_tools=True` is set at
the `LLMBundle` level. If the LLM decides to emit `tool_calls` during
this auxiliary request, the code enters the non-streaming tool loop and
crashes when trying to append history.

## Fix

Replaced all direct `.index` accesses with `getattr(..., "index", None)`
for safe, backward-compatible access:

| Method | File | Line | Change |
|--------|------|------|--------|
| `_append_history` | `rag/llm/chat_model.py` | ~L304 |
`tool_call.index` → `getattr(tool_call, "index", None)` |
| `_append_history_batch` | `rag/llm/chat_model.py` | ~L332 | `tc.index`
→ `getattr(tc, "index", None)` |
| `_append_history` | `rag/llm/chat_model.py` | ~L1467 |
`tool_call.index` → `getattr(tool_call, "index", None)` |
| `_append_history_batch` | `rag/llm/chat_model.py` | ~L1496 |
`tc.index` → `getattr(tc, "index", None)` |

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-05-06 14:39:40 +08:00
sapienza yoan
811e9826d0 perf: avoid O(n²) array growth in embedding accumulation (#14369)
### What problem does this PR solve?

Both tokenizer (`rag/flow/tokenizer/tokenizer.py`) and
`BuiltinEmbed.encode`
(`rag/llm/embedding_model.py`) currently accumulate embedding batches
via
`np.concatenate` inside the per-batch loop. `np.concatenate` allocates a
new
array and copies all existing data on every call, so accumulating N
batches
is O(N²) in both time and peak memory.

Replacing the incremental concatenate with a list-of-batches + a single
`np.vstack` at the end gives O(N) total work.

For tokenizer the title-vector broadcast `np.concatenate([vts[0]] * N)`
is
also replaced by `np.tile`, which does the same job with a single
contiguous
allocation instead of building a Python list of references.

This is purely a CPU/memory optimisation — output shape and dtype are
unchanged. Measured impact grows with document size:
  -   1k chunks (batch 512, 2 iters):    ~negligible
  -  10k chunks (20 iters):              ~10× speedup on this stage
  - 100k chunks (195 iters):             ~100× speedup, and peak RAM
drops from O(N) extra to near-zero

### Type of change

- [x] Performance Improvement

Co-authored-by: yoan sapienza <Yoan Sapienza yoan.sapienza@orange.fr Yoan Sapienza zappy@macbookpro.home>
2026-04-30 11:00:10 +08:00
FuturMix
2548c28d65 feat: add FuturMix as model provider (#14419)
## Summary

Add [FuturMix](https://futurmix.ai) as a new model provider. FuturMix is
an OpenAI-compatible unified AI gateway that provides access to 22+
models (GPT, Claude, Gemini, DeepSeek, and more) through a single API
endpoint and key.

- **API Base**: `https://futurmix.ai/v1` (OpenAI-compatible)
- **Supported capabilities**: Chat, Embedding, Image2Text, TTS,
Speech2Text, Rerank

### Changes

| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Add `FuturMix` to `SupportedLiteLLMProvider`
enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `FuturMixChat(Base)` — follows
Astraflow/Avian pattern |
| `rag/llm/embedding_model.py` | Add `FuturMixEmbed(OpenAIEmbed)` —
follows Astraflow pattern |
| `rag/llm/cv_model.py` | Add `FuturMixCV(GptV4)` — follows
SILICONFLOW/OpenRouter pattern |
| `rag/llm/tts_model.py` | Add `FuturMixTTS(OpenAITTS)` — follows
CometAPI/DeerAPI pattern |
| `rag/llm/sequence2txt_model.py` | Add `FuturMixSeq2txt(GPTSeq2txt)` —
follows StepFun pattern |
| `rag/llm/rerank_model.py` | Add `FuturMixRerank(OpenAI_APIRerank)` |
| `conf/llm_factories.json` | Add factory config with 8 chat, 2
embedding, 1 image2text, 2 TTS, 1 speech2text models |
| `docs/guides/models/supported_models.mdx` | Add FuturMix to supported
models table |

### Models included

- **Chat**: claude-sonnet-4-20250514, claude-3.5-haiku, gpt-4o,
gpt-4o-mini, gemini-2.5-flash, gemini-2.0-flash, deepseek-chat,
deepseek-reasoner
- **Embedding**: text-embedding-3-small, text-embedding-3-large
- **Image2Text**: gpt-4o
- **TTS**: tts-1, tts-1-hd
- **Speech2Text**: whisper-1

## Test plan

- [ ] Verify FuturMix appears in the model provider list in RAGFlow UI
- [ ] Configure FuturMix with API key and test chat completion
- [ ] Test embedding model with document indexing
- [ ] Test image2text with a sample image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-30 10:59:37 +08:00
Stephen Hu
345bec812d refactor: improve QwenRerank logic (#14388)
### What problem does this PR solve?

improve QwenRerank logic

### Type of change

- [x] Refactoring
2026-04-28 20:17:34 +08:00
buua436
e6e80041f5 Fix: agent toolcall null response & schema validation & DeepSeek think history (#14425)
### What problem does this PR solve?
agent toolcall null response & schema validation & DeepSeek think
history

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 17:09:08 +08:00
wdeveloper16
78188ce9e9 Feat: add OpenDataLoader PDF parser backend (#14058) (#14097)
### What problem does this PR solve?

Closes #14058.

RAGFlow supports multiple PDF parsing backends (DeepDOC, MinerU,
Docling, TCADP, PaddleOCR). This PR adds **OpenDataLoader**
([opendataloader-project/opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf))
as a new optional backend, giving users a deterministic, local-first
alternative with competitive table extraction accuracy.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---

### Changes

#### Backend
- `deepdoc/parser/opendataloader_parser.py` — new `OpenDataLoaderParser`
class inheriting `RAGFlowPdfParser`. Implements `check_installation()`
(guards Python package + Java 11+ runtime), `parse_pdf()` with
JSON-first extraction (heading/paragraph/table/list/image/formula) and
Markdown fallback, position-tag generation compatible with the shared
`@@page\tx0\tx1\ty0\ty1##` format, and temp-dir lifecycle with cleanup.
- `rag/app/naive.py` — new `by_opendataloader()` wrapper, registered in
`PARSERS` dict, added to `chunk_token_num=0` override list.
- `rag/flow/parser/parser.py` — `"opendataloader"` branch in the
pipeline PDF handler + check validation list.

#### Infrastructure
- `docker/entrypoint.sh` — `ensure_opendataloader()` function: opt-in
via `USE_OPENDATALOADER=true`, skips gracefully if Java is not on PATH.

#### Frontend
- `web/src/components/layout-recognize-form-field.tsx` —
`OpenDataLoader` added to `ParseDocumentType` enum and parser dropdown.
Cascades automatically to the pipeline editor's Parser component.

#### Docs
- `docs/guides/dataset/select_pdf_parser.md` — added OpenDataLoader
entry and full env-var reference.

---

### Environment variables

| Variable | Default | Description |
|---|---|---|
| `USE_OPENDATALOADER` | `false` | Set `true` to install
`opendataloader-pdf` on container startup |
| `OPENDATALOADER_VERSION` | latest | Pin the PyPI release (e.g.
`==2.2.1`) |
| `OPENDATALOADER_HYBRID` | _(unset)_ | Enable hybrid AI mode (e.g.
`docling-fast`) |
| `OPENDATALOADER_IMAGE_OUTPUT` | _(unset)_ | `off` / `embedded` /
`external` |
| `OPENDATALOADER_OUTPUT_DIR` | _(tmp)_ | Persistent output dir; temp
dir used + cleaned if unset |
| `OPENDATALOADER_DELETE_OUTPUT` | `1` | `0` to retain intermediate
files for debugging |
| `OPENDATALOADER_SANITIZE` | _(unset)_ | `1` to filter prompt-injection
patterns from output |

---

### Dependencies

- **Runtime**: `opendataloader-pdf` (PyPI, Apache 2.0) — opt-in, not
added to `pyproject.toml` core deps. Installed by
`ensure_opendataloader()` at container startup when
`USE_OPENDATALOADER=true`.
- **System**: Java 11+ on PATH (JVM is the underlying engine). The
installer skips with a warning if `java` is not found.

---

### How to test

**Standalone parser:**
```bash
source .venv/bin/activate
uv pip install opendataloader-pdf
python3 -c "
import sys; sys.path.insert(0, '.')
from deepdoc.parser.opendataloader_parser import OpenDataLoaderParser
p = OpenDataLoaderParser()
print('available:', p.check_installation())
s, t = p.parse_pdf('path/to/test.pdf', parse_method='pipeline')
print(f'sections={len(s)} tables={len(t)}')
"

```
### Benchmark vs Docling
```
file                      parser            secs  sections  tables
----------------------------------------------------------------------
text-heavy.pdf            docling           45.29       148      10
text-heavy.pdf            opendataloader     3.14       559       0
table-heavy.pdf           docling           7.05        76       3
table-heavy.pdf           opendataloader     3.71        90       0
complex.pdf               docling            42.67       114       8
complex.pdf               opendataloader     3.51       180       0
```
2026-04-25 00:33:02 +08:00
qinling0210
1473000135 Implement retrieval_test in GO (#14231)
### What problem does this PR solve?

Implement retrieval_test in GO

### Type of change

- [x] Refactoring
2026-04-24 15:30:14 +08:00
ucloudnb666
f853a39b40 feat: Add Astraflow provider support (global + China endpoints) (#14270)
## Add Astraflow Provider Support

This PR integrates [Astraflow](https://astraflow.ucloud.cn/) (by UCloud
/ 优刻得) as a new AI model provider in RAGFlow, with support for both
global and China endpoints.

### About Astraflow
Astraflow is an OpenAI-compatible AI model aggregation platform
supporting 200+ models from major providers including DeepSeek, Qwen,
GPT, Claude, Gemini, Llama, Mistral, and more.

| Variant | Factory Name | Endpoint | Env Var |
|---------|-------------|----------|---------|
| Global | `Astraflow` | `https://api-us-ca.umodelverse.ai/v1` |
`ASTRAFLOW_API_KEY` |
| China | `Astraflow-CN` | `https://api.modelverse.cn/v1` |
`ASTRAFLOW_CN_API_KEY` |

- **API key signup**: https://astraflow.ucloud.cn/

---

### Files Changed

| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Register `Astraflow` and `Astraflow-CN` in
`SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and
`LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `AstraflowChat` and `AstraflowCNChat`
(OpenAI-compatible `Base` subclass) |
| `rag/llm/embedding_model.py` | Add `AstraflowEmbed` and
`AstraflowCNEmbed` (subclasses of `OpenAIEmbed`) |
| `rag/llm/rerank_model.py` | Add `AstraflowRerank` and
`AstraflowCNRerank` (subclasses of `OpenAI_APIRerank`) |
| `rag/llm/cv_model.py` | Add `AstraflowCV` and `AstraflowCNCV`
(subclasses of `GptV4`) |
| `rag/llm/tts_model.py` | Add `AstraflowTTS` and `AstraflowCNTTS`
(subclasses of `OpenAITTS`) |
| `rag/llm/sequence2txt_model.py` | Add `AstraflowSeq2txt` and
`AstraflowCNSeq2txt` (subclasses of `GPTSeq2txt`) |
| `conf/llm_factories.json` | Register `Astraflow` and `Astraflow-CN`
factories with a curated list of popular models |

---

### Supported Model Types
-  **Chat / LLM** — DeepSeek-V3/R1, Qwen3, GPT-4o/4.1, Claude 3.5/3.7,
Gemini 2.0/2.5 Flash, Llama 3.3/4, Mistral, and 200+ more
-  **Text Embedding** — text-embedding-3-small/large
-  **Image / Vision (IMAGE2TEXT)** — GPT-4o, GPT-4.1, Claude, Gemini,
Llama-4, etc.
-  **Text Re-Rank**
-  **TTS** — tts-1
-  **Speech-to-Text (SPEECH2TEXT)** — whisper-1

### Implementation Notes
- Uses the `openai/` LiteLLM prefix — consistent with other
OpenAI-compatible aggregation platforms (SILICONFLOW, DeerAPI, CometAPI,
OpenRouter, n1n, Avian, etc.)
- `Astraflow` (global, rank 250) and `Astraflow-CN` (China, rank 249)
are separate factory entries, allowing users to choose the optimal
endpoint based on their region.
- All model classes cleanly subclass existing base classes (`Base`,
`OpenAIEmbed`, `OpenAI_APIRerank`, `GptV4`, `OpenAITTS`, `GPTSeq2txt`)
with no custom logic needed — the provider is fully OpenAI-compatible.

---------

Co-authored-by: user <user@xzaaaMacBook-Air.local>
2026-04-22 15:38:34 +08:00
rhinoceros.xn
4e992de91f Add tongyi gte-rerank-v2 (#14215)
https://bailian.console.aliyun.com/cn-beijing?tab=api#/api/?type=model&url=2780056

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change
- [x] Other (please describe): add gte-rerank-v2、qwen3-rerank
2026-04-20 11:39:17 +08:00
Yongteng Lei
fac46ef67f Refa: change Minimax base url to mainland by default to align with UI (#14195)
### What problem does this PR solve?

Change Minimax base url to mainland by default to align with UI.

### Type of change

- [x] Refactoring
2026-04-17 19:08:57 +08:00
Idriss Sbaaoui
ff27ce86d6 fix: gpt-5 name-based config clearing from base chat path (#13949)
### What problem does this PR solve?

fix #13944 where OpenAI-compatible custom endpoints failed verification
when model names contained `gpt-5` becauser of incorrect name-based
handling in the Base/backend=`base` path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-07 11:24:47 +08:00
Yongteng Lei
dd839f30e8 Fix: code supports matplotlib (#13724)
### What problem does this PR solve?

Code as "final" node: 

![img_v3_02vs_aece4caf-8403-4939-9e68-9845a22c2cfg](https://github.com/user-attachments/assets/9d87b8df-da6b-401c-bf6d-8b807fe92c22)

Code as "mid" node:

![img_v3_02vv_f74f331f-d755-44ab-a18c-96fff8cbd34g](https://github.com/user-attachments/assets/c94ef3f9-2a6c-47cb-9d2b-19703d2752e4)


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-20 20:32:00 +08:00
tmimmanuel
13d0df1562 feat: add Perplexity contextualized embeddings API as a new model provider (#13709)
### What problem does this PR solve?

Adds Perplexity contextualized embeddings API as a new model provider,
as requested in #13610.

- `PerplexityEmbed` provider in `rag/llm/embedding_model.py` supporting
both standard (`/v1/embeddings`) and contextualized
(`/v1/contextualizedembeddings`) endpoints
- All 4 Perplexity embedding models registered in
`conf/llm_factories.json`: `pplx-embed-v1-0.6b`, `pplx-embed-v1-4b`,
`pplx-embed-context-v1-0.6b`, `pplx-embed-context-v1-4b`
- Frontend entries (enum, icon mapping, API key URL) in
`web/src/constants/llm.ts`
- Updated `docs/guides/models/supported_models.mdx`
- 22 unit tests in `test/unit_test/rag/llm/test_perplexity_embed.py`

Perplexity's API returns `base64_int8` encoded embeddings (not
OpenAI-compatible), so this uses a custom `requests`-based
implementation. Contextualized vs standard model is auto-detected from
the model name.

Closes #13610

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2026-03-20 10:47:48 +08:00
Idriss Sbaaoui
9070408b04 Fix : model-specific handling (#13675)
### What problem does this PR solve?

add a handler for gpt 5 models that do not accept parameters by dropping
them, and centralize all models with specific paramter handling function
into a single helper.
solves issue #13639 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-03-18 17:28:20 +08:00
Ethan Clarke
35cd56f990 feat: add MiniMax-M2.5 and M2.5-highspeed models (#13557)
## Summary

Add MiniMax's latest M2.5 model family to the model registry and update
the default API base URL to the international endpoint for broader
accessibility.

## Changes

- **Add MiniMax-M2.5 models** to `conf/llm_factories.json`:
- `MiniMax-M2.5` — Peak Performance. Ultimate Value. Master the Complex.
  - `MiniMax-M2.5-highspeed` — Same performance, faster and more agile.
- Both support 204,800 token context window and tool calling (`is_tools:
true`).
- **Update default MiniMax API base URL** in `rag/llm/__init__.py`:
- From `https://api.minimaxi.com/v1` (domestic) to
`https://api.minimax.io/v1` (international).
- Chinese users can still override via the Base URL field in the UI
settings (as documented in existing i18n strings).

## Supported Models

| Model | Context Window | Tool Calling | Description |
|-------|---------------|-------------|-------------|
| `MiniMax-M2.5` | 204,800 tokens | Yes | Peak Performance. Ultimate
Value. |
| `MiniMax-M2.5-highspeed` | 204,800 tokens | Yes | Same performance,
faster and more agile. |

## API Documentation

- OpenAI Compatible API:
https://platform.minimax.io/docs/api-reference/text-openai-api

## Testing

- [x] JSON validation passes
- [x] Python syntax validation passes
- [x] Ruff lint passes
- [x] MiniMax-M2.5 API call verified (returns valid response)
- [x] MiniMax-M2.5-highspeed API call verified (returns valid response)

Co-authored-by: PR Bot <pr-bot@minimaxi.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-12 20:41:46 +08:00
Ethan T.
1cee8b1a7b fix: use context managers for file handles to prevent resource leaks (#13514)
## Summary
- Convert bare `open()` calls to `with` context managers or
`Path.read_text()`
- File handles leak if not properly closed, especially on exceptions
- Fixes in crypt.py, sequence2txt_model.py, term_weight.py,
deepdoc/vision/__init__.py

## Test plan
- [x] File operations work correctly with context managers
- [x] Resources properly cleaned up on exceptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:47:06 +08:00
Yongteng Lei
3c80a0ae09 Fix: support vLLM's new reasoning field (#13493)
### What problem does this PR solve?

Support vLLM's new reasoning field

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 21:13:14 +08:00
Jonah Hartmann
6023eb27ac feat: add Ragcon provider (#13425)
### What problem does this PR solve?

This PR aims to extend the list of possible providers. Adds new Provider
"RAGcon" within the Ollama Modal. It provides all model types except OCR
via Openai-compatible endpoints.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>
2026-03-06 09:37:27 +08:00
Magicbook1108
5fc3bd38b0 Feat: Support siliconflow.com (#13308)
### What problem does this PR solve?

Feat: Support siliconflow.com

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 15:37:42 +08:00
Stephen Hu
aec2ef4232 refactor:improve tts model's codes (#13137)
### What problem does this PR solve?

improve tts model's codes

### Type of change

- [x] Refactoring
2026-02-28 10:18:00 +08:00
Yuxing Deng
51b180d991 fix: adding GPUStack chat model requires v1 suffix (#13237)
### What problem does this PR solve?

Refer to issue: #13236
The base url for GPUStack chat model requires `/v1` suffix. For the
other model type like `Embedding` or `Rerank`, the `/v1` suffix is not
required and will be appended in code.
So keep the same logic for chat model as other model type.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 20:13:07 +08:00
avianion
5f53fbe0f1 feat: Add Avian as an LLM provider (#13256)
### What problem does this PR solve?

This PR adds [Avian](https://avian.io) as a new LLM provider to RAGFlow.
Avian provides an OpenAI-compatible API with competitive pricing,
offering access to models like DeepSeek V3.2, Kimi K2.5, GLM-5, and
MiniMax M2.5.

**Provider details:**
- API Base URL: `https://api.avian.io/v1`
- Auth: Bearer token via API key
- OpenAI-compatible (chat completions, streaming, function calling)
- Models:
  - `deepseek/deepseek-v3.2` — 164K context, $0.26/$0.38 per 1M tokens
  - `moonshotai/kimi-k2.5` — 131K context, $0.45/$2.20 per 1M tokens
  - `z-ai/glm-5` — 131K context, $0.30/$2.55 per 1M tokens
  - `minimax/minimax-m2.5` — 1M context, $0.30/$1.10 per 1M tokens

**Changes:**
- `rag/llm/chat_model.py` — Add `AvianChat` class extending `Base`
- `rag/llm/__init__.py` — Register in `SupportedLiteLLMProvider`,
`FACTORY_DEFAULT_BASE_URL`, `LITELLM_PROVIDER_PREFIX`
- `conf/llm_factories.json` — Add Avian factory with model definitions
- `web/src/constants/llm.ts` — Add to `LLMFactory` enum, `IconMap`,
`APIMapUrl`
- `web/src/components/svg-icon.tsx` — Register SVG icon
- `web/src/assets/svg/llm/avian.svg` — Provider icon
- `docs/references/supported_models.mdx` — Add to supported models table

This follows the same pattern as other OpenAI-compatible providers
(e.g., n1n #12680, TokenPony).

cc @KevinHuSh @JinHai-CN

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2026-02-27 17:36:55 +08:00
Magicbook1108
98e1d5aa5c Refact: switch from google-generativeai to google-genai (#13140)
### What problem does this PR solve?

Refact: switch from oogle-generativeai to google-genai  #13132
Refact: commnet out unused pywencai.

### Type of change

- [x] Refactoring
2026-02-24 10:28:33 +08:00
Magicbook1108
109441628b Fix: upload image files (#13071)
### What problem does this PR solve?

Fix: upload image files

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-11 09:47:33 +08:00
Yongteng Lei
48591cb1e7 Refa: boost OpenAI-compatible reranker UX (#13087)
### What problem does this PR solve?

boost OpenAI-compatible reranker UX.

### Type of change

- [x] Refactoring
2026-02-10 16:13:21 +08:00
Yongteng Lei
6361fc4b33 Feat: update stepfun list (#12991)
### What problem does this PR solve?

Update stepfun list.

Add TTS and Sequence2Text functionalities.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-05 12:47:04 +08:00
Yongteng Lei
3a86e7c224 Feat: support doubao-embedding-vision model (#12983)
### What problem does this PR solve?

Add support `doubao-embedding-vision` model.
`doubao-embedding-large-text` is deprecated.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2026-02-05 09:49:46 +08:00
eviaaaaa
c59ae4c7c2 Fix: codeExec return types & error handling; Update Spark model mappings (#12896)
## What problem does this PR solve?

This PR addresses three specific issues to improve agent reliability and
model support:

1. **`codeExec` Output Limitation**: Previously, the `codeExec` tool was
strictly limited to returning `string` types. I updated the output
constraint to `object` to support structured data (Dicts, Lists, etc.)
required for complex downstream tasks.
2. **`codeExec` Error Handling**: Improved the execution logic so that
when runtime errors occur, the tool captures the exception and returns
the error message as the output instead of causing the process to abort
or fail silently.
3. **Spark Model Configuration**:
    - Added support for the `MAX-32k` model variant.
- Fixed the `Spark-Lite` mapping from `general` to `lite` to match the
latest API specifications.

## Type of change

- [x] Bug Fix (fixes execution logic and model mapping)
- [x] New Feature / Enhancement (adds model support and improves tool
flexibility)

## Key Changes

### `agent/tools/code_exec.py`
- Changed the output type definition from `string` to `object`.
- Refactored the execution flow to gracefully catch exceptions and
return error messages as part of the tool output.

### `rag/llm/chat_model.py`
- Added `"Spark-Max-32K": "max-32k"` to the model list.
- Updated `"Spark-Lite"` value from `"general"` to `"lite"`.

## Checklist
- [x] My code follows the style guidelines of this project.
- [x] I have performed a self-review of my own code.

Signed-off-by: evilhero <2278596667@qq.com>
2026-01-29 19:22:35 +08:00