ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
jiashi19	0d7ad0ed0c	Feat/agent thinking switch (#15446 ) ### What problem does this PR solve? This PR adds an Agent LLM setting to control thinking mode for official providers that expose a thinking switch. Related to #12842. Closes #15445. Some providers expose thinking controls through provider-specific request fields, but Agent LLM settings did not have a unified option for users to enable or disable thinking mode. This PR adds a `Thinking` selector with: - System default - Enabled - Disabled <img width="452" height="278" alt="8566b0b4-0546-4c8a-913d-f9bbd38319f6" src="https://github.com/user-attachments/assets/25b497f7-1ba0-4bfe-940d-6fe79287d6ab" /> <img width="471" height="971" alt="8a0a6bee-f45f-48d5-bd83-17af260de3db" src="https://github.com/user-attachments/assets/41ad43c1-5087-48f1-bf37-f2ca14c2be2f" /> Initial support is limited to the verified official providers: - Qwen / DashScope: `enable_thinking` - Kimi / Moonshot: `thinking.type` - GLM / ZHIPU-AI: `thinking.type` For LiteLLM-based providers, provider-specific fields are forwarded through `extra_body` before `drop_params` filtering so the request parameters are preserved. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: jiashi <jiashi19@outlook.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Tim Wang	ca96d61e73	Feat: Add New API model provider for OpenAI-compatible gateways (#15991 ) ## Summary Add support for "New API" as a model provider, enabling connection to [New API](https://github.com/QuantumNous/new-api) / [one-api](https://github.com/songquanpeng/one-api) compatible gateways that aggregate multiple LLM backends behind a unified OpenAI-compatible `/v1` endpoint. ### Features - All model types: Chat, Embedding, Rerank, Image2Text, TTS, Speech2Text - List Models discovery: `NewAPI(OpenAIAPICompatible)` class in `model_meta.py` queries the gateway's `/v1/models` to auto-discover available models via the native `GET /api/v1/providers/<name>/models` endpoint - Model parameter editing: Pencil icon on each discovered model row to edit `model_type`, `max_tokens`, and `features` (e.g. tool call support) before submitting - Custom model addition: "Add Custom Model" button at the bottom of the List Models dropdown for models not returned by the API - Gear icon settings: Enabled the Settings gear button on provider instances to manage models on existing instances (viewMode) - viewMode credential passthrough: Fixed List Models in viewMode — merges `initialValues` credentials when `api_key`/`base_url` fields are hidden by `hideWhenInstanceExists` ### Changes Backend (8 files): - `rag/llm/chat_model.py` — `NewAPIChat(Base)` class - `rag/llm/embedding_model.py` — `NewAPIEmbed(OpenAIEmbed)` class (no auto `/v1` append) - `rag/llm/rerank_model.py` — `NewAPIRerank(Base)` class (uses `/rerank` endpoint) - `rag/llm/cv_model.py` — `NewAPICv(GptV4)` class - `rag/llm/tts_model.py` — `NewAPITTS(OpenAITTS)` class - `rag/llm/sequence2txt_model.py` — `NewAPISeq2txt(GPTSeq2txt)` class - `rag/llm/model_meta.py` — `NewAPI(OpenAIAPICompatible)` class for List Models discovery - `conf/llm_factories.json` — New API factory entry with all model type tags Frontend (8 files + 1 new SVG): - `web/src/assets/svg/llm/new-api.svg` — New API logo icon - `web/src/constants/llm.ts` — `LLMFactory.NewAPI` enum + `IconMap` entry - `web/src/components/svg-icon.tsx` — `NewAPI` added to `svgIcons` - `web/src/pages/user-setting/setting-model/modal/provider-modal/field-config/local-llm-configs.ts` — New API `buildLocalConfig` - `web/src/pages/user-setting/setting-model/modal/provider-modal/constants.ts` — `LIST_MODEL_PROVIDERS` includes NewAPI - `web/src/pages/user-setting/setting-model/components/used-model.tsx` — Enable Settings gear button - `web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-picker.ts` — viewMode credential merge + model editing state/handlers - `web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-options.tsx` — Pencil edit icon per model row - `web/src/pages/user-setting/setting-model/modal/provider-modal/index.tsx` — `AddCustomModelDialog` import + edit dialog rendering Note on Go implementation: A Go model driver (`NewAPIModel` delegating to `OpenAIModel`) has been prepared but is deferred until the Go runtime is enabled in a future release (current v0.26.0 images use `API_PROXY_SCHEME=python` and do not compile Go binaries). Will submit as a follow-up PR. ## Related - Depends on: #15996 (provider instance API improvements — server-side credential lookup, idempotent `add_model`, security fixes — required for viewMode gear icon and batch model submission) ## Test plan - [ ] Add New API provider with api_key and base_url pointing to an OpenAI-compatible gateway - [ ] Click "List Models" — should discover and display available models from `/v1/models` - [ ] Click pencil icon on a model — should open edit dialog to change model_type, max_tokens, features - [ ] Select multiple models and click OK — should add all selected models - [ ] Click gear icon on the added instance — should open viewMode with List Models working - [ ] In viewMode, select new models including pre-existing ones, click OK — should succeed (requires #15996) - [ ] Verify all model types work: create a Chat assistant, Embedding KB, Rerank setting 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Tim Wang <wanghualoong@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-26 18:47:20 +08:00
Günter Lukas	398f488b1b	fix: support Google Cloud Gemini eu/us multipoint endpoints (#15990 ) fix: support Google Cloud Gemini eu/us multipoint endpoints (#15990)	2026-06-24 11:07:05 +08:00
Lynn	a5cce29f22	Fix: add mimo (#16136 ) ### What problem does this PR solve? Add chat model factory for Xiaomi model. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 19:02:33 +08:00
Idriss Sbaaoui	9871a7e0b6	fix: replicate model provider (#15933 ) ### What problem does this PR solve? FIx replicate model provider failing with valid api key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:08:33 +08:00
euvre	f97d6396b4	fix: BaiduYiyan API key validation fails in set_api_key (#15828 ) ### What problem does this PR solve? When setting the API key for the BaiduYiyan provider, all model validations fail with the error "Fail to access model using this api key. No valid response received". Root cause: 1. `BaiduYiyanChat` in `rag/llm/chat_model.py` does not override `async_chat_streamly()`. The `verify_api_key()` function uses `mdl.async_chat_streamly()` to validate, but `BaiduYiyanChat` inherits `Base.async_chat_streamly()` which uses the OpenAI client, not the Baidu Qianfan SDK (qianfan). Since BaiduYiyan has no OpenAI-compatible base_url, validation always fails. 2. `verify_api_key()` in `provider_api_service.py` does not format the raw API key string into the JSON format (`{"yiyan_ak": "...", "yiyan_sk": "..."}`) that `BaiduYiyanChat.__init__()` expects via `json.loads(key)`. Fix: 1. Add `async_chat_streamly()` method to `BaiduYiyanChat` using the qianfan SDK, consistent with the existing `chat_streamly()` method. 2. Add BaiduYiyan API key formatting in `provider_api_service.py` `verify_api_key()` to match the format expected by `BaiduYiyanChat.__init__()`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-06-09 19:05:58 +08:00
Wang Qi	8e4fba6cd2	Fix OpenRouter key JSONDecodeError (#15776 ) Fix OpenRouter key JSONDecodeError	2026-06-08 19:19:10 +08:00
Lynn	794c1f4b25	Fix: volc engine and other json key factories (#15653 ) ### What problem does this PR solve? Fix: - VolcEngine adapt to new api_key format - Save dict api_key as json ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-05 09:45:44 +08:00
Wang Qi	d41373cfa9	Feature: Add the new anthropic and voyage models (#15516 ) add the newanthropic and voyage models. Strip opus 4.7 and 4.8 of certain usnspported keys Co-authored-by: Idriss Sbaaoui <112825897+6ba3i@users.noreply.github.com>	2026-06-02 17:29:18 +08:00
nickmopen	bebf6ed244	fix(llm): strip non-generation keys from gen_conf for LiteLLM providers (#15427 ) (#15432 ) ### What problem does this PR solve? Fixes #15427. All LiteLLM-routed chats fail with: - Anthropic: `litellm.BadRequestError: AnthropicException - {"type":"invalid_request_error","message":"model_type: Extra inputs are not permitted"}` - OpenAI: `litellm.BadRequestError: OpenAIException - Unknown parameter: 'model_type'` This is a regression from v0.25.4. #### Root cause A chat assistant's `llm_setting` is forwarded to the model as `gen_conf`. `llm_setting` can legitimately carry RAGFlow-internal metadata such as `model_type` (the chat REST APIs in `api/apps/restful_apis/` read it back out of `llm_setting`), so that key ends up inside `gen_conf`. `Base._clean_conf` (OpenAI-compatible providers) already whitelists the keys it forwards, so direct-OpenAI providers were unaffected. `LiteLLMBase._clean_conf` only dropped `max_tokens` and passed everything else straight through to `litellm.acompletion`, which forwarded `model_type` to the upstream provider — and Anthropic / OpenAI reject it. Because both Claude and GPT route through LiteLLM, every chat broke. #### Fix - Extract the allowed-key set into a shared `ALLOWED_GEN_CONF_KEYS` constant and reuse it in `Base._clean_conf`. - Apply the same whitelist in `LiteLLMBase._clean_conf`, plus the LiteLLM-specific reasoning params (`thinking`, `reasoning_effort`, `extra_body`) that the model-family policies inject for reasoning models. This covers all four LiteLLM completion paths (`async_chat`, `async_chat_streamly`, `async_chat_with_tools`, `async_chat_streamly_with_tools`), since they all route through `_clean_conf`. #### Tests Adds `test/unit_test/rag/llm/test_clean_conf_whitelist.py` covering both backends: `model_type` (and other stray keys) are dropped, genuine generation params and `thinking` survive, `max_tokens` is removed, and the whitelist invariants hold. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Added test cases	2026-06-02 10:04:11 +08:00
呆萌闷油瓶	658ff06ca4	feat: add 4 new models for siliconflow (#15383 ) ### What problem does this PR solve? Added 4 new models: deepseek-ai/DeepSeek-V4-Pro deepseek-ai/DeepSeek-V4-Flash Pro/moonshotai/Kimi-K2.6 Pro/zai-org/GLM-5.1 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-29 19:28:29 +08:00
wdeveloper16	4b36801b53	fix: resolve asyncio correctness issues (fire-and-forget tasks, event loop nesting) (#14761 ) ## Summary Fixes the confirmed asyncio anti-patterns from #14755. Only the three verified bugs are addressed; patterns already correctly using `asyncio.new_event_loop()` in a fresh thread are left untouched. ### Changes `api/apps/restful_apis/tenant_api.py` — fire-and-forget `send_invite_email` `asyncio.create_task()` was called without storing the `Task` reference. CPython's GC can collect an unfinished task, silently cancelling it and swallowing exceptions. Fixed by storing the task in a module-level `_background_tasks: set[Task]` with a `done_callback` to discard it on completion — the standard Python idiom for safe background tasks. `api/apps/restful_apis/agent_api.py` — fire-and-forget `background_run` Same root cause in the webhook "Immediately" execution path. Same fix applied. `rag/llm/chat_model.py` (`LocalLLM._stream_response`) — `asyncio.get_event_loop()` on running loop `asyncio.get_event_loop()` returns Quart's running event loop when called from an async context. Calling `loop.run_until_complete()` on it raises `RuntimeError`. Replaced with `asyncio.new_event_loop()` so the generator uses a dedicated fresh loop, closed in a `finally` block. ## What was NOT changed - `llm_service._sync_from_async_stream` and `evaluation_service._sync_from_async_gen`: both already correctly use `asyncio.new_event_loop()` inside a fresh thread. - `llm_service._run_coroutine_sync`: only caller is `rag/app/resume.py` (sync context), so `thread.join()` is correct there. - `requests` in agent tools: sync methods dispatched through thread pools; httpx migration is a separate, larger refactor. ## Test plan - [ ] Invite a team member and confirm the email is sent with no task warnings in logs. - [ ] Trigger a webhook agent in "Immediately" mode; confirm canvas state is persisted after background run. - [ ] Verify `LocalLLM` (Jina backend) chat and streaming work end-to-end. Closes #14755 --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-05-25 22:45:40 +08:00
Jonathan Hill	111cdc77b5	fix: guard LLM response against empty choices (fixes #14711 ) (#14988 ) ## Summary Fixes 10 unguarded `response.choices[0]` accesses that cause `IndexError` or `AttributeError` when the LLM returns an empty `choices` list — the scenario described in #14711. - `rag/llm/cv_model.py` - `rag/llm/chat_model.py` Each access site is now guarded with: ```python if not response.choices: raise ValueError("LLM returned empty response") ``` ## Verification Detected and verified by [pact](https://github.com/qizwiz/pact) — a sheaf-cohomological LLM contract checker using Z3 as a local theory solver. pact sheaf-cohomological proof status after fix: \| File \| Ȟ¹ (after) \| Z3 \| \|------\|-----------\|-----\| \| `rag/llm/cv_model.py` \| 0 \| UNSAT ✓ \| \| `rag/llm/chat_model.py` \| 0 \| UNSAT ✓ \| All access sites proven safe (Z3 UNSAT certificate). The checker was also used to verify the autogen streaming-None fix in [microsoft/autogen#7711](https://github.com/microsoft/autogen/pull/7711). ## Test plan - [ ] Existing test suite passes - [ ] Manually test with a provider that returns empty `choices` under load (e.g. Vertex AI) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Jonathan Hill <jonathan.f.hill@gmail.com>	2026-05-21 15:37:19 +08:00
Kevin Hu	e7544562cc	Feat: @tool decorator for chat-model tool registration (#15047 ) ## Summary - Adds a lightweight `@tool` decorator and `FunctionToolSession` adapter in `rag/llm/tool_decorator.py` that let callers register plain Python functions as LLM tools without hand-writing OpenAI function schemas or building an MCP-style session. - Refactors `Base.bind_tools` and `LiteLLMBase.bind_tools` in `rag/llm/chat_model.py` to accept either the new decorator form `bind_tools(tools=[fn1, fn2])` or the existing `(toolcall_session, tools_schemas)` form, so existing agent/dialog call-sites in `agent/component/agent_with_tools.py`, `api/db/services/llm_service.py`, and `api/db/services/dialog_service.py` are unaffected. - Adds 8 unit tests in `test/unit_test/rag/llm/test_tool_decorator.py` covering schema shape, required/optional inference, sync + async dispatch, and bad-input rejection. ## Usage ```python from rag.llm.tool_decorator import tool @tool def get_weather(city: str) -> str: """Get current weather for a city. :param city: City name to look up. """ return f"{city}: 21 C, partly cloudy" chat_mdl.bind_tools(tools=[get_weather]) ans, tk = await chat_mdl.async_chat_with_tools(system, history) ``` The decorator introspects `inspect.signature` + type hints + the docstring (`:param name:` style) and attaches an OpenAI-format `openai_schema` to the callable. `FunctionToolSession` duck-types the existing `ToolCallSession` protocol, dispatching async callables directly and sync ones through `thread_pool_exec` so the event loop is never blocked. ## Design notes - `tool_decorator.py` deliberately does not live inside `rag/llm/__init__.py` to avoid forcing every consumer through the heavy provider auto-discovery loop and to sidestep a circular import (`__init__.py` imports `chat_model`, which would otherwise need symbols from `__init__.py`). - `FunctionToolSession` is duck-typed against `common.mcp_tool_call_conn.ToolCallSession` rather than explicitly inheriting from it, so importing the decorator doesn't pull the MCP client SDK into the import graph. - Docstring parsing is intentionally minimal (`:param name:` only) to keep this dependency-free; Google/NumPy styles can be added later via `docstring_parser` if needed. ## Test plan - [x] `python -m pytest test/unit_test/rag/llm/test_tool_decorator.py -v` — 8 passed - [x] `python -m pytest test/unit_test/rag/llm/ --ignore=test/unit_test/rag/llm/test_perplexity_embed.py` — 11 passed (the ignored test has a pre-existing `numpy` import that's unrelated) - [ ] Reviewer: smoke-test the new path end-to-end with a live model via `chat_mdl.bind_tools(tools=[my_fn])` to confirm the OpenAI-format schemas pass through unchanged 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 15:32:17 +08:00
Magicbook1108	b28e134944	Feat: add local & ssh provider in admin panel (#15039 ) ### What problem does this PR solve? Feat: add local & ssh provider in admin panel ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-20 16:56:20 +08:00
wdeveloper16	14c0985182	feat: bump Python minimum from 3.12 to 3.13, drop strenum backport (#14767 ) Closes #14753 ## What changed \| File \| Change \| \|---\|---\| \| `pyproject.toml` \| `requires-python` → `>=3.13,<3.15`; remove `strenum==0.4.15` \| \| `Dockerfile` \| `uv python install 3.13`, `uv sync --python 3.13` \| \| `.github/workflows/tests.yml` \| `uv sync --python 3.13` on both matrix legs \| \| `CLAUDE.md` \| dev setup command + requirements note updated \| \| `deepdoc/parser/mineru_parser.py` \| `from strenum import StrEnum` → `from enum import StrEnum` \| \| `agent/tools/code_exec.py` \| same \| `StrEnum` has been in the stdlib since Python 3.11 — the `strenum` backport package is no longer needed once the floor is 3.13. ## Why uv.lock is not regenerated `uv lock --python 3.13` fails because: 1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0` 2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels) depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0` 3. These two constraints are irreconcilable on Python 3.13 The lockfile regeneration requires loosening the `numpy` upper bound in the `infiniflow/graspologic` fork. Once that fork commit is updated and the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will succeed. ## RFC corrections Two claims in the original RFC (#14753) did not hold up under code review: - "graspologic hard-blocks 3.13" — the infiniflow fork at the pinned commit has no `<3.13` Python constraint. The blocker is the transitive `numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a direct Python version cap. - "free-threading throughput gains for I/O-bound workload" — Python 3.13 free-threading requires a special `--disable-gil` build and provides no benefit for async I/O code (the GIL is already released during I/O). The real motivation is forward compatibility and improved error messages.	2026-05-15 14:40:53 +08:00
Ricardo-M-L	1046042e01	fix(llm): replace mutable default `gen_conf={}` with None + defensive copy (#14566 ) ### What 19 methods across `rag/llm/chat_model.py` and `rag/llm/cv_model.py` declare `gen_conf={}` (or `gen_conf: dict = {}`) as a parameter default and then mutate `gen_conf` in place — typically `del gen_conf["max_tokens"]`, `gen_conf["penalty_score"] = ...`, or `gen_conf.pop(...)` as part of provider-specific normalization. ### The two bugs in this pattern 1. Mutable default argument (Python footgun). Python evaluates default values once at function-definition time, so the single `{}` dict is shared across every caller that doesn't pass `gen_conf`. The first such call's mutations leak into the default seen by every subsequent call. ```python # Before def chat_streamly(self, system, history, gen_conf={}, kwargs): if "max_tokens" in gen_conf: del gen_conf["max_tokens"] # mutates the SHARED default dict ... ``` After call N with `max_tokens` set, call N+1 that omits `gen_conf` no longer sees `max_tokens` — even though the caller never touched it. 2. Caller-dict pollution.** When the caller does pass a `gen_conf` dict, the same in-place mutations modify the caller's dict. A reused `gen_conf` (very common for chat-loop callers that build the config once and pass it on every turn) silently loses `max_tokens`, `presence_penalty`, etc. after the first round. ### The fix In every affected method: - Change `gen_conf={}` (or `gen_conf: dict = {}`) → `gen_conf=None`. - Add `gen_conf = dict(gen_conf or {})` as the first statement of the body so all subsequent mutations operate on a fresh local copy. ```python # After def chat_streamly(self, system, history, gen_conf=None, kwargs): gen_conf = dict(gen_conf or {}) if "max_tokens" in gen_conf: del gen_conf["max_tokens"] # local copy — safe ... ``` This is byte-for-byte identical provider-side behavior for callers that already pass a fresh `gen_conf` per call. The new `dict(...)` copy is O(small constant) per call. ### Files changed - `rag/llm/chat_model.py` — 17 methods - `rag/llm/cv_model.py` — 2 methods ### Tests Adds `test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py` — an `ast`-based regression guard that walks both modules and asserts no parameter named `gen_conf` ever has a mutable literal (`{}` or `[]`) as its default. The test caught five additional `gen_conf: dict = {}` sites that an initial `gen_conf={}` text grep had missed (annotated parameters with whitespace), and would fail again if the pattern is ever reintroduced. ``` $ pytest test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py -v ============================== 3 passed in 0.04s =============================== ``` `ruff check` passes on all touched files. ### Notes - This PR is intentionally focused on just** the `gen_conf` default + copy fix. There's a related (but separate) `history.insert(0, ...)` pattern in the same files that mutates the caller's history list in 12 places — left for a follow-up so this PR stays mechanical and easy to review. ### Latest revision (`700bb54a7`) — addresses CodeRabbit review - Type annotation: `gen_conf: dict = None` → `gen_conf: dict \| None = None` (5 occurrences in `chat_model.py`). The old annotation was a static-checker mismatch since `None` isn't a `dict`. - Regression test: the AST check accessed `default.keys` directly. `ast.List` has no `.keys` attribute — a future `gen_conf=[]` would crash with `AttributeError` instead of being caught. Use `getattr` for both `.keys` (Dict) and `.elts` (List). Manually verified the updated check correctly catches both `gen_conf={}` and `gen_conf=[]` while ignoring `gen_conf=None` and non-empty literals. --------- Co-authored-by: Ricardo <ricardo@example.com>	2026-05-09 13:11:44 +08:00
Zhichang Yu	86fe78c73f	feat(llm): add MiniMax GroupId header support (#14610 ) ## Summary - Add MiniMax provider GroupId query parameter support in `LiteLLMBase` - Extract `group_id` from key configuration in `__init__` - Append `GroupId` as query parameter to `api_base` in `_construct_complete_args` ## Why this change is needed MiniMax provides an OpenAI-compatible API endpoint (`/v1/chat/completions`), but `GroupId` is a MiniMax-specific account identifier required for billing and rate limiting - it is not part of the OpenAI standard. Looking at LiteLLM's `MinimaxChatConfig`: - `get_complete_url()` only constructs the base URL (e.g., `https://api.minimaxi.com/v1/chat/completions`) - LiteLLM does not automatically inject `GroupId` into requests - This must be handled by the caller (ragflow's chat_model.py) The implementation appends `GroupId` as a query parameter to `api_base`: ```python api_base = completion_args.get("api_base", self.base_url) separator = "&" if "?" in api_base else "?" completion_args["api_base"] = f"{api_base}{separator}GroupId={self.group_id}" ``` This matches MiniMax's official API format (as documented by LlamaFactory): ```bash curl --location 'https://api.minimaxi.chat/v1/text/chatcompletion?GroupId=你的GroupId' \ --header 'Authorization: Bearer 你的API_Key' ``` ## Test plan - [ ] Verify MiniMax API calls work with GroupId query parameter - [ ] Verify backward compatibility for other providers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 11:54:49 +08:00
euvre	8269fa01b4	Fix AttributeError when appending non-streaming tool calls to chat history in Agentic Agent (#14456 ) ### What problem does this PR solve? Fix #14340 ## Problem Description When using an Agentic Agent (not Workflow) with one or more Retrieval tools (e.g., Dataset Retrieval + Memory Retrieval), the agent silently returns an empty response (`agent_response: ""`) after hanging for several minutes. The server logs show: ``` AttributeError: 'ChatCompletionMessageToolCall' object has no attribute 'index' ``` This error propagates as a `GENERIC_ERROR`, causing the canvas to return an empty response. The subsequent Memory save task then receives the empty `agent_response` and logs: ``` Document for referred_document_id XXXX not found ``` ## Reproduction Steps 1. Set `DOC_ENGINE=infinity` (or `elasticsearch` — the engine itself is not the root cause). 2. Create a blank Agentic Agent (not a Workflow). 3. Add two Retrieval tools to the Agent node: - `Retrieval_DS` → Dataset (Knowledge Base) - `Retrieval_Mem` → Memory component 4. Add a Message node with Save to Memory enabled. 5. Launch the agent and send any message (e.g., "hola"). 6. The agent hangs and returns an empty response. ## Root Cause Analysis The crash occurs in `_append_history` and `_append_history_batch` inside `rag/llm/chat_model.py`. These methods directly access `.index` on tool call objects: ```python # _append_history_batch { "index": tc.index, # <-- crashes here ... } ``` However, non-streaming LLM responses (`stream=False`) return `ChatCompletionMessageToolCall` objects, which do not have an `index` field according to the OpenAI API specification. The `index` field only exists on `ChoiceDeltaToolCall` objects returned in streaming responses (`stream=True`). When the agentic agent triggers an internal `full_question` call (used to compress multi-turn conversation history), the request is incorrectly routed through `async_chat_with_tools` because `is_tools=True` is set at the `LLMBundle` level. If the LLM decides to emit `tool_calls` during this auxiliary request, the code enters the non-streaming tool loop and crashes when trying to append history. ## Fix Replaced all direct `.index` accesses with `getattr(..., "index", None)` for safe, backward-compatible access: \| Method \| File \| Line \| Change \| \|--------\|------\|------\|--------\| \| `_append_history` \| `rag/llm/chat_model.py` \| ~L304 \| `tool_call.index` → `getattr(tool_call, "index", None)` \| \| `_append_history_batch` \| `rag/llm/chat_model.py` \| ~L332 \| `tc.index` → `getattr(tc, "index", None)` \| \| `_append_history` \| `rag/llm/chat_model.py` \| ~L1467 \| `tool_call.index` → `getattr(tool_call, "index", None)` \| \| `_append_history_batch` \| `rag/llm/chat_model.py` \| ~L1496 \| `tc.index` → `getattr(tc, "index", None)` \| ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: noob <yixiao121314@outlook.com>	2026-05-06 14:39:40 +08:00
FuturMix	2548c28d65	feat: add FuturMix as model provider (#14419 ) ## Summary Add [FuturMix](https://futurmix.ai) as a new model provider. FuturMix is an OpenAI-compatible unified AI gateway that provides access to 22+ models (GPT, Claude, Gemini, DeepSeek, and more) through a single API endpoint and key. - API Base: `https://futurmix.ai/v1` (OpenAI-compatible) - Supported capabilities: Chat, Embedding, Image2Text, TTS, Speech2Text, Rerank ### Changes \| File \| Change \| \|------\|--------\| \| `rag/llm/__init__.py` \| Add `FuturMix` to `SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` \| \| `rag/llm/chat_model.py` \| Add `FuturMixChat(Base)` — follows Astraflow/Avian pattern \| \| `rag/llm/embedding_model.py` \| Add `FuturMixEmbed(OpenAIEmbed)` — follows Astraflow pattern \| \| `rag/llm/cv_model.py` \| Add `FuturMixCV(GptV4)` — follows SILICONFLOW/OpenRouter pattern \| \| `rag/llm/tts_model.py` \| Add `FuturMixTTS(OpenAITTS)` — follows CometAPI/DeerAPI pattern \| \| `rag/llm/sequence2txt_model.py` \| Add `FuturMixSeq2txt(GPTSeq2txt)` — follows StepFun pattern \| \| `rag/llm/rerank_model.py` \| Add `FuturMixRerank(OpenAI_APIRerank)` \| \| `conf/llm_factories.json` \| Add factory config with 8 chat, 2 embedding, 1 image2text, 2 TTS, 1 speech2text models \| \| `docs/guides/models/supported_models.mdx` \| Add FuturMix to supported models table \| ### Models included - Chat: claude-sonnet-4-20250514, claude-3.5-haiku, gpt-4o, gpt-4o-mini, gemini-2.5-flash, gemini-2.0-flash, deepseek-chat, deepseek-reasoner - Embedding: text-embedding-3-small, text-embedding-3-large - Image2Text: gpt-4o - TTS: tts-1, tts-1-hd - Speech2Text: whisper-1 ## Test plan - [ ] Verify FuturMix appears in the model provider list in RAGFlow UI - [ ] Configure FuturMix with API key and test chat completion - [ ] Test embedding model with document indexing - [ ] Test image2text with a sample image 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-30 10:59:37 +08:00
buua436	e6e80041f5	Fix: agent toolcall null response & schema validation & DeepSeek think history (#14425 ) ### What problem does this PR solve? agent toolcall null response & schema validation & DeepSeek think history ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-28 17:09:08 +08:00
ucloudnb666	f853a39b40	feat: Add Astraflow provider support (global + China endpoints) (#14270 ) ## Add Astraflow Provider Support This PR integrates [Astraflow](https://astraflow.ucloud.cn/) (by UCloud / 优刻得) as a new AI model provider in RAGFlow, with support for both global and China endpoints. ### About Astraflow Astraflow is an OpenAI-compatible AI model aggregation platform supporting 200+ models from major providers including DeepSeek, Qwen, GPT, Claude, Gemini, Llama, Mistral, and more. \| Variant \| Factory Name \| Endpoint \| Env Var \| \|---------\|-------------\|----------\|---------\| \| Global \| `Astraflow` \| `https://api-us-ca.umodelverse.ai/v1` \| `ASTRAFLOW_API_KEY` \| \| China \| `Astraflow-CN` \| `https://api.modelverse.cn/v1` \| `ASTRAFLOW_CN_API_KEY` \| - API key signup: https://astraflow.ucloud.cn/ --- ### Files Changed \| File \| Change \| \|------\|--------\| \| `rag/llm/__init__.py` \| Register `Astraflow` and `Astraflow-CN` in `SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` \| \| `rag/llm/chat_model.py` \| Add `AstraflowChat` and `AstraflowCNChat` (OpenAI-compatible `Base` subclass) \| \| `rag/llm/embedding_model.py` \| Add `AstraflowEmbed` and `AstraflowCNEmbed` (subclasses of `OpenAIEmbed`) \| \| `rag/llm/rerank_model.py` \| Add `AstraflowRerank` and `AstraflowCNRerank` (subclasses of `OpenAI_APIRerank`) \| \| `rag/llm/cv_model.py` \| Add `AstraflowCV` and `AstraflowCNCV` (subclasses of `GptV4`) \| \| `rag/llm/tts_model.py` \| Add `AstraflowTTS` and `AstraflowCNTTS` (subclasses of `OpenAITTS`) \| \| `rag/llm/sequence2txt_model.py` \| Add `AstraflowSeq2txt` and `AstraflowCNSeq2txt` (subclasses of `GPTSeq2txt`) \| \| `conf/llm_factories.json` \| Register `Astraflow` and `Astraflow-CN` factories with a curated list of popular models \| --- ### Supported Model Types - ✅ Chat / LLM — DeepSeek-V3/R1, Qwen3, GPT-4o/4.1, Claude 3.5/3.7, Gemini 2.0/2.5 Flash, Llama 3.3/4, Mistral, and 200+ more - ✅ Text Embedding — text-embedding-3-small/large - ✅ Image / Vision (IMAGE2TEXT) — GPT-4o, GPT-4.1, Claude, Gemini, Llama-4, etc. - ✅ Text Re-Rank - ✅ TTS — tts-1 - ✅ Speech-to-Text (SPEECH2TEXT) — whisper-1 ### Implementation Notes - Uses the `openai/` LiteLLM prefix — consistent with other OpenAI-compatible aggregation platforms (SILICONFLOW, DeerAPI, CometAPI, OpenRouter, n1n, Avian, etc.) - `Astraflow` (global, rank 250) and `Astraflow-CN` (China, rank 249) are separate factory entries, allowing users to choose the optimal endpoint based on their region. - All model classes cleanly subclass existing base classes (`Base`, `OpenAIEmbed`, `OpenAI_APIRerank`, `GptV4`, `OpenAITTS`, `GPTSeq2txt`) with no custom logic needed — the provider is fully OpenAI-compatible. --------- Co-authored-by: user <user@xzaaaMacBook-Air.local>	2026-04-22 15:38:34 +08:00
Idriss Sbaaoui	ff27ce86d6	fix: gpt-5 name-based config clearing from base chat path (#13949 ) ### What problem does this PR solve? fix #13944 where OpenAI-compatible custom endpoints failed verification when model names contained `gpt-5` becauser of incorrect name-based handling in the Base/backend=`base` path. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-07 11:24:47 +08:00
Yongteng Lei	dd839f30e8	Fix: code supports matplotlib (#13724 ) ### What problem does this PR solve? Code as "final" node: ![img_v3_02vs_aece4caf-8403-4939-9e68-9845a22c2cfg](https://github.com/user-attachments/assets/9d87b8df-da6b-401c-bf6d-8b807fe92c22) Code as "mid" node: ![img_v3_02vv_f74f331f-d755-44ab-a18c-96fff8cbd34g](https://github.com/user-attachments/assets/c94ef3f9-2a6c-47cb-9d2b-19703d2752e4) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-20 20:32:00 +08:00
Idriss Sbaaoui	9070408b04	Fix : model-specific handling (#13675 ) ### What problem does this PR solve? add a handler for gpt 5 models that do not accept parameters by dropping them, and centralize all models with specific paramter handling function into a single helper. solves issue #13639 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-03-18 17:28:20 +08:00
Yongteng Lei	3c80a0ae09	Fix: support vLLM's new reasoning field (#13493 ) ### What problem does this PR solve? Support vLLM's new reasoning field ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 21:13:14 +08:00
Jonah Hartmann	6023eb27ac	feat: add Ragcon provider (#13425 ) ### What problem does this PR solve? This PR aims to extend the list of possible providers. Adds new Provider "RAGcon" within the Ollama Modal. It provides all model types except OCR via Openai-compatible endpoints. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>	2026-03-06 09:37:27 +08:00
Yuxing Deng	51b180d991	fix: adding GPUStack chat model requires v1 suffix (#13237 ) ### What problem does this PR solve? Refer to issue: #13236 The base url for GPUStack chat model requires `/v1` suffix. For the other model type like `Embedding` or `Rerank`, the `/v1` suffix is not required and will be appended in code. So keep the same logic for chat model as other model type. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 20:13:07 +08:00
avianion	5f53fbe0f1	feat: Add Avian as an LLM provider (#13256 ) ### What problem does this PR solve? This PR adds [Avian](https://avian.io) as a new LLM provider to RAGFlow. Avian provides an OpenAI-compatible API with competitive pricing, offering access to models like DeepSeek V3.2, Kimi K2.5, GLM-5, and MiniMax M2.5. Provider details: - API Base URL: `https://api.avian.io/v1` - Auth: Bearer token via API key - OpenAI-compatible (chat completions, streaming, function calling) - Models: - `deepseek/deepseek-v3.2` — 164K context, $0.26/$0.38 per 1M tokens - `moonshotai/kimi-k2.5` — 131K context, $0.45/$2.20 per 1M tokens - `z-ai/glm-5` — 131K context, $0.30/$2.55 per 1M tokens - `minimax/minimax-m2.5` — 1M context, $0.30/$1.10 per 1M tokens Changes: - `rag/llm/chat_model.py` — Add `AvianChat` class extending `Base` - `rag/llm/__init__.py` — Register in `SupportedLiteLLMProvider`, `FACTORY_DEFAULT_BASE_URL`, `LITELLM_PROVIDER_PREFIX` - `conf/llm_factories.json` — Add Avian factory with model definitions - `web/src/constants/llm.ts` — Add to `LLMFactory` enum, `IconMap`, `APIMapUrl` - `web/src/components/svg-icon.tsx` — Register SVG icon - `web/src/assets/svg/llm/avian.svg` — Provider icon - `docs/references/supported_models.mdx` — Add to supported models table This follows the same pattern as other OpenAI-compatible providers (e.g., n1n #12680, TokenPony). cc @KevinHuSh @JinHai-CN ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-02-27 17:36:55 +08:00
eviaaaaa	c59ae4c7c2	Fix: codeExec return types & error handling; Update Spark model mappings (#12896 ) ## What problem does this PR solve? This PR addresses three specific issues to improve agent reliability and model support: 1. `codeExec` Output Limitation: Previously, the `codeExec` tool was strictly limited to returning `string` types. I updated the output constraint to `object` to support structured data (Dicts, Lists, etc.) required for complex downstream tasks. 2. `codeExec` Error Handling: Improved the execution logic so that when runtime errors occur, the tool captures the exception and returns the error message as the output instead of causing the process to abort or fail silently. 3. Spark Model Configuration: - Added support for the `MAX-32k` model variant. - Fixed the `Spark-Lite` mapping from `general` to `lite` to match the latest API specifications. ## Type of change - [x] Bug Fix (fixes execution logic and model mapping) - [x] New Feature / Enhancement (adds model support and improves tool flexibility) ## Key Changes ### `agent/tools/code_exec.py` - Changed the output type definition from `string` to `object`. - Refactored the execution flow to gracefully catch exceptions and return error messages as part of the tool output. ### `rag/llm/chat_model.py` - Added `"Spark-Max-32K": "max-32k"` to the model list. - Updated `"Spark-Lite"` value from `"general"` to `"lite"`. ## Checklist - [x] My code follows the style guidelines of this project. - [x] I have performed a self-review of my own code. Signed-off-by: evilhero <2278596667@qq.com>	2026-01-29 19:22:35 +08:00
Yongteng Lei	b57c82b122	Feat: add kimi-k2.5 (#12852 ) ### What problem does this PR solve? Add kimi-k2.5 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-28 12:41:20 +08:00
Yongteng Lei	2a758402ad	Fix: Hunyuan cannot work properly (#12843 ) ### What problem does this PR solve? Hunyuan cannot work properly ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-27 17:04:53 +08:00
Kevin Hu	927db0b373	Refa: asyncio.to_thread to ThreadPoolExecutor to break thread limitat… (#12716 ) ### Type of change - [x] Refactoring	2026-01-20 13:29:37 +08:00
n1n.ai	f3d347f55f	feat: Add n1n provider (#12680 ) This PR adds n1n as an LLM provider to RAGFlow. Co-authored-by: Qun <qun@ip-10-5-5-38.us-west-2.compute.internal>	2026-01-19 13:12:42 +08:00
Pegasus	b091ff2730	Fix enable_thinking parameter for Qwen3 models (#12603 ) ### Issue When using Qwen3 models (`qwen3-32b`, `qwen3-max`) through the Tongyi-Qianwen provider for non-streaming calls (e.g., knowledge graph generation), the API fails with: Closes #12424 ``` parameter.enable_thinking must be set to false for non-streaming calls ``` ### Root Cause In `LiteLLMBase.async_chat()`, the `extra_body={"enable_thinking": False}` was set in `kwargs` but never forwarded to `_construct_completion_args()`. ### What problem does this PR solve? Pass merged kwargs to `_construct_completion_args()` using `{gen_conf, **kwargs}` to safely handle potential duplicate parameters. ### Changes - `rag/llm/chat_model.py`: Forward kwargs containing `extra_body` to `_construct_completion_args()` in `async_chat()` _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461	2026-01-14 16:35:46 +08:00
Stephen Hu	f1dc2df23c	Fix:Bedrock assume_role auth mode fails with LiteLLM "Extra inputs are not permitted" error (#12495 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/12489 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-08 12:53:41 +08:00
Stephen Hu	6f2fc2f1cb	refactor:re order logics in clean_gen_conf (#12391 ) ### What problem does this PR solve? re order logics in clean_gen_conf #12388 ### Type of change - [x] Refactoring	2026-01-04 10:31:56 +08:00
Magicbook1108	f8fd1ea7e1	Feat: Further update Bedrock model configs (#12029 ) ### What problem does this PR solve? Feat: Further update Bedrock model configs #12020 #12008 <img width="700" alt="2b4f0f7fab803a2a2d5f345c756a2c69" src="https://github.com/user-attachments/assets/e1b9eaad-5c60-47bd-a6f4-88a104ce0c63" /> <img width="700" alt="afe88ec3c58f745f85c5c507b040c250" src="https://github.com/user-attachments/assets/9de39745-395d-4145-930b-96eb452ad6ef" /> <img width="700" alt="1a21bb2b7cd8003dce1e5207f27efc69" src="https://github.com/user-attachments/assets/ddba1682-6654-4954-aa71-41b8ebc04ac0" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 11:32:20 +08:00
Magicbook1108	e84d5412bc	Feat: bedrock iam authentication (#12020 ) ### What problem does this PR solve? Feat: bedrock iam authentication #12008 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-18 17:13:09 +08:00
Yongteng Lei	6be0338aa0	Fix: Asure-OpenAI resource not found (#11934 ) ### What problem does this PR solve? Asure-OpenAI resource not found. #11750 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-13 11:32:46 +08:00
Magicbook1108	948bc93786	Feat: Add GPT-5.2 & pro (#11929 ) ### What problem does this PR solve? Feat: Add GPT-5.2 & pro ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-12 17:35:08 +08:00
Magicbook1108	ca2d6f3301	Fix: duplicate output by async_chat_streamly (#11842 ) ### What problem does this PR solve? Fix: duplicate output by async_chat_streamly Refact: revert manual modification ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-09 19:21:52 +08:00
N0bodycan	9863862348	fix: prevent redundant retries in async_chat_streamly upon success (#11832 ) ## What changes were proposed in this pull request? Added a return statement after the successful completion of the async for loop in async_chat_streamly. ## Why are the changes needed? Previously, the code lacked a break/return mechanism inside the try block. This caused the retry loop (for attempt in range...) to continue executing even after the LLM response was successfully generated and yielded, resulting in duplicate requests (up to max_retries times). ## Does this PR introduce any user-facing change? No (it fixes an internal logic bug).	2025-12-09 17:14:30 +08:00
Yongteng Lei	c51e6b2a58	Refa: migrate CV model chat to Async (#11828 ) ### What problem does this PR solve? Migrate CV model chat to Async. #11750 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-12-09 13:08:37 +08:00
Yongteng Lei	51ec708c58	Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779 ) ### What problem does this PR solve? Cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats. ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-12-08 09:43:03 +08:00
Yongteng Lei	e3f40db963	Refa: make RAGFlow more asynchronous 2 (#11689 ) ### What problem does this PR solve? Make RAGFlow more asynchronous 2. #11551, #11579, #11619. ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-12-03 14:19:53 +08:00
Kevin Hu	a6681d6366	Revert "Refa: make RAGFlow more asynchronous 2" (#11669 ) Reverts infiniflow/ragflow#11664	2025-12-02 19:42:05 +08:00
Yongteng Lei	627c11c429	Refa: make RAGFlow more asynchronous 2 (#11664 ) ### What problem does this PR solve? Make RAGFlow more asynchronous 2. #11551, #11579, #11619. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring - [x] Performance Improvement	2025-12-02 18:57:07 +08:00
Yongteng Lei	a713f54732	Refa: add MiniMax-M2 and remove deprecated MiniMax models (#11642 ) ### What problem does this PR solve? Add MiniMax-M2 and remove deprecated models. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2025-12-02 14:43:44 +08:00
Yongteng Lei	b6c4722687	Refa: make RAGFlow more asynchronous (#11601 ) ### What problem does this PR solve? Try to make this more asynchronous. Verified in chat and agent scenarios, reducing blocking behavior. #11551, #11579. However, the impact of these changes still requires further investigation to ensure everything works as expected. ### Type of change - [x] Refactoring	2025-12-01 14:24:06 +08:00

1 2 3 4 5 ...

262 Commits