### What problem does this PR solve?
1. Add license announcement
2. Add sanity check on API config
3. Add base class: BaseModel
4. Add GetBaseURL
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Switching pagesize on a chunk page did not reset the current page.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
remove old and add latest cohere models
### Type of change
- [x] Refactoring
- [x] Other (please describe): update models
### What problem does this PR solve?
Fix: Model provider add verify and fixed form in modal not resetting
issue
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Closes#15388.
Chat completion routes did not reliably honor per-request generation
settings:
- `/api/v1/chat/completions` copied generation settings with a
truthiness check, so valid zero values such as `temperature: 0`, `top_p:
0`, `frequency_penalty: 0`, `presence_penalty: 0`, and `max_tokens: 0`
were dropped.
- `/api/v1/openai/{chat_id}/chat/completions` did not forward standard
generation settings into the request-specific dialog LLM settings before
calling `async_chat`.
This PR preserves explicitly supplied generation parameters, including
zero values, and merges request-level overrides into existing dialog
settings where appropriate.
The supported generation parameter keys and merge behavior live in a
shared REST API helper to keep both completion routes aligned.
Validation:
- `git diff --check`
- `python3 -m py_compile api/apps/restful_apis/_generation_params.py
api/apps/restful_apis/chat_api.py api/apps/restful_apis/openai_api.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `uv run ruff check api/apps/restful_apis/_generation_params.py
api/apps/restful_apis/chat_api.py api/apps/restful_apis/openai_api.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `ZHIPU_AI_API_KEY=dummy uv run pytest
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py
-q -k generation_params`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
When uploading `.md` files with `parser=naive` and `delimiter="\n"`,
markdown headers (e.g., `## Quick Travel`) become separate chunks with
very short content (16-18 characters). This causes retrieval issues:
when the header is matched, the corresponding body text is not included
in the chunk.
## Related Issues
Closes#15487
## Checklist
- [x] Code changes are minimal and focused
- [x] Unit tests added (12/12 passed)
- [x] No breaking changes
### What problem does this PR solve?
Fix:
- Handle siliconflow and siliconflow_intl api_key
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
add the newanthropic and voyage models. Strip opus 4.7 and 4.8 of
certain usnspported keys
Co-authored-by: Idriss Sbaaoui <112825897+6ba3i@users.noreply.github.com>
## What
#15240
implementation for PUT /api/v1/mcp/servers/:mcp_id
## Changes
- Adds the Go implementation for `PUT /api/v1/mcp/servers/:mcp_id`.
- Wires MCP service and handler into the Go server/router for the update
route.
- Preserves Python-style behavior for ownership checks, partial update
fields, MCP type/name/URL validation, `headers`/`variables`
normalization, and tool metadata scrubbing.
### What problem does this PR solve?
Closes#15379
Around 29 Go model providers in `internal/entity/models/` share an
`http.Client` configured with `Timeout: 120 * time.Second`, and reuse
that same client for `ChatStreamlyWithSender`. Go's
`http.Client.Timeout` is a hard ceiling on the whole request that also
covers reading the response body, so it behaves as a wall clock on
streaming. Any streamed chat response that lasts longer than 120 seconds
gets cut off in the middle with a timeout error. Long generations,
reasoning model outputs, and slow or overloaded upstreams are the common
victims.
The providers that already behave correctly (`groq`, `mistral`,
`voyage`, `anthropic`) set no client `Timeout` and instead wrap each
request in a `context.WithTimeout`. This change converges the affected
providers onto that same pattern.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
mark mysql migrations as applied
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Harden `NewN1NModel` to avoid panics when `http.DefaultTransport` is a
custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `n1n_test.go` with coverage for name/factory plus
`TestN1NNewModelWithCustomDefaultTransport`.
Co-authored-by: Cursor <cursoragent@cursor.com>
### What problem does this PR solve?
This PR aligns `POST /api/v1/system/tokens` in Go with the Python
implementation.
### Type of change
- Keep the token creation flow under the system API route.
- Preserve the owner-tenant authorization check.
- Generate and persist API tokens consistently with the current Go
service flow.
- Return the created token payload in the standard API response format.
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix:
- Use @ to avoid split by `_` in model_name.
- Verify api_key when add instance.
- Pop api_key in list intances response.
- Remove useless index.
- Sort providers, instances and models by name.
- Get `is_tools` from llm_factories.json
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
implement /api/v1/datasets/<dataset_id>/metadata/config
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR fixes the issue where Qwen3.5/Qwen3.6 series models may spend
excessive time on simple document-parsing tasks, such as Auto Metadata
extraction, keyword extraction, question generation, and image
description when using the MinerU parser.
For these tasks, Qwen3.5/Qwen3.6 models may perform unnecessary
reasoning by default, which can lead to very long response times, high
token consumption, and, in some cases, potential infinite output loops.
Since Qwen3.5/Qwen3.6 multimodal models are instantiated as `CvModel`
when configured as `image2text`, the existing `enable_thinking=False`
logic in `chat_model.py` does not apply to them. This PR adds the
corresponding handling for the CV/image-to-text model path as well.
This helps reduce unnecessary thinking time, avoid potential infinite
loops, and improve parsing efficiency without noticeably affecting
output quality for these simple extraction and image-description tasks.
Fixes#15083.
### What problem does this PR solve?
Fixes#15286.
When calling `/api/v1/openai/<chat_id>/chat/completions` with `"stream":
true`, the response contains the answer **twice** — the final message
repeats everything that was already streamed.
#### Root cause
RAGFlow's `async_chat` streams the body as incremental `delta.content`
chunks, then emits a terminating `final` event whose `answer` is the
**complete** (decorated) message. The handler re-emitted that full
answer as one more `delta.content` chunk:
```python
if ans.get("final"):
if ans.get("answer"):
full_content = ans["answer"]
response["choices"][0]["delta"]["content"] = full_content # <-- whole answer again
yield ...
```
So a client accumulating `delta.content` ends up with the message
duplicated.
#### Fix
Drop the re-emission. The complete answer from the `final` event is now
surfaced **only** through the trailing chunk's `final_content` and
`reference` fields, which matches OpenAI streaming semantics: deltas are
incremental, and the final chunk carries only `finish_reason` / `usage`
(plus RAGFlow's `reference` / `final_content` extensions).
This matches the expected behavior described in the issue: "The stream
should only yield content chunks once, and the final message should only
contain reference, usage, and finish_reason."
#### Testability refactor
The streaming SSE assembly was a closure inside the request handler, so
it could only be exercised against a live server + real LLM. I extracted
it into a module-level `_stream_chat_completion_sse` async generator
(behavior-preserving) so it can be unit-tested with a fake event stream.
#### Tests
Adds
`test/unit_test/api/apps/restful_apis/test_openai_stream_no_duplicate.py`
(same import-stub pattern as the existing `test_get_agent_session.py`):
- body is streamed exactly once (the regression);
- the complete answer is never re-emitted as a content chunk;
- the terminating chunk has `finish_reason="stop"`, `content=None`, and
correct `usage`;
- `final_content` / `reference` are present on the trailing chunk;
- reasoning (`think`) deltas stream separately and are not duplicated.
> Note: this is unrelated to #15442, which only changes the `stream`
default — it does not touch the duplication logic.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Added test cases
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
Feature: #14961
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
Fixes#15115.
`GET /api/v1/documents/images/<image_id>` returned **Image not found**
when the thumbnail storage object key contained hyphens (e.g.
`page-1.png`). Document APIs build URLs as `{dataset_id}-{thumbnail}`,
but `get_document_image()` used `image_id.split("-")` and required
exactly two segments, so keys like `<kb_id>-page-1.png` were rejected
even though the blob existed.
This PR splits only on the first hyphen (`split("-", 1)`) and sets
`Content-Type` from the object key extension via `CONTENT_TYPE_MAP`
instead of hardcoding `image/JPEG`.
### What problem does this PR solve?
Fixes#15117.
Chunk images are stored with `img_id = f"{bucket}-{objname}"` in
`image2id()` (`rag/utils/base64_image.py`). When loading via
`id2image()`, the code used `image_id.split("-")` and required exactly
two segments. Object keys that contain hyphens (e.g. `page-1.jpg`)
produce more than two segments, so `id2image` returns `None` and chunk
image previews fail even though the blob exists.
This is the same parsing issue as #15115 (HTTP thumbnail route); this PR
fixes the indexing/retrieval path.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Test plan
- [x] `pytest test/unit_test/rag/utils/test_base64_image.py`
- [ ] Manual: index a chunk with an `objname` containing hyphens and
confirm `img_id` resolves to an image in retrieval
Fixes#15117.
### Related issues
Closes#15312
### What problem does this PR solve?
`tools/scripts/mysql_migration.py` built batch INSERT SQL for the
`tenant_model_provider` stage using f-strings with raw `llm_factory` and
`tenant_id` values. If either value contained a single quote, migration
SQL could fail; this also created unnecessary SQL-injection risk in the
migration path.
This PR replaces string interpolation with parameterized SQL
placeholders in `TenantModelProviderStage.execute()`. The migration now
safely handles quoted values and executes deterministically across
existing tenant data.
### Related issues
Closes#15358
<!-- After filing upstream, replace XXXX with your issue number. -->
---
### What problem does this PR solve?
`POST /api/v1/openai/<chat_id>/chat/completions` forwards `messages` to
`async_chat` without normalizing `content`. Downstream, `dialog_service`
assumes string content:
```python
re.sub(r"##\d+\$\$", "", m["content"])
```
OpenAI-compatible clients may send `content` as an **array** of parts
(text, `image_url`, etc.), including text-only arrays. That causes
`TypeError` and HTTP **500** instead of a valid response or a clear
**400**.
`openai_api.py` also reads `messages[-1]["content"]` directly for
`prompt` without handling list-shaped content.
This PR normalizes array `content` to a string (concatenating `type:
text` parts) before calling `async_chat`, matching a minimal
OpenAI-compat path. Image parts can be documented as unsupported or
handled in a follow-up if vision integration is required.
### What problem does this PR solve?
Fix auto metadata type issue
https://github.com/infiniflow/ragflow/issues/15323
Type information is missing at frontend - backend correctly store the
type information for the auto metadata type.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Fixes#15245 — `POST /api/v1/chat/completions` with `stream=true`
intermittently returns 500:
```
data:{"code": 500, "message": "failed to encode response: json:
unsupported value: NaN (status code: 500)", "data": {...}}
```
…even though "the same question" works on retry.
## Root cause
The streaming path serialized the answer with bare `json.dumps(...)`
(`api/apps/restful_apis/chat_api.py:1221`). `json.dumps` defaults to
`allow_nan=True` and emits the literal token `NaN` for NaN /
Infinity float values. That is valid Python-flavored JSON but
**invalid per RFC 8259**, so downstream consumers reject it. The
reporter's gateway is Go-based and the error wording
(`failed to encode response: json: unsupported value: NaN`) is
straight from Go's `encoding/json`.
How NaN gets into the payload: retrieval scoring in
`rag/nlp/search.py` runs `np.mean(...)` over aggregations that can
be empty, and similarity denominators can be zero. Reference chunk
fields like `similarity`, `vector_similarity`, `term_similarity`
can therefore be NaN depending on which chunks a given query
retrieves — which is exactly why the failure is intermittent for
the same question.
The non-streaming branch (`get_json_result(data=answer)`,
`chat_api.py:1243`) has the same vulnerability — Quart's `jsonify`
also defaults to `allow_nan=True` and the same retrieval pipeline
feeds both branches.
`agent/tools/exesql.py:88-102` already has the same NaN/Inf guard
for SQL results. This PR brings the chat completions path up to
parity.
## Fix
Add a small `_sanitize_json_floats(obj)` helper near the top of
`api/apps/restful_apis/chat_api.py`. It walks `dict` / `list` /
`tuple` and replaces any `float` that is `NaN` or `±Infinity` with
`None`. Apply it at the two serialization boundaries:
- **Streaming branch** (`stream()`): sanitize the SSE payload before
`json.dumps`.
- **Non-streaming branch**: sanitize the `answer` dict before
`get_json_result(data=...)`.
The terminal `data:True` frame and the `code:500` error frame carry
no scores and are left untouched.
Added `import math` to the existing alphabetical import block.
No change to retrieval logic — replacing NaN with `null` at the
serialization boundary is conservative: clients still parse the
JSON, a missing-score chunk is a strictly better failure mode than
a 500 that kills the whole reply.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes#15427.
All LiteLLM-routed chats fail with:
- Anthropic: `litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error","message":"model_type: Extra inputs are
not permitted"}`
- OpenAI: `litellm.BadRequestError: OpenAIException - Unknown parameter:
'model_type'`
This is a regression from v0.25.4.
#### Root cause
A chat assistant's `llm_setting` is forwarded to the model as
`gen_conf`. `llm_setting` can legitimately carry RAGFlow-internal
metadata such as `model_type` (the chat REST APIs in
`api/apps/restful_apis/` read it back out of `llm_setting`), so that key
ends up inside `gen_conf`.
`Base._clean_conf` (OpenAI-compatible providers) already **whitelists**
the keys it forwards, so direct-OpenAI providers were unaffected.
`LiteLLMBase._clean_conf` only dropped `max_tokens` and passed
everything else straight through to `litellm.acompletion`, which
forwarded `model_type` to the upstream provider — and Anthropic / OpenAI
reject it. Because both Claude and GPT route through LiteLLM, every chat
broke.
#### Fix
- Extract the allowed-key set into a shared `ALLOWED_GEN_CONF_KEYS`
constant and reuse it in `Base._clean_conf`.
- Apply the same whitelist in `LiteLLMBase._clean_conf`, plus the
LiteLLM-specific reasoning params (`thinking`, `reasoning_effort`,
`extra_body`) that the model-family policies inject for reasoning
models.
This covers all four LiteLLM completion paths (`async_chat`,
`async_chat_streamly`, `async_chat_with_tools`,
`async_chat_streamly_with_tools`), since they all route through
`_clean_conf`.
#### Tests
Adds `test/unit_test/rag/llm/test_clean_conf_whitelist.py` covering both
backends: `model_type` (and other stray keys) are dropped, genuine
generation params and `thinking` survive, `max_tokens` is removed, and
the whitelist invariants hold.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Added test cases
## What
#15240
Implements `GET /api/v1/mcp/servers` in the Go API server.
## Changes
- Added MCP server DAO list query with tenant scoping.
- Added MCP service response wrapper.
- Added MCP handler for list request parsing and response formatting.
- Wired `GET /api/v1/mcp/servers` under authenticated `/api/v1` routes.
- Initialized MCP service and handler in the Go server startup.
- update_time and update_date now both map to update_date
- create_time and create_date now both map to create_date
- default ordering now returns create_date
## API Behavior
Matches the Python endpoint behavior:
- Requires authenticated user.
- Lists MCP servers for the current user tenant.
- Supports `keywords`.
- Supports `mcp_id` and repeated/comma-separated `mcp_ids`.
- Supports `page`, `page_size`, `orderby`, and `desc`.
- Returns:
```json
{
"code": 0,
"message": "success",
"data": {
"mcp_servers": [],
"total": 0
}
}
```
## Summary
Add Tigris configuration to the Configuration and Backup & migration
pages, using the existing AWS_S3 backend — no code changes required.
Fix `region` → `region_name` in the existing S3 config example in
`backup_and_migration.md`. The code in `s3_conn.py` reads `region_name`,
so the previous field name was silently ignored.
##Context
With MinIO's open-source repository archived (#13840 on
infiniflow/ragflow), users need documented alternatives for object
storage. Tigris is S3-compatible and works with RAGFlow's existing
AWS_S3 backend out of the box.
## Changes
`configurations.md`: Added `### s3 (Tigris)` section after `### minio`,
matching the existing reference style. Includes config block, field
descriptions, and a pointer to `service_conf.yaml.template` for other
S3-compatible backends.
`backup_and_migration.md`: Added Tigris config block under single-bucket
mode. Fixed region → region_name in the existing S3 example. Added
Tigris to the supported backends list.
##Notes
No new files — edits to existing docs only.
Config field names (`access_key`, `secret_key`, `region_name`,
`endpoint_url`, `bucket`, `prefix_path`, `signature_version`,
`addressing_style`) verified against `rag/utils/s3_conn.py`.
## Summary
- Skip MinerU `header`, `footer`, and `page_number` blocks when
converting `content_list.json` into sections.
- Ignore unsupported block types explicitly so future MinerU output
types cannot re-emit the previous text block.
Fixes duplicate text in General/naive chunks when parsing PDFs via
MinerU (reported with repeated page headers and body text in slices).
Closes#15335
## Test plan
- [x] `pytest test/unit_test/deepdoc/parser/test_mineru_parser.py -v`
(4/4 passed)
## Summary
- Add custom `base_url` support to the Google Go model driver.
- Preserve Google URL suffix configuration when creating custom base URL
driver instances.
- Validate Google chat/stream request inputs before constructing the SDK
client.
- Cover Google model listing, connection checks, base URL resolution,
and request validation with focused tests.
## What changed
- `GoogleModel.NewInstance` now returns a Google driver configured with
the supplied base URL map.
- Google SDK client creation now resolves configured base URLs through
`genai.HTTPOptions.BaseURL`.
- Base URL lookup supports configured regions, empty-region keys, and
`default` fallback.
- Google chat, streaming chat, embeddings, and model listing now reject
blank API keys before creating SDK clients.
- Google chat and streaming chat now reject blank model names locally,
and streaming chat rejects a nil sender.
- Existing message handling, embeddings, pagination, and provider errors
are preserved.
## Why
Google custom model instances could not use configured base URLs because
`NewInstance` returned `nil` and the SDK client path ignored the driver
base URL map. The request validation keeps invalid Google calls from
reaching SDK client construction with blank credentials or incomplete
chat inputs.