ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
bitloi	a75ea7ba7c	Fix: Chat completion generation parameter overrides (#15389 ) ### What problem does this PR solve? Closes #15388. Chat completion routes did not reliably honor per-request generation settings: - `/api/v1/chat/completions` copied generation settings with a truthiness check, so valid zero values such as `temperature: 0`, `top_p: 0`, `frequency_penalty: 0`, `presence_penalty: 0`, and `max_tokens: 0` were dropped. - `/api/v1/openai/{chat_id}/chat/completions` did not forward standard generation settings into the request-specific dialog LLM settings before calling `async_chat`. This PR preserves explicitly supplied generation parameters, including zero values, and merges request-level overrides into existing dialog settings where appropriate. The supported generation parameter keys and merge behavior live in a shared REST API helper to keep both completion routes aligned. Validation: - `git diff --check` - `python3 -m py_compile api/apps/restful_apis/_generation_params.py api/apps/restful_apis/chat_api.py api/apps/restful_apis/openai_api.py test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py` - `uv run ruff check api/apps/restful_apis/_generation_params.py api/apps/restful_apis/chat_api.py api/apps/restful_apis/openai_api.py test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py` - `ZHIPU_AI_API_KEY=dummy uv run pytest test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py -q -k generation_params` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-03 11:46:10 +08:00
kpdev	76968af0ba	Guard missing storage blobs on preview and image endpoints (#15366 ) Fixes [#15365](https://github.com/infiniflow/ragflow/issues/15365) — `get_document_image()` and document preview call `make_response(None)` when storage returns no bytes, causing HTTP 500.	2026-06-03 11:33:03 +08:00
nickmopen	5b02fe4841	fix(api): stop duplicating answer in openai-compatible chat completions stream (#15286 ) (#15443 ) ### What problem does this PR solve? Fixes #15286. When calling `/api/v1/openai/<chat_id>/chat/completions` with `"stream": true`, the response contains the answer twice — the final message repeats everything that was already streamed. #### Root cause RAGFlow's `async_chat` streams the body as incremental `delta.content` chunks, then emits a terminating `final` event whose `answer` is the complete (decorated) message. The handler re-emitted that full answer as one more `delta.content` chunk: ```python if ans.get("final"): if ans.get("answer"): full_content = ans["answer"] response["choices"][0]["delta"]["content"] = full_content # <-- whole answer again yield ... ``` So a client accumulating `delta.content` ends up with the message duplicated. #### Fix Drop the re-emission. The complete answer from the `final` event is now surfaced only through the trailing chunk's `final_content` and `reference` fields, which matches OpenAI streaming semantics: deltas are incremental, and the final chunk carries only `finish_reason` / `usage` (plus RAGFlow's `reference` / `final_content` extensions). This matches the expected behavior described in the issue: "The stream should only yield content chunks once, and the final message should only contain reference, usage, and finish_reason." #### Testability refactor The streaming SSE assembly was a closure inside the request handler, so it could only be exercised against a live server + real LLM. I extracted it into a module-level `_stream_chat_completion_sse` async generator (behavior-preserving) so it can be unit-tested with a fake event stream. #### Tests Adds `test/unit_test/api/apps/restful_apis/test_openai_stream_no_duplicate.py` (same import-stub pattern as the existing `test_get_agent_session.py`): - body is streamed exactly once (the regression); - the complete answer is never re-emitted as a content chunk; - the terminating chunk has `finish_reason="stop"`, `content=None`, and correct `usage`; - `final_content` / `reference` are present on the trailing chunk; - reasoning (`think`) deltas stream separately and are not duplicated. > Note: this is unrelated to #15442, which only changes the `stream` default — it does not touch the duplication logic. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Added test cases --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-02 13:20:40 +08:00
kpdev	0f6f7b3c3c	fix(api): document image_id parsing for hyphenated thumbnail keys (#15115 ) (#15116 ) ### What problem does this PR solve? Fixes #15115. `GET /api/v1/documents/images/<image_id>` returned Image not found when the thumbnail storage object key contained hyphens (e.g. `page-1.png`). Document APIs build URLs as `{dataset_id}-{thumbnail}`, but `get_document_image()` used `image_id.split("-")` and required exactly two segments, so keys like `<kb_id>-page-1.png` were rejected even though the blob existed. This PR splits only on the first hyphen (`split("-", 1)`) and sets `Content-Type` from the object key extension via `CONTENT_TYPE_MAP` instead of hardcoding `image/JPEG`.	2026-06-02 10:54:14 +08:00
kpdev	a4bc066f74	fix(rag): id2image parsing for hyphenated storage object keys (#15117 ) (#15118 ) ### What problem does this PR solve? Fixes #15117. Chunk images are stored with `img_id = f"{bucket}-{objname}"` in `image2id()` (`rag/utils/base64_image.py`). When loading via `id2image()`, the code used `image_id.split("-")` and required exactly two segments. Object keys that contain hyphens (e.g. `page-1.jpg`) produce more than two segments, so `id2image` returns `None` and chunk image previews fail even though the blob exists. This is the same parsing issue as #15115 (HTTP thumbnail route); this PR fixes the indexing/retrieval path. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Test plan - [x] `pytest test/unit_test/rag/utils/test_base64_image.py` - [ ] Manual: index a chunk with an `objname` containing hyphens and confirm `img_id` resolves to an image in retrieval Fixes #15117.	2026-06-02 10:52:51 +08:00
Hernandez Avelino	09d0a17453	fix(api): handle array message content on OpenAI chat completions (#15359 ) ### Related issues Closes #15358 <!-- After filing upstream, replace XXXX with your issue number. --> --- ### What problem does this PR solve? `POST /api/v1/openai/<chat_id>/chat/completions` forwards `messages` to `async_chat` without normalizing `content`. Downstream, `dialog_service` assumes string content: ```python re.sub(r"##\d+\$\$", "", m["content"]) ``` OpenAI-compatible clients may send `content` as an array of parts (text, `image_url`, etc.), including text-only arrays. That causes `TypeError` and HTTP 500 instead of a valid response or a clear 400. `openai_api.py` also reads `messages[-1]["content"]` directly for `prompt` without handling list-shaped content. This PR normalizes array `content` to a string (concatenating `type: text` parts) before calling `async_chat`, matching a minimal OpenAI-compat path. Image parts can be documented as unsupported or handled in a follow-up if vision integration is required.	2026-06-02 10:27:03 +08:00
Rene Arredondo	e1403171f1	fix(chat): sanitize NaN/Inf scores before serializing chat completions (#15245 ) (#15266 ) ## Summary Fixes #15245 — `POST /api/v1/chat/completions` with `stream=true` intermittently returns 500: ``` data:{"code": 500, "message": "failed to encode response: json: unsupported value: NaN (status code: 500)", "data": {...}} ``` …even though "the same question" works on retry. ## Root cause The streaming path serialized the answer with bare `json.dumps(...)` (`api/apps/restful_apis/chat_api.py:1221`). `json.dumps` defaults to `allow_nan=True` and emits the literal token `NaN` for NaN / Infinity float values. That is valid Python-flavored JSON but invalid per RFC 8259, so downstream consumers reject it. The reporter's gateway is Go-based and the error wording (`failed to encode response: json: unsupported value: NaN`) is straight from Go's `encoding/json`. How NaN gets into the payload: retrieval scoring in `rag/nlp/search.py` runs `np.mean(...)` over aggregations that can be empty, and similarity denominators can be zero. Reference chunk fields like `similarity`, `vector_similarity`, `term_similarity` can therefore be NaN depending on which chunks a given query retrieves — which is exactly why the failure is intermittent for the same question. The non-streaming branch (`get_json_result(data=answer)`, `chat_api.py:1243`) has the same vulnerability — Quart's `jsonify` also defaults to `allow_nan=True` and the same retrieval pipeline feeds both branches. `agent/tools/exesql.py:88-102` already has the same NaN/Inf guard for SQL results. This PR brings the chat completions path up to parity. ## Fix Add a small `_sanitize_json_floats(obj)` helper near the top of `api/apps/restful_apis/chat_api.py`. It walks `dict` / `list` / `tuple` and replaces any `float` that is `NaN` or `±Infinity` with `None`. Apply it at the two serialization boundaries: - Streaming branch (`stream()`): sanitize the SSE payload before `json.dumps`. - Non-streaming branch: sanitize the `answer` dict before `get_json_result(data=...)`. The terminal `data:True` frame and the `code:500` error frame carry no scores and are left untouched. Added `import math` to the existing alphabetical import block. No change to retrieval logic — replacing NaN with `null` at the serialization boundary is conservative: clients still parse the JSON, a missing-score chunk is a strictly better failure mode than a 500 that kills the whole reply. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-02 10:08:34 +08:00
nickmopen	bebf6ed244	fix(llm): strip non-generation keys from gen_conf for LiteLLM providers (#15427 ) (#15432 ) ### What problem does this PR solve? Fixes #15427. All LiteLLM-routed chats fail with: - Anthropic: `litellm.BadRequestError: AnthropicException - {"type":"invalid_request_error","message":"model_type: Extra inputs are not permitted"}` - OpenAI: `litellm.BadRequestError: OpenAIException - Unknown parameter: 'model_type'` This is a regression from v0.25.4. #### Root cause A chat assistant's `llm_setting` is forwarded to the model as `gen_conf`. `llm_setting` can legitimately carry RAGFlow-internal metadata such as `model_type` (the chat REST APIs in `api/apps/restful_apis/` read it back out of `llm_setting`), so that key ends up inside `gen_conf`. `Base._clean_conf` (OpenAI-compatible providers) already whitelists the keys it forwards, so direct-OpenAI providers were unaffected. `LiteLLMBase._clean_conf` only dropped `max_tokens` and passed everything else straight through to `litellm.acompletion`, which forwarded `model_type` to the upstream provider — and Anthropic / OpenAI reject it. Because both Claude and GPT route through LiteLLM, every chat broke. #### Fix - Extract the allowed-key set into a shared `ALLOWED_GEN_CONF_KEYS` constant and reuse it in `Base._clean_conf`. - Apply the same whitelist in `LiteLLMBase._clean_conf`, plus the LiteLLM-specific reasoning params (`thinking`, `reasoning_effort`, `extra_body`) that the model-family policies inject for reasoning models. This covers all four LiteLLM completion paths (`async_chat`, `async_chat_streamly`, `async_chat_with_tools`, `async_chat_streamly_with_tools`), since they all route through `_clean_conf`. #### Tests Adds `test/unit_test/rag/llm/test_clean_conf_whitelist.py` covering both backends: `model_type` (and other stray keys) are dropped, genuine generation params and `thinking` survive, `max_tokens` is removed, and the whitelist invariants hold. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Added test cases	2026-06-02 10:04:11 +08:00
monsterDavid	d398d617ca	fix(mineru): skip page chrome blocks to prevent duplicate chunks (#15387 ) ## Summary - Skip MinerU `header`, `footer`, and `page_number` blocks when converting `content_list.json` into sections. - Ignore unsupported block types explicitly so future MinerU output types cannot re-emit the previous text block. Fixes duplicate text in General/naive chunks when parsing PDFs via MinerU (reported with repeated page headers and body text in slices). Closes #15335 ## Test plan - [x] `pytest test/unit_test/deepdoc/parser/test_mineru_parser.py -v` (4/4 passed)	2026-06-01 20:15:04 +08:00
Wang Qi	1a6df01b53	Bug fix: Enhance embeding model to give better error message (#15346 ) To resolve https://github.com/infiniflow/ragflow/issues/15343 enhance the model embedding message to give extact failure message to customer. # QWen ## Retrieval <img width="3321" height="1033" alt="image" src="https://github.com/user-attachments/assets/6b82921a-a3a7-4a33-a383-1cf316398ee2" /> ## Chat <img width="2241" height="311" alt="image" src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716" /> # SiliconFlow ## Retrieval <img width="3321" height="1033" alt="image" src="https://github.com/user-attachments/assets/ee2cd191-a27d-4729-b53d-2fbdb4e352cd" /> ## Chat <img width="1562" height="210" alt="image" src="https://github.com/user-attachments/assets/10376a8e-a3f4-422f-bc2e-96f2a8a96448" /> # Baichuan ## Retrieval <img width="3321" height="1107" alt="image" src="https://github.com/user-attachments/assets/dcb5409d-f7fc-4804-b186-5e1ee11e09c4" /> ## Chat <img width="2241" height="311" alt="image" src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716" /> # Zhipu zhipu is good.	2026-06-01 19:18:16 +08:00
kpdev	252cc19f93	Infer Content-Type for document image endpoint (#15368 ) ## Summary Fixes [#15367](https://github.com/infiniflow/ragflow/issues/15367) — `GET /api/v1/documents/images/<image_id>` always returned `Content-Type: image/JPEG` even for PNG/WebP chunk images and extensioned thumbnails. ## Related Issue Fixes #15367 ## Change Type - [x] Bug fix - [x] Regression tests - [ ] New feature - [ ] Refactor ## What Changed - Added `_detect_image_content_type_from_bytes()` — PNG/JPEG/GIF/WebP/BMP magic-byte detection - Added `_content_type_for_document_image()` — object-key extension via `CONTENT_TYPE_MAP`, then magic bytes, else `application/octet-stream` - `get_document_image()` — set inferred `Content-Type` instead of hardcoded `image/JPEG` - Also guards missing storage blob (`Image not found.`) to avoid `make_response(None)` (same handler; complements #15365) ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/document_api.py` \| MIME inference helpers + handler update \| \| `test/testcases/test_web_api/test_document_app/test_document_metadata.py` \| 3 unit tests \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_content_type_from_object_extension_unit -v pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_content_type_from_magic_bytes_unit -v pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_missing_blob_unit -v ``` ## Test Plan - [x] `.png` object key → `image/png` - [x] Extensionless chunk key + PNG bytes → `image/png` (magic bytes) - [x] Missing blob → 4xx `"Image not found."` - [ ] CI green	2026-06-01 19:08:32 +08:00
kpdev	b35266e9a5	Return 4xx when file download storage blob is missing (#15371 ) ## Summary Fixes [#15369](https://github.com/infiniflow/ragflow/issues/15369) — `GET /api/v1/files/<file_id>` calls `make_response(None)` when both primary and fallback storage lookups return empty, causing HTTP 500. ## Related Issue Fixes #15369 ## Change Type - [x] Bug fix - [x] Regression tests ## What Changed - `file_api.download()` — after fallback `STORAGE_IMPL.get`, return `get_error_data_result(message="This file is empty.")` when `not blob`, matching document REST download semantics. ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/file_api.py` \| Empty-blob guard before `make_response()` \| \| `test/testcases/test_web_api/test_file_app/test_file_routes_unit.py` \| Regression test \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_file_app/test_file_routes_unit.py::test_download_missing_blob_returns_error -v pytest test/testcases/test_web_api/test_file_app/test_file_routes_unit.py::test_download_falls_back_to_document_storage -v ``` ## Test Plan - [x] Both storage paths empty → `"This file is empty."` (no `make_response(None)`) - [x] Existing fallback success test still passes - [ ] CI green	2026-06-01 19:08:06 +08:00
Idriss Sbaaoui	da1ed6f0e7	Feat: add new tests and tescases for restful api suite (#15347 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-06-01 11:02:40 +08:00
web-dev0521	cd18cfab79	feat(connector): implement Outlook data source connector (issue #15332 ) (#15333 ) ### What problem does this PR solve? Closes #15332. RAGFlow can index Gmail and generic IMAP mailboxes but had no native connector for Outlook / Microsoft 365 mail. Organisations on Microsoft 365 had no way to bring mailbox content into a knowledge base through Microsoft Graph. This PR adds a net-new Outlook data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Pages over `/users/{id}/mailFolders/{folder}/messages/delta` per mailbox and persists `@odata.deltaLink` values in `OutlookCheckpoint.delta_links`, so incremental syncs only fetch changed messages. - Supports two scoping modes: - Tenant-wide (default): enumerates every user in the tenant via `/users` and syncs each mailbox. Requires `User.Read.All`. - Targeted: when `user_ids` is provided (comma-separated UPNs or object IDs), only those mailboxes are synced. `User.Read.All` is not needed in this mode. - Lets the caller pick the mail folder (`inbox`, `sentitems`, `archive`, ...). Defaults to `inbox`. - Maps each message to a `Document` shaped after the Gmail connector: one `TextSection` carrying `From/To/Cc/Subject` headers + body, with HTML bodies stripped to text inline (no extra dependency). - Surfaces typed errors on the validation probe: 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with `Mail.Read` / `User.Read.All` hint), 404 on a configured mailbox → `ConnectorValidationError`, 5xx → `UnexpectedValidationError`. - Skips messages flagged `@removed` by the delta semantics and messages whose `receivedDateTime` is older than `poll_range_start`. #### Files \| File \| Change \| \|------\|--------\| \| `common/data_source/outlook_connector.py` \| New — `OutlookConnector` (`CheckpointedConnectorWithPermSync` + `SlimConnectorWithPermSync`) + `OutlookCheckpoint` + tiny `_strip_html` helper. \| \| `common/data_source/config.py` \| `DocumentSource.OUTLOOK = "outlook"`. \| \| `common/constants.py` \| `FileSource.OUTLOOK = "outlook"`. \| \| `common/data_source/__init__.py` \| Export `OutlookConnector`. \| \| `rag/svr/sync_data_source.py` \| `Outlook(SyncBase)` with `batch_size` normalisation, CSV/list parsing of `user_ids`; registered in `func_factory`. \| \| `web/src/pages/user-setting/data-source/constant/index.tsx` \| `DataSourceKey.OUTLOOK`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder, user_ids, batch_size), default values. \| \| `web/src/locales/en.ts`, `web/src/locales/zh.ts` \| `outlookDescription` + 5 tooltip keys (EN + ZH). \| \| `test/unit_test/data_source/test_outlook_connector_unit.py` \| New — 19 unit tests (`p1`/`p2`/`p3`) covering auth, validation (tenant-wide vs specific user vs error paths), checkpoint helpers, user enumeration pagination, message filtering, HTML body stripping. \| #### Required Azure AD permissions - `Mail.Read` (Application, admin-granted) — always. - `User.Read.All` (Application, admin-granted) — only when `user_ids` is left blank so the connector can enumerate mailboxes. #### Out of scope - Attachment indexing. The current connector emits message body + headers; binary attachments are flagged via `metadata.has_attachments` but not pulled. Adding attachment hydration is straightforward but scoped out per the issue's "decide whether attachments are indexed in the first version" note. - Delegated (per-user) OAuth. The connector uses app-only credentials, consistent with the SharePoint / Teams precedent in this codebase. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-29 21:52:29 +08:00
galuis116	d1f6594618	Fix: JWT algorithm-confusion in OIDC ID token verification (#15181 ) ### What problem does this PR solve? Closes #15180. `OIDCClient.parse_id_token` in `api/apps/auth/oidc.py` read the JWT signing algorithm from the unverified JWT header and passed it through to `jwt.decode(..., algorithms=[alg], ...)` as the trust anchor. This is the textbook JWT algorithm-confusion vulnerability (CWE-345 / CWE-347). Any unauthenticated client capable of reaching the OIDC callback could take over an arbitrary account on any RAGFlow deployment with OIDC login enabled: 1. `alg: "none"` — present a JWT with `{"alg": "none"}` and no signature segment → `jwt.decode(..., algorithms=["none"])` → PyJWT's `NoneAlgorithm` accepts the token without verification → login as any user. 2. RSA / HMAC confusion — fetch the public RSA key from the provider's JWKS (it's public), forge a JWT with `{"alg": "HS256"}` HMAC-signed using the public-key bytes as the secret → `jwt.decode(..., algorithms=["HS256"], key=public_key)` → verifier accepts → login as any user. (Modern PyJWT independently refuses to use a PEM-formatted key as an HMAC secret, which mitigates this leg for PEM key formats; the fix here is the only mitigation for raw / DER / JWK octet keys and for older PyJWT versions.) ### What changed `api/apps/auth/oidc.py`: - New module constants `_ALLOWED_OIDC_SIGNING_ALGS` (asymmetric-only: `RS`, `ES`, `PS`, `EdDSA` — explicitly excludes `none` and `HS`) and `_DEFAULT_OIDC_SIGNING_ALGS = ("RS256",)` (the OIDC Core 1.0 §2 spec default). - New helper `_resolve_id_token_signing_algs(metadata)` — intersects the provider's advertised `id_token_signing_alg_values_supported` from `/.well-known/openid-configuration` with the safe allowlist; falls back to RS256 when the field is missing or contains only unsafe values. - `OIDCClient.__init__` now stores the resolved allowlist on `self.id_token_signing_algs` — pinned once, from a trusted source, at construction time. - `parse_id_token` no longer calls `jwt.get_unverified_header` and no longer reads `alg` from the JWT header. It passes `self.id_token_signing_algs` to `jwt.decode(..., algorithms=...)`. `PyJWKClient.get_signing_key_from_jwt` still reads the `kid` from the header internally for JWKS lookup — that's fine, `kid` is not a security decision; the signature still proves which key was actually used. `test/testcases/test_web_api/test_auth_app/test_oidc_client_unit.py`: - Existing `test_parse_id_token_success_and_error` drops its `jwt.get_unverified_header` mock (no longer called by `parse_id_token`). - `_metadata` and `_make_client` helpers grew an optional `signing_algs` parameter so tests can configure what the discovery document advertises. - New `TestSSRFValidation` / algorithm-confusion regression block (7 tests): - `test_id_token_signing_algs_default_to_rs256_when_metadata_missing` - `test_id_token_signing_algs_intersect_metadata_with_safe_allowlist` - `test_id_token_signing_algs_fall_back_when_only_unsafe_advertised` - `test_id_token_signing_algs_ignores_non_string_entries` - `test_id_token_signing_algs_handles_non_list_metadata_field` - `test_parse_id_token_passes_pinned_algorithms_to_jwt_decode` — sabotages `jwt.get_unverified_header` to raise on call, proving the verification path never consults the unverified header. - `test_parse_id_token_rejects_alg_none` — uses real PyJWT to encode an `alg: "none"` token; `parse_id_token` raises `ValueError("Error parsing ID Token: …")` instead of accepting it. - `test_parse_id_token_rejects_hs256_when_allowlist_is_asymmetric` — uses real PyJWT to forge an `alg: "HS256"` token with a non-PEM shared secret (so PyJWT's incidental PEM-as-HMAC refusal isn't what blocks it); `parse_id_token` raises because `HS256` is not in the pinned allowlist. Sanity-checked end-to-end with real PyJWT outside the project test runner: - `alg=none` forged token + `algorithms=["RS256"]` → `InvalidAlgorithmError` ✓ - `alg=HS256` forged token + `algorithms=["RS256"]` → `InvalidAlgorithmError` ✓ - Same `alg=HS256` token + `algorithms=["HS256"]` → accepted ({'sub': 'admin'}) — confirming the attack path was real before the fix. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: galuis116 <contact@duerrimports.com>	2026-05-29 19:37:01 +08:00
kpdev	cb1ea5a47f	Validate chunk image_base64 before doc-store write (#15364 ) ## Summary Fixes [#15363](https://github.com/infiniflow/ragflow/issues/15363) — `add_chunk` / `update_chunk` indexed chunks with `image_id` before validating or storing `image_base64`, leaving orphan chunks on invalid input. ## Related Issue Fixes #15363 ## Change Type - [x] Bug fix - [x] Regression tests ## What Changed - Added `_decode_chunk_image_base64()` — strict base64 decode with structured 4xx errors - Added `_store_chunk_image_or_error()` — catches `store_chunk_image` failures - `add_chunk` / `update_chunk`: decode + store image before `docStoreConn.insert` / `update`; only set `img_id` after successful storage ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/chunk_api.py` \| Helpers + reorder image handling \| \| `test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py` \| 3 regression tests \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_add_chunk_invalid_image_base64_does_not_index_chunk -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_update_chunk_invalid_image_base64_does_not_update_chunk -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_add_chunk_valid_image_base64_stores_before_insert -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py -v ``` ## Test Plan - [x] Invalid `image_base64` on add → 4xx, no doc-store insert - [x] Invalid `image_base64` on update → 4xx, no doc-store update - [x] Valid PNG base64 on add → image stored, chunk indexed with `img_id` - [ ] CI green	2026-05-29 19:36:46 +08:00
monsterDavid	53bb2bd9e8	fix(metadata): preserve empty AND results across filter conditions (#15386 ) ## Summary - Fix `meta_filter()` AND logic so an empty result from an early condition is not overwritten when a later condition matches. - Add regression tests for empty-first AND, successful AND intersection, and OR behavior after an empty first condition. Fixes incorrect `/retrieval` metadata filtering when multiple AND conditions are used and the first condition matches no documents. Closes #15360 ## Test plan - [x] `pytest test/unit_test/common/test_metadata_filter_operators.py -v` (19/19 passed)	2026-05-29 19:33:26 +08:00
web-dev0521	bda2117a25	feat(connector): implement OneDrive data source connector (issue #15330 ) (#15331 ) ### What problem does this PR solve? Closes #15330. RAGFlow had no connector for OneDrive / OneDrive for Business. Users who store working documents in OneDrive could not index them into a knowledge base without manually downloading and re-uploading files. This PR adds a net-new OneDrive data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Enumerates every drive visible to the service principal and pages through `/drives/{id}/root/delta`, persisting `@odata.deltaLink` values per drive so subsequent syncs only fetch changed items. - Optionally narrows ingestion to a sub-folder (`folder_path`) without needing a separate code path. - Surfaces typed errors on the validation probe (`GET /drives?$top=1`): 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with a `Files.Read.All` hint), 5xx → `UnexpectedValidationError`. - Filters folders, soft-deleted items, and unsupported extensions (`.pdf .docx .doc .xlsx .xls .pptx .ppt .txt .md .csv`). #### Files \| File \| Change \| \|------\|--------\| \| `common/data_source/onedrive_connector.py` \| New — `OneDriveConnector` + `OneDriveCheckpoint`. \| \| `common/data_source/config.py` \| `DocumentSource.ONEDRIVE = "onedrive"`. \| \| `common/constants.py` \| `FileSource.ONEDRIVE = "onedrive"`. \| \| `common/data_source/__init__.py` \| Export `OneDriveConnector`. \| \| `rag/svr/sync_data_source.py` \| `OneDrive(SyncBase)` with `batch_size` normalisation; registered in `func_factory`. \| \| `web/src/pages/user-setting/data-source/constant/index.tsx` \| `DataSourceKey.ONEDRIVE`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder_path, batch_size), default values. \| \| `web/src/locales/en.ts`, `web/src/locales/zh.ts` \| `onedriveDescription` + 4 tooltip keys (EN + ZH). \| \| `test/unit_test/data_source/test_onedrive_connector_unit.py` \| New — 13 unit tests (`p1`/`p2`) covering auth, validation, checkpoint helpers, and document filtering. \| #### Required Azure AD permission `Files.Read.All` (Application, admin-granted). #### Out of scope - Interactive end-user OAuth (delegated permissions) — the connector uses app-only credentials, consistent with the SharePoint / Teams precedent. - Binary download of file contents — the sync layer emits `Document`s carrying `webUrl` + metadata; bytes are hydrated downstream by the parse pipeline. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-29 19:26:06 +08:00
buua436	bd6251f462	Fix: default OpenAI chat completions to non-stream (#15394 ) ### What problem does this PR solve? default OpenAI chat completions to non-stream when `stream` is omitted https://github.com/infiniflow/ragflow/issues/15356 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 17:47:47 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
web-dev0521	98bc9ca6ac	feat: implement Microsoft Teams data source connector (#15193 ) ### What problem does this PR solve? Closes #15191. RAGFlow shipped a Microsoft Teams connector stub (`common/data_source/teams_connector.py`) whose document-loading methods all returned `[]`, `Teams._generate()` was a `pass`, and Teams was commented out of the data-source settings UI. As a result there was no way to index Teams channel conversations into a knowledge base. This PR implements the connector end to end on top of Microsoft Graph (Office365-REST-Python-Client). It shares the MSAL client-credentials auth shape with the SharePoint connector. Backend - `common/data_source/teams_connector.py` - `load_credentials()` now builds the Graph client using an MSAL client-credentials token callback — the form `GraphClient` actually expects. (The previous stub passed a raw access-token string to `GraphClient(...)`, which is not how that client is driven.) Token acquisition is lazy, so credential loading performs no network call. - `validate_connector_settings()` lists teams via Graph. - `load_from_checkpoint()` is now a generator that pages teams → channels → messages, flattens each top-level post together with its replies into one blob-based `Document` (`extension` `.txt`/`.html`, `blob`, `size_bytes`, `doc_updated_at`). Incremental syncs are bounded by message `lastModifiedDateTime` (falling back to `createdDateTime`). Per-message errors surface as `ConnectorFailure` instead of aborting the run. - `retrieve_all_slim_docs_perm_sync()` yields id-only `SlimDocument` batches and the checkpoint helpers return proper `TeamsCheckpoint`s. - ACL → `ExternalAccess` mapping is intentionally left best-effort (`load_from_checkpoint_with_perm_sync` delegates to the standard load) because the sync pipeline does not currently persist `ExternalAccess`. - `rag/svr/sync_data_source.py` - Implemented `Teams._generate()` using the existing `CheckpointOutputWrapper` pattern (same shape as Confluence/Jira/Google Drive), supporting full reindex and incremental polling from `poll_range_start`. - `TeamsConnector` is already exported from `common/data_source/__init__.py`. Frontend (`web/`) - Enabled the `TEAMS` data-source enum and added its form fields (`tenant_id`, `client_id`, `client_secret`), default values, display metadata, and a Teams icon. - Added `teamsDescription` / `teamsTenantIdTip` to `en.ts` and `zh.ts`. Tests - `test/unit_test/data_source/test_teams_connector_unit.py`: mock-based unit tests covering credential loading (incomplete creds raise, happy path sets the Graph client, fetch-without-creds raises), post/reply flattening (incl. the HTML vs text extension), incremental `lastModifiedDateTime` filtering, and slim-doc listing. All 6 pass; `ruff check` is clean. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-28 17:10:38 +08:00
web-dev0521	5de021ebb4	feat: implement Slack data source connector (#15188 ) ### What problem does this PR solve? Closes #15187. RAGFlow shipped a Slack connector (`common/data_source/slack_connector.py`) but it was never usable: `Slack._generate()` in the sync worker was a `pass` stub, the connector's document-generating code was incompatible with the current data model, and Slack was commented out of the data-source settings UI. As a result, teams had no way to index Slack channels/threads into a knowledge base. This PR completes the connector end to end. Backend - `common/data_source/slack_connector.py` - Rewrote `thread_to_doc` to produce a blob-based `Document` (`extension`/`blob`/`size_bytes`). The previous implementation built the doc with a `sections=[...]` argument and omitted the now-required `blob`/`extension`/ `size_bytes` fields, so it raised a validation error against the current `Document` model. Thread messages are now cleaned and flattened into a single UTF-8 text blob. - Added `load_from_state()` / `poll_source(start, end)` generators. The connector's checkpoint interface is a no-op stub, so both full and incremental syncs run through a single channel-iterating generator built on the existing module helpers (`get_channels`, `filter_channels`, `get_channel_messages`, `_process_message`), with per-channel thread de-duplication. - `rag/svr/sync_data_source.py` - Implemented `Slack._generate()`. Credentials are loaded via `StaticCredentialsProvider` (the connector requires `slack_bot_token` and does not support `load_credentials`). Supports full reindex and incremental polling from `poll_range_start`, plus the optional channel filter. Modeled on the Confluence/Dropbox wrappers. - `SlackConnector` was already exported from `common/data_source/__init__.py`. Frontend (`web/`) - Enabled the `SLACK` data-source enum and added its form fields (Slack bot token + optional channel filter), default values, display metadata, and a Slack icon. - Added `slackDescription` / `slackBotTokenTip` / `slackChannelsTip` strings to `en.ts` and `zh.ts`. Tests - `test/unit_test/data_source/test_slack_connector_unit.py`: unit tests covering credential loading (`load_credentials` raises, `set_credentials_provider` initializes clients, missing credentials raises) and document generation (standalone message + flattened thread, blob/extension/size_bytes/metadata, and the incremental poll time window). All 5 pass; `ruff check` is clean. Required Slack scopes: `channels:read`, `channels:history`, `users:read`. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-28 15:46:07 +08:00
web-dev0521	c4c4e228e3	feat: implement SharePoint data source connector (#15190 ) ### What problem does this PR solve? Closes #15189. RAGFlow shipped a SharePoint connector stub (`common/data_source/sharepoint_connector.py`) whose document-loading methods all returned `[]`, `SharePoint._generate()` was a `pass`, and SharePoint was commented out of the data-source settings UI. As a result there was no way to index files stored in SharePoint document libraries. This PR implements the connector end to end on top of Microsoft Graph (Office365-REST-Python-Client). Backend - `common/data_source/sharepoint_connector.py` - `load_credentials()` now builds the Graph client using an MSAL client-credentials token callback — the form `GraphClient` actually expects. (The previous stub passed a raw access-token string to `GraphClient(...)`, which is not how that client is driven.) Token acquisition is lazy, so credential loading does no network call. - `validate_connector_settings()` resolves the configured site via Graph. - `load_from_checkpoint()` is now a generator that enumerates every document library under the site, walks folders depth-first, downloads each file, and yields blob-based `Document` objects (`extension` / `blob` / `size_bytes` / `doc_updated_at`). Incremental syncs are bounded by file `lastModifiedDateTime`. Per-file errors are surfaced as `ConnectorFailure` rather than aborting the run. - `retrieve_all_slim_docs_perm_sync()` yields id-only `SlimDocument` batches (no downloads) and the checkpoint helpers return proper checkpoints. - ACL → `ExternalAccess` mapping is intentionally left best-effort (`load_from_checkpoint_with_perm_sync` delegates to the standard load) because the sync pipeline does not currently persist `ExternalAccess`; this can be extended once that plumbing exists. - `rag/svr/sync_data_source.py` - Implemented `SharePoint._generate()` using the existing `CheckpointOutputWrapper` pattern (same shape as Confluence/Jira/Google Drive), supporting full reindex and incremental polling from `poll_range_start`. - `SharePointConnector` is already exported from `common/data_source/__init__.py`. Frontend (`web/`) - Enabled the `SHAREPOINT` data-source enum and added its form fields `site_url`, `tenant_id`, `client_id`, `client_secret`), default values, display metadata, and a SharePoint icon. - Added `sharepointDescription` / `sharepointSiteUrlTip` to `en.ts` and `zh.ts`. Tests - `test/unit_test/data_source/test_sharepoint_connector_unit.py`: mock-based unit tests covering credential loading (incomplete creds raise, happy path sets the Graph client, fetch-without-creds raises), drive traversal + file download, incremental `lastModifiedDateTime` filtering, and slim-doc listing. All 6 pass; `ruff check` is clean. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-28 13:26:08 +08:00
Wang Qi	0aff6a3f32	Feature: Allow page_size max value 100 (#15292 ) Feature: Allow page_size max value 100	2026-05-28 11:13:01 +08:00
Idriss Sbaaoui	0940f1a135	Feat: add new tests and tescases for restful api suite (#15299 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-05-28 11:03:12 +08:00
Jack	f0cb7a544b	Refactor: Task Executor (#15154 ) ### What problem does this PR solve? 1. Break huge function into smaller pieces 2. Add unit test for the smaller pieces function 3. Layer-ed design a. infra layer - task_context.py, recording_context.py, write_operation_interceptor.py, ... b. service layer - *_service.py c. business layer - task_handler.py 4. Default behavior: use "refactor-ed version" - can switch to original version by change env variable ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring - [x] Performance Improvement --------- Co-authored-by: Liu An <asiro@qq.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-05-27 21:54:17 +08:00
Idriss Sbaaoui	1f34a18242	Feat: add new tests and tescases for restful api suite (#15277 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-05-27 13:07:49 +08:00
Liu An	0639dba89a	Docs: Update version references to v0.25.6 in READMEs and docs (#15248 ) ### What problem does this PR solve? - Update version tags in README files (including translations) from v0.25.5 to v0.25.6 - Modify Docker image references and documentation to reflect new version - Update version badges and image descriptions - Maintain consistency across all language variants of README files ### Type of change - [x] Documentation Update	2026-05-26 19:45:43 +08:00
Idriss Sbaaoui	036ed5b236	Feat: add new tests and tescases for restful api suite (#15230 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-05-26 13:24:22 +08:00
天海蒼灆	0d2a17254c	fix(api): allow canvas_type in agent create and update APIs (#15201 ) ### What problem does this PR solve? Creating or updating an agent via `POST /api/v1/agents` and `PUT /api/v1/agents/{agent_id}` did not persist `canvas_type` because the handler `req` dict never assigned the field before `UserCanvasService.save` / `update_by_id`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 11:31:46 +08:00
Idriss Sbaaoui	c3b38d397f	Feat: add new tests and tescases for restful api suite (#15223 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-05-26 10:08:45 +08:00
Idriss Sbaaoui	7d200d5bd7	Feat: add new tests and tescases for restful api suite (#15208 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-05-25 19:03:56 +08:00
Wang Qi	f4d36f7082	Fix #15170 cannot filter document status (#15216 ) Fix #15170 cannot filter document status ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-25 18:58:37 +08:00
Wang Qi	4776bfa8a2	Fix: Correct the API path (#15204 ) Follow on PR #15146 to reslove the backwad compatability issue. 1. /agents/<attachment_id>/download -> /agents/attachments/<attachment_id>/download ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-25 17:11:24 +08:00
Jonathan Chang	9d1006e4ec	fix: The output of the parser in the ingestion pipeline contains HTML tags (#14920 ) ## Summary This change fixes ingestion quality issues where MinerU parser output may contain HTML fragments (for example, table-related tags like `<tr>`, `<td>`, `<br>`), which were previously passed directly into chunking/tokenization and degraded chunk quality. The fix adds a sanitization step in the MinerU parser path so parsed sections are normalized to clean text before chunking. ## Change Type (select all) - [x] Bug fix - [x] Ingestion pipeline improvement - [x] Parser/chunking quality fix ## Related Issue - https://github.com/infiniflow/ragflow/issues/14831	2026-05-25 16:06:36 +08:00
Wang Qi	5069561abc	Fix /chat/completions to allow send only the latest message (#15197 ) ### What problem does this PR solve? 1. Fix /chat/completions to send only the latest message 2. Allo chat stream=False ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-25 14:23:33 +08:00
Wang Qi	bb148edf4c	Revert "Fix: /openai/<chat_id>/chat/completions not aware of session_id" (#15205 ) Reverts infiniflow/ragflow#15155 because this is never supported, keep it as it is.	2026-05-25 14:23:10 +08:00
Wang Qi	e6dd397531	Fix: /openai/<chat_id>/chat/completions not aware of session_id (#15155 ) ### What problem does this PR solve? Fix: /openai/<chat_id>/chat/completions not aware of session_id ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-22 20:38:56 +08:00
Wang Qi	87918650ff	Refactor: Move API files (#15151 ) Refactor: Move API files	2026-05-22 17:44:05 +08:00
kpdev	faf77a5a8a	feat(evaluation): track token usage in evaluation results (#13487 ) ## Summary Implements the TODO in `evaluation_service.py`: Track token usage in evaluation results. ## Changes - Import `num_tokens_from_string` from `common.token_utils` - Prompt tokens: Use the full prompt returned by `async_chat` when available (includes system prompt + knowledge base + query), otherwise fall back to the question token count - Completion tokens: Count tokens in the generated answer - Storage: Store `token_usage` as `{prompt_tokens, completion_tokens, total_tokens}` in each `EvaluationResult` instead of `None` ## Why The evaluation pipeline previously saved `token_usage: None` for every result. This change allows downstream consumers (e.g. evaluation dashboards, cost tracking) to see approximate token usage per test case using the same tokenizer (tiktoken cl100k_base) used elsewhere in RAGFlow. ## Testing - No new tests added; existing evaluation flow unchanged - Token counting uses existing `num_tokens_from_string` utility --------- Co-authored-by: kiannidev <kiannidev@users.noreply.github.com>	2026-05-22 15:19:53 +08:00
Full Stack Developer	8f90740d2e	feat: pass chat_template_kwargs through agent chat completion (#14542 ) ### What problem does this PR solve? The agent API currently does not pass chat_template_kwargs to the underlying LLM call path, so clients cannot control template-level model behavior (such as thinking-mode toggles) when invoking /agents/chat/completion. This PR adds passthrough support for chat_template_kwargs across agent execution flows (session and non-session, streaming and non-streaming) by propagating it through canvas runtime state and into LLM invocation kwargs. This addresses the feature gap raised in [Issue #14182](https://github.com/infiniflow/ragflow/issues/14182). Closes #14182 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-22 15:15:49 +08:00
dale053	c33d0b8081	fix: prevent sensitive fields from leaking in user API responses (#14792 ) Closes #14789 ### What problem does this PR solve? User API endpoints (`login`, `user_profile`, `user_add`, `forget_reset_password`) were returning full user objects via `to_json()` / `to_dict()`, which included sensitive fields like `password` and `access_token` in the response body. This leaks credentials to the client. This PR adds a `to_safe_dict()` method on the `User` model that strips sensitive fields (`password`, `access_token`) and replaces all affected call sites to use it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-22 15:14:26 +08:00
Wang Qi	a9ec78cb9c	Refactor: enahnce retry and timeout (#14983 ) ### What problem does this PR solve? 1. Enhance retry and timeout, and adjust the default timeout 2. NER: spacy do not batch chunks 3. extract _has_cancel_and_exit 4. enhance log messages ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-22 13:16:39 +08:00
dale053	6ab25bf715	fix: block SSRF in misc_utils.download_img for OAuth avatars (#14868 ) ### What problem does this PR solve? Closes #14865 `download_img` in `common/misc_utils.py` is used for OAuth avatar URLs. The previous implementation called `async_request` from `common.http_client`, which followed redirects without re-validating each hop and did not apply the same SSRF protections as this path needs. That made it possible to reach non-public or disallowed targets (for example via redirects or unsafe URLs) when fetching avatars. This change replaces that flow with an explicit, bounded fetch: each URL (including every redirect target) is checked with `common.ssrf_guard.assert_url_is_safe`, DNS is pinned with `pin_dns_global`, `httpx` streams the body with `follow_redirects=False` and a manual redirect loop (capped by `RAGFLOW_OAUTH_AVATAR_MAX_REDIRECTS`), and total response size is capped (`RAGFLOW_OAUTH_AVATAR_MAX_BYTES`). Timeouts, proxy, and user agent align with `HTTP_CLIENT_*` env vars without importing `http_client`, so lightweight tests stay simple. Unit tests cover empty/None URLs, loopback, cloud metadata-style addresses, and disallowed schemes so SSRF regressions are caught early. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-22 12:12:04 +08:00
buua436	ea1764a7dc	Revert "fix(api): infer /documents/{id}/download Content-Type from filename when ext is omitted (#15052 )" (#15138 ) Reverts infiniflow/ragflow#15053	2026-05-22 11:46:01 +08:00
Jin Hai	775ea55679	Docs: update python version to 3.13 (#15103 ) ### What problem does this PR solve? 1. update python version to 3.13 2. upgrade ormsgpack to 1.6.0 ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-21 19:09:19 +08:00
Haruko386	a725e114f9	Go: implement ASR and TTS for Xinference (#15096 ) ### What problem does this PR solve? implement ASR and TTS for Xinference ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-21 18:28:06 +08:00
dripsmvcp	12a148d541	fix(api): guard against missing session in get_agent_session (#15011 ) `GET /agents/<agent_id>/sessions/<session_id>` crashed with `AttributeError: 'NoneType' object has no attribute 'to_dict'` when the session lookup failed: `_, conv = API4ConversationService.get_by_id(...)` returned `(False, None)`, then `conv.to_dict()` was called unconditionally. This is reachable in multi-instance deployments: the session row may not yet be visible on the node servicing the immediate follow-up GET after a session is created on a different node. Add the same `if not exists` guard already used by every other call site of `API4ConversationService.get_by_id` (see agent_api.py:1147, sdk/session.py:179, conversation_service.py:248, canvas_service.py:323). Closes #14989 ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-05-21 15:37:10 +08:00
dripsmvcp	ce9a4425d2	fix(imap): handle multi-address headers in _parse_singular_addr (#15006 ) Replace the RuntimeError with a warning + first-address fallback so a single email whose From header contains multiple addresses no longer crashes the entire IMAP sync task. Also add regression tests covering: - #14963: RFC 5322 quoted display names with commas (e.g. "Schlüter, Sabine" <s@x>) parsed as one address, not two. - #14964: multi-address headers warn instead of raising. Closes #14964 Refs #14963	2026-05-21 15:37:02 +08:00
天海蒼灆	3e5b11a523	Feat(browser control)：Add new agent component 'browser' to control browser by AI (#14888 ) ### What problem does this PR solve? This PR adds a new `Browser` operator to Agent workflows, enabling prompt-driven browser automation in RAGFlow.Technically based ‘Browser-Use’ It includes: - Backend browser component execution with tenant LLM integration - Upload source support (file IDs, URLs, variables, CSV/JSON array) - Downloaded file persistence to RAGFlow storage - Frontend node/operator integration, form config, icon, and i18n updates - Unit tests for upload/download and ID parsing logic - Dependency and Docker updates for browser-use runtime support ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-21 15:32:32 +08:00

1 2 3 4 5 ...

343 Commits