ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
writinwaters	c2597f132e	Docs: Added a guide on how to ingest an RSS feed. (#15467 ) ### What problem does this PR solve? Added a guide on how to ingest an RSS feed. ### Type of change - [x] Documentation Update	2026-06-01 20:23:36 +08:00
monsterDavid	d398d617ca	fix(mineru): skip page chrome blocks to prevent duplicate chunks (#15387 ) ## Summary - Skip MinerU `header`, `footer`, and `page_number` blocks when converting `content_list.json` into sections. - Ignore unsupported block types explicitly so future MinerU output types cannot re-emit the previous text block. Fixes duplicate text in General/naive chunks when parsing PDFs via MinerU (reported with repeated page headers and body text in slices). Closes #15335 ## Test plan - [x] `pytest test/unit_test/deepdoc/parser/test_mineru_parser.py -v` (4/4 passed)	2026-06-01 20:15:04 +08:00
oktofeesh	f0e4f2d5d8	fix(go-models): apply custom Google base URLs (#15385 ) ## Summary - Add custom `base_url` support to the Google Go model driver. - Preserve Google URL suffix configuration when creating custom base URL driver instances. - Validate Google chat/stream request inputs before constructing the SDK client. - Cover Google model listing, connection checks, base URL resolution, and request validation with focused tests. ## What changed - `GoogleModel.NewInstance` now returns a Google driver configured with the supplied base URL map. - Google SDK client creation now resolves configured base URLs through `genai.HTTPOptions.BaseURL`. - Base URL lookup supports configured regions, empty-region keys, and `default` fallback. - Google chat, streaming chat, embeddings, and model listing now reject blank API keys before creating SDK clients. - Google chat and streaming chat now reject blank model names locally, and streaming chat rejects a nil sender. - Existing message handling, embeddings, pagination, and provider errors are preserved. ## Why Google custom model instances could not use configured base URLs because `NewInstance` returned `nil` and the SDK client path ignored the driver base URL map. The request validation keeps invalid Google calls from reaching SDK client construction with blank credentials or incomplete chat inputs.	2026-06-01 19:24:29 +08:00
euvre	fb3bd3de02	fix(deepdoc): add English caption patterns to fix missing figure/table numbering (#15481 ) ### What problem does this PR solve? ## Problem When parsing PDFs containing English figure/table captions (e.g. "Fig. 20", "Figure 20", "Table 20"), the `is_caption` method in `TableStructureRecognizer` failed to recognize them as captions. This caused figure numbering gaps in the parsed output (e.g. Fig. 19 → Fig. 21, skipping Fig. 20). ## Root Cause The `is_caption` regex only matched Chinese caption formats: ```python patt = [r"[图表]+[ 0-9:：]{2,}"] ``` When the layout recognizer also failed to assign a `caption` layout type to a given text block, English captions were entirely missed. ## Fix Added three case-insensitive English caption patterns to `is_caption` in `deepdoc/vision/table_structure_recognizer.py`: - `(?i)Fig\.?\s*\d+` — matches `Fig. 20`, `Fig 20`, `FIG. 20`, etc. - `(?i)Figure\s+\d+` — matches `Figure 20`, `FIGURE 20`, etc. - `(?i)Table\s+\d+` — matches `Table 20`, `TABLE 20`, etc. ## Files Changed - `deepdoc/vision/table_structure_recognizer.py` — extended `is_caption` regex patterns - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: noob <yixiao121314@outlook.com>	2026-06-01 19:22:11 +08:00
Wang Qi	1a6df01b53	Bug fix: Enhance embeding model to give better error message (#15346 ) To resolve https://github.com/infiniflow/ragflow/issues/15343 enhance the model embedding message to give extact failure message to customer. # QWen ## Retrieval <img width="3321" height="1033" alt="image" src="https://github.com/user-attachments/assets/6b82921a-a3a7-4a33-a383-1cf316398ee2" /> ## Chat <img width="2241" height="311" alt="image" src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716" /> # SiliconFlow ## Retrieval <img width="3321" height="1033" alt="image" src="https://github.com/user-attachments/assets/ee2cd191-a27d-4729-b53d-2fbdb4e352cd" /> ## Chat <img width="1562" height="210" alt="image" src="https://github.com/user-attachments/assets/10376a8e-a3f4-422f-bc2e-96f2a8a96448" /> # Baichuan ## Retrieval <img width="3321" height="1107" alt="image" src="https://github.com/user-attachments/assets/dcb5409d-f7fc-4804-b186-5e1ee11e09c4" /> ## Chat <img width="2241" height="311" alt="image" src="https://github.com/user-attachments/assets/ec311365-62d5-407a-8915-5c8d72be9716" /> # Zhipu zhipu is good.	2026-06-01 19:18:16 +08:00
kpdev	252cc19f93	Infer Content-Type for document image endpoint (#15368 ) ## Summary Fixes [#15367](https://github.com/infiniflow/ragflow/issues/15367) — `GET /api/v1/documents/images/<image_id>` always returned `Content-Type: image/JPEG` even for PNG/WebP chunk images and extensioned thumbnails. ## Related Issue Fixes #15367 ## Change Type - [x] Bug fix - [x] Regression tests - [ ] New feature - [ ] Refactor ## What Changed - Added `_detect_image_content_type_from_bytes()` — PNG/JPEG/GIF/WebP/BMP magic-byte detection - Added `_content_type_for_document_image()` — object-key extension via `CONTENT_TYPE_MAP`, then magic bytes, else `application/octet-stream` - `get_document_image()` — set inferred `Content-Type` instead of hardcoded `image/JPEG` - Also guards missing storage blob (`Image not found.`) to avoid `make_response(None)` (same handler; complements #15365) ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/document_api.py` \| MIME inference helpers + handler update \| \| `test/testcases/test_web_api/test_document_app/test_document_metadata.py` \| 3 unit tests \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_content_type_from_object_extension_unit -v pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_content_type_from_magic_bytes_unit -v pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_get_document_image_missing_blob_unit -v ``` ## Test Plan - [x] `.png` object key → `image/png` - [x] Extensionless chunk key + PNG bytes → `image/png` (magic bytes) - [x] Missing blob → 4xx `"Image not found."` - [ ] CI green	2026-06-01 19:08:32 +08:00
kpdev	b35266e9a5	Return 4xx when file download storage blob is missing (#15371 ) ## Summary Fixes [#15369](https://github.com/infiniflow/ragflow/issues/15369) — `GET /api/v1/files/<file_id>` calls `make_response(None)` when both primary and fallback storage lookups return empty, causing HTTP 500. ## Related Issue Fixes #15369 ## Change Type - [x] Bug fix - [x] Regression tests ## What Changed - `file_api.download()` — after fallback `STORAGE_IMPL.get`, return `get_error_data_result(message="This file is empty.")` when `not blob`, matching document REST download semantics. ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/file_api.py` \| Empty-blob guard before `make_response()` \| \| `test/testcases/test_web_api/test_file_app/test_file_routes_unit.py` \| Regression test \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_file_app/test_file_routes_unit.py::test_download_missing_blob_returns_error -v pytest test/testcases/test_web_api/test_file_app/test_file_routes_unit.py::test_download_falls_back_to_document_storage -v ``` ## Test Plan - [x] Both storage paths empty → `"This file is empty."` (no `make_response(None)`) - [x] Existing fallback success test still passes - [ ] CI green	2026-06-01 19:08:06 +08:00
balibabu	f194e8b4c4	Fix: The newly added model did not appear in the drop-down menu. (#15476 ) ### What problem does this PR solve? Fix: The newly added model did not appear in the drop-down menu. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-01 17:56:41 +08:00
euvre	1e80419c21	fix: restore TitleChunker output for json/chunks upstream formats (#15396 ) fix: restore TitleChunker output for json/chunks upstream formats ## Summary The refactor commit `e194027b` (#14247) introduced two regressions that caused `TitleChunker` to produce zero chunks when the upstream Parser node outputs `json` or `chunks` format (e.g. PDF parsing). ## Root Cause ### 1. Dead code in `extract_line_records` (critical) After refactor, when `payload` is `None` (which is the case for `json` and `chunks` output formats), the method returns an empty list immediately via `return []`, so no records are ever extracted from structured upstream output. The original `json`/`chunks` handling code became unreachable dead code. ### 2. Unconditional overwrite in `build_chunks_from_record_groups` The `chunks` variable assigned in the `if` branch for markdown/text/html formats was unconditionally overwritten by the statement below it, due to a missing `else` keyword. ## Fix - Remove the premature `return []` so the `json`/`chunks` branch is reachable again. - Add `else` branch in `build_chunks_from_record_groups` so the two format families are handled independently. ## Test Plan - [x] Verified no lint errors on the changed file - [ ] Tested with a PDF document parsed via DeepDOC → TitleChunker pipeline - [ ] Tested with markdown input through TitleChunker - [ ] Tested hierarchy and group chunking modes ## Impact - Fixes the regression where documents parsed with `json`/`chunks` output format produced no chunks from `TitleChunker`. - No API or configuration changes. Fully backward compatible. Signed-off-by: noob <yixiao121314@outlook.com>	2026-06-01 17:14:22 +08:00
balibabu	82202fa469	Fix: Unable to create dataset (#15472 ) ### What problem does this PR solve? Fix: Unable to create dataset ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-01 15:30:52 +08:00
Wang Qi	10e8690890	GraphRAG - NER - spacy - fix spacy extraction (#14783 ) Fix spacy extraction	2026-06-01 13:05:54 +08:00
sxxtony	12579dbc3d	Go: implement dataset ingestion log APIs (#15421 ) ### What problem does this PR solve? Part of the Python → Go API server rewrite tracked in #15240 (Dataset ingestion section). This PR implements the three dataset ingestion endpoints in the Go API server, mirroring the existing Python `dataset_api_service` behaviour: - `GET /api/v1/datasets/<dataset_id>/ingestions/summary` - `GET /api/v1/datasets/<dataset_id>/ingestions` - `GET /api/v1/datasets/<dataset_id>/ingestions/<log_id>` ### Type of change - [x] Refactoring - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>	2026-06-01 11:23:44 +08:00
glorydavid03023	3774916060	Go: implement Embed in GPUStack driver (#15182 ) ### What problem does this PR solve? The Go GPUStack driver returned a stub error for `Embed()` even though GPUStack exposes OpenAI-compatible embeddings on the v1-openai route (not `v1/embeddings`). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-01 11:22:43 +08:00
Haruko386	2d7044b57e	feat[Go] implement api/v1/thumbnails API (#15416 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality	2026-06-01 11:22:08 +08:00
Idriss Sbaaoui	da1ed6f0e7	Feat: add new tests and tescases for restful api suite (#15347 ) ### What problem does this PR solve? extend restful api suite ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test	2026-06-01 11:02:40 +08:00
Wang Qi	4972af4367	Fix memory empty issue (#15411 ) Fix memory empty issue	2026-06-01 10:25:56 +08:00
balibabu	e13431cdc0	Fix: If the filename is too long, it overflows the confirmation box for deleting the file. (#15287 ) ### What problem does this PR solve? Fix: If the filename is too long, it overflows the confirmation box for deleting the file. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-01 10:22:56 +08:00
web-dev0521	cd18cfab79	feat(connector): implement Outlook data source connector (issue #15332 ) (#15333 ) ### What problem does this PR solve? Closes #15332. RAGFlow can index Gmail and generic IMAP mailboxes but had no native connector for Outlook / Microsoft 365 mail. Organisations on Microsoft 365 had no way to bring mailbox content into a knowledge base through Microsoft Graph. This PR adds a net-new Outlook data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Pages over `/users/{id}/mailFolders/{folder}/messages/delta` per mailbox and persists `@odata.deltaLink` values in `OutlookCheckpoint.delta_links`, so incremental syncs only fetch changed messages. - Supports two scoping modes: - Tenant-wide (default): enumerates every user in the tenant via `/users` and syncs each mailbox. Requires `User.Read.All`. - Targeted: when `user_ids` is provided (comma-separated UPNs or object IDs), only those mailboxes are synced. `User.Read.All` is not needed in this mode. - Lets the caller pick the mail folder (`inbox`, `sentitems`, `archive`, ...). Defaults to `inbox`. - Maps each message to a `Document` shaped after the Gmail connector: one `TextSection` carrying `From/To/Cc/Subject` headers + body, with HTML bodies stripped to text inline (no extra dependency). - Surfaces typed errors on the validation probe: 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with `Mail.Read` / `User.Read.All` hint), 404 on a configured mailbox → `ConnectorValidationError`, 5xx → `UnexpectedValidationError`. - Skips messages flagged `@removed` by the delta semantics and messages whose `receivedDateTime` is older than `poll_range_start`. #### Files \| File \| Change \| \|------\|--------\| \| `common/data_source/outlook_connector.py` \| New — `OutlookConnector` (`CheckpointedConnectorWithPermSync` + `SlimConnectorWithPermSync`) + `OutlookCheckpoint` + tiny `_strip_html` helper. \| \| `common/data_source/config.py` \| `DocumentSource.OUTLOOK = "outlook"`. \| \| `common/constants.py` \| `FileSource.OUTLOOK = "outlook"`. \| \| `common/data_source/__init__.py` \| Export `OutlookConnector`. \| \| `rag/svr/sync_data_source.py` \| `Outlook(SyncBase)` with `batch_size` normalisation, CSV/list parsing of `user_ids`; registered in `func_factory`. \| \| `web/src/pages/user-setting/data-source/constant/index.tsx` \| `DataSourceKey.OUTLOOK`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder, user_ids, batch_size), default values. \| \| `web/src/locales/en.ts`, `web/src/locales/zh.ts` \| `outlookDescription` + 5 tooltip keys (EN + ZH). \| \| `test/unit_test/data_source/test_outlook_connector_unit.py` \| New — 19 unit tests (`p1`/`p2`/`p3`) covering auth, validation (tenant-wide vs specific user vs error paths), checkpoint helpers, user enumeration pagination, message filtering, HTML body stripping. \| #### Required Azure AD permissions - `Mail.Read` (Application, admin-granted) — always. - `User.Read.All` (Application, admin-granted) — only when `user_ids` is left blank so the connector can enumerate mailboxes. #### Out of scope - Attachment indexing. The current connector emits message body + headers; binary attachments are flagged via `metadata.has_attachments` but not pulled. Adding attachment hydration is straightforward but scoped out per the issue's "decide whether attachments are indexed in the first version" note. - Delegated (per-user) OAuth. The connector uses app-only credentials, consistent with the SharePoint / Teams precedent in this codebase. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-29 21:52:29 +08:00
Rintaro	11af34a895	fix(opensearch): repair document-metadata path broken by #14577 (#15393 ) ### What problem does this PR solve? Document metadata is completely broken on the OpenSearch backend (`DOC_ENGINE=opensearch`). Both failures were introduced by #14577, which added a doc-metadata dispatch surface but only validated it against Elasticsearch. 1. Index creation rejected (`mapper_parsing_exception`). `OSConnection.create_doc_meta_idx` feeds `conf/doc_meta_es_mapping.json` verbatim to OpenSearch. That file declares a top-level `"dynamic": "runtime"`. Runtime fields are Elasticsearch-only; OpenSearch cannot parse the value: mapper_parsing_exception: Could not convert [dynamic.dynamic] to boolean (400) 2. `search()` signature mismatch (`TypeError`). `DocMetadataService` (added by #14577) calls `docStoreConn.search(...)` with snake_case kwargs (`select_fields=`, `index_names=`, `knowledgebase_ids=`, …), matching `ESConnection.search`. But `OSConnection.search` still uses camelCase parameters (`selectFields`, `indexNames`, `knowledgebaseIds`, …): TypeError: OSConnection.search() got an unexpected keyword argument 'select_fields' The UI then shows "0 fields" for every document on OpenSearch. ### Fix 1. In `OSConnection.create_doc_meta_idx`, normalize a top-level `"dynamic": "runtime"` to `True` for the OpenSearch request only. The shared mapping file is left untouched, so the Elasticsearch backend keeps its runtime-field behavior. Dynamic field discovery is preserved on OpenSearch. 2. Rename the `OSConnection.search()` parameters (and their in-method local uses) from camelCase to snake_case so they match `ESConnection.search()` and the `DocMetadataService` call sites. The change is confined to `search()`; `get/insert/update/delete` keep their existing positional signatures (they are called positionally from `rag/nlp/search.py`). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Affected backends OpenSearch only. Elasticsearch, Infinity and OceanBase are untouched. ### How to reproduce 1. `DOC_ENGINE=opensearch`, restart the stack. 2. Upload/parse a document, then open the dataset's document list / set metadata. - Before: index creation 400s (`Could not convert [dynamic.dynamic]`), and/or `TypeError ... 'select_fields'`; document metadata shows 0 fields. ### Risk & backward compatibility - ES default deployment: no change. `doc_meta_es_mapping.json` is not modified, so ES still receives `"dynamic": "runtime"`. - `search()` rename is internal; the only kwarg caller (`DocMetadataService`) already uses the snake_case names this PR aligns to. ### Test plan - [ ] `DOC_ENGINE=opensearch`: per-tenant `ragflow_doc_meta_*` index is created (no `mapper_parsing_exception`); document metadata reads/writes work. - [ ] `DOC_ENGINE=elasticsearch` regression: doc-meta index still created with runtime mapping; metadata unchanged.	2026-05-29 21:49:36 +08:00
Rintaro	3dfc16973c	fix(opensearch): implement get_scores for KNN second-pass scoring (#15390 ) ### What problem does this PR solve? On the OpenSearch backend (`DOC_ENGINE=opensearch`), every retrieval that performs the KNN second-pass scoring crashes with: AttributeError: 'OSConnection' object has no attribute 'get_scores' Root cause. #14970 ("Refactor: Drop the vector fetch for ES") added a `get_scores()` helper to `ESConnectionBase` (`common/doc_store/es_conn_base.py`) and introduced `Dealer._knn_scores()` in `rag/nlp/search.py`, which calls `self.dataStore.get_scores(res)`. `search.py` routes Infinity and OceanBase to their own similarity paths via `DOC_ENGINE_INFINITY` / `DOC_ENGINE_OCEANBASE`, but OpenSearch sets neither flag, so it falls into the Elasticsearch branch and calls `get_scores`. `OSConnection` (which subclasses `DocStoreConnection` directly, not `ESConnectionBase`) never received that method, so any vector-search hit triggers the crash. It reproduces with any normal embedding (e.g. 1024-dim mistral-embed) as soon as a KNN query returns hits. ### Fix Add `OSConnection.get_scores()`, mirroring `ESConnectionBase.get_scores()`. OpenSearch hit headers expose `_score` exactly like Elasticsearch (the existing `OSConnection.__getSource` already reads `d["_score"]`), so the implementation is identical. Scope note: Infinity and OceanBase deliberately do not use `get_scores` (#14970 routes them elsewhere), so this fix is intentionally limited to the OpenSearch backend, which is the only one reaching the ES KNN-score path. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Affected backends OpenSearch only. Elasticsearch already implements `get_scores`; Infinity / OceanBase are routed away from it. ### How to reproduce 1. `DOC_ENGINE=opensearch` (docker `.env`), restart the stack. 2. Create a knowledge base with any dense embedding model and parse a document. 3. Run a retrieval / chat over that KB -> 500 with the AttributeError above. ### Risk & backward compatibility None for the default Elasticsearch deployment -- the change only adds a method to `OSConnection`. No default values or ES/Infinity/OceanBase behavior change. ### Test plan - [ ] With `DOC_ENGINE=opensearch`, retrieval over a KB returns scored chunks (no AttributeError). - [ ] `DOC_ENGINE=elasticsearch` regression: retrieval unchanged. - [ ] Empty-result path: `_knn_scores` early-returns `{}` (guarded), get_scores handles an empty `hits` list gracefully.	2026-05-29 21:49:15 +08:00
jony376	a2500fed43	fix(api): move dify retrieval health check to /dify/retrieval/health (#15311 ) ### Related issues Closes #15310 ### What problem does this PR solve? `/api/v1/dify/retrieval` had duplicate `GET` route registrations in `dify_retrieval_api.py`: one for authenticated retrieval and another for unauthenticated health checks. Sharing the same path and method created ambiguous routing behavior and an unstable API contract for Dify external knowledge base integration. This PR separates concerns by moving the health-check endpoint to `GET /api/v1/dify/retrieval/health`, while keeping retrieval on `/api/v1/dify/retrieval`. This makes auth behavior deterministic and prevents route shadowing/conflicts. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 21:47:55 +08:00
OrbisAI Security	b4c8711d51	fix: upgrade crawl4ai to 0.8.0 (CVE-2026-26217) (#15415 ) ## Summary Upgrade crawl4ai from 0.7.6 to 0.8.0 to fix CVE-2026-26217. ## Vulnerability \| Field \| Value \| \|-------\|-------\| \| ID \| CVE-2026-26217 \| \| Severity \| CRITICAL \| \| Scanner \| trivy \| \| Rule \| `CVE-2026-26217` \| \| File \| `uv.lock` \| \| Assessment \| Likely exploitable \| Description: Crawl4AI Has Local File Inclusion in Docker API via file:// URLs ## Evidence Scanner confirmation: trivy rule `CVE-2026-26217` flagged this pattern. Production code: This file is in the production codebase, not test-only code. ## Threat Model Context This is a web service - vulnerabilities in request handlers are directly exploitable by remote attackers. ## Changes - `pyproject.toml` - `uv.lock` ## Verification - [x] Build passes - [x] Scanner re-scan confirms fix - [x] LLM code review passed --- This change addresses a pattern flagged by static analysis. The code path handles user-influenced input and the fix reduces the attack surface against both manual and automated exploitation. --- Automated security fix by [OrbisAI Security](https://orbisappsec.com)	2026-05-29 21:38:41 +08:00
Attili-sys	a28a0c6986	File addition .rooignore (#15414 ) This PR introduces a `.rooignore` file to the root of the repository to optimize how AI coding assistants (like Roo) interact with the RAGFlow codebase. Currently, when AI agents index the workspace, they can waste tokens and processing time reading through generated files, caches, large dependency artifacts, and runtime logs. This `.rooignore` file provides a standard configuration to exclude these irrelevant directories and files (such as `.venv/`, `node_modules/`, `__pycache__/`, logs, and large binaries). This significantly reduces indexing noise, prevents accidental reads of sensitive or bulky local data, and ensures AI coding agents remain focused strictly on relevant source code. ### Type of change - [x] Other (please describe): Developer Experience (DX) / AI Tooling configuration	2026-05-29 20:37:44 +08:00
Hz_	539d38bc20	fix: backfill missing api token beta values (#15405 ) ### What problem does this PR solve? This PR updates `SystemService.ListAPITokens` to lazily backfill missing `beta` values for API tokens, matching the Python behavior of `/api/v1/system/tokens`. ### Type of change - When an API token has an empty `beta`, generate a new one. - Persist the generated `beta` back to the `api_token` table. - Keep the handler/routing unchanged. - `GET /api/v1/system/tokens` now returns tokens with `beta` filled in for older records that were missing it. - This aligns Go behavior with the Python implementation.	2026-05-29 20:04:10 +08:00
oktofeesh	be28177955	fix(go-models): harden Hunyuan embedding validation (#15249 ) ## Summary - Validate Hunyuan embedding model name and API key before building requests. - Reuse region-aware base URL validation for embedding requests. - Replace the stale unsupported Embed test with happy-path and validation coverage. ## What changed - Added early Hunyuan Embed validation for missing model names and API keys. - Routed Embed through the same base URL region guard used by the other Hunyuan methods. - Updated Hunyuan tests to configure the embedding suffix and cover Embed success plus invalid inputs. ## Why Hunyuan Embed is implemented, but the existing test still expected it to be unsupported and could panic before returning a normal validation error. This keeps the implemented embedding path aligned with the current driver behavior and prevents nil input panics. Closes #15087 Refs #14736	2026-05-29 19:50:01 +08:00
galuis116	d1f6594618	Fix: JWT algorithm-confusion in OIDC ID token verification (#15181 ) ### What problem does this PR solve? Closes #15180. `OIDCClient.parse_id_token` in `api/apps/auth/oidc.py` read the JWT signing algorithm from the unverified JWT header and passed it through to `jwt.decode(..., algorithms=[alg], ...)` as the trust anchor. This is the textbook JWT algorithm-confusion vulnerability (CWE-345 / CWE-347). Any unauthenticated client capable of reaching the OIDC callback could take over an arbitrary account on any RAGFlow deployment with OIDC login enabled: 1. `alg: "none"` — present a JWT with `{"alg": "none"}` and no signature segment → `jwt.decode(..., algorithms=["none"])` → PyJWT's `NoneAlgorithm` accepts the token without verification → login as any user. 2. RSA / HMAC confusion — fetch the public RSA key from the provider's JWKS (it's public), forge a JWT with `{"alg": "HS256"}` HMAC-signed using the public-key bytes as the secret → `jwt.decode(..., algorithms=["HS256"], key=public_key)` → verifier accepts → login as any user. (Modern PyJWT independently refuses to use a PEM-formatted key as an HMAC secret, which mitigates this leg for PEM key formats; the fix here is the only mitigation for raw / DER / JWK octet keys and for older PyJWT versions.) ### What changed `api/apps/auth/oidc.py`: - New module constants `_ALLOWED_OIDC_SIGNING_ALGS` (asymmetric-only: `RS`, `ES`, `PS`, `EdDSA` — explicitly excludes `none` and `HS`) and `_DEFAULT_OIDC_SIGNING_ALGS = ("RS256",)` (the OIDC Core 1.0 §2 spec default). - New helper `_resolve_id_token_signing_algs(metadata)` — intersects the provider's advertised `id_token_signing_alg_values_supported` from `/.well-known/openid-configuration` with the safe allowlist; falls back to RS256 when the field is missing or contains only unsafe values. - `OIDCClient.__init__` now stores the resolved allowlist on `self.id_token_signing_algs` — pinned once, from a trusted source, at construction time. - `parse_id_token` no longer calls `jwt.get_unverified_header` and no longer reads `alg` from the JWT header. It passes `self.id_token_signing_algs` to `jwt.decode(..., algorithms=...)`. `PyJWKClient.get_signing_key_from_jwt` still reads the `kid` from the header internally for JWKS lookup — that's fine, `kid` is not a security decision; the signature still proves which key was actually used. `test/testcases/test_web_api/test_auth_app/test_oidc_client_unit.py`: - Existing `test_parse_id_token_success_and_error` drops its `jwt.get_unverified_header` mock (no longer called by `parse_id_token`). - `_metadata` and `_make_client` helpers grew an optional `signing_algs` parameter so tests can configure what the discovery document advertises. - New `TestSSRFValidation` / algorithm-confusion regression block (7 tests): - `test_id_token_signing_algs_default_to_rs256_when_metadata_missing` - `test_id_token_signing_algs_intersect_metadata_with_safe_allowlist` - `test_id_token_signing_algs_fall_back_when_only_unsafe_advertised` - `test_id_token_signing_algs_ignores_non_string_entries` - `test_id_token_signing_algs_handles_non_list_metadata_field` - `test_parse_id_token_passes_pinned_algorithms_to_jwt_decode` — sabotages `jwt.get_unverified_header` to raise on call, proving the verification path never consults the unverified header. - `test_parse_id_token_rejects_alg_none` — uses real PyJWT to encode an `alg: "none"` token; `parse_id_token` raises `ValueError("Error parsing ID Token: …")` instead of accepting it. - `test_parse_id_token_rejects_hs256_when_allowlist_is_asymmetric` — uses real PyJWT to forge an `alg: "HS256"` token with a non-PEM shared secret (so PyJWT's incidental PEM-as-HMAC refusal isn't what blocks it); `parse_id_token` raises because `HS256` is not in the pinned allowlist. Sanity-checked end-to-end with real PyJWT outside the project test runner: - `alg=none` forged token + `algorithms=["RS256"]` → `InvalidAlgorithmError` ✓ - `alg=HS256` forged token + `algorithms=["RS256"]` → `InvalidAlgorithmError` ✓ - Same `alg=HS256` token + `algorithms=["HS256"]` → accepted ({'sub': 'admin'}) — confirming the attack path was real before the fix. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: galuis116 <contact@duerrimports.com>	2026-05-29 19:37:01 +08:00
kpdev	cb1ea5a47f	Validate chunk image_base64 before doc-store write (#15364 ) ## Summary Fixes [#15363](https://github.com/infiniflow/ragflow/issues/15363) — `add_chunk` / `update_chunk` indexed chunks with `image_id` before validating or storing `image_base64`, leaving orphan chunks on invalid input. ## Related Issue Fixes #15363 ## Change Type - [x] Bug fix - [x] Regression tests ## What Changed - Added `_decode_chunk_image_base64()` — strict base64 decode with structured 4xx errors - Added `_store_chunk_image_or_error()` — catches `store_chunk_image` failures - `add_chunk` / `update_chunk`: decode + store image before `docStoreConn.insert` / `update`; only set `img_id` after successful storage ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/chunk_api.py` \| Helpers + reorder image handling \| \| `test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py` \| 3 regression tests \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_add_chunk_invalid_image_base64_does_not_index_chunk -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_update_chunk_invalid_image_base64_does_not_update_chunk -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_add_chunk_valid_image_base64_stores_before_insert -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py -v ``` ## Test Plan - [x] Invalid `image_base64` on add → 4xx, no doc-store insert - [x] Invalid `image_base64` on update → 4xx, no doc-store update - [x] Valid PNG base64 on add → image stored, chunk indexed with `img_id` - [ ] CI green	2026-05-29 19:36:46 +08:00
Dexterity	04aa8d04e8	fix(go-models): raise SSE scanner buffer so large stream chunks are not dropped (#15382 ) ### Summary Closes #15381 Every provider in `internal/entity/models/` reads its streaming response with `bufio.NewScanner(resp.Body)` and iterates over `scanner.Scan()`. The default `bufio.Scanner` maximum token size is 64KB, so when an upstream sends a single SSE `data:` line larger than 64KB (long content deltas, large tool or function call argument blobs, bundled `reasoning_content`, or providers that emit a whole message in one event) `scanner.Scan()` returns `false` and `scanner.Err()` returns `bufio.ErrTooLong`. Streaming chat then ends with an error partway through the response. This change adds `scanner.Buffer(make([]byte, 641024), 10241024)` immediately after every SSE scanner that was still bare, raising the cap to 1MB. 1MB is the value already used for streaming chat in `openai.go`, `modelscope.go`, `groq.go`, `mistral.go`, `xai.go` and the other already patched providers (the 8MB cap in the repo is reserved for TTS and embedding paths), so this simply converges the remaining providers onto the established pattern. Nothing else changes: line parsing, `data:` prefix handling, `[DONE]` detection, JSON unmarshalling, error handling, and the existing `scanner.Err()` checks all stay the same. Providers covered (23 scanners across 22 files): 302ai, aliyun, baichuan, baidu, cohere, deepinfra, deepseek, gitee, huggingface, lmstudio, minimax (the chat scanner, whose TTS scanner was already bumped), moonshot, nvidia, ollama, openrouter, orcarouter, paddleocr, siliconflow, tokenhub, vllm, volcengine, xunfei, zhipu-ai. `jiekouai.go` is excluded because it is covered by the in flight #15337. A table driven regression test (`sse_scanner_buffer_test.go`) streams a single 128KB `data:` content delta followed by `data: [DONE]` through an `httptest` server and asserts that `ChatStreamlyWithSender` delivers the full content with no error across a representative subset of providers. Without the buffer fix the test fails with `bufio.Scanner: token too long`. This PR also removes three duplicate declarations of the package level `roundTripperFunc` test helper that several recently merged provider PRs each added independently, which had left the `internal/entity/models` test package unable to compile. The helper now lives in a single place and is shared. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 19:34:00 +08:00
monsterDavid	53bb2bd9e8	fix(metadata): preserve empty AND results across filter conditions (#15386 ) ## Summary - Fix `meta_filter()` AND logic so an empty result from an early condition is not overwritten when a later condition matches. - Add regression tests for empty-first AND, successful AND intersection, and OR behavior after an empty first condition. Fixes incorrect `/retrieval` metadata filtering when multiple AND conditions are used and the first condition matches no documents. Closes #15360 ## Test plan - [x] `pytest test/unit_test/common/test_metadata_filter_operators.py -v` (19/19 passed)	2026-05-29 19:33:26 +08:00
bitloi	2d229dd8aa	fix(go): resolve custom base_url for empty default region (#15043 ) ### What problem does this PR solve? Fixes custom `base_url` resolution when a model instance has no configured region. Some drivers read custom base URLs from `BaseURL[""]` when `apiConfig.Region` is empty, while others normalize empty region to `"default"` and read `BaseURL["default"]`. This PR adds the `"default"` alias only for empty-region custom base URLs while preserving the existing empty-region key. Closes #15042 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 19:33:09 +08:00
Haruko386	d766e49128	feat[Go]: implement /system/stats and refactor /system/config/log (#15407 ) ### What problem does this PR solve? As title ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-29 19:32:21 +08:00
Hz_	d2f0a18f42	fix: persist logout access token invalidation (#15397 ) ### What this PR fixes This PR fixes an issue in the Python backend where user logout did not reliably persist the invalidated access_token to the database. Although the logout endpoint returned success and logged that the token had been invalidated, the user.access_token value could remain unchanged in the database, which meant the previous login token could stay valid longer than expected. ### What changed - Resolve the real user object before updating the token - Persist the invalidated access_token before calling logout_user() - Return a server error if the token update is not written successfully ### Impact - Logging out now correctly replaces the stored access_token with an INVALID_... value - The previous login session is properly invalidated - The change is limited to the logout flow and is intentionally small in scope	2026-05-29 19:31:45 +08:00
Alexander Laurent	faa9c5469e	feat: add Go MCP server delete API (#15262 ) ## What #15240 Implementation for DELETE /api/v1/mcp/servers/:mcp_id	2026-05-29 19:29:55 +08:00
Hz_	09e91a8e61	Fix user registration initialization in Go API (#15349 ) ### What problem does this PR solve? This PR fixes several behavior gaps in the Go implementation of the user registration API. ### Type of change - Make `nickname` required for user registration. - Align registration error messages and response data with expected API behavior. - Handle password decryption errors for registration more consistently. - Generate UUID v1-style IDs for new users, access tokens, tenants, user-tenant records, and root files. - Initialize default user fields during registration, including: - language - color schema - timezone - last login time - Create user, tenant, user-tenant relation, tenant LLM records, and root folder in a single DB transaction. - Initialize default tenant LLM records from configured default models. - Avoid partial registration data when one creation step fails. - Use locale-based default language fallback for user profile responses.	2026-05-29 19:29:23 +08:00
呆萌闷油瓶	658ff06ca4	feat: add 4 new models for siliconflow (#15383 ) ### What problem does this PR solve? Added 4 new models: deepseek-ai/DeepSeek-V4-Pro deepseek-ai/DeepSeek-V4-Flash Pro/moonshotai/Kimi-K2.6 Pro/zai-org/GLM-5.1 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-29 19:28:29 +08:00
web-dev0521	bda2117a25	feat(connector): implement OneDrive data source connector (issue #15330 ) (#15331 ) ### What problem does this PR solve? Closes #15330. RAGFlow had no connector for OneDrive / OneDrive for Business. Users who store working documents in OneDrive could not index them into a knowledge base without manually downloading and re-uploading files. This PR adds a net-new OneDrive data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Enumerates every drive visible to the service principal and pages through `/drives/{id}/root/delta`, persisting `@odata.deltaLink` values per drive so subsequent syncs only fetch changed items. - Optionally narrows ingestion to a sub-folder (`folder_path`) without needing a separate code path. - Surfaces typed errors on the validation probe (`GET /drives?$top=1`): 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with a `Files.Read.All` hint), 5xx → `UnexpectedValidationError`. - Filters folders, soft-deleted items, and unsupported extensions (`.pdf .docx .doc .xlsx .xls .pptx .ppt .txt .md .csv`). #### Files \| File \| Change \| \|------\|--------\| \| `common/data_source/onedrive_connector.py` \| New — `OneDriveConnector` + `OneDriveCheckpoint`. \| \| `common/data_source/config.py` \| `DocumentSource.ONEDRIVE = "onedrive"`. \| \| `common/constants.py` \| `FileSource.ONEDRIVE = "onedrive"`. \| \| `common/data_source/__init__.py` \| Export `OneDriveConnector`. \| \| `rag/svr/sync_data_source.py` \| `OneDrive(SyncBase)` with `batch_size` normalisation; registered in `func_factory`. \| \| `web/src/pages/user-setting/data-source/constant/index.tsx` \| `DataSourceKey.ONEDRIVE`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder_path, batch_size), default values. \| \| `web/src/locales/en.ts`, `web/src/locales/zh.ts` \| `onedriveDescription` + 4 tooltip keys (EN + ZH). \| \| `test/unit_test/data_source/test_onedrive_connector_unit.py` \| New — 13 unit tests (`p1`/`p2`) covering auth, validation, checkpoint helpers, and document filtering. \| #### Required Azure AD permission `Files.Read.All` (Application, admin-granted). #### Out of scope - Interactive end-user OAuth (delegated permissions) — the connector uses app-only credentials, consistent with the SharePoint / Teams precedent. - Binary download of file contents — the sync layer emits `Document`s carrying `webUrl` + metadata; bytes are hydrated downstream by the parse pipeline. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-29 19:26:06 +08:00
buua436	bd6251f462	Fix: default OpenAI chat completions to non-stream (#15394 ) ### What problem does this PR solve? default OpenAI chat completions to non-stream when `stream` is omitted https://github.com/infiniflow/ragflow/issues/15356 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 17:47:47 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
glorydavid03023	b79f79d9b9	fix(go-models): harden Novita default transport handling (#15350 ) ## Summary - Harden `NewNovitaModel` to avoid panics when `http.DefaultTransport` is a custom non-`*http.Transport` RoundTripper. - Fallback to a safe transport (`ProxyFromEnvironment`) while preserving existing pooling/timeout settings. Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-29 14:28:46 +08:00
bitloi	ea3a5dba11	fix: validate custom model inputs (#15200 ) ### What problem does this PR solve? Closes #15199. The add-custom-model endpoint is routed through `/api/v1/providers/:provider_name/instances/:instance_name/models`, but the handler previously trusted `provider_name` and `instance_name` from the JSON body instead of the path target. A request could therefore hit one provider/instance URL while operating on a different body provider/instance. The same handler only rejected `model_types` when the slice was nil. An empty array passed validation and reached `ModelProviderService.AddCustomModel`, where `request.ModelTypes[0]` could panic. This PR makes the path provider/instance authoritative, rejects mismatched body values, rejects missing or empty `model_types`, and adds a service-level guard so direct service callers cannot hit the same panic path. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 10:15:01 +08:00
web-dev0521	550bdf215c	feat(go-api): implement tenant member management (issue #15294 ) (#15295 ) ## Summary Ports the Python `tenant_api` team/member management endpoints to Go, adding 4 endpoints under `/api/v1/tenants/:tenant_id/`: - `GET /tenants/:tenant_id/users` — list non-owner members with user details (owner only) - `POST /tenants/:tenant_id/users` — invite a user by email; creates invite-role join record (owner only) - `DELETE /tenants/:tenant_id/users` — remove a member by `user_id`; owner can remove anyone, members can remove themselves - `PATCH /tenants/:tenant_id` — accept a pending invitation, transitioning role `invite → normal` Closes #15294	2026-05-29 10:13:09 +08:00
Haruko386	834236a3ec	feat[Go]: implement /api/v1/system/status GET (#15348 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-29 10:12:12 +08:00
oktofeesh	58eb957c30	fix(go-models): harden JieKouAI driver requests (#15337 ) ## Summary - Harden JieKouAI request validation before outbound provider calls - Force non-streaming and streaming chat methods to use their expected stream modes - Make model listing use a bodyless GET and parse model responses without panics Closes #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-29 10:09:27 +08:00
nickmopen	e023c165b6	Fix(kb): enforce tenant authorization on UpdateMetadataSetting (#15268 ) (#15270 ) ## Summary Closes #15268. The `UpdateMetadataSetting` handler at `internal/handler/kb.go:126` retrieved the authenticated user via `GetUser(c)` but discarded the user object (`_, errorCode, errorMessage := GetUser(c)`), then forwarded the caller-supplied `kb_id` straight to the service layer with no ownership check. Any authenticated user could mutate the `parser_config` / metadata of any knowledge base in the system by guessing or harvesting a `kb_id` — a classic IDOR (CWE-284, OWASP A01). This is the only handler in `internal/handler/kb.go` missing the check; every sibling (`ListTags`, `ListTagsFromKbs`, `RenameTag`, `KnowledgeGraph`, `DeleteKnowledgeGraph`, `GetMeta`, `GetBasicInfo`) already calls `h.kbService.Accessible(kbID, user.ID)`. The same defensive check on the document preview endpoint was added in PR #14625 — this PR closes the matching gap on the KB metadata endpoint. --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-29 10:08:55 +08:00
glorydavid03023	7fc909acc9	fix(go-models): harden ModelScope default transport handling (#15339 ) ## Summary - Harden `NewModelScopeModel` to avoid panics when `http.DefaultTransport` is a custom non-`*http.Transport` RoundTripper. - Fallback to a safe transport (`ProxyFromEnvironment`) while preserving existing pooling/timeout settings. - Add `TestModelScopeNewModelWithCustomDefaultTransport` regression coverage. Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-28 19:41:11 +08:00
web-dev0521	0a7662cf3e	feat(go-api): implement GET /api/v1/agents list endpoint (issue #15328 ) (#15329 ) ## Summary Closes: #15328 - Implements `GET /api/v1/agents` — the agent/canvas listing endpoint needed to complete the Home dashboard tile in `web/src/pages/home/`. - Mirrors Python `api/apps/restful_apis/agent_api.py::list_agents` exactly: tenant-join auth, optional `owner_ids` guard, keyword filter, pagination, ordering, and `canvas_category` filter (default: `agent_canvas`). - Scope: read-only list only. Full agent CRUD and canvas runtime are explicitly out of scope (separate slice of #15240).	2026-05-28 19:40:54 +08:00
web-dev0521	f80ec17fc5	feat(go-api): implement connector (data source) management endpoints (#15274 ) ## Summary Ports the connector (data source) management endpoints that power `web/src/pages/user-setting/data-source/` from Python (`api/apps/restful_apis/connector_api.py`) to Go. Previously only `GET /connectors` (list) was implemented in Go; this adds the rest of the lifecycle. Closes #15273 (subtask of #15240). ## Endpoints implemented All under base path `/api/v1` (mirrors the Python routes): \| Method \| Path \| Description \| \|--------\|------\|-------------\| \| POST \| `/connectors/{connector_id}/test` \| Validate stored credentials \| `GET /connectors` (list) was already present and is unchanged. --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-28 19:40:15 +08:00
web-dev0521	98bc9ca6ac	feat: implement Microsoft Teams data source connector (#15193 ) ### What problem does this PR solve? Closes #15191. RAGFlow shipped a Microsoft Teams connector stub (`common/data_source/teams_connector.py`) whose document-loading methods all returned `[]`, `Teams._generate()` was a `pass`, and Teams was commented out of the data-source settings UI. As a result there was no way to index Teams channel conversations into a knowledge base. This PR implements the connector end to end on top of Microsoft Graph (Office365-REST-Python-Client). It shares the MSAL client-credentials auth shape with the SharePoint connector. Backend - `common/data_source/teams_connector.py` - `load_credentials()` now builds the Graph client using an MSAL client-credentials token callback — the form `GraphClient` actually expects. (The previous stub passed a raw access-token string to `GraphClient(...)`, which is not how that client is driven.) Token acquisition is lazy, so credential loading performs no network call. - `validate_connector_settings()` lists teams via Graph. - `load_from_checkpoint()` is now a generator that pages teams → channels → messages, flattens each top-level post together with its replies into one blob-based `Document` (`extension` `.txt`/`.html`, `blob`, `size_bytes`, `doc_updated_at`). Incremental syncs are bounded by message `lastModifiedDateTime` (falling back to `createdDateTime`). Per-message errors surface as `ConnectorFailure` instead of aborting the run. - `retrieve_all_slim_docs_perm_sync()` yields id-only `SlimDocument` batches and the checkpoint helpers return proper `TeamsCheckpoint`s. - ACL → `ExternalAccess` mapping is intentionally left best-effort (`load_from_checkpoint_with_perm_sync` delegates to the standard load) because the sync pipeline does not currently persist `ExternalAccess`. - `rag/svr/sync_data_source.py` - Implemented `Teams._generate()` using the existing `CheckpointOutputWrapper` pattern (same shape as Confluence/Jira/Google Drive), supporting full reindex and incremental polling from `poll_range_start`. - `TeamsConnector` is already exported from `common/data_source/__init__.py`. Frontend (`web/`) - Enabled the `TEAMS` data-source enum and added its form fields (`tenant_id`, `client_id`, `client_secret`), default values, display metadata, and a Teams icon. - Added `teamsDescription` / `teamsTenantIdTip` to `en.ts` and `zh.ts`. Tests - `test/unit_test/data_source/test_teams_connector_unit.py`: mock-based unit tests covering credential loading (incomplete creds raise, happy path sets the Graph client, fetch-without-creds raises), post/reply flattening (incl. the HTML vs text extension), incremental `lastModifiedDateTime` filtering, and slim-doc listing. All 6 pass; `ruff check` is clean. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-28 17:10:38 +08:00
glorydavid03023	b7d88f0b09	fix(go-models): harden Voyage default transport handling (#15341 ) ## Summary - Harden `NewVoyageModel` to avoid panics when `http.DefaultTransport` is a custom non-`*http.Transport` RoundTripper. - Fallback to a safe transport (`ProxyFromEnvironment`) while preserving existing pooling/timeout settings. - Add `TestVoyageNewModelWithCustomDefaultTransport` regression coverage. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 16:46:58 +08:00
glorydavid03023	ff9aa4e2c7	fix(go-models): harden LongCat default transport handling (#15340 ) ## Summary - Harden `NewLongCatModel` to avoid panics when `http.DefaultTransport` is a custom non-`*http.Transport` RoundTripper. - Fallback to a safe transport (`ProxyFromEnvironment`) while preserving existing pooling/timeout settings. - Add `TestLongCatNewModelWithCustomDefaultTransport` regression coverage. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 16:45:59 +08:00

1 2 3 4 5 ...

6479 Commits