mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
### What problem does this PR solve? Closes #15332. RAGFlow can index Gmail and generic IMAP mailboxes but had no native connector for Outlook / Microsoft 365 mail. Organisations on Microsoft 365 had no way to bring mailbox content into a knowledge base through Microsoft Graph. This PR adds a net-new Outlook data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Pages over `/users/{id}/mailFolders/{folder}/messages/delta` per mailbox and persists `@odata.deltaLink` values in `OutlookCheckpoint.delta_links`, so incremental syncs only fetch changed messages. - Supports two scoping modes: - **Tenant-wide** (default): enumerates every user in the tenant via `/users` and syncs each mailbox. Requires `User.Read.All`. - **Targeted**: when `user_ids` is provided (comma-separated UPNs or object IDs), only those mailboxes are synced. `User.Read.All` is not needed in this mode. - Lets the caller pick the mail folder (`inbox`, `sentitems`, `archive`, ...). Defaults to `inbox`. - Maps each message to a `Document` shaped after the Gmail connector: one `TextSection` carrying `From/To/Cc/Subject` headers + body, with HTML bodies stripped to text inline (no extra dependency). - Surfaces typed errors on the validation probe: 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with `Mail.Read` / `User.Read.All` hint), 404 on a configured mailbox → `ConnectorValidationError`, 5xx → `UnexpectedValidationError`. - Skips messages flagged `@removed` by the delta semantics and messages whose `receivedDateTime` is older than `poll_range_start`. #### Files | File | Change | |------|--------| | `common/data_source/outlook_connector.py` | **New** — `OutlookConnector` (`CheckpointedConnectorWithPermSync` + `SlimConnectorWithPermSync`) + `OutlookCheckpoint` + tiny `_strip_html` helper. | | `common/data_source/config.py` | `DocumentSource.OUTLOOK = "outlook"`. | | `common/constants.py` | `FileSource.OUTLOOK = "outlook"`. | | `common/data_source/__init__.py` | Export `OutlookConnector`. | | `rag/svr/sync_data_source.py` | `Outlook(SyncBase)` with `batch_size` normalisation, CSV/list parsing of `user_ids`; registered in `func_factory`. | | `web/src/pages/user-setting/data-source/constant/index.tsx` | `DataSourceKey.OUTLOOK`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder, user_ids, batch_size), default values. | | `web/src/locales/en.ts`, `web/src/locales/zh.ts` | `outlookDescription` + 5 tooltip keys (EN + ZH). | | `test/unit_test/data_source/test_outlook_connector_unit.py` | **New** — 19 unit tests (`p1`/`p2`/`p3`) covering auth, validation (tenant-wide vs specific user vs error paths), checkpoint helpers, user enumeration pagination, message filtering, HTML body stripping. | #### Required Azure AD permissions - `Mail.Read` (Application, admin-granted) — always. - `User.Read.All` (Application, admin-granted) — only when `user_ids` is left blank so the connector can enumerate mailboxes. #### Out of scope - **Attachment indexing.** The current connector emits message body + headers; binary attachments are flagged via `metadata.has_attachments` but not pulled. Adding attachment hydration is straightforward but scoped out per the issue's "decide whether attachments are indexed in the first version" note. - **Delegated (per-user) OAuth.** The connector uses app-only credentials, consistent with the SharePoint / Teams precedent in this codebase. ### Type of change - [x] New Feature (non-breaking change which adds functionality)
97 lines
3.4 KiB
Python
97 lines
3.4 KiB
Python
|
|
"""
|
|
Thanks to https://github.com/onyx-dot-app/onyx
|
|
|
|
Content of this directory is under the "MIT Expat" license as defined below.
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
of this software and associated documentation files (the "Software"), to deal
|
|
in the Software without restriction, including without limitation the rights
|
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
copies of the Software, and to permit persons to whom the Software is
|
|
furnished to do so, subject to the following conditions:
|
|
|
|
The above copyright notice and this permission notice shall be included in all
|
|
copies or substantial portions of the Software.
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
SOFTWARE.
|
|
"""
|
|
|
|
from .blob_connector import BlobStorageConnector
|
|
from .rss_connector import RSSConnector
|
|
from .slack_connector import SlackConnector
|
|
from .gmail_connector import GmailConnector
|
|
from .notion_connector import NotionConnector
|
|
from .confluence_connector import ConfluenceConnector
|
|
from .discord_connector import DiscordConnector
|
|
from .dropbox_connector import DropboxConnector
|
|
from .google_drive.connector import GoogleDriveConnector
|
|
from .jira.connector import JiraConnector
|
|
from .sharepoint_connector import SharePointConnector
|
|
from .onedrive_connector import OneDriveConnector
|
|
from .outlook_connector import OutlookConnector
|
|
from .teams_connector import TeamsConnector
|
|
from .moodle_connector import MoodleConnector
|
|
from .airtable_connector import AirtableConnector
|
|
from .dingtalk_ai_table_connector import DingTalkAITableConnector
|
|
from .asana_connector import AsanaConnector
|
|
from .imap_connector import ImapConnector
|
|
from .zendesk_connector import ZendeskConnector
|
|
from .seafile_connector import SeaFileConnector
|
|
from .rdbms_connector import RDBMSConnector
|
|
from .webdav_connector import WebDAVConnector
|
|
from .rest_api_connector import RestAPIConnector
|
|
from .config import BlobType, DocumentSource
|
|
from .models import Document, TextSection, ImageSection, BasicExpertInfo
|
|
from .exceptions import (
|
|
ConnectorMissingCredentialError,
|
|
ConnectorValidationError,
|
|
CredentialExpiredError,
|
|
InsufficientPermissionsError,
|
|
UnexpectedValidationError
|
|
)
|
|
|
|
__all__ = [
|
|
"BlobStorageConnector",
|
|
"RSSConnector",
|
|
"SlackConnector",
|
|
"GmailConnector",
|
|
"NotionConnector",
|
|
"ConfluenceConnector",
|
|
"DiscordConnector",
|
|
"DropboxConnector",
|
|
"GoogleDriveConnector",
|
|
"JiraConnector",
|
|
"SharePointConnector",
|
|
"OneDriveConnector",
|
|
"OutlookConnector",
|
|
"TeamsConnector",
|
|
"MoodleConnector",
|
|
"BlobType",
|
|
"DocumentSource",
|
|
"Document",
|
|
"TextSection",
|
|
"ImageSection",
|
|
"BasicExpertInfo",
|
|
"ConnectorMissingCredentialError",
|
|
"ConnectorValidationError",
|
|
"CredentialExpiredError",
|
|
"InsufficientPermissionsError",
|
|
"UnexpectedValidationError",
|
|
"AirtableConnector",
|
|
"AsanaConnector",
|
|
"ImapConnector",
|
|
"ZendeskConnector",
|
|
"SeaFileConnector",
|
|
"RDBMSConnector",
|
|
"WebDAVConnector",
|
|
"DingTalkAITableConnector",
|
|
"RestAPIConnector",
|
|
]
|