mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
### What problem does this PR solve? Closes #15330. RAGFlow had no connector for OneDrive / OneDrive for Business. Users who store working documents in OneDrive could not index them into a knowledge base without manually downloading and re-uploading files. This PR adds a net-new OneDrive data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Enumerates every drive visible to the service principal and pages through `/drives/{id}/root/delta`, persisting `@odata.deltaLink` values per drive so subsequent syncs only fetch changed items. - Optionally narrows ingestion to a sub-folder (`folder_path`) without needing a separate code path. - Surfaces typed errors on the validation probe (`GET /drives?$top=1`): 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with a `Files.Read.All` hint), 5xx → `UnexpectedValidationError`. - Filters folders, soft-deleted items, and unsupported extensions (`.pdf .docx .doc .xlsx .xls .pptx .ppt .txt .md .csv`). #### Files | File | Change | |------|--------| | `common/data_source/onedrive_connector.py` | **New** — `OneDriveConnector` + `OneDriveCheckpoint`. | | `common/data_source/config.py` | `DocumentSource.ONEDRIVE = "onedrive"`. | | `common/constants.py` | `FileSource.ONEDRIVE = "onedrive"`. | | `common/data_source/__init__.py` | Export `OneDriveConnector`. | | `rag/svr/sync_data_source.py` | `OneDrive(SyncBase)` with `batch_size` normalisation; registered in `func_factory`. | | `web/src/pages/user-setting/data-source/constant/index.tsx` | `DataSourceKey.ONEDRIVE`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder_path, batch_size), default values. | | `web/src/locales/en.ts`, `web/src/locales/zh.ts` | `onedriveDescription` + 4 tooltip keys (EN + ZH). | | `test/unit_test/data_source/test_onedrive_connector_unit.py` | **New** — 13 unit tests (`p1`/`p2`) covering auth, validation, checkpoint helpers, and document filtering. | #### Required Azure AD permission `Files.Read.All` (Application, admin-granted). #### Out of scope - Interactive end-user OAuth (delegated permissions) — the connector uses app-only credentials, consistent with the SharePoint / Teams precedent. - Binary download of file contents — the sync layer emits `Document`s carrying `webUrl` + metadata; bytes are hydrated downstream by the parse pipeline. ### Type of change - [x] New Feature (non-breaking change which adds functionality)
95 lines
3.3 KiB
Python
95 lines
3.3 KiB
Python
|
|
"""
|
|
Thanks to https://github.com/onyx-dot-app/onyx
|
|
|
|
Content of this directory is under the "MIT Expat" license as defined below.
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
of this software and associated documentation files (the "Software"), to deal
|
|
in the Software without restriction, including without limitation the rights
|
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
copies of the Software, and to permit persons to whom the Software is
|
|
furnished to do so, subject to the following conditions:
|
|
|
|
The above copyright notice and this permission notice shall be included in all
|
|
copies or substantial portions of the Software.
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
SOFTWARE.
|
|
"""
|
|
|
|
from .blob_connector import BlobStorageConnector
|
|
from .rss_connector import RSSConnector
|
|
from .slack_connector import SlackConnector
|
|
from .gmail_connector import GmailConnector
|
|
from .notion_connector import NotionConnector
|
|
from .confluence_connector import ConfluenceConnector
|
|
from .discord_connector import DiscordConnector
|
|
from .dropbox_connector import DropboxConnector
|
|
from .google_drive.connector import GoogleDriveConnector
|
|
from .jira.connector import JiraConnector
|
|
from .sharepoint_connector import SharePointConnector
|
|
from .onedrive_connector import OneDriveConnector
|
|
from .teams_connector import TeamsConnector
|
|
from .moodle_connector import MoodleConnector
|
|
from .airtable_connector import AirtableConnector
|
|
from .dingtalk_ai_table_connector import DingTalkAITableConnector
|
|
from .asana_connector import AsanaConnector
|
|
from .imap_connector import ImapConnector
|
|
from .zendesk_connector import ZendeskConnector
|
|
from .seafile_connector import SeaFileConnector
|
|
from .rdbms_connector import RDBMSConnector
|
|
from .webdav_connector import WebDAVConnector
|
|
from .rest_api_connector import RestAPIConnector
|
|
from .config import BlobType, DocumentSource
|
|
from .models import Document, TextSection, ImageSection, BasicExpertInfo
|
|
from .exceptions import (
|
|
ConnectorMissingCredentialError,
|
|
ConnectorValidationError,
|
|
CredentialExpiredError,
|
|
InsufficientPermissionsError,
|
|
UnexpectedValidationError
|
|
)
|
|
|
|
__all__ = [
|
|
"BlobStorageConnector",
|
|
"RSSConnector",
|
|
"SlackConnector",
|
|
"GmailConnector",
|
|
"NotionConnector",
|
|
"ConfluenceConnector",
|
|
"DiscordConnector",
|
|
"DropboxConnector",
|
|
"GoogleDriveConnector",
|
|
"JiraConnector",
|
|
"SharePointConnector",
|
|
"OneDriveConnector",
|
|
"TeamsConnector",
|
|
"MoodleConnector",
|
|
"BlobType",
|
|
"DocumentSource",
|
|
"Document",
|
|
"TextSection",
|
|
"ImageSection",
|
|
"BasicExpertInfo",
|
|
"ConnectorMissingCredentialError",
|
|
"ConnectorValidationError",
|
|
"CredentialExpiredError",
|
|
"InsufficientPermissionsError",
|
|
"UnexpectedValidationError",
|
|
"AirtableConnector",
|
|
"AsanaConnector",
|
|
"ImapConnector",
|
|
"ZendeskConnector",
|
|
"SeaFileConnector",
|
|
"RDBMSConnector",
|
|
"WebDAVConnector",
|
|
"DingTalkAITableConnector",
|
|
"RestAPIConnector",
|
|
]
|