mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
### What problem does this PR solve? Closes #15330. RAGFlow had no connector for OneDrive / OneDrive for Business. Users who store working documents in OneDrive could not index them into a knowledge base without manually downloading and re-uploading files. This PR adds a net-new OneDrive data source that: - Authenticates against Microsoft Graph with the same MSAL client-credentials flow already used by the SharePoint and Teams connectors (no new auth primitives). - Enumerates every drive visible to the service principal and pages through `/drives/{id}/root/delta`, persisting `@odata.deltaLink` values per drive so subsequent syncs only fetch changed items. - Optionally narrows ingestion to a sub-folder (`folder_path`) without needing a separate code path. - Surfaces typed errors on the validation probe (`GET /drives?$top=1`): 401 → `ConnectorMissingCredentialError`, 403 → `InsufficientPermissionsError` (with a `Files.Read.All` hint), 5xx → `UnexpectedValidationError`. - Filters folders, soft-deleted items, and unsupported extensions (`.pdf .docx .doc .xlsx .xls .pptx .ppt .txt .md .csv`). #### Files | File | Change | |------|--------| | `common/data_source/onedrive_connector.py` | **New** — `OneDriveConnector` + `OneDriveCheckpoint`. | | `common/data_source/config.py` | `DocumentSource.ONEDRIVE = "onedrive"`. | | `common/constants.py` | `FileSource.ONEDRIVE = "onedrive"`. | | `common/data_source/__init__.py` | Export `OneDriveConnector`. | | `rag/svr/sync_data_source.py` | `OneDrive(SyncBase)` with `batch_size` normalisation; registered in `func_factory`. | | `web/src/pages/user-setting/data-source/constant/index.tsx` | `DataSourceKey.ONEDRIVE`, visibility map (`syncDeletedFiles: true`), info entry, form fields (tenant_id, client_id, client_secret, folder_path, batch_size), default values. | | `web/src/locales/en.ts`, `web/src/locales/zh.ts` | `onedriveDescription` + 4 tooltip keys (EN + ZH). | | `test/unit_test/data_source/test_onedrive_connector_unit.py` | **New** — 13 unit tests (`p1`/`p2`) covering auth, validation, checkpoint helpers, and document filtering. | #### Required Azure AD permission `Files.Read.All` (Application, admin-granted). #### Out of scope - Interactive end-user OAuth (delegated permissions) — the connector uses app-only credentials, consistent with the SharePoint / Teams precedent. - Binary download of file contents — the sync layer emits `Document`s carrying `webUrl` + metadata; bytes are hydrated downstream by the parse pipeline. ### Type of change - [x] New Feature (non-breaking change which adds functionality)
(1). Deploy RAGFlow services and images
https://ragflow.io/docs/build_docker_image
(2). Configure the required environment for testing
Install Python dependencies (including test dependencies):
uv sync --python 3.13 --only-group test --no-default-groups --frozen
Activate the environment:
source .venv/bin/activate
Install SDK:
uv pip install sdk/python
Modify the .env file: Add the following code:
COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.25.6 #Replace with the image you are using
Start the container(wait two minutes):
docker compose -f docker/docker-compose.yml up -d
(3). Test Elasticsearch
a) Run sdk tests against Elasticsearch:
export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380 # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Elasticsearch:
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
(4). Test Infinity
Modify the .env file:
DOC_ENGINE=${DOC_ENGINE:-infinity}
Start the container:
docker compose -f docker/docker-compose.yml down -v
docker compose -f docker/docker-compose.yml up -d
a) Run sdk tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api