mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
### What problem does this PR solve? Incremental Seafile sync only ingests files whose modification time falls in the poll window; documents removed in Seafile were never removed from the knowledge base. This contributes to [#14362](https://github.com/infiniflow/ragflow/issues/14362) (datasource “sync deleted files” coordination). This PR adds a **slim snapshot** (`retrieve_all_slim_docs_perm_sync`) that enumerates current remote file IDs **without downloading content**, using the same logical IDs as full ingest (`seafile:{repo_id}:{file_id}`). When **`sync_deleted_files`** is enabled on incremental runs, **`SeaFile._generate`** returns **`(document_generator, file_list)`** so **`SyncBase`** can run **`cleanup_stale_documents_for_task`** and remove stale KB documents. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### What changed - **`common/data_source/seafile_connector.py`**: `SeaFileConnector` implements **`SlimConnectorWithPermSync`**; **`_list_files_recursive(..., filter_by_mtime=...)`** supports full-tree listing for snapshots; **`retrieve_all_slim_docs_perm_sync()`** reuses the same library/root scan as ingest and applies the same **size** ceiling; logging for snapshot start/end and counts. - **`rag/svr/sync_data_source.py`**: **`SeaFile._generate`** validates **`batch_size`**, captures **`end_ts`** before snapshot + **`poll_source`**, wraps slim retrieval in **`try`/`except`** ( **`file_list = None`** on failure so ingest continues), returns **`(generator, file_list)`**. - **`web/src/pages/user-setting/data-source/constant/index.tsx`**: **`syncDeletedFiles`** for Seafile in **`DataSourceFeatureVisibilityMap`**.