mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-07-01 08:15:44 +08:00
## Summary Fixes #14159 — files from different datasets can overwrite each other in Azure Blob storage. ## Problem Both `azure_spn_conn.py` and `azure_sas_conn.py` ignore the `bucket` parameter in all storage operations (`put`, `get`, `rm`, `obj_exist`, `get_presigned_url`). Files are stored flat using only the filename, so two datasets containing a file with the same name will overwrite each other. The MinIO and S3 implementations correctly use the bucket (typically the knowledge base ID) as a path prefix to create logical folder isolation: - MinIO: uses `use_prefix_path` decorator → `{orig_bucket}/{fnm}` - S3: uses `use_prefix_path` decorator → `{prefix_path}/{bucket}/{fnm}` ## Fix Prepend `{bucket}/` to the file path in all 5 operations across both Azure connector files: | File | Methods fixed | |------|---------------| | `azure_spn_conn.py` | `put`, `get`, `rm`, `obj_exist`, `get_presigned_url` | | `azure_sas_conn.py` | `put`, `get`, `rm`, `obj_exist`, `get_presigned_url` | This matches the existing convention where `bucket` is the knowledge base ID used as a directory prefix. ## ⚠️ Migration Note Existing Azure SPN/SAS deployments have files stored without the bucket prefix. After this fix, new files will be stored under `{bucket}/{filename}` while existing files remain at `{filename}`. A one-time migration script or manual file move may be needed for existing deployments. New deployments are unaffected. ## Testing - Verified the fix is consistent across all 5 methods in both files - The `health()` method is intentionally left unchanged as it uses a hardcoded test filename without bucket semantics Co-authored-by: Jin Hai <haijin.chn@gmail.com>