Files
ragflow/rag/utils
d 🔹 057806d7f1 fix: prepend bucket prefix to Azure SPN and SAS storage paths (#14185)
## Summary

Fixes #14159 — files from different datasets can overwrite each other in
Azure Blob storage.

## Problem

Both `azure_spn_conn.py` and `azure_sas_conn.py` ignore the `bucket`
parameter in all storage operations (`put`, `get`, `rm`, `obj_exist`,
`get_presigned_url`). Files are stored flat using only the filename, so
two datasets containing a file with the same name will overwrite each
other.

The MinIO and S3 implementations correctly use the bucket (typically the
knowledge base ID) as a path prefix to create logical folder isolation:
- MinIO: uses `use_prefix_path` decorator → `{orig_bucket}/{fnm}`
- S3: uses `use_prefix_path` decorator → `{prefix_path}/{bucket}/{fnm}`

## Fix

Prepend `{bucket}/` to the file path in all 5 operations across both
Azure connector files:

| File | Methods fixed |
|------|---------------|
| `azure_spn_conn.py` | `put`, `get`, `rm`, `obj_exist`,
`get_presigned_url` |
| `azure_sas_conn.py` | `put`, `get`, `rm`, `obj_exist`,
`get_presigned_url` |

This matches the existing convention where `bucket` is the knowledge
base ID used as a directory prefix.

## ⚠️ Migration Note

Existing Azure SPN/SAS deployments have files stored without the bucket
prefix. After this fix, new files will be stored under
`{bucket}/{filename}` while existing files remain at `{filename}`. A
one-time migration script or manual file move may be needed for existing
deployments. New deployments are unaffected.

## Testing

- Verified the fix is consistent across all 5 methods in both files
- The `health()` method is intentionally left unchanged as it uses a
hardcoded test filename without bucket semantics

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-07 20:48:32 +08:00
..
2025-12-29 12:01:18 +08:00
2026-04-30 14:52:43 +08:00
2025-12-29 12:01:18 +08:00
2025-12-29 12:01:18 +08:00
2026-04-23 12:51:55 +08:00
2025-12-29 12:01:18 +08:00
2025-12-29 12:01:18 +08:00