9 Commits

Author SHA1 Message Date
kpdev
b35266e9a5 Return 4xx when file download storage blob is missing (#15371)
## Summary

Fixes [#15369](https://github.com/infiniflow/ragflow/issues/15369) —
`GET /api/v1/files/<file_id>` calls `make_response(None)` when both
primary and fallback storage lookups return empty, causing HTTP 500.

## Related Issue

Fixes #15369

## Change Type

- [x] Bug fix
- [x] Regression tests

## What Changed

- **`file_api.download()`** — after fallback `STORAGE_IMPL.get`, return
`get_error_data_result(message="This file is empty.")` when `not blob`,
matching document REST download semantics.

## Files Changed

| File | Change |
|------|--------|
| `api/apps/restful_apis/file_api.py` | Empty-blob guard before
`make_response()` |
| `test/testcases/test_web_api/test_file_app/test_file_routes_unit.py` |
Regression test |

## Validation

```bash
cd /root/gittensor/ragflow
pytest test/testcases/test_web_api/test_file_app/test_file_routes_unit.py::test_download_missing_blob_returns_error -v
pytest test/testcases/test_web_api/test_file_app/test_file_routes_unit.py::test_download_falls_back_to_document_storage -v
```

## Test Plan

- [x] Both storage paths empty → `"This file is empty."` (no
`make_response(None)`)
- [x] Existing fallback success test still passes
- [ ] CI green
2026-06-01 19:08:06 +08:00
akie
c11650bb4c Fix IDOR: Add permission checks to file ancestry endpoints (#14725)
Close #14292

## Issue

File ancestry endpoints return folder metadata without validating tenant
permissions, allowing any authenticated user to query arbitrary
`file_id` values across tenant boundaries.

## Affected Endpoints
- `GET /v1/file/parent_folder?file_id={file_id}`
- `GET /v1/file/all_parent_folder?file_id={file_id}`  
- `GET /api/v1/files/{id}/ancestors`

## Root Cause

These endpoints **skip the permission check** that other file operations
(Delete, Download, Move) perform.

## Expected Permission Check

All file operations should follow this 3-step validation:

- Check file.tenant_id
- Check if user_id belongs to this tenant (via user_tenant join table)
- Check KB permission type (team permission)


**Code reference:** This is implemented in `checkFileTeamPermission()`
and used by Delete/Download/Move, but **missing** from
GetParentFolder/GetAllParentFolders.

## Reproduction

```bash
# User B (tenant: BBB) accessing User A's file (tenant: AAA)
curl -H "Authorization: Bearer USER_B_TOKEN" \
  "http://localhost:9384/v1/file/parent_folder?file_id=AAA_FILE_123"

# Result: Returns User A's folder metadata 
# Expected: "No authorization." 
Fix
Pass userID from handler to service and call checkFileTeamPermission() — same as Download/Delete/Move handlers.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 16:03:23 +08:00
jony376
6547751936 Fix: missing authorization checks in /files/link-to-datasets (#14649)
### Related issues
Closes #14648

### What problem does this PR solve?

This PR fixes an authorization flaw in `POST /files/link-to-datasets`.

Before this change, the endpoint only checked whether the supplied
`file_ids` and `kb_ids` existed. It did not verify whether the
authenticated user was actually allowed to access those files or target
datasets. As a result, an authenticated user who knew valid IDs could
relink another user's files to arbitrary datasets.

This was especially risky because the relinking flow is state-changing:
the background worker removes existing file-document mappings and then
recreates documents under the attacker-supplied dataset IDs.

This change makes the route enforce the same permission model already
used by nearby file and document operations:

- each resolved file must pass `check_file_team_permission(...)`
- each target dataset must pass `check_kb_team_permission(...)`
- authorization is enforced before scheduling background relinking work

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Testing

- Added regression coverage in
`test/testcases/test_web_api/test_file_app/test_file2document_routes_unit.py`
- Covered:
  - unauthorized file access is rejected
  - unauthorized dataset access is rejected
- existing success path still returns immediately after scheduling
background work
- Attempted to run:
- `python -m pytest
test\\testcases\\test_web_api\\test_file_app\\test_file2document_routes_unit.py
-q`
- Local execution in this workspace is currently blocked by missing test
dependencies during bootstrap, including `ragflow_sdk`

---------

Co-authored-by: jony376 <jony376@gmail.com>
2026-05-08 13:49:23 +08:00
buua436
47129fdd08 Fix: optimize file batch delete (#14473)
### What problem does this PR solve?

optimize file batch delete

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-30 11:00:39 +08:00
Wang Qi
b684c89950 Add backward compat APIs (#14427)
### What problem does this PR solve?

Add backward compat APIs:

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 15:15:49 +08:00
Wang Qi
aae45b959b Refactor: API file2document (#14306)
Refactor: API file2document
2026-04-23 11:40:45 +08:00
Syed Shahmeer Ali
ff92b5575b Fix: /file2document/convert blocks event loop on large folders causing 504 timeout (#13784)
Problem

The /file2document/convert endpoint ran all file lookups, document
deletions, and insertions synchronously inside the
request cycle. Linking a large folder (~1.7GB with many files) caused
504 Gateway Timeout because the blocking DB loop
  held the HTTP connection open for too long.

  Fix

- Extracted the heavy DB work into a plain sync function _convert_files
- Inputs are validated and folder file IDs expanded upfront (fast path)
- The blocking work is dispatched to a thread pool via
get_running_loop().run_in_executor() and the endpoint returns 200
  immediately
- Frontend only checks data.code === 0 so the response change
(file2documents list → True) has no impact

  Fixes #13781

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:45:10 +08:00
Yongteng Lei
3d10e2075c Refa: files /file API to RESTFul style (#13741)
### What problem does this PR solve?

Files /file API to RESTFul style.

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-24 19:24:41 +08:00
6ba3i
22c4d72891 tests: improve RAGFlow coverage based on Codecov report (#13219)
### What problem does this PR solve?

Codecov’s coverage report shows that several RAGFlow code paths are
currently untested or under-tested. This makes it easier for regressions
to slip in during refactors and feature work.
This PR adds targeted automated tests to cover the files and branches
highlighted by Codecov, improving confidence in core behavior while
keeping runtime functionality unchanged.

### Type of change

- [x] Other (please describe): Test coverage improvement (adds/extends
unit and integration tests to address Codecov-reported gaps)
2026-02-26 19:03:26 +08:00