Commit Graph

6581 Commits

Author SHA1 Message Date
Hz_
074c331cdf fix(go-api): sync document handler interface and enforce preview acce… (#15688)
### Description

This PR syncs the `documentServiceIface` interface and introduces
handler methods for document preview, artifact fetching, and downloading
in the Go API. It also ensures that strict dataset alignment and access
checks are enforced when retrieving or downloading documents.

Furthermore, this PR introduces comprehensive unit tests for both the
newly added Handler and Service methods to ensure robustness and prevent
future regressions.

### Key Changes
* **Router & Handler Integration**: 
  * Added and wired new API endpoints in `internal/router/router.go`.
* Synchronized the `documentServiceIface` with `GetDocumentArtifact`,
`GetDocumentPreview`, and `DownloadDocument`.
* Implemented handlers for these endpoints in
`internal/handler/document.go`.
* **Access & Validation Enforcement**: 
* Refactored `internal/service/document.go` to strictly check if a
document belongs to the requested dataset before allowing downloads or
previews.
* Added robust artifact file sanitization (`sanitizeArtifactFilename`)
and attachment handling (`shouldForceArtifactAttachment`).
* **Comprehensive Unit Testing**:
* **Handler Layer (`internal/handler/document_test.go`)**: Added mock
service implementations and Gin router tests covering success,
not-found, and internal error states for all 3 new endpoints.
* **Service Layer (`internal/service/document_test.go`)**: Added
extensive business logic tests including dataset mismatch checks,
non-existent document checks, and artifact file validation.
2026-06-08 11:37:06 +08:00
Lynn
b05d5a5228 Feat: get model list from remote (#15711)
### What problem does this PR solve?

Feat:
- Get model list from remote provider. 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-08 11:02:40 +08:00
kpdev
b0a45809ff fix(onedrive): normalize folder_path for Graph delta URL (#15503)
Prepend a leading slash and reject `..` segments so scoped OneDrive
delta queries use `root:/path:/delta` instead of `root:path:/delta`.

Fixes #15500

### What problem does this PR solve?

The OneDrive connector builds Microsoft Graph delta URLs from optional
`config.folder_path`. When users enter a path without a leading slash
(e.g. `Documents/Reports` instead of `/Documents/Reports`), the
connector produces a malformed URL such as
`root:Documents/Reports:/delta`. Per [Microsoft Graph path-based
addressing](https://learn.microsoft.com/en-us/graph/onedrive-addressing-driveitems),
the segment after `root:` must start with `/` (e.g.
`root:/Documents/Reports:/delta`). Sync and validation then fail or
return no documents, which is hard to diagnose from the UI because the
optional folder field does not enforce the format.

This PR normalizes `folder_path` at connector construction time (prepend
`/`, trim whitespace and trailing slashes) and rejects `..` segments
before any Graph request is made.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 09:56:47 +08:00
Jack
5a04ac0864 feat: Dify-compatible retrieval API endpoint (#15704)
## Summary

Dify-compatible retrieval API for external knowledge base integration.

## Changes

- **New handler**: DifyRetrievalHandler with POST/GET
/api/v1/dify/retrieval
- **Health check**: GET /api/v1/dify/retrieval/health
- **Full pipeline**: KB validation -> permission check -> embedding ->
metadata filter -> chunk retrieval -> child chunk aggregation ->
optional KG search -> response assembly
- **12 tests** covering all paths (success, errors, metadata filter, KG
mode)
- **Testability**: Handler dependencies defined as interfaces
(KBServiceIface, ModelServiceIface, etc.)

## Files

| File | Type |
|------|------|
| internal/handler/dify_retrieval_handler.go | New — handler +
interfaces |
| internal/handler/dify_retrieval_handler_test.go | New — 12 tests |
| internal/router/router.go | Modified — route registration |
| cmd/server_main.go | Modified — handler wiring |
| internal/service/kg/pipeline.go | Modified — SetChatModel/SetEmbModel
|
| internal/service/kg/retrieval.go | New — helper functions |
| internal/service/kg/scoring.go | Moved from service package |
| internal/service/kg/search.go | New — KG search functions |
| internal/service/kg/types.go | New — type definitions |

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 21:16:25 +08:00
Hz_
1deb1313d2 feat(go-cli): support batch model add/remove and optional embedding dimension (#15631)
## Summary

  This PR improves the Go CLI in two areas:

1. It adds batch model management support, allowing multiple models to
be added or removed in a single command.
2. It makes the `dimension` argument optional for the `embed text`
command.

These changes keep the existing single-model and explicit-dimension
behaviors compatible while making the CLI more convenient for common
workflows.

  ## What Changed

  ### 1. Batch model add/remove support

The CLI now supports operating on multiple model names provided in a
single quoted string.

  Supported commands include:

```
add model 'x1 x2 x3' to provider 'vllm' instance 'test' with tokens 1024 chat think vision, token 2048 chat, token 1024 think vision;

drop model 'x1 x2 x3' from 'vllm' 'test';

remove model 'x1 x2 x3' from 'vllm' 'test';

```

For add model, each config segment after with is matched to the
corresponding model name by position.

  Example mapping:

  - x1 -> tokens 1024, chat + vision, thinking=true
  - x2 -> tokens 2048, chat
  - x3 -> tokens 1024, vision, thinking=true

  The existing single-model syntax remains supported.

  ### 2. Optional embedding dimension

Previously, the Go CLI required dimension to be explicitly provided for
embed text.

  Before:

embed text 'what is rag' 'who are you' with 'model@test@provider'
dimension 8192;

  Now both forms are supported:

embed text 'what is rag' 'who are you' with 'model@test@provider'
dimension 8192;

  embed text 'what is rag' 'who are you' with 'model@test@provider';

When omitted, the CLI leaves dimension unset and relies on
provider/backend behavior.


  ## Tests

  Added parser tests covering:

  - Multiple models with multiple config segments
  - Model type deduplication
  - Model/config count mismatch
  - Drop/remove multiple models
  - Optional embedding dimension parsing
2026-06-05 19:31:06 +08:00
balibabu
9c14e3f377 Fix: When adding a chat in the main interface, a warning will automatically pop up (#15685)
### What problem does this PR solve?

Fix: When adding a chat in the main interface, a warning will
automatically pop up (even if embedding and LLM model have already been
configured).
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-05 19:09:22 +08:00
Jack
ea79d65d08 feat: add KGSearchRetrieval for full KG pipeline (N-hop, scoring, query_rewrite, community) (#15690)
## Summary

`KGSearchRetrieval` composes entity search, type search, relation
search, N-hop analysis, score fusion, LLM-based query\_rewrite, and
community reports into a single synthetic chunk for KG-enhanced
retrieval.

### Components

| Component | Source | Status |
|-----------|--------|--------|
| Entity/relation/community search | Direct `DocEngine.Search` calls | 
|
| N-hop analysis + score fusion | `common.AnalyzeNHopPaths` /
`DoubleHitBoost` / `FuseRelationScores` |  #15666 |
| Query rewrite prompt + parser | `common.BuildQueryRewritePrompt` /
`ParseQueryRewriteResponse` |  #15669 |
| Token budget | `common.BuildKGContent` + `NumTokensFromString` | 
#15666 |
| LLM query rewrite integration | `queryRewrite` function with fallback
|  |

### Testing

11 tests (pure function + mock engine):

```
=== RUN   TestKgEntityFromChunk_Basic          --- PASS
=== RUN   TestKgEntityFromChunk_ScoreFallback  --- PASS
=== RUN   TestKgEntityFromChunk_MissingFields  --- PASS
=== RUN   TestKgRelationFromChunk_Basic        --- PASS
=== RUN   TestKgRelationFromChunk_MissingFrom  --- PASS
=== RUN   TestSearchKGTypeSamples_Success      --- PASS
=== RUN   TestSearchKGTypeSamples_Empty        --- PASS
=== RUN   TestKGSearchRetrieval_Basic          --- PASS
=== RUN   TestKGSearchRetrieval_NoEntities     --- PASS
=== RUN   TestQueryRewrite_Fallback            --- PASS
=== RUN   TestQueryRewrite_EmptyQuestion       --- PASS
```

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 18:00:27 +08:00
Wang Qi
aa9545e4c9 Revert "fix: duplicate document ingest guard" (#15707)
Reverts infiniflow/ragflow#15638
2026-06-05 17:45:29 +08:00
Wang Qi
214ee319f8 Revert "fix(api): authorize owner_ids for list chats and search apps (#14775) (#15698)
This reverts PR #14775  commit 5a5e766386.
2026-06-05 17:26:02 +08:00
Yufeng He
6cba5a544a fix(agent): skip empty switch conditions (#15691)
## What
- make `Switch` ignore conditions that have no evaluable items
- add a regression for blank `cpn_id` items falling through to the else
branch
- keep the existing non-empty `and` condition behavior covered

Fixes #15643.

## Verified
- `python -m py_compile agent\component\switch.py
test\unit_test\agent\component\test_switch.py`
- `python -m pytest test\unit_test\agent\component\test_switch.py -q` ->
`2 passed`
- `python -m ruff check agent\component\switch.py
test\unit_test\agent\component\test_switch.py`
- `git diff --check`

I also checked `python -m ruff format --check` on the touched files. It
would reformat pre-existing style in `agent/component/switch.py` beyond
this bug fix, so I kept the patch scoped instead of reformatting the
whole file.
2026-06-05 17:20:44 +08:00
Liu An
aab01af6f2 fix: Update Dockerfile and release workflow to use GitHub mirror instead of Gitee (#15700)
### What problem does this PR solve?

Update Dockerfile and release workflow to use GitHub mirror instead of
Gitee

### Type of change

- [x] Other (please describe): CI
2026-06-05 16:10:52 +08:00
tmimmanuel
f78ef328bb Go: implement Bedrock embeddings (#15543)
### What problem does this PR solve?

Fixes #15542.

AWS Bedrock support for the Go model provider layer was added in #15166,
but embedding support was intentionally left out of scope and
`BedrockModel.Embed(...)` still returned the `no such method` sentinel.
This PR implements Bedrock text embeddings under the umbrella provider
tracker #14736.

### What this PR includes

- `internal/entity/models/bedrock.go`: implement
`BedrockModel.Embed(...)` through Bedrock Runtime `InvokeModel` with
existing SigV4 auth, region resolution, and runtime URL helpers.
- Titan embeddings: supports `amazon.titan-embed-text-v1` and
`amazon.titan-embed-text-v2:0`; v2 forwards `EmbeddingConfig.Dimension`
as `dimensions` when provided, while v1 keeps the payload minimal.
- Cohere embeddings: supports `cohere.embed-english-v3`,
`cohere.embed-multilingual-v3`, and `cohere.embed-v4:0`; batches input
texts and maps returned vectors to RAGFlow `EmbeddingData` in input
order.
- `conf/models/bedrock.json`: adds the `embedding` URL suffix (`invoke`)
and Bedrock embedding model entries.
- `internal/entity/models/bedrock_test.go`: adds unit tests for Titan,
Cohere, typed Cohere responses, validation, empty input, unsupported
models, and HTTP error propagation.

Reference docs:

- Bedrock InvokeModel API:
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html
- Titan Text Embeddings:
https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html
- Cohere Embed models on Bedrock:
https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- [x] `jq empty conf/models/bedrock.json`
- [x] `git diff --check`
- [x] `go test ./internal/entity/models/... -run Bedrock -count=1`
- [x] `go test ./internal/entity/models/... -run '^$' -count=1`
- [x] `go test ./internal/entity/models/... -run Bedrock -race -count=1`

Note: `go test ./internal/entity/models/... -count=1` currently fails in
unrelated existing Astraflow coverage
(`TestAstraflowEmbedReturnsNoSuchMethod` panics in
`internal/entity/models/astraflow.go`). The Bedrock-specific tests and
compile-only package check pass.
2026-06-05 13:26:32 +08:00
web-dev0521
b8db200757 feat(go-api): implement MCP server management endpoints (#15281)
## Summary

Ports the MCP (Model Context Protocol) server management endpoints that
power `web/src/pages/user-setting/mcp/` from Python
(`api/apps/restful_apis/mcp_api.py`) to Go. There were no MCP routes in
the Go server before this change.

Closes #15275 (subtask of #15240).

## Endpoints implemented (base path `/api/v1`)

| Method | Path | Description |
|--------|------|-------------|
| GET | `/mcp/servers` | List tenant servers (keyword / order /
pagination) |
| POST | `/mcp/servers` | Create a server |
| GET | `/mcp/servers/{mcp_id}` | Get one (`?mode=download` exports
config) |
| PUT | `/mcp/servers/{mcp_id}` | Update a server |
| DELETE | `/mcp/servers/{mcp_id}` | Delete a server |
| POST | `/mcp/import` | Bulk import from JSON config |
| POST | `/mcp/servers/{mcp_id}/test` | Connect + list tools (see notes)
|

## Implementation

Follows the existing `handler → service → dao` layering (per PR #14790):

- **entity** (`internal/entity/mcp.go`): added `MCPServerType` constants
and `IsValidMCPServerType` over the existing `MCPServer` model.
- **dao** (`internal/dao/mcp.go`): new `MCPServerDAO` with tenant-scoped
CRUD, a keyword filter, and a **whitelisted order-column map** (guards
against SQL injection via the caller-supplied `orderby`).
- **service** (`internal/service/mcp.go`): new `MCPService` —
list/get/export/create/update/delete/import/test — mirroring
`MCPServerService` and the `mcp_api` request validation, with sentinel
errors for clean code mapping.
- **handler** (`internal/handler/mcp.go`): new `MCPHandler` with the
seven handlers and Python-compatible response codes.
- **router / server_main**: registered the `/mcp` group and wired the
handler.

## Deviations from Python (documented in code)

1. **Bulk import is at `POST /mcp/import`, not `/mcp/servers/import`.**
gin (v1.9.1) cannot register a static segment and a path param at the
same tree node, so `/mcp/servers/import` would collide with
`/mcp/servers/:mcp_id` and panic at startup. The frontend should call
`/mcp/import`.
2. **No live tool discovery on create/update/import.** The Python path
runs `get_mcp_tools` over SSE / streamable-HTTP and stores
`variables.tools`. The Go server has no MCP client yet, so these persist
`variables`/`headers` but leave `variables.tools` unpopulated.
3. **`/test` returns a data error (`ErrMCPTestUnsupported`)** until a Go
MCP client lands. Per the issue, the live-connection path is scoped as a
follow-up; the handler still validates `url` + `server_type`.

## Testing

- Added `internal/service/mcp_test.go` covering `IsValidMCPServerType`
and the `TestServer` validation/short-circuit paths (no DB required).
- No Go toolchain was available in the dev environment, so `go build
./...` / `go vet ./...` verification is left to CI.

## Follow-ups

- Go MCP client (SSE / streamable-HTTP) to enable live tool discovery
and the real `/test` behavior.
- Reconcile the `/mcp/import` vs `/mcp/servers/import` path with the
frontend.

---------
2026-06-05 13:25:09 +08:00
web-dev0521
1d7e45115b feat(connectors): add Salesforce CRM data source connector (#15462)
### What problem does this PR solve?

Closes #15461.

RAGFlow had no way to ingest Salesforce CRM data, so support / sales
teams couldn't ground responses on live Accounts, Contacts,
Opportunities, Cases, or Knowledge articles. This adds a first-class
Salesforce data source connector that authenticates against a Connected
App via OAuth 2.0 client-credentials, queries selected SObjects via
SOQL, and turns each record into an indexable document with incremental
sync.

**Highlights**
- `common/data_source/salesforce_connector.py`: new
`SalesforceConnector` (`CheckpointedConnectorWithPermSync` +
`SlimConnectorWithPermSync`).
- OAuth 2.0 client-credentials flow; canonical `instance_url` from the
token response so multi-pod orgs route correctly.
- Per-object `SystemModstamp` cursor stored in
`SalesforceCheckpoint.cursors` — a failure mid-object doesn't rewind
sibling objects, and re-syncs only fetch changed rows.
- Deterministic record-to-text formatter (sorted keys) so SOQL field
reordering on the server doesn't mark every row "changed" on each poll.
- `_get_json` raises on non-2xx so 429 / 5xx never silently advance the
checkpoint past missing data.
- `Knowledge__kav` is in the default object set but is skipped silently
when the org doesn't have Salesforce Knowledge enabled (404 on
describe).
- Slim-doc IDs are scoped as `<Object>/<Id>` so prune deletes can't
collide across object types.
- `common/constants.py`, `common/data_source/config.py`,
`common/data_source/__init__.py`: register `salesforce` in `FileSource`
/ `DocumentSource` and export `SalesforceConnector`.
- `rag/svr/sync_data_source.py`: new `Salesforce(SyncBase)` class routed
through `load_from_checkpoint` (poll_source would re-walk every object
each run) and added to `func_factory`.
- Frontend:
- `web/src/pages/user-setting/data-source/constant/index.tsx`: new
`DataSourceKey.SALESFORCE`, form fields (instance URL, client ID/secret,
objects, api_version, batch size), `syncDeletedFiles` capability,
default form values, and tile entry with the new icon.
  - `web/src/locales/{en,zh}.ts`: description + per-field tooltips.
- `web/src/assets/svg/data-source/salesforce.svg`: 48x48 brand-style
icon to match the other Microsoft / cloud tiles.

**Verification**
- `npm run build` (vite + esbuild) passes (1m 26s).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-05 13:24:36 +08:00
Jack
e629c0203b feat: add KG entity/relation/community search functions (#15689)
## Summary

Knowledge Graph search functions for entity, relation, community report,
and type-samples retrieval. Uses DocEngine.SelectFields (PR #15684) for
KG-specific fields.

### Functions

| Function | Description |
|----------|-------------|
| `SearchKGEntities` | Hybrid search over KG entities (dense + text +
fusion) |
| `SearchKGEntitiesByTypes` | Entity search filtered by
`entity_type_kwd` |
| `SearchKGRelations` | Hybrid search over KG relations |
| `SearchKGCommunityReports` | Community report search by entity names |
| `SearchKGTypeSamples` | Type→entities mapping for query_rewrite |

### Internal helpers

| Helper | Description |
|--------|-------------|
| `buildHybridExpr` | Shared dense+text+fusion expression construction |
| `buildKGDenseExpr` | Wraps `Embed()` call for vector search |
| `Parse*` | Convert raw chunks to typed structs |

### Testing

35 tests (pure function + mock integration)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 13:23:04 +08:00
Haruko386
4b2af1347c feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641)
feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641)
2026-06-05 13:22:23 +08:00
buua436
71649db3b0 fix: prevent duplicated post-think text (#15651)
### What problem does this PR solve?
This fixes duplicated post-think text in streamed chat responses. When
the model emits text immediately after `</think>`, the stream state now
advances its cursor correctly so the same visible prefix is not emitted
twice.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-05 13:21:26 +08:00
Jack
f6ff862a24 fix: restore case-insensitive contains/not contains/not in and consolidate metadata filter pipeline (#15686)
## Summary

This PR fixes case-sensitivity regressions introduced in #15656 and
consolidates the metadata filtering pipeline by removing the duplicate
`applySingleCondition` adapter layer.

### Bug fixes
1. **contains / not contains**: restored case-insensitive matching (was
lost when `applySingleCondition` was replaced by
`common.MetaFilter.matchValue` which lacked `strings.ToLower`)
2. **not in**: restored case-insensitive matching (was lost for same
reason; uses `strings.EqualFold`)
3. **!= with date filter values**: non-date metadata values now
correctly match the `≠` operator (a non-date value IS not equal to any
date, but was returning false)

### Architecture
4. **Removed `applySingleCondition`** (65 lines) — the inline switch was
a duplicate of `common.MetaFilter` logic. `ApplyMetaFilter` now converts
conditions and delegates to `common.MetaFilter` once per filter set,
eliminating ~25 lines of duplicate AND/OR merge logic.
5. **Added `filterSet`** — O(n+m) hash-map fast path for `in`/`not in`
operators, replacing the O(n*m) linear scan in `matchValue`.
6. **Exported `NormalizeOperator`** from `common` for consistent
operator alias handling.

### Cleanup
7. Removed 18 lines of dead code (`matchValue`'s `in`/`not in` branches
already bypassed by `filterOut` delegation)
8. Fixed orphaned godoc comment for `convertOperator`
9. Fixed incorrect `filterSet` doc comment (claimed "matching EqualFold"
but used `strings.ToLower`)
10. Completed `convertToMetaCondition` operator normalization
documentation

### Testing
- 60 tests (24 service + 36 common), all passing
- New tests: `==`, `≠`, `>`, `<`, `≥`, `≤`, `empty`, `not empty` through
`ApplyMetaFilter`
- New tests: `<`, `≤`, `≠` through `MetaFilter`; `not-in-empty-list`
through `filterSet`
- All 18 `MetaFilter` tests pass; all 10 `filterSet` unit tests pass

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:47:55 +08:00
Jack
ee32d91aab feat: add EnrichChunksWithDocMetadata function to attach document metadata to chunks (#15659)
## Summary

Add `EnrichChunksWithDocMetadata` as a method on `MetadataService` that
attaches document metadata to retrieval chunks in-place. Equivalent to
Python's `enrich_chunks_with_document_metadata()` from
`api/utils/reference_metadata_utils.py`.

### Usage

```go
metadataSvc.EnrichChunksWithDocMetadata(chunks, tenantID, metadataFields)
```

### Changes

- **`service/metadata.go`**: Added `EnrichChunksWithDocMetadata` method
- **`service/enrich_metadata_test.go`** (new): 7 test cases

### Algorithm

1. Collect unique `(kb_id, doc_id)` pairs from chunks
2. Fetch metadata from ES via `SearchMetadata(kbID, tenantID, docIDs)`
3. Attach `document_metadata` field to each matching chunk
4. Optionally filter to specified `metadataFields`

### Testing

All 7 tests pass:

```
=== RUN   TestEnrichChunksWithDocMetadata_NoChunks       --- PASS
=== RUN   TestEnrichChunksWithDocMetadata_EmptyChunks     --- PASS
=== RUN   TestEnrichChunksWithDocMetadata_EmptyDocID      --- PASS
=== RUN   TestEnrichChunksWithDocMetadata_DuplicateDocIDs --- PASS
=== RUN   TestEnrichChunksWithDocMetadata_MultipleKBs     --- PASS
=== RUN   TestEnrichChunksWithDocMetadata_WithMetadataFields --- PASS
=== RUN   TestEnrichChunksWithDocMetadata_MixedFields     --- PASS
```

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:42:23 +08:00
Jack
3b1ae3f829 feat: support SelectFields override in DocEngine for KG-specific queries (#15684)
## Summary

Both ES and Infinity engines now respect `SearchRequest.SelectFields`,
allowing callers to specify output columns for KG
entity/relation/community queries instead of the default chunk columns.

### Changes

- **`internal/engine/elasticsearch/chunk.go`**: Added `SelectFields`
override after default `outputColumns`
- **`internal/engine/infinity/chunk.go`**: Added `SelectFields` override
after default `outputColumns`
- **`internal/engine/elasticsearch/kg_test.go`** (new): Integration test
(skipped unless `ES_TEST=1`)

### Usage

```go
result, err := docEngine.Search(ctx, \&types.SearchRequest{
    KbIDs:        kbIDs,
    SelectFields: []string{entity_kwd, entity_type_kwd, rank_flt, n_hop_with_weight},
    Filter:       map[string]interface{}{knowledge_graph_kwd: entity},
})
```

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:41:39 +08:00
Wang Qi
4cbe597d7e Refactor: consolidate to use @login_required (#15652)
Refactor: consolidate to use @login_required
2026-06-05 11:35:00 +08:00
bitloi
9f3e289b78 Fix: preserve markdown tables during delimiter extraction (#15632)
### What problem does this PR solve?

Markdown extraction can split tables row by row when delimiter-based
extraction uses a newline delimiter. That loses table structure during
chunking even though delimiters should still split normally outside
tables.

This PR keeps the follow-up to #15482 intentionally narrow:

- preserve Markdown pipe tables during delimiter-based extraction
- preserve borderless pipe tables during delimiter-based extraction
- preserve multiline HTML tables during delimiter-based extraction
- keep delimiter splitting unchanged outside protected table ranges

Refs #15482

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Testing

- `ruff check deepdoc/parser/markdown_parser.py
test/unit_test/deepdoc/parser/test_markdown_parser.py`
- `python3 run_tests.py -t
test/unit_test/deepdoc/parser/test_markdown_parser.py`
- `git diff --check`
2026-06-05 10:35:33 +08:00
dripsmvcp
431f52a5d4 feat[Go]: implement GET /agents/templates (issue #15240) (#15573)
## Summary

Port the canvas-template catalogue endpoint to the Go API server. Listed
in the Go-API port checklist of #15240.

Mirrors `list_agent_template` in `api/apps/restful_apis/agent_api.py`:
returns every row from the `canvas_template` table so that the UI can
render the template gallery on the New-Agent screen.

## What

- `internal/dao/canvas_template.go` — new `CanvasTemplateDAO.GetAll()`
ordered by `create_time desc` (newest templates first).
- `internal/service/agent.go` — wire the new DAO into `AgentService` and
expose `ListTemplates() ([]*entity.CanvasTemplate, error)`.
- `internal/handler/agent.go` — new `AgentHandler.ListTemplates` HTTP
handler (auth-gated, mirrors Python `@login_required`).
- `internal/router/router.go` — `agents.GET("/templates",
r.agentHandler.ListTemplates)` registered alongside the existing `GET
/agents`.
- `internal/handler/agent_test.go` — three new tests covering: success
path, empty-list → JSON array (not `null`), and the auth gate.

## Notes

- `CanvasTemplate` entity, GORM tags, and DB migration already exist in
`internal/entity/canvas.go` and `internal/dao/database.go` — no schema
change required.
- The handler coerces a `nil` slice to `[]*entity.CanvasTemplate{}` so
the JSON payload is always an array (the frontend does `data.map(...)`
on it).

## Test plan

- [x] `go vet ./internal/handler ./internal/service ./internal/dao
./internal/router` clean
- [x] Three unit tests added; existing `TestListAgents_Success`
untouched
- [ ] CI runs `go test ./internal/handler` with cgo binding linked

## Related

- Tracker: #15240
2026-06-05 10:13:30 +08:00
Jack
a237a89b90 feat: add QueryRewrite prompt builder and response parser (#15669)
QueryRewrite prompt builder and response parser. Zero external
dependencies.

### Functions
- `BuildQueryRewritePrompt`: Renders `minirag_query2kwd` prompt with
query and type pool
- `ParseQueryRewriteResponse`: Parses LLM JSON response with fallback
for markdown and extra text

### Testing
```
=== RUN   TestBuildQueryRewritePrompt             --- PASS
=== RUN   TestParseQueryRewriteResponse_ValidJSON --- PASS
=== RUN   TestParseQueryRewriteResponse_MarkdownBlock --- PASS
=== RUN   TestParseQueryRewriteResponse_ExtraText --- PASS
=== RUN   TestParseQueryRewriteResponse_Invalid   --- PASS
=== RUN   TestParseQueryRewriteResponse_EmptyEntities --- PASS
```

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 10:11:14 +08:00
Jack
bf6c091c9f feat: add KG scoring utilities (#15666)
KG scoring utilities as pure functions.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 10:10:59 +08:00
kpdev
bd49fd70aa fix(api): set SDK document download Content-Type from filename (#15112) (#15113)
## Summary

- Infer `Content-Type` from the stored document filename on SDK download
routes.
- Covers `GET /api/v1/datasets/<dataset_id>/documents/<document_id>` and
`GET /api/v1/documents/<document_id>`.
- Aligns with REST preview/download via `CONTENT_TYPE_MAP`.

## Test plan

- [x] `pytest
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_download_mimetype_from_filename`
- [x] Manual: `curl -sSI` on SDK dataset document download for a PDF;
expect `Content-Type: application/pdf`

Fixes #15112.
2026-06-05 10:08:53 +08:00
Lynn
794c1f4b25 Fix: volc engine and other json key factories (#15653)
### What problem does this PR solve?

Fix:
- VolcEngine adapt to new api_key format
- Save dict api_key as json

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-05 09:45:44 +08:00
He Wang
7789862cc5 fix(docker): mount tmpfs on es01 /tmp for entrypoint permissions (#15655)
### What problem does this PR solve?

On some Linux hosts (e.g. x86_64 with enforced POSIX ACL on overlay
storage), the official `elasticsearch` Docker image cannot start because
`docker-entrypoint.sh` needs to create temporary files under `/tmp` for
bash here-documents, while the image ACL grants `user:elasticsearch`
only `r-x` on `/tmp`:

```
/usr/local/bin/docker-entrypoint.sh: line 73/84: cannot create temp file for here-document: Permission denied
```

RAGFlow users hit this when running `docker compose` with the default
`es01` service. See also Refs #284.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

## Summary

Mount a writable `tmpfs` at `/tmp` for the `es01` service so
Elasticsearch entrypoint scripts can run on ACL-enforced environments.
Closes the startup failure described in #284 for non-ARM deployments.

## Changes

- Add `tmpfs: /tmp:mode=1777,size=512m` to `es01` in
`docker/docker-compose-base.yml`
- Document why the mount is required (ES image `/tmp` ACL vs entrypoint
here-documents)

## Test plan

- [x] Verified on Linux (x86_64): `docker run --rm elasticsearch:8.11.3
bash -c 'mktemp'` fails without tmpfs and succeeds with `--tmpfs
/tmp:mode=1777,size=512m`
- [x] Verified `es01` becomes healthy after `docker compose up -d es01`
with this change
- [ ] Upstream maintainers: `docker compose -f
docker/docker-compose-base.yml --profile elasticsearch up -d es01` on a
host where ACL is enforced


Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-04 23:19:31 +08:00
Jack
eee6ad546f feat: add ResolveReferenceMetadata utility function (#15663)
Add `ResolveReferenceMetadata` to parse `include_metadata` /
`metadata_fields` from request and config payloads.

### Changes
- **New**: `internal/common/reference_metadata.go` — pure function, zero
dependencies
- **New**: `internal/common/reference_metadata_test.go` — 8 test cases

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 22:34:18 +08:00
Jack
96a416629d refactor: change GetFlattedMetaByKBs return type to common.MetaData (#15656)
## Summary

Change `GetFlattedMetaByKBs` return type from `map[string]interface{}`
to strongly-typed `common.MetaData`.

**Depends on**: #15648 (provides `MetaData`, `MetaValueDocs` types)

### Changes
- `service/metadata.go`: Changed return type, removed type assertions
- `service/metadata_filter.go`: Updated all metadata function signatures
- `service/metadata_filter_test.go` (new): 12 test cases

### Bug fix
`applySingleCondition` used `.([]interface{})` assertions on `[]string`
data, silently breaking operators like `!=`, `contains`, `start with`,
etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 22:16:04 +08:00
web-dev0521
98f2a2e60b feat(connectors): add Azure Blob Storage data source connector (#15466)
### What problem does this PR solve?

Closes #15465.

RAGFlow supports S3, Google Cloud Storage, R2, and OCI as data sources
but not Azure Blob Storage, leaving Azure users without a way to index
container objects into a knowledge base. This adds a first-class Azure
Blob Storage data-source connector — distinct from RAGFlow's existing
Azure storage *backends* (`rag/utils/azure_sas_conn.py`,
`rag/utils/azure_spn_conn.py`) which store RAGFlow's own files.

**Highlights**
- `common/data_source/azure_blob_connector.py`: new `AzureBlobConnector`
(`CheckpointedConnectorWithPermSync` + `SlimConnectorWithPermSync`).
- Uses the existing `azure-storage-blob` dependency (already in
`pyproject.toml`).
  - Three auth modes, tried in order of precedence:
1. **Account key** — `account_name` + `account_key` + `container_name`.
    2. **Connection string** — `connection_string` + `container_name`.
3. **SAS token** — `container_url` + `sas_token` (same shape as
`RAGFlowAzureSasBlob`).
- ETag fingerprint stored per blob in `AzureBlobCheckpoint.etags` —
unchanged blobs (same ETag as last run) are skipped without a download.
Only new/modified blobs are fetched.
  - Optional `prefix` scopes indexing to a virtual folder.
- `validate_connector_settings()` probes `get_container_properties()`
and maps `AuthenticationFailed / 403 / ContainerNotFound` to typed
connector exceptions.
  - Slim-doc IDs are blob names so prune reconciles correctly.
- `common/constants.py`, `common/data_source/config.py`,
`common/data_source/__init__.py`: register `azure_blob` in `FileSource`
/ `DocumentSource` and export `AzureBlobConnector`.
- `rag/svr/sync_data_source.py`: new `AzureBlob(SyncBase)` class routed
through `load_from_checkpoint` (ETag fingerprint owns change-detection)
and added to `func_factory`.
- Frontend:
- `web/src/pages/user-setting/data-source/constant/index.tsx`: new
`DataSourceKey.AZURE_BLOB`, auth-mode selector (account key / connection
string / SAS token), all credential fields, prefix + batch-size,
`syncDeletedFiles` capability, default form values, tile entry with
icon.
- `web/src/locales/{en,zh}.ts`: description + per-field tooltips for all
9 new keys.
- `web/src/assets/svg/data-source/azure-blob.svg`: Azure-branded
stacked-cylinders icon.

**Verification**
- `npm run build` (vite + esbuild) passes (37 s).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-04 21:06:01 +08:00
Jack
a78a3fdd47 fix: add nil guard to DocumentDAO.GetByIDs and add tests (#15649)
## Summary

`DocumentDAO.GetByIDs()` generated `WHERE id IN ()` for empty/nil ID
slices, which is invalid SQL and would fail on most databases. This PR
adds a nil guard and comprehensive tests.

### Changes

- **Modified**: `internal/dao/document.go` — Added `len(ids) == 0` guard
to `GetByIDs`
- **New**: `internal/dao/document_test.go` — 4 test cases covering
success, empty IDs, nil IDs, and no-match

### Testing

```
=== RUN   TestDocumentGetByIDs_Success   --- PASS
=== RUN   TestDocumentGetByIDs_EmptyIDs  --- PASS
=== RUN   TestDocumentGetByIDs_NilIDs    --- PASS
=== RUN   TestDocumentGetByIDs_NoMatch   --- PASS
```

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 21:00:02 +08:00
Jack
461c190c49 feat: migrate meta_filter and convert_conditions to Go (#15648)
## Summary

Migrate the metadata filtering utilities `meta_filter` and
`convert_conditions` from `common/metadata_utils.py` to Go as pure
functions with zero external dependencies.

These functions are used by `dify/retrieval`, `openai/chat/completions`,
`document_api`, and `chunk_api` for filtering documents by metadata
conditions.

### Changes

- **New**: `internal/common/metadata_utils.go` — `ConvertConditions()`
and `MetaFilter()` with full operator support
- **New**: `internal/common/metadata_utils_test.go` — 18 test cases
covering all operators and edge cases

### Supported Operators

`=`, `≠`, `>`, `<`, `≥`, `≤`, `contains`, `not contains`, `in`, `not
in`, `start with`, `end with`, `empty`, `not empty`

### Design

- Numeric comparison via `strconv.ParseFloat`
- Date comparison via YYYY-MM-DD format detection
- Case-insensitive string comparison fallback
- `and` / `or` logic support for multiple conditions
- Zero external dependencies — pure functions only
2026-06-04 20:14:27 +08:00
Jack
e627f5d8c5 feat: implement POST /api/v1/searchbots/related_questions API (#15639)
## Summary

Implement the `POST /api/v1/searchbots/related_questions` endpoint in
Go, generating related search questions via LLM.

### Changes

- **New**: `internal/handler/related_questions.go` — Handler with
injectable LLM interface, prompt constant, and response parsing
- **New**: `internal/handler/related_questions_test.go` — 9 tests (4
handler + 5 parse)
- **Modified**: `internal/router/router.go` — Added route +
`RelatedQuestionsHandler` to struct
- **Modified**: `cmd/server_main.go` — Wired handler with
`SearchService` and `ModelProviderService`

### Testing

All 9 tests pass:

```
=== RUN   TestRelatedQuestionsHandler_Success        --- PASS
=== RUN   TestRelatedQuestionsHandler_EmptyResponse  --- PASS
=== RUN   TestRelatedQuestionsHandler_LLMFailure     --- PASS
=== RUN   TestRelatedQuestionsHandler_MissingQuestion --- PASS
=== RUN   TestParseRelatedQuestions_Standard         --- PASS
=== RUN   TestParseRelatedQuestions_Empty            --- PASS
=== RUN   TestParseRelatedQuestions_NoNumberedLines  --- PASS
=== RUN   TestParseRelatedQuestions_MixedContent     --- PASS
=== RUN   TestParseRelatedQuestions_MultiDigit       --- PASS
```

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 19:13:58 +08:00
Jack
6143205b37 feat: implement GET /api/v1/agents/<agent_id>/versions/<version_id> API (#15640)
## Summary

Implement the `GET /api/v1/agents/<agent_id>/versions/<version_id>`
endpoint in Go, returning full version details including DSL.

Depends on #15629 which introduced the version list endpoint and
`UserCanvasVersionDAO` infrastructure.

### Changes

- **Modified**: `internal/handler/agent.go` — Added `GetAgentVersion`
handler with auth check and ownership verification
- **Modified**: `internal/router/router.go` — Registered `GET
/:agent_id/versions/:version_id` route
- **New/Modified tests**: Service and handler tests for the version
detail endpoint

### Testing

```
=== RUN   TestGetVersion_Success       --- PASS
=== RUN   TestGetVersion_WrongCanvas   --- PASS
=== RUN   TestGetVersion_NotFound      --- PASS
=== RUN   TestGetAgentVersionHandler_Success      --- PASS
=== RUN   TestGetAgentVersionHandler_VersionNotFound --- PASS
```

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 19:13:26 +08:00
buua436
423fb6faae fix: duplicate document ingest guard (#15638)
### What problem does this PR solve?
When a document is rerun or updated concurrently, the previous
unconditional update could overwrite a newer task state.
This change adds an `update_time`-based optimistic lock so the update
only succeeds if the record has not been modified by another flow in the
meantime.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 17:57:51 +08:00
Haruko386
baeb0c0431 Refactor[Go Model Provider]: refactor baseURL and modelConfig (#15627)
### What problem does this PR solve?

As Title

### Type of change

- [x] Refactoring
2026-06-04 17:50:22 +08:00
buua436
04dc3bb19c fix: pass search id to searchbots ask (#15646)
### What problem does this PR solve?
This change ensures `/searchbots/ask` receives `search_id` from the
frontend, so the backend can load the matching search configuration when
the shared search flow invokes the endpoint.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 17:41:56 +08:00
Jack
23aae19898 feat: implement POST /api/v1/agents/<agent_id>/upload API (#15633)
## Summary

Implement the `POST /api/v1/agents/<agent_id>/upload` endpoint in Go,
allowing file uploads associated with agent canvases.

### Changes

- **Modified**: `internal/service/agent.go` — Added `CheckCanvasAccess`
method (owner + team-level permission semantics)
- **Modified**: `internal/handler/agent.go` — Added `UploadAgentFile`
handler with auth check, multipart file parsing, and delegation to
`FileService`. Added `fileUploader` interface for testability.
- **Modified**: `internal/router/router.go` — Registered `POST
/:agent_id/upload` route
- **Modified**: `cmd/server_main.go` — Wired `fileService` into
`AgentHandler`
- **New**: `internal/service/agent_test.go` — 4 service-level tests for
`CheckCanvasAccess` (owner, team member, private denial, not found)
- **New**: `internal/handler/agent_upload_test.go` — 3 handler-level
tests (success with fake file service, cross-user denial, empty file
rejection)

### Testing

All 7 tests pass with zero mocking of the DB layer (in-memory SQLite):

```
=== RUN   TestCheckCanvasAccess_Owner               --- PASS
=== RUN   TestCheckCanvasAccess_NotOwner            --- PASS
=== RUN   TestCheckCanvasAccess_PrivateCanvas_Denied --- PASS
=== RUN   TestCheckCanvasAccess_NotFound            --- PASS
=== RUN   TestUploadAgentFileHandler_Success        --- PASS
=== RUN   TestUploadAgentFileHandler_NoPermission   --- PASS
=== RUN   TestUploadAgentFileHandler_NoFiles        --- PASS
```

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:21:47 +08:00
Lynn
b65b18ba4c Fix: model provider (#15634)
### What problem does this PR solve?

Not display `success` when check not passed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 16:05:00 +08:00
Jack
02d163a177 feat: implement GET /api/v1/agents/<agent_id>/versions API (#15629)
## Summary

Implement the `GET /api/v1/agents/<agent_id>/versions` endpoint in Go,
listing all version snapshots for an agent canvas in descending update
time order.

### Changes

- **New**: `internal/dao/user_canvas_version.go` —
`UserCanvasVersionDAO` with `ListByCanvasID` (ordered by update_time
DESC) and `GetByID`
- **Modified**: `internal/service/agent.go` — Added `CheckCanvasAccess`,
`ListVersions`, `GetVersion` methods
- **Modified**: `internal/handler/agent.go` — Added `ListAgentVersions`
handler with auth check
- **Modified**: `internal/router/router.go` — Registered `GET
/:agent_id/versions` route
- **New**: `internal/service/agent_test.go` — 5 service-level tests
(SQLite in-memory DB, zero mock)
- **Modified**: `internal/handler/agent_test.go` — 3 handler-level tests
(real DB, pre-authenticated context)

### Testing

All 8 tests pass with zero mocking (in-memory SQLite replaces MySQL):

```
=== RUN   TestListVersions_Success         --- PASS
=== RUN   TestListVersions_Empty           --- PASS
=== RUN   TestCheckCanvasAccess_Owner      --- PASS
=== RUN   TestCheckCanvasAccess_NotOwner   --- PASS
=== RUN   TestCheckCanvasAccess_NotFound   --- PASS
=== RUN   TestListAgentVersionsHandler_Success      --- PASS
=== RUN   TestListAgentVersionsHandler_NoPermission --- PASS
=== RUN   TestListAgentVersionsHandler_CanvasNotFound --- PASS
```

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 15:36:26 +08:00
Jack
c6eee09ed3 feat: migrate POST /api/v1/datasets/<dataset_id>/documents/stop to Go (#15597)
## Summary

Migrate the stop parse documents endpoint from Python to Go.

### Python endpoint
`POST /api/v1/datasets/<dataset_id>/documents/stop` —
`api/apps/restful_apis/document_api.py:1542-1641`

### Changes
| File | Change |
|------|--------|
| `internal/dao/task.go` | Add `GetByDocID` method |
| `internal/dao/task_test.go` | 3 DAO tests (new file) |
| `internal/service/document.go` | Add `StopParseDocuments` + refactor
shared helpers |
| `internal/service/document_test.go` | 8 service tests |
| `internal/handler/document.go` | Add handler + request struct +
interface |
| `internal/handler/document_test.go` | 5 handler tests |
| `internal/router/router.go` | Add `POST /:dataset_id/documents/stop`
route |

### How it works
1. Validates all document IDs belong to the dataset
2. For each document in RUNNING/CANCEL state (or with unfinished tasks):
- Sets Redis cancel signal `{task_id}-cancel` for each associated task
   - Updates `document.run` to CANCEL ("2")
3. Returns `{"success_count": N, "errors": [...]}`

### Test strategy
- **DAO/Service**: SQLite in-memory DB, zero mocks. Redis is nil-safe by
design.
- **Handler**: `fakeDocumentService` implementing `documentServiceIface`
interface.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-04 14:16:13 +08:00
Yufeng He
5db1b296fb fix: fall back from empty Docling native chunks (#15601)
## Summary
- keep the native Docling chunking path when it returns usable chunks
- fall back to the standard Docling response parser when a chunked
request gets HTTP 200 but returns no usable chunks
- add a regression test for older Docling servers that accept the
chunking request but return a standard conversion payload

## Why
Older external Docling servers can accept a request containing
`do_chunking: true` and still return the standard conversion response
shape. The current code treats any HTTP 200 from the chunked request as
a native chunk response, finds no chunk entries, and returns zero
sections without trying the standard response parser.

Fixes #15569.

## Validation
- `python -m pytest
test\\unit_test\\deepdoc\\parser\\test_docling_parser_remote.py -q`
- `python -m py_compile deepdoc\\parser\\docling_parser.py
test\\unit_test\\deepdoc\\parser\\test_docling_parser_remote.py`
- `python -m ruff check deepdoc\\parser\\docling_parser.py
test\\unit_test\\deepdoc\\parser\\test_docling_parser_remote.py`
- `git diff --check`
2026-06-04 13:42:58 +08:00
bitloi
01a5598aa5 Fix: markdown fenced code block extraction (#15630)
### What problem does this PR solve?

Markdown extraction currently applies custom delimiters before
respecting fenced code blocks. When a delimiter such as a newline is
configured, fenced code can be split into separate chunks, and longer
outer fences can be closed incorrectly by shorter nested fences.

This PR keeps the fix intentionally narrow for the Markdown chunking
discussion in #15482:

- preserve fenced code blocks when delimiter-based extraction is used
- support both backtick and tilde fences
- respect fence length so longer outer fences can contain shorter inner
fences
- keep delimiter splitting unchanged outside fenced blocks

Refs #15482

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Testing

- `ruff check deepdoc/parser/markdown_parser.py
test/unit_test/deepdoc/parser/test_markdown_parser.py`
- `python3 run_tests.py -t
test/unit_test/deepdoc/parser/test_markdown_parser.py`
2026-06-04 13:33:46 +08:00
buua436
c70f19e138 Fix: remove duplicate document preview access check (#15625)
### What problem does this PR solve?

remove duplicate document preview access check

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 13:05:15 +08:00
Lynn
597ac1e900 Fix: search bot and verify model instance (#15588)
### What problem does this PR solve?

Fix:
- Verify provider with empty llm list in llm_factories.json
- Set search bot's chat_llm_name, use tenant default chat model as
default

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 11:59:55 +08:00
buua436
bbacb31226 Fix: think stream tail handling (#15582)
### What problem does this PR solve?

think stream tail handling
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 10:04:35 +08:00
kpdev
d26d799467 fix(api): restore accessible check on document preview (#15505)
Restore `DocumentService.accessible` on `GET
/api/v1/documents/{doc_id}/preview` so cross-tenant users cannot stream
documents by UUID.

Fixes #15501

### What problem does this PR solve?

PR #15146 (`71a52d579`) moved the agent attachment download route and
accidentally removed the `DocumentService.accessible(doc_id,
current_user.id)` guard from the REST preview handler. The endpoint
still requires login, but any authenticated user who knows another
tenant's `doc_id` can download the raw file bytes.

This restores the same authorization check that existed before #15146,
returning a generic `"Document not found!"` when access is denied (no
cross-tenant ID enumeration). SDK download routes tracked in #15125 are
unchanged.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-04 09:59:07 +08:00
dripsmvcp
2196f2260a fix(api): restore DocumentService.accessible check on /preview (#15508)
## Summary
Restore the `DocumentService.accessible(doc_id, current_user.id)` check
that PR #15146 dropped from the REST document preview handler. Any
authenticated caller could download any tenant's document bytes by
guessing/knowing the `doc_id`.

## Root cause
`api/apps/restful_apis/document_api.py` — the `GET
/documents/<doc_id>/preview` handler called `DocumentService.get_by_id`
and went straight to `File2DocumentService.get_storage_address` +
`STORAGE_IMPL.get`, with no tenant check between the lookup and the
read. The handler's docstring even promises "user must belong to the
tenant that owns the document's knowledge base" — the code didn't
enforce it.

## Fix
- Add `current_user` to the existing `api.apps` import.
- Immediately after `get_by_id`, call
`DocumentService.accessible(doc_id, current_user.id)`; on denial, return
the **same** `get_data_error_result(message="Document not found!")`
shape used for the missing-doc branch. That makes a cross-tenant probe
indistinguishable from a missing-doc probe, preventing ID enumeration
(the issue body calls this out explicitly).
- Emit `logging.warning` with caller user + doc_id for audit.
- Restores symmetry with peer routes that already call
`accessible(doc_id, user_id)` (e.g. `_run_sync` at
`document_api.py:1380`).

## Test plan
Adds
`test/unit_test/api/apps/restful_apis/test_document_preview_accessible.py`:

- **`test_cross_tenant_preview_is_denied`** — owner tenant ≠ caller
tenant; asserts the response shape is `Document not found!` and the
storage backend (`thread_pool_exec(STORAGE_IMPL.get, ...)`) is **never**
invoked.
- **`test_missing_doc_returns_not_found`** — missing-doc behaviour
unchanged.

Stub-loader pattern mirrors
`test/unit_test/api/apps/sdk/test_dify_retrieval.py` (added in #15028,
passing in CI).

## Provenance — how this fix was produced

This PR was authored against a small cited knowledge base committed in
the working tree as a `.vouch/` (see
[vouchdev/vouch](https://github.com/vouchdev/vouch)). The loop used
here:

1. **Grounding first.** Before reading the handler, queried the KB for
prior context: `vouch context "tenant scoped accessible authorization"`
→ retrieved a cited claim distilled from PR #15028 (which restored the
same `accessible()` check on `/dify/retrieval`). The retrieved rule:

> *ragflow REST endpoints that load by tenant-scoped id must call
`<Service>.accessible(id, tenant_id)` after `get_by_id` and before
storage/DB read; deny with code 109 'No authorization.' and log a
warning. Established by PR #15028.*

2. **Applied the pattern with a domain refinement.** For an API/JSON
endpoint, `No authorization.` is the right denial shape. For a
**byte-streaming, browser-facing** endpoint like `/preview`, leaking
*existence* itself enables enumeration — so per the issue's expected
behaviour, this PR denies with `Document not found!` (indistinguishable
from missing) instead. Same auth check, narrower response.

3. **Recorded the refinement back into the KB** as a new cited claim, so
the next IDOR-class issue starts already grounded in both the general
pattern and the byte-route nuance.

Net effect of the workflow: the fix replicates a known-good pattern
instead of reinventing it, *and* the place where the pattern was nuanced
is now retrievable for the next pass. Mechanism is fully independent of
this PR — it's not a runtime dependency, just process discipline.

Closes #15501
2026-06-04 09:58:26 +08:00
euvre
9a9d3ddf5f fix: show default embedding model when provider is not yet registered (#15511)
### What problem does this PR solve?

### Problem

On the Model Providers page, the Embedding Model dropdown in System
Model Settings shows empty (no default selected), even though a default
embedding model is configured in `service_conf.yaml`.

### Root Cause

Two issues were identified:

1. **Backend: `_get_model_info` fails for unregistered providers**
The tenant's `embd_id` is set to `bge-m3@xxxx` during initialization
(from the placeholder config `factory: 'xxxx'`). The `_get_model_info`
function requires the provider to exist in `tenant_model_provider`
table, but `xxxx` is never a real provider. Even after the user adds a
real provider (e.g., ZHIPU-AI), the stale `embd_id` still references the
non-existent one, causing the function to return `None`.

2. **Frontend: default models cache not invalidated after adding
provider**
`useAddProviderInstance` only invalidates `addedProviders` and
`allModels` caches after adding a provider instance, but does **not**
invalidate the `defaultModels` cache. This means the default model list
is not re-fetched until the user manually refreshes the page.

### Fix

**`api/apps/services/models_api_service.py`**

- Added `_resolve_model_from_tenant_providers()` helper: when the
default model's provider doesn't exist (e.g., placeholder `xxxx`), it
searches through the tenant's actually registered providers for a model
of the same type and returns the first match.
- When an instance name doesn't match (e.g., `"default"` vs actual name
`"1"`), the function now auto-resolves to the first real instance under
that provider.
- Falls back to `FACTORY_LLM_INFOS` validation when neither provider nor
instance exists.

**`web/src/hooks/use-llm-request.tsx`**

- Added `queryClient.invalidateQueries({ queryKey:
LlmKeys.defaultModels() })` to `useAddProviderInstance` so that the
default model list is re-fetched immediately after a provider instance
is added, eliminating the need for a manual page refresh.

### Testing

- Verified with a tenant whose `embd_id=bge-m3@xxxx` and only provider
is ZHIPU-AI (instance `1`): `_resolve_model_from_tenant_providers`
correctly resolves to `embedding-2@1@ZHIPU-AI`.
- After adding a provider via the UI, the embedding model dropdown now
immediately shows the resolved default without requiring a page refresh.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-06-04 09:55:49 +08:00