ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Danut Matei	e2b0da9eea	fix(opensearch): keep the BM25 leg in hybrid search (#15760 ) ### What problem does this PR solve? Fixes the OpenSearch side of #10747: hybrid search drops the keyword (BM25) leg and ends up doing plain vector search. When a search has both a text and a vector leg, `OSConnection.search()` throws the text query away: del q["query"] q["query"] = {"knn": knn_query} The text clause only stays on as a filter inside the knn query, so it narrows the candidate set but doesn't count towards scoring. So hybrid search on OpenSearch behaves like plain vector search, unlike the Elasticsearch backend. What I changed: - when both legs are present, send a real hybrid query `{"hybrid": {"queries": [bm25, {"knn": ...}]}}` and let a normalization-processor search pipeline score and combine the two legs - only the actual filters (kb_id, available_int, ...) go in the knn filter, not the text must clause - create the pipeline on startup if it's missing, so there's no separate provisioning step. name and weights can be set under `os:` in service_conf.yaml, or via `OS_HYBRID_PIPELINE`; defaults are `ragflow_hybrid_pipeline` and `[0.5, 0.5]` - normalization-processor needs OpenSearch 2.10+. on older clusters, or when the pipeline can't be created, log a warning and fall back to vector-only instead of pointing at a pipeline that doesn't exist This is only the hybrid-search fix; `create_doc_meta_idx` is already on main. Testing (there's no OpenSearch path in CI): added a unit test (`test/unit_test/rag/utils/test_opensearch_hybrid_search.py`, no services needed) that checks the query built in each case — hybrid + pipeline param for text+vector, plain knn for vector-only, plain bool for text-only, the knn filter never carrying the text query_string, and the vector-only fallback when the pipeline isn't available. Also ran it against a real OpenSearch 2.19.1 container with a doc that matches the keyword but sits outside the knn top-k: pure knn returns `['D1','D2','D5']` (keyword doc missing), the hybrid query returns `['A','D1','D2','D5']` (keyword doc present). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Danut Matei <matei.danut.dm@gmail.com>	2026-06-08 16:17:47 +08:00
Jack	8f4809d1b5	feat: implement POST /api/v1/searchbots/retrieval_test (#15710 ) ## What problem does this PR solve? Implements `POST /api/v1/searchbots/retrieval_test` in the Go API server, aligning with the Python `bot_api.py` counterpart. Also applies security hardening and consistency fixes discovered during CTO-level code review: - Missing endpoint: `retrieval_test` was not available in Go, requiring Python fallback - Security: Both `chunkHandler` and `searchBotHandler` leaked `err.Error()` to API consumers - Python alignment: Default values, empty question handling, and `top_k <= 0` validation differed from Python behavior - Test gaps: `chunkHandler.RetrievalTest` had zero unit tests; several edge cases uncovered ## Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring ## Summary ### New Endpoint - `POST /api/v1/searchbots/retrieval_test` — retrieval test with full field support (page, size, top_k, use_kg, cross_languages, keyword, similarity_threshold, vector_similarity_weight) ### New Type - `common.StringSlice` — JSON type that accepts both `"kb1"` and `["kb1", "kb2"]`, matching Python API flexibility ### Security - Both `searchBotHandler` and `chunkHandler` now use `common.Warn()` + generic error messages instead of leaking `err.Error()` to API consumers - All error responses include consistent `"data": nil` shape - `chunkHandler.RetrievalTest` uses interface-based DI (`chunkService`) to enable testability ### Python Alignment - Handler-level defaults align with Python `bot_api.py` (page=1, size=30, top_k=1024, similarity_threshold=0.0, vector_similarity_weight=0.3) - `top_k <= 0` validation matching Python behavior - Empty/whitespace question returns 200 + empty result (matches `chunk_api.py`) - `chunkHandler` `Datasets` field uses `common.StringSlice` for string-or-array flexibility ### Refactoring - `ChunkServiceIface` → `ChunkRetriever`, `chunkSvcIface` → `chunkService` (Go-conventional naming) - Extracted `applyRetrievalDefaults`, `toRetrievalServiceRequest` from handler body - Regex moved to package-level var in `parseRelatedQuestions` - `service.RetrievalTestRequest.Datasets` type changed to `common.StringSlice` - `chunkHandler` now uses consumer-side interface for DI ### Tests - 37 unit tests across both handlers: auth, validation, defaults, StringSlice edge cases, empty/whitespace KbID, service errors, JSON format, `top_k <= 0`, field mapping verification ## Files Changed \| File \| Change \| \|------\|--------\| \| `cmd/server_main.go` \| Wire new handler + chunkService + difyRetrievalHandler \| \| `internal/common/json_types.go` \| New StringSlice type \| \| `internal/common/json_types_test.go` \| StringSlice tests \| \| `internal/handler/chunk.go` \| Interface-based DI, security, Python alignment, defaults \| \| `internal/handler/chunk_test.go` \| New — 9 comprehensive tests \| \| `internal/handler/searchbot.go` \| New endpoint + refactoring + `top_k <= 0` validation \| \| `internal/handler/searchbot_test.go` \| 18 tests covering all edge cases \| \| `internal/router/router.go` \| Register new route + difyRetrievalHandler \| \| `internal/service/chunk.go` \| Datasets type → StringSlice, Question binding relaxed \| 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 16:16:56 +08:00
balibabu	9c32b73cf7	Fix: The embedded website floating component on the agent page does not display citations. (#15767 ) ### What problem does this PR solve? Fix: The embedded website floating component on the agent page does not display citations. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 16:09:36 +08:00
buua436	e81bca73d5	fix: normalize agent session chunks (#15756 ) ### What problem does this PR solve? Normalize agent session chunk references so they are mapped through a dedicated helper instead of duplicating the field extraction inline. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 15:29:55 +08:00
qinling0210	5e0a7ce408	Update Rerank logic in GO (#15755 ) ### What problem does this PR solve? Sync the rerank logic in the following PR to GO. https://github.com/infiniflow/ragflow/pull/15429 https://github.com/infiniflow/ragflow/pull/15434 ### Type of change - [x] Refactoring	2026-06-08 15:28:10 +08:00
buua436	6bf7056422	feat: add placeholder model metas (#15753 ) ### What problem does this PR solve? add placeholder model metas ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 14:54:59 +08:00
balibabu	c935f305e2	Fix: The time zone is not displayed on the personal profile page. (#15759 ) ### What problem does this PR solve? Fix: The time zone is not displayed on the personal profile page. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 14:33:52 +08:00
bitloi	220ee9dbfb	fix: normalize reasoning model families (#15612 ) ### What problem does this PR solve? Closes #15611. RAGFlow's fallback reasoning parser only recognized the exact model family `qwen3`. For provider-prefixed Qwen model names such as SiliconFlow's `qwen/qwen3-8b`, the derived model class can be `qwen/qwen3`, so inline `<think>...</think>` content was not split from the visible answer when `reasoning_content` was absent. This PR normalizes model-family detection before fallback reasoning extraction, keeps the parser nil-safe, and adds focused tests for Qwen3 variants plus Gitee and SiliconFlow chat responses. It also makes SiliconFlow propagate `ChatConfig.Thinking` into the chat request body, matching the existing Gitee behavior, so Qwen thinking mode is actually enabled when requested. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring ### Validation - `/root/go/bin/gofmt -l internal/entity/models/common.go internal/entity/models/common_test.go internal/entity/models/reasoning_family_provider_test.go internal/entity/models/siliconflow.go` - `git diff --check` - `/root/go/bin/go test ./internal/entity/models -run 'Test(NormalizeModelFamily\|GetThinkingAndAnswer\|GiteeChatExtractsQwenThinkingFromInlineContent\|SiliconflowChatExtractsProviderPrefixedQwenThinkingFromInlineContent)' -vet=off -count=1` Note: the full package command `/root/go/bin/go test ./internal/entity/models -vet=off -count=1` now runs locally, but it currently fails on an unrelated existing `TestAstraflowEmbedReturnsNoSuchMethod` panic in `internal/entity/models/astraflow.go:482`.	2026-06-08 13:32:52 +08:00
oktofeesh	b1a2210d06	fix(go-models): increase JieKouAI SSE scanner buffer (#15737 ) ## Summary - Raise the JieKouAI streaming SSE scanner buffer to handle larger data chunks without truncation.	2026-06-08 13:10:10 +08:00
tmimmanuel	5e25e2600b	Go: implement Xiaomi chat provider (#15626 ) ### What problem does this PR solve? Implements the Xiaomi MiMo chat provider for the Go model provider layer. Reference issue: #14736 Official docs used: - Xiaomi MiMo OpenAI-compatible chat API: https://platform.xiaomimimo.com/docs/en-US/api/chat/openai-api - Xiaomi MiMo model and rate limits: https://platform.xiaomimimo.com/docs/en-US/quick-start/model - Xiaomi MiMo model hyperparameters: https://platform.xiaomimimo.com/docs/en-US/quick-start/model-hyperparameters	2026-06-08 13:09:36 +08:00
cleanjunc	38f9ea5fec	fix(rerank): normalize reranker scores onto a single scale before hybrid blend (#15429 ) ### What problem does this PR solve? Closes #15428 The hybrid score in `rag/nlp/search.py` (`rerank_by_model`) blends reranker similarity with token similarity on a fixed `[0, 1]` scale: ```python return tkweight * np.array(tksim) + vtweight * vtsim + rank_fea # tkweight=0.3, vtweight=0.7 ``` The reranker implementations did not agree on that scale. Only three of roughly 17 providers normalized their output, and `NvidiaRerank` returned raw, unbounded logits. Weighted at `0.7`, a negative logit could push a genuinely relevant chunk below pure keyword matches, and its magnitude swamped `tksim`, which lives in `[0, 1]`. The practical effect was that the same query produced differently scaled scores depending on the configured reranker, and logit based providers degraded retrieval quality instead of improving it. This PR enforces a single scoring contract in one place: - `Base.similarity` is now the only public entry point. It short-circuits empty input and guarantees a normalized result. Each provider implements its raw scoring in `_compute_rank`, which removes sixteen duplicated empty input guards and the three scattered normalization calls. - Normalization is range aware. Providers that already return calibrated `[0, 1]` relevance scores (Cohere, Jina, Voyage, and others) keep their absolute magnitudes, so `similarity_threshold` filtering and the reported `vector_similarity` stay meaningful. Only out-of-range output such as NVIDIA logits is min-max rescaled into `[0, 1]`. - The twelve leftover `[DEBUG ...]` prints in `rerank_by_model`, introduced in #14231, are removed. They ran on every retrieval, added per chunk overhead, and leaked queries, keywords, and document content to stdout and logs. A new regression suite in `test/unit_test/rag/llm/test_rerank_normalization.py` covers logit rescaling (positive, negative, and flat batches), preservation of already calibrated scores, ordering, empty input handling, and the per provider HTTP path. It also asserts that no provider overrides `similarity()`, so the contract cannot silently drift. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 11:53:22 +08:00
dripsmvcp	3d7adf2193	feat[Go]: implement GET /plugin/tools (issue #15240 ) (#15570 ) ## Summary Port the Python `GET /v1/plugin/tools` endpoint to the Go API server. Listed in the Go-API port checklist of #15240. Returns the metadata of every embedded LLM tool plugin in the same JSON shape the Python endpoint emits (camelCase keys preserved), so existing frontends bind to the Go server without changes.	2026-06-08 11:53:19 +08:00
cleanjunc	91983106f2	fix(retrieval): keep rerank window aligned to page_size for deep pagination (#15434 ) ### What problem does this PR solve? Closes #15433 Reranked retrieval drops results and returns short pages once pagination crosses the first candidate block, for the common page sizes 10 and 30. In `rag/nlp/search.py`, the candidate window (`RERANK_LIMIT`) is rounded up to a multiple of `page_size` to keep block based pagination aligned, and then clamped back to 64: ```python RERANK_LIMIT = math.ceil(64 / page_size) * page_size if page_size > 1 else 1 # e.g. 70 for page_size=10 RERANK_LIMIT = max(30, RERANK_LIMIT) if rerank_mdl and top > 0: RERANK_LIMIT = min(RERANK_LIMIT, top, 64) # clamps back to 64, breaking the multiple ``` `RERANK_LIMIT` is used both as the backend block size (`page = global_offset // RERANK_LIMIT`) and as the modulus that slices a page out of a reranked block (`begin = global_offset % RERANK_LIMIT`). When it stops being a multiple of `page_size`, the block that gets fetched and the slice taken from it no longer agree. With `page_size=10` and `top=1024`, page 7 returns only 4 of 10 results and the head of the next block is never shown on any page. This happens whenever the result set spans more than one block, which is the default. Fix The window math is moved into a small reusable helper, `Dealer._rerank_window`, which: - targets a pool of about 64 candidates, - bounds it by `top` when a reranker is active, and - always rounds to a whole number of pages, so the window stays an exact multiple of `page_size`. The call site becomes a single line, and the alignment invariant now lives in one documented place. Behavior is unchanged on every path that was already aligned (the non reranked path and any `top` that already produced a page multiple). Verification A simulation of the full retrieval path (per block rerank, similarity threshold filter, and the exact `page // window` and `offset % window` math) confirms the fix loses nothing where the old code lost real results: ``` ps=10 top=1024: new window=70 dropped_valid=0 \| old window=64 dropped_valid=16 ps=30 top=1024: new window=90 dropped_valid=0 \| old window=64 dropped_valid=66 ``` New unit tests in `test/unit_test/rag/test_search_pagination.py` cover the alignment invariant, cross block pagination (every candidate surfaced once, in order, no gaps, no short interior pages), the reported regression, and parity with the old window on the previously correct paths. All 114 cases pass and `ruff check` is clean. Fixes the reranked deep pagination data loss described above. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 11:53:12 +08:00
qinling0210	c960dc2a4c	Refine handling of POST /api/v1/datasets/search in GO (#15583 ) ### What problem does this PR solve? Refine handling of POST /api/v1/datasets/search in GO ### Type of change - [x] Refactoring	2026-06-08 11:49:37 +08:00
Hz_	074c331cdf	fix(go-api): sync document handler interface and enforce preview acce… (#15688 ) ### Description This PR syncs the `documentServiceIface` interface and introduces handler methods for document preview, artifact fetching, and downloading in the Go API. It also ensures that strict dataset alignment and access checks are enforced when retrieving or downloading documents. Furthermore, this PR introduces comprehensive unit tests for both the newly added Handler and Service methods to ensure robustness and prevent future regressions. ### Key Changes * Router & Handler Integration: * Added and wired new API endpoints in `internal/router/router.go`. * Synchronized the `documentServiceIface` with `GetDocumentArtifact`, `GetDocumentPreview`, and `DownloadDocument`. * Implemented handlers for these endpoints in `internal/handler/document.go`. * Access & Validation Enforcement: * Refactored `internal/service/document.go` to strictly check if a document belongs to the requested dataset before allowing downloads or previews. * Added robust artifact file sanitization (`sanitizeArtifactFilename`) and attachment handling (`shouldForceArtifactAttachment`). * Comprehensive Unit Testing: * Handler Layer (`internal/handler/document_test.go`): Added mock service implementations and Gin router tests covering success, not-found, and internal error states for all 3 new endpoints. * Service Layer (`internal/service/document_test.go`): Added extensive business logic tests including dataset mismatch checks, non-existent document checks, and artifact file validation.	2026-06-08 11:37:06 +08:00
Lynn	b05d5a5228	Feat: get model list from remote (#15711 ) ### What problem does this PR solve? Feat： - Get model list from remote provider. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-08 11:02:40 +08:00
kpdev	b0a45809ff	fix(onedrive): normalize folder_path for Graph delta URL (#15503 ) Prepend a leading slash and reject `..` segments so scoped OneDrive delta queries use `root:/path:/delta` instead of `root:path:/delta`. Fixes #15500 ### What problem does this PR solve? The OneDrive connector builds Microsoft Graph delta URLs from optional `config.folder_path`. When users enter a path without a leading slash (e.g. `Documents/Reports` instead of `/Documents/Reports`), the connector produces a malformed URL such as `root:Documents/Reports:/delta`. Per [Microsoft Graph path-based addressing](https://learn.microsoft.com/en-us/graph/onedrive-addressing-driveitems), the segment after `root:` must start with `/` (e.g. `root:/Documents/Reports:/delta`). Sync and validation then fail or return no documents, which is hard to diagnose from the UI because the optional folder field does not enforce the format. This PR normalizes `folder_path` at connector construction time (prepend `/`, trim whitespace and trailing slashes) and rejects `..` segments before any Graph request is made. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 09:56:47 +08:00
Jack	5a04ac0864	feat: Dify-compatible retrieval API endpoint (#15704 ) ## Summary Dify-compatible retrieval API for external knowledge base integration. ## Changes - New handler: DifyRetrievalHandler with POST/GET /api/v1/dify/retrieval - Health check: GET /api/v1/dify/retrieval/health - Full pipeline: KB validation -> permission check -> embedding -> metadata filter -> chunk retrieval -> child chunk aggregation -> optional KG search -> response assembly - 12 tests covering all paths (success, errors, metadata filter, KG mode) - Testability: Handler dependencies defined as interfaces (KBServiceIface, ModelServiceIface, etc.) ## Files \| File \| Type \| \|------\|------\| \| internal/handler/dify_retrieval_handler.go \| New — handler + interfaces \| \| internal/handler/dify_retrieval_handler_test.go \| New — 12 tests \| \| internal/router/router.go \| Modified — route registration \| \| cmd/server_main.go \| Modified — handler wiring \| \| internal/service/kg/pipeline.go \| Modified — SetChatModel/SetEmbModel \| \| internal/service/kg/retrieval.go \| New — helper functions \| \| internal/service/kg/scoring.go \| Moved from service package \| \| internal/service/kg/search.go \| New — KG search functions \| \| internal/service/kg/types.go \| New — type definitions \| --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 21:16:25 +08:00
Hz_	1deb1313d2	feat(go-cli): support batch model add/remove and optional embedding dimension (#15631 ) ## Summary This PR improves the Go CLI in two areas: 1. It adds batch model management support, allowing multiple models to be added or removed in a single command. 2. It makes the `dimension` argument optional for the `embed text` command. These changes keep the existing single-model and explicit-dimension behaviors compatible while making the CLI more convenient for common workflows. ## What Changed ### 1. Batch model add/remove support The CLI now supports operating on multiple model names provided in a single quoted string. Supported commands include: ``` add model 'x1 x2 x3' to provider 'vllm' instance 'test' with tokens 1024 chat think vision, token 2048 chat, token 1024 think vision; drop model 'x1 x2 x3' from 'vllm' 'test'; remove model 'x1 x2 x3' from 'vllm' 'test'; ``` For add model, each config segment after with is matched to the corresponding model name by position. Example mapping: - x1 -> tokens 1024, chat + vision, thinking=true - x2 -> tokens 2048, chat - x3 -> tokens 1024, vision, thinking=true The existing single-model syntax remains supported. ### 2. Optional embedding dimension Previously, the Go CLI required dimension to be explicitly provided for embed text. Before: embed text 'what is rag' 'who are you' with 'model@test@provider' dimension 8192; Now both forms are supported: embed text 'what is rag' 'who are you' with 'model@test@provider' dimension 8192; embed text 'what is rag' 'who are you' with 'model@test@provider'; When omitted, the CLI leaves dimension unset and relies on provider/backend behavior. ## Tests Added parser tests covering: - Multiple models with multiple config segments - Model type deduplication - Model/config count mismatch - Drop/remove multiple models - Optional embedding dimension parsing	2026-06-05 19:31:06 +08:00
balibabu	9c14e3f377	Fix: When adding a chat in the main interface, a warning will automatically pop up (#15685 ) ### What problem does this PR solve? Fix: When adding a chat in the main interface, a warning will automatically pop up (even if embedding and LLM model have already been configured). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-05 19:09:22 +08:00
Jack	ea79d65d08	feat: add KGSearchRetrieval for full KG pipeline (N-hop, scoring, query_rewrite, community) (#15690 ) ## Summary `KGSearchRetrieval` composes entity search, type search, relation search, N-hop analysis, score fusion, LLM-based query\_rewrite, and community reports into a single synthetic chunk for KG-enhanced retrieval. ### Components \| Component \| Source \| Status \| \|-----------\|--------\|--------\| \| Entity/relation/community search \| Direct `DocEngine.Search` calls \| ✅ \| \| N-hop analysis + score fusion \| `common.AnalyzeNHopPaths` / `DoubleHitBoost` / `FuseRelationScores` \| ✅ #15666 \| \| Query rewrite prompt + parser \| `common.BuildQueryRewritePrompt` / `ParseQueryRewriteResponse` \| ✅ #15669 \| \| Token budget \| `common.BuildKGContent` + `NumTokensFromString` \| ✅ #15666 \| \| LLM query rewrite integration \| `queryRewrite` function with fallback \| ✅ \| ### Testing 11 tests (pure function + mock engine): ``` === RUN TestKgEntityFromChunk_Basic --- PASS === RUN TestKgEntityFromChunk_ScoreFallback --- PASS === RUN TestKgEntityFromChunk_MissingFields --- PASS === RUN TestKgRelationFromChunk_Basic --- PASS === RUN TestKgRelationFromChunk_MissingFrom --- PASS === RUN TestSearchKGTypeSamples_Success --- PASS === RUN TestSearchKGTypeSamples_Empty --- PASS === RUN TestKGSearchRetrieval_Basic --- PASS === RUN TestKGSearchRetrieval_NoEntities --- PASS === RUN TestQueryRewrite_Fallback --- PASS === RUN TestQueryRewrite_EmptyQuestion --- PASS ``` --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 18:00:27 +08:00
Wang Qi	aa9545e4c9	Revert "fix: duplicate document ingest guard" (#15707 ) Reverts infiniflow/ragflow#15638	2026-06-05 17:45:29 +08:00
Wang Qi	214ee319f8	Revert "fix(api): authorize owner_ids for list chats and search apps (#14775 ) (#15698 ) This reverts PR #14775 commit `5a5e766386`.	2026-06-05 17:26:02 +08:00
Yufeng He	6cba5a544a	fix(agent): skip empty switch conditions (#15691 ) ## What - make `Switch` ignore conditions that have no evaluable items - add a regression for blank `cpn_id` items falling through to the else branch - keep the existing non-empty `and` condition behavior covered Fixes #15643. ## Verified - `python -m py_compile agent\component\switch.py test\unit_test\agent\component\test_switch.py` - `python -m pytest test\unit_test\agent\component\test_switch.py -q` -> `2 passed` - `python -m ruff check agent\component\switch.py test\unit_test\agent\component\test_switch.py` - `git diff --check` I also checked `python -m ruff format --check` on the touched files. It would reformat pre-existing style in `agent/component/switch.py` beyond this bug fix, so I kept the patch scoped instead of reformatting the whole file.	2026-06-05 17:20:44 +08:00
Liu An	aab01af6f2	fix: Update Dockerfile and release workflow to use GitHub mirror instead of Gitee (#15700 ) ### What problem does this PR solve? Update Dockerfile and release workflow to use GitHub mirror instead of Gitee ### Type of change - [x] Other (please describe): CI	2026-06-05 16:10:52 +08:00
tmimmanuel	f78ef328bb	Go: implement Bedrock embeddings (#15543 ) ### What problem does this PR solve? Fixes #15542. AWS Bedrock support for the Go model provider layer was added in #15166, but embedding support was intentionally left out of scope and `BedrockModel.Embed(...)` still returned the `no such method` sentinel. This PR implements Bedrock text embeddings under the umbrella provider tracker #14736. ### What this PR includes - `internal/entity/models/bedrock.go`: implement `BedrockModel.Embed(...)` through Bedrock Runtime `InvokeModel` with existing SigV4 auth, region resolution, and runtime URL helpers. - Titan embeddings: supports `amazon.titan-embed-text-v1` and `amazon.titan-embed-text-v2:0`; v2 forwards `EmbeddingConfig.Dimension` as `dimensions` when provided, while v1 keeps the payload minimal. - Cohere embeddings: supports `cohere.embed-english-v3`, `cohere.embed-multilingual-v3`, and `cohere.embed-v4:0`; batches input texts and maps returned vectors to RAGFlow `EmbeddingData` in input order. - `conf/models/bedrock.json`: adds the `embedding` URL suffix (`invoke`) and Bedrock embedding model entries. - `internal/entity/models/bedrock_test.go`: adds unit tests for Titan, Cohere, typed Cohere responses, validation, empty input, unsupported models, and HTTP error propagation. Reference docs: - Bedrock InvokeModel API: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html - Titan Text Embeddings: https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html - Cohere Embed models on Bedrock: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - [x] `jq empty conf/models/bedrock.json` - [x] `git diff --check` - [x] `go test ./internal/entity/models/... -run Bedrock -count=1` - [x] `go test ./internal/entity/models/... -run '^$' -count=1` - [x] `go test ./internal/entity/models/... -run Bedrock -race -count=1` Note: `go test ./internal/entity/models/... -count=1` currently fails in unrelated existing Astraflow coverage (`TestAstraflowEmbedReturnsNoSuchMethod` panics in `internal/entity/models/astraflow.go`). The Bedrock-specific tests and compile-only package check pass.	2026-06-05 13:26:32 +08:00
web-dev0521	b8db200757	feat(go-api): implement MCP server management endpoints (#15281 ) ## Summary Ports the MCP (Model Context Protocol) server management endpoints that power `web/src/pages/user-setting/mcp/` from Python (`api/apps/restful_apis/mcp_api.py`) to Go. There were no MCP routes in the Go server before this change. Closes #15275 (subtask of #15240). ## Endpoints implemented (base path `/api/v1`) \| Method \| Path \| Description \| \|--------\|------\|-------------\| \| GET \| `/mcp/servers` \| List tenant servers (keyword / order / pagination) \| \| POST \| `/mcp/servers` \| Create a server \| \| GET \| `/mcp/servers/{mcp_id}` \| Get one (`?mode=download` exports config) \| \| PUT \| `/mcp/servers/{mcp_id}` \| Update a server \| \| DELETE \| `/mcp/servers/{mcp_id}` \| Delete a server \| \| POST \| `/mcp/import` \| Bulk import from JSON config \| \| POST \| `/mcp/servers/{mcp_id}/test` \| Connect + list tools (see notes) \| ## Implementation Follows the existing `handler → service → dao` layering (per PR #14790): - entity (`internal/entity/mcp.go`): added `MCPServerType` constants and `IsValidMCPServerType` over the existing `MCPServer` model. - dao (`internal/dao/mcp.go`): new `MCPServerDAO` with tenant-scoped CRUD, a keyword filter, and a whitelisted order-column map (guards against SQL injection via the caller-supplied `orderby`). - service (`internal/service/mcp.go`): new `MCPService` — list/get/export/create/update/delete/import/test — mirroring `MCPServerService` and the `mcp_api` request validation, with sentinel errors for clean code mapping. - handler (`internal/handler/mcp.go`): new `MCPHandler` with the seven handlers and Python-compatible response codes. - router / server_main: registered the `/mcp` group and wired the handler. ## Deviations from Python (documented in code) 1. Bulk import is at `POST /mcp/import`, not `/mcp/servers/import`. gin (v1.9.1) cannot register a static segment and a path param at the same tree node, so `/mcp/servers/import` would collide with `/mcp/servers/:mcp_id` and panic at startup. The frontend should call `/mcp/import`. 2. No live tool discovery on create/update/import. The Python path runs `get_mcp_tools` over SSE / streamable-HTTP and stores `variables.tools`. The Go server has no MCP client yet, so these persist `variables`/`headers` but leave `variables.tools` unpopulated. 3. `/test` returns a data error (`ErrMCPTestUnsupported`) until a Go MCP client lands. Per the issue, the live-connection path is scoped as a follow-up; the handler still validates `url` + `server_type`. ## Testing - Added `internal/service/mcp_test.go` covering `IsValidMCPServerType` and the `TestServer` validation/short-circuit paths (no DB required). - No Go toolchain was available in the dev environment, so `go build ./...` / `go vet ./...` verification is left to CI. ## Follow-ups - Go MCP client (SSE / streamable-HTTP) to enable live tool discovery and the real `/test` behavior. - Reconcile the `/mcp/import` vs `/mcp/servers/import` path with the frontend. ---------	2026-06-05 13:25:09 +08:00
web-dev0521	1d7e45115b	feat(connectors): add Salesforce CRM data source connector (#15462 ) ### What problem does this PR solve? Closes #15461. RAGFlow had no way to ingest Salesforce CRM data, so support / sales teams couldn't ground responses on live Accounts, Contacts, Opportunities, Cases, or Knowledge articles. This adds a first-class Salesforce data source connector that authenticates against a Connected App via OAuth 2.0 client-credentials, queries selected SObjects via SOQL, and turns each record into an indexable document with incremental sync. Highlights - `common/data_source/salesforce_connector.py`: new `SalesforceConnector` (`CheckpointedConnectorWithPermSync` + `SlimConnectorWithPermSync`). - OAuth 2.0 client-credentials flow; canonical `instance_url` from the token response so multi-pod orgs route correctly. - Per-object `SystemModstamp` cursor stored in `SalesforceCheckpoint.cursors` — a failure mid-object doesn't rewind sibling objects, and re-syncs only fetch changed rows. - Deterministic record-to-text formatter (sorted keys) so SOQL field reordering on the server doesn't mark every row "changed" on each poll. - `_get_json` raises on non-2xx so 429 / 5xx never silently advance the checkpoint past missing data. - `Knowledge__kav` is in the default object set but is skipped silently when the org doesn't have Salesforce Knowledge enabled (404 on describe). - Slim-doc IDs are scoped as `<Object>/<Id>` so prune deletes can't collide across object types. - `common/constants.py`, `common/data_source/config.py`, `common/data_source/__init__.py`: register `salesforce` in `FileSource` / `DocumentSource` and export `SalesforceConnector`. - `rag/svr/sync_data_source.py`: new `Salesforce(SyncBase)` class routed through `load_from_checkpoint` (poll_source would re-walk every object each run) and added to `func_factory`. - Frontend: - `web/src/pages/user-setting/data-source/constant/index.tsx`: new `DataSourceKey.SALESFORCE`, form fields (instance URL, client ID/secret, objects, api_version, batch size), `syncDeletedFiles` capability, default form values, and tile entry with the new icon. - `web/src/locales/{en,zh}.ts`: description + per-field tooltips. - `web/src/assets/svg/data-source/salesforce.svg`: 48x48 brand-style icon to match the other Microsoft / cloud tiles. Verification - `npm run build` (vite + esbuild) passes (1m 26s). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-05 13:24:36 +08:00
Jack	e629c0203b	feat: add KG entity/relation/community search functions (#15689 ) ## Summary Knowledge Graph search functions for entity, relation, community report, and type-samples retrieval. Uses DocEngine.SelectFields (PR #15684) for KG-specific fields. ### Functions \| Function \| Description \| \|----------\|-------------\| \| `SearchKGEntities` \| Hybrid search over KG entities (dense + text + fusion) \| \| `SearchKGEntitiesByTypes` \| Entity search filtered by `entity_type_kwd` \| \| `SearchKGRelations` \| Hybrid search over KG relations \| \| `SearchKGCommunityReports` \| Community report search by entity names \| \| `SearchKGTypeSamples` \| Type→entities mapping for query_rewrite \| ### Internal helpers \| Helper \| Description \| \|--------\|-------------\| \| `buildHybridExpr` \| Shared dense+text+fusion expression construction \| \| `buildKGDenseExpr` \| Wraps `Embed()` call for vector search \| \| `Parse*` \| Convert raw chunks to typed structs \| ### Testing 35 tests (pure function + mock integration) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 13:23:04 +08:00
Haruko386	4b2af1347c	feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641 ) feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641)	2026-06-05 13:22:23 +08:00
buua436	71649db3b0	fix: prevent duplicated post-think text (#15651 ) ### What problem does this PR solve? This fixes duplicated post-think text in streamed chat responses. When the model emits text immediately after `</think>`, the stream state now advances its cursor correctly so the same visible prefix is not emitted twice. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-05 13:21:26 +08:00
Jack	f6ff862a24	fix: restore case-insensitive contains/not contains/not in and consolidate metadata filter pipeline (#15686 ) ## Summary This PR fixes case-sensitivity regressions introduced in #15656 and consolidates the metadata filtering pipeline by removing the duplicate `applySingleCondition` adapter layer. ### Bug fixes 1. contains / not contains: restored case-insensitive matching (was lost when `applySingleCondition` was replaced by `common.MetaFilter.matchValue` which lacked `strings.ToLower`) 2. not in: restored case-insensitive matching (was lost for same reason; uses `strings.EqualFold`) 3. != with date filter values: non-date metadata values now correctly match the `≠` operator (a non-date value IS not equal to any date, but was returning false) ### Architecture 4. Removed `applySingleCondition` (65 lines) — the inline switch was a duplicate of `common.MetaFilter` logic. `ApplyMetaFilter` now converts conditions and delegates to `common.MetaFilter` once per filter set, eliminating ~25 lines of duplicate AND/OR merge logic. 5. Added `filterSet` — O(n+m) hash-map fast path for `in`/`not in` operators, replacing the O(nm) linear scan in `matchValue`. 6. Exported `NormalizeOperator`* from `common` for consistent operator alias handling. ### Cleanup 7. Removed 18 lines of dead code (`matchValue`'s `in`/`not in` branches already bypassed by `filterOut` delegation) 8. Fixed orphaned godoc comment for `convertOperator` 9. Fixed incorrect `filterSet` doc comment (claimed "matching EqualFold" but used `strings.ToLower`) 10. Completed `convertToMetaCondition` operator normalization documentation ### Testing - 60 tests (24 service + 36 common), all passing - New tests: `==`, `≠`, `>`, `<`, `≥`, `≤`, `empty`, `not empty` through `ApplyMetaFilter` - New tests: `<`, `≤`, `≠` through `MetaFilter`; `not-in-empty-list` through `filterSet` - All 18 `MetaFilter` tests pass; all 10 `filterSet` unit tests pass --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 12:47:55 +08:00
Jack	ee32d91aab	feat: add EnrichChunksWithDocMetadata function to attach document metadata to chunks (#15659 ) ## Summary Add `EnrichChunksWithDocMetadata` as a method on `MetadataService` that attaches document metadata to retrieval chunks in-place. Equivalent to Python's `enrich_chunks_with_document_metadata()` from `api/utils/reference_metadata_utils.py`. ### Usage ```go metadataSvc.EnrichChunksWithDocMetadata(chunks, tenantID, metadataFields) ``` ### Changes - `service/metadata.go`: Added `EnrichChunksWithDocMetadata` method - `service/enrich_metadata_test.go` (new): 7 test cases ### Algorithm 1. Collect unique `(kb_id, doc_id)` pairs from chunks 2. Fetch metadata from ES via `SearchMetadata(kbID, tenantID, docIDs)` 3. Attach `document_metadata` field to each matching chunk 4. Optionally filter to specified `metadataFields` ### Testing All 7 tests pass: ``` === RUN TestEnrichChunksWithDocMetadata_NoChunks --- PASS === RUN TestEnrichChunksWithDocMetadata_EmptyChunks --- PASS === RUN TestEnrichChunksWithDocMetadata_EmptyDocID --- PASS === RUN TestEnrichChunksWithDocMetadata_DuplicateDocIDs --- PASS === RUN TestEnrichChunksWithDocMetadata_MultipleKBs --- PASS === RUN TestEnrichChunksWithDocMetadata_WithMetadataFields --- PASS === RUN TestEnrichChunksWithDocMetadata_MixedFields --- PASS ``` Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 11:42:23 +08:00
Jack	3b1ae3f829	feat: support SelectFields override in DocEngine for KG-specific queries (#15684 ) ## Summary Both ES and Infinity engines now respect `SearchRequest.SelectFields`, allowing callers to specify output columns for KG entity/relation/community queries instead of the default chunk columns. ### Changes - `internal/engine/elasticsearch/chunk.go`: Added `SelectFields` override after default `outputColumns` - `internal/engine/infinity/chunk.go`: Added `SelectFields` override after default `outputColumns` - `internal/engine/elasticsearch/kg_test.go` (new): Integration test (skipped unless `ES_TEST=1`) ### Usage ```go result, err := docEngine.Search(ctx, \&types.SearchRequest{ KbIDs: kbIDs, SelectFields: []string{entity_kwd, entity_type_kwd, rank_flt, n_hop_with_weight}, Filter: map[string]interface{}{knowledge_graph_kwd: entity}, }) ``` Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 11:41:39 +08:00
Wang Qi	4cbe597d7e	Refactor: consolidate to use @login_required (#15652 ) Refactor: consolidate to use @login_required	2026-06-05 11:35:00 +08:00
bitloi	9f3e289b78	Fix: preserve markdown tables during delimiter extraction (#15632 ) ### What problem does this PR solve? Markdown extraction can split tables row by row when delimiter-based extraction uses a newline delimiter. That loses table structure during chunking even though delimiters should still split normally outside tables. This PR keeps the follow-up to #15482 intentionally narrow: - preserve Markdown pipe tables during delimiter-based extraction - preserve borderless pipe tables during delimiter-based extraction - preserve multiline HTML tables during delimiter-based extraction - keep delimiter splitting unchanged outside protected table ranges Refs #15482 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Testing - `ruff check deepdoc/parser/markdown_parser.py test/unit_test/deepdoc/parser/test_markdown_parser.py` - `python3 run_tests.py -t test/unit_test/deepdoc/parser/test_markdown_parser.py` - `git diff --check`	2026-06-05 10:35:33 +08:00
dripsmvcp	431f52a5d4	feat[Go]: implement GET /agents/templates (issue #15240 ) (#15573 ) ## Summary Port the canvas-template catalogue endpoint to the Go API server. Listed in the Go-API port checklist of #15240. Mirrors `list_agent_template` in `api/apps/restful_apis/agent_api.py`: returns every row from the `canvas_template` table so that the UI can render the template gallery on the New-Agent screen. ## What - `internal/dao/canvas_template.go` — new `CanvasTemplateDAO.GetAll()` ordered by `create_time desc` (newest templates first). - `internal/service/agent.go` — wire the new DAO into `AgentService` and expose `ListTemplates() ([]entity.CanvasTemplate, error)`. - `internal/handler/agent.go` — new `AgentHandler.ListTemplates` HTTP handler (auth-gated, mirrors Python `@login_required`). - `internal/router/router.go` — `agents.GET("/templates", r.agentHandler.ListTemplates)` registered alongside the existing `GET /agents`. - `internal/handler/agent_test.go` — three new tests covering: success path, empty-list → JSON array (not `null`), and the auth gate. ## Notes - `CanvasTemplate` entity, GORM tags, and DB migration already exist in `internal/entity/canvas.go` and `internal/dao/database.go` — no schema change required. - The handler coerces a `nil` slice to `[]entity.CanvasTemplate{}` so the JSON payload is always an array (the frontend does `data.map(...)` on it). ## Test plan - [x] `go vet ./internal/handler ./internal/service ./internal/dao ./internal/router` clean - [x] Three unit tests added; existing `TestListAgents_Success` untouched - [ ] CI runs `go test ./internal/handler` with cgo binding linked ## Related - Tracker: #15240	2026-06-05 10:13:30 +08:00
Jack	a237a89b90	feat: add QueryRewrite prompt builder and response parser (#15669 ) QueryRewrite prompt builder and response parser. Zero external dependencies. ### Functions - `BuildQueryRewritePrompt`: Renders `minirag_query2kwd` prompt with query and type pool - `ParseQueryRewriteResponse`: Parses LLM JSON response with fallback for markdown and extra text ### Testing ``` === RUN TestBuildQueryRewritePrompt --- PASS === RUN TestParseQueryRewriteResponse_ValidJSON --- PASS === RUN TestParseQueryRewriteResponse_MarkdownBlock --- PASS === RUN TestParseQueryRewriteResponse_ExtraText --- PASS === RUN TestParseQueryRewriteResponse_Invalid --- PASS === RUN TestParseQueryRewriteResponse_EmptyEntities --- PASS ``` Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 10:11:14 +08:00
Jack	bf6c091c9f	feat: add KG scoring utilities (#15666 ) KG scoring utilities as pure functions. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 10:10:59 +08:00
kpdev	bd49fd70aa	fix(api): set SDK document download Content-Type from filename (#15112 ) (#15113 ) ## Summary - Infer `Content-Type` from the stored document filename on SDK download routes. - Covers `GET /api/v1/datasets/<dataset_id>/documents/<document_id>` and `GET /api/v1/documents/<document_id>`. - Aligns with REST preview/download via `CONTENT_TYPE_MAP`. ## Test plan - [x] `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_download_mimetype_from_filename` - [x] Manual: `curl -sSI` on SDK dataset document download for a PDF; expect `Content-Type: application/pdf` Fixes #15112.	2026-06-05 10:08:53 +08:00
Lynn	794c1f4b25	Fix: volc engine and other json key factories (#15653 ) ### What problem does this PR solve? Fix: - VolcEngine adapt to new api_key format - Save dict api_key as json ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-05 09:45:44 +08:00
He Wang	7789862cc5	fix(docker): mount tmpfs on es01 /tmp for entrypoint permissions (#15655 ) ### What problem does this PR solve? On some Linux hosts (e.g. x86_64 with enforced POSIX ACL on overlay storage), the official `elasticsearch` Docker image cannot start because `docker-entrypoint.sh` needs to create temporary files under `/tmp` for bash here-documents, while the image ACL grants `user:elasticsearch` only `r-x` on `/tmp`: ``` /usr/local/bin/docker-entrypoint.sh: line 73/84: cannot create temp file for here-document: Permission denied ``` RAGFlow users hit this when running `docker compose` with the default `es01` service. See also Refs #284. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ## Summary Mount a writable `tmpfs` at `/tmp` for the `es01` service so Elasticsearch entrypoint scripts can run on ACL-enforced environments. Closes the startup failure described in #284 for non-ARM deployments. ## Changes - Add `tmpfs: /tmp:mode=1777,size=512m` to `es01` in `docker/docker-compose-base.yml` - Document why the mount is required (ES image `/tmp` ACL vs entrypoint here-documents) ## Test plan - [x] Verified on Linux (x86_64): `docker run --rm elasticsearch:8.11.3 bash -c 'mktemp'` fails without tmpfs and succeeds with `--tmpfs /tmp:mode=1777,size=512m` - [x] Verified `es01` becomes healthy after `docker compose up -d es01` with this change - [ ] Upstream maintainers: `docker compose -f docker/docker-compose-base.yml --profile elasticsearch up -d es01` on a host where ACL is enforced Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-04 23:19:31 +08:00
Jack	eee6ad546f	feat: add ResolveReferenceMetadata utility function (#15663 ) Add `ResolveReferenceMetadata` to parse `include_metadata` / `metadata_fields` from request and config payloads. ### Changes - New: `internal/common/reference_metadata.go` — pure function, zero dependencies - New: `internal/common/reference_metadata_test.go` — 8 test cases Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 22:34:18 +08:00
Jack	96a416629d	refactor: change GetFlattedMetaByKBs return type to common.MetaData (#15656 ) ## Summary Change `GetFlattedMetaByKBs` return type from `map[string]interface{}` to strongly-typed `common.MetaData`. Depends on: #15648 (provides `MetaData`, `MetaValueDocs` types) ### Changes - `service/metadata.go`: Changed return type, removed type assertions - `service/metadata_filter.go`: Updated all metadata function signatures - `service/metadata_filter_test.go` (new): 12 test cases ### Bug fix `applySingleCondition` used `.([]interface{})` assertions on `[]string` data, silently breaking operators like `!=`, `contains`, `start with`, etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 22:16:04 +08:00
web-dev0521	98f2a2e60b	feat(connectors): add Azure Blob Storage data source connector (#15466 ) ### What problem does this PR solve? Closes #15465. RAGFlow supports S3, Google Cloud Storage, R2, and OCI as data sources but not Azure Blob Storage, leaving Azure users without a way to index container objects into a knowledge base. This adds a first-class Azure Blob Storage data-source connector — distinct from RAGFlow's existing Azure storage backends (`rag/utils/azure_sas_conn.py`, `rag/utils/azure_spn_conn.py`) which store RAGFlow's own files. Highlights - `common/data_source/azure_blob_connector.py`: new `AzureBlobConnector` (`CheckpointedConnectorWithPermSync` + `SlimConnectorWithPermSync`). - Uses the existing `azure-storage-blob` dependency (already in `pyproject.toml`). - Three auth modes, tried in order of precedence: 1. Account key — `account_name` + `account_key` + `container_name`. 2. Connection string — `connection_string` + `container_name`. 3. SAS token — `container_url` + `sas_token` (same shape as `RAGFlowAzureSasBlob`). - ETag fingerprint stored per blob in `AzureBlobCheckpoint.etags` — unchanged blobs (same ETag as last run) are skipped without a download. Only new/modified blobs are fetched. - Optional `prefix` scopes indexing to a virtual folder. - `validate_connector_settings()` probes `get_container_properties()` and maps `AuthenticationFailed / 403 / ContainerNotFound` to typed connector exceptions. - Slim-doc IDs are blob names so prune reconciles correctly. - `common/constants.py`, `common/data_source/config.py`, `common/data_source/__init__.py`: register `azure_blob` in `FileSource` / `DocumentSource` and export `AzureBlobConnector`. - `rag/svr/sync_data_source.py`: new `AzureBlob(SyncBase)` class routed through `load_from_checkpoint` (ETag fingerprint owns change-detection) and added to `func_factory`. - Frontend: - `web/src/pages/user-setting/data-source/constant/index.tsx`: new `DataSourceKey.AZURE_BLOB`, auth-mode selector (account key / connection string / SAS token), all credential fields, prefix + batch-size, `syncDeletedFiles` capability, default form values, tile entry with icon. - `web/src/locales/{en,zh}.ts`: description + per-field tooltips for all 9 new keys. - `web/src/assets/svg/data-source/azure-blob.svg`: Azure-branded stacked-cylinders icon. Verification - `npm run build` (vite + esbuild) passes (37 s). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-04 21:06:01 +08:00
Jack	a78a3fdd47	fix: add nil guard to DocumentDAO.GetByIDs and add tests (#15649 ) ## Summary `DocumentDAO.GetByIDs()` generated `WHERE id IN ()` for empty/nil ID slices, which is invalid SQL and would fail on most databases. This PR adds a nil guard and comprehensive tests. ### Changes - Modified: `internal/dao/document.go` — Added `len(ids) == 0` guard to `GetByIDs` - New: `internal/dao/document_test.go` — 4 test cases covering success, empty IDs, nil IDs, and no-match ### Testing ``` === RUN TestDocumentGetByIDs_Success --- PASS === RUN TestDocumentGetByIDs_EmptyIDs --- PASS === RUN TestDocumentGetByIDs_NilIDs --- PASS === RUN TestDocumentGetByIDs_NoMatch --- PASS ``` Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 21:00:02 +08:00
Jack	461c190c49	feat: migrate meta_filter and convert_conditions to Go (#15648 ) ## Summary Migrate the metadata filtering utilities `meta_filter` and `convert_conditions` from `common/metadata_utils.py` to Go as pure functions with zero external dependencies. These functions are used by `dify/retrieval`, `openai/chat/completions`, `document_api`, and `chunk_api` for filtering documents by metadata conditions. ### Changes - New: `internal/common/metadata_utils.go` — `ConvertConditions()` and `MetaFilter()` with full operator support - New: `internal/common/metadata_utils_test.go` — 18 test cases covering all operators and edge cases ### Supported Operators `=`, `≠`, `>`, `<`, `≥`, `≤`, `contains`, `not contains`, `in`, `not in`, `start with`, `end with`, `empty`, `not empty` ### Design - Numeric comparison via `strconv.ParseFloat` - Date comparison via YYYY-MM-DD format detection - Case-insensitive string comparison fallback - `and` / `or` logic support for multiple conditions - Zero external dependencies — pure functions only	2026-06-04 20:14:27 +08:00
Jack	e627f5d8c5	feat: implement POST /api/v1/searchbots/related_questions API (#15639 ) ## Summary Implement the `POST /api/v1/searchbots/related_questions` endpoint in Go, generating related search questions via LLM. ### Changes - New: `internal/handler/related_questions.go` — Handler with injectable LLM interface, prompt constant, and response parsing - New: `internal/handler/related_questions_test.go` — 9 tests (4 handler + 5 parse) - Modified: `internal/router/router.go` — Added route + `RelatedQuestionsHandler` to struct - Modified: `cmd/server_main.go` — Wired handler with `SearchService` and `ModelProviderService` ### Testing All 9 tests pass: ``` === RUN TestRelatedQuestionsHandler_Success --- PASS === RUN TestRelatedQuestionsHandler_EmptyResponse --- PASS === RUN TestRelatedQuestionsHandler_LLMFailure --- PASS === RUN TestRelatedQuestionsHandler_MissingQuestion --- PASS === RUN TestParseRelatedQuestions_Standard --- PASS === RUN TestParseRelatedQuestions_Empty --- PASS === RUN TestParseRelatedQuestions_NoNumberedLines --- PASS === RUN TestParseRelatedQuestions_MixedContent --- PASS === RUN TestParseRelatedQuestions_MultiDigit --- PASS ``` Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 19:13:58 +08:00
Jack	6143205b37	feat: implement GET /api/v1/agents/<agent_id>/versions/<version_id> API (#15640 ) ## Summary Implement the `GET /api/v1/agents/<agent_id>/versions/<version_id>` endpoint in Go, returning full version details including DSL. Depends on #15629 which introduced the version list endpoint and `UserCanvasVersionDAO` infrastructure. ### Changes - Modified: `internal/handler/agent.go` — Added `GetAgentVersion` handler with auth check and ownership verification - Modified: `internal/router/router.go` — Registered `GET /:agent_id/versions/:version_id` route - New/Modified tests: Service and handler tests for the version detail endpoint ### Testing ``` === RUN TestGetVersion_Success --- PASS === RUN TestGetVersion_WrongCanvas --- PASS === RUN TestGetVersion_NotFound --- PASS === RUN TestGetAgentVersionHandler_Success --- PASS === RUN TestGetAgentVersionHandler_VersionNotFound --- PASS ``` Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 19:13:26 +08:00
buua436	423fb6faae	fix: duplicate document ingest guard (#15638 ) ### What problem does this PR solve? When a document is rerun or updated concurrently, the previous unconditional update could overwrite a newer task state. This change adds an `update_time`-based optimistic lock so the update only succeeds if the record has not been modified by another flow in the meantime. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-04 17:57:51 +08:00

1 2 3 4 5 ...

6595 Commits