ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-04 09:39:32 +08:00

Author	SHA1	Message	Date
balibabu	c849c76f8a	Feat: Add a prefix to the `name` of the `FormField` associated with the chat. (#16178 ) Fix: Add a prefix to the `name` of the `FormField` associated with the chat.	2026-06-22 19:18:11 +08:00
Zhichang Yu	3f805a64f1	feat(agent): align Go agent behavior with Python (except retrieval component) (#16225 ) ## Summary Aligns the Go agent runtime/canvas/components/tools behavior with the Python `agent/` implementation so the same stored canvas DSL produces the same execution result on either side. Every component, tool, and runtime primitive in `internal/agent/` is now driven by the same semantics as its Python counterpart — variable resolution, template substitution, control flow, error reporting, retry/cancel, and stream event shapes. The retrieval component is the one explicit exception in this PR. It is being reworked in a separate change and is excluded from this alignment pass; the wrapper slot (`universe_a_wrappers.go → newRetrievalComponent`) is preserved. ## Scope of alignment ### Components (all aligned with `agent/component/`) `Begin` · `Message` · `LLM` (incl. ChatTemplateKwargs, MessageHistoryWindowSize, VisualFiles, Cite, OutputStructure, JSONOutput, TopP, MaxRetries, DelayAfterError, credentials) · `Agent` (react + tool artifact capture + `Reset()` interface-assert) · `Switch` (12/12 operators, Python-equivalent semantics) · `Categorize` · `Invoke` · `Iteration` · `Loop` (macro-expansion through `workflowx.AddLoopNode`) · `UserFillUp` (Python-equivalent interrupt/resume via eino `compose.Interrupt`/`ResumeWithData`) · `FillUp` · `DataOperations` · `ListOperations` · `StringTransform` · `VariableAggregator` · `VariableAssigner` · `Browser` (full stagehand runtime parity) · `DocsGenerator` · `ExcelProcessor`. ### Tools (all aligned with `agent/tools/`) `Retrieval` (wrapper slot only — logic out of scope) · `MCPToolAdapter` (streamable-HTTP) · `CodeExec` (sandbox bridge with `code_exec_contract.go` matching Python contract) · `AkShare` · `ArXiv` · `Crawler` · `DeepL` · `DuckDuckGo` · `Email` · `ExeSQL` · `GitHub` · `Google` · `GoogleScholar` · `Jin10` · `PubMed` · `QWeather` · `SearXNG` · `Tavily` · `Tushare` · `Wencai` · `Wikipedia` · `YahooFinance` — uniform `eino tool.InvokableTool` interface, SSRF protection, shared HTTP client. ### Canvas execution engine (`internal/agent/canvas/`) Aligned with Python's `agent/canvas.py`: - Scheduler (`scheduler.go`): state pre/post handlers, node lambdas, per-component timeout resolver (4-level: per-class env → per-class table → uniform env → 600s fallback), `legacyNoOpNames`. - Loop subgraph (`loop_subgraph.go`): Python-equivalent `AddLoopNode` macro expansion + condition translation. - Multibranch (`multibranch.go`): `Switch` / `Categorize` routing via `compose.NewGraphMultiBranch` — same branch selection semantics as Python. - Parallel subgraph (`parallel_subgraph.go`): matches Python's parallel fan-out contract. - Interrupt/Resume (`interrupt_resume.go`): `UserFillUpNodeBody` / `IsInterruptError` / `ExtractInterruptContexts` — replaces the deprecated Python sentinel chain with eino's native interrupt API, preserving the same external behavior. - Checkpoint (`checkpoint_store.go`): `RedisCheckPointStore` Get/Set/Delete, with business metadata (status / canvas_id / parent_run_id) on a parallel Redis Hash. - RunTracker (`run_tracker.go`): Start / MarkSucceeded / MarkFailed / MarkCancelled / AttachCheckpoint — same lifecycle as the Python run record. - Cancel (`cancel.go`): Redis pub/sub watch. - Stream (`stream.go`): SSE channel with `messages` / `waiting` / `errors` / `done` events, same shape as Python's `agent.canvas.RunEvent` payload. ### DSL bridge (`internal/agent/dsl/`) - `normalize.go`: v1↔v2 collapsed into a single wire format — Python and Go consume the same stored JSON. - `reset.go`: per-run state reset matches Python's `Canvas.reset()` semantics. - Testdata mirrors Python's `agent_msg.json` / `all.json` / etc. ### Runtime (`internal/agent/runtime/`) - `CanvasState` / `NewCanvasState` / `GetVar` / `SetVar` / `ReadVars`: same `{{cpn_id@param}}` resolution model. - `ResolveTemplate` (regex fast path + gonja fallback) — Python Jinja-style semantics. - `selector.go`, `metrics.go`, `component.go`: shared runtime contracts. ## Out of scope (intentionally) - `Retrieval` component logic — wrapped only; full parity lands in a follow-up PR. - Frontend — only minor dsl-bridge / canvas UX fixes ride along. - CLI / admin / model registry — orthogonal to agent behavior. ## How alignment is verified `internal/service/agent_run_e2e_test.go` exercises the full production chain against real Python-shaped DSL fixtures: ``` loadCanvasForUser → versionDAO.GetLatest → decodeCanvasFromDSL → canvas.Compile → cc.Workflow.Invoke → answer extraction ``` using in-memory SQLite + miniredis (no Docker). Covers: - `TestRunAgent_RealCanvas_BeginMessage` — happy path, `{{sys.query}}` resolution - `TestRunAgent_RealCanvas_WaitForUserResume` — two-run resume cycle (Python-equivalent) - `TestRunAgent_RealCanvas_CompileFails` — unknown component name → sanitized error (Python-equivalent) - `TestRunAgent_RealCanvas_InvokeFails` — unresolvable template ref (Python-equivalent) - `TestRunAgent_RunTracker_AttachCheckpoint_CallSequence` — Start→AttachCheckpoint→MarkSucceeded lifecycle `internal/handler/agent_test.go` — SSE streaming parity (`Content-Type: text/event-stream`, `data: {…}\n\n`, trailing `data: [DONE]\n\n`, OpenAI-compatible non-stream `choices`). `internal/agent/canvas/fixture_compile_test.go` + per-component tests pin the Python-equivalent outputs. ``` go test -count=1 -v -run 'TestRunAgent_RealCanvas\|TestRunAgent_RunTracker' ./internal/service/ ``` ## Design reference `docs/develop/agent-go-port-design.md` (1329 lines, last cross-checked 2026-06-17) — module layout, per-component / per-tool inventory, corner-case catalogue, and the actionable backlog (Section 14, including the retrieval alignment follow-up). --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-06-22 11:58:29 +08:00
balibabu	a9021528c3	Fix: Lint error. (#16172 ) ### What problem does this PR solve? Fix: Lint error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-18 13:14:18 +08:00
Wang Qi	99a25dca34	Fix Chat/Search/Agent bot show image (#16152 ) Fix Chat/Search/Agent bot show image	2026-06-18 09:38:31 +08:00
balibabu	3247e353c7	Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. (#16134 ) ### What problem does this PR solve? Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 16:18:47 +08:00
Rander	1235da7093	refactor(paddleocr): migrate from sync API to async Job API (#15967 ) ## Summary Migrate PaddleOCR integration from the deprecated synchronous HTTP API to the new asynchronous Job API (`submit → poll → fetch`), aligning with PaddleOCR 3.6.0+ architecture. ## Changes ### Python (`deepdoc/parser/paddleocr_parser.py`) - Replace synchronous `requests.post()` with async Job API flow (submit → poll → fetch) - Authentication: `token {token}` → `Bearer {token}` - File transfer: base64 JSON body → multipart file upload - Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout controlled by `request_timeout`) - Result: fetch full JSONL from result URL, preserving `prunedResult` with bbox info for crop functionality - Rename `api_url` → `base_url` (backward compatible: `api_url` still accepted as fallback) ### Python (`rag/llm/ocr_model.py`) - Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to `paddleocr_api_url` / `PADDLEOCR_API_URL` ### Go (`internal/entity/models/paddleocr.go`) - Add `Client-Platform: ragflow` header to submit and poll requests - Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5, max 15s) ### Python (`common/constants.py`) - Add `PADDLEOCR_BASE_URL` to env keys and default config ## Backward Compatibility - Old env var `PADDLEOCR_API_URL` still works (used as fallback) - Frontend field `paddleocr_api_url` still works (backend reads it as fallback) - No user-facing configuration changes required for existing setups ## Why not use the `paddleocr` SDK package directly? RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing `block_bbox`, `block_label`, `parsing_res_list`) from the raw API response for PDF crop functionality. The SDK's public `parse_document()` API only returns `DocParsingResult` with `markdown_text`, discarding the bbox data. Therefore we implement the async Job API flow directly via HTTP, following the same logic as the SDK internally.	2026-06-16 19:34:21 +08:00
chanx	cac87d7f77	fix: remove unnecessary 'asChild' prop from FilterButton component (#16094 ) ### What problem does this PR solve? fix: remove unnecessary 'asChild' prop from FilterButton component ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 17:55:04 +08:00
balibabu	ba93ac3bd7	Feat: Move less important chat settings into a collapsible panel. (#16024 ) ### What problem does this PR solve? Feat: Move less important chat settings into a collapsible panel. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-15 19:09:19 +08:00
balibabu	fa6d29603a	Fix: Adjust chat line height. (#16021 ) ### What problem does this PR solve? Fix: Adjust chat line height. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-15 16:53:45 +08:00
buua436	400dfd50d8	feat: add custom value support for s3 region (#15968 ) ### What problem does this PR solve? Allow S3-compatible data source region fields to accept custom values while preserving search-and-select behavior. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-15 11:40:28 +08:00
Zhichang Yu	3fa15c0e2f	feat(agent): Go port — canvas engine, 22 components, DSL v2, 13 endpoints (#15952 ) Ports the agent canvas subsystem from Python to Go. ## What's included ### Canvas Engine (Phase 0/1) - State engine, scheduler, variable resolver, Redis checkpoint store, cancel protocol - 209 tests across canvas / component / io packages ### 22 Components (P0–P4) \| Tier \| Components \| \|---\|---\| \| P0 T1+T2+T3 \| LLM, Agent, ExitLoop, Switch, Categorize, Begin, Message, Invoke \| \| P1 T3 \| VariableAggregator, VariableAssigner, StringTransform, ListOperations, DataOperations \| \| P2 T3 \| Iteration, IterationItem, Loop, LoopItem \| \| P3 T3 \| UserFillUp, Fillup \| \| P4 T5 \| Browser, ExcelProcessor, DocsGenerator \| ### DSL v2 Schema (Phase 2.5) - Typed v2 in-memory model with v1-to-v2 auto-detect converter - v1 legacy field stripping per plan §2.11.7 ### HTTP Endpoints & Bug Fixes (Plans PR1–PR3) - DELETE SQL bug fix: gorm v2 `Where("id = ?", id).Delete(...)` pattern - CreateAgent validation: title/DSL required, duplicate check, 103 envelope - 13 new endpoints: templates, prompts, tags, sessions CRUD, chat/completions (SSE + non-stream stubs), rerun, test_db_connection, logs, webhook/logs - 756 Go unit tests (745 → 756, +18) - 17 → 0 Python integration test failures (test_agents.py + test_session_management/) ### Tools 21 eino tools: HTTPHelper, search tools, financial/data tools, mandatory stubs ### Infrastructure OTel observability, NATS message queue, DeepDoc gRPC client, SSRF guards, IDOR mitigation	2026-06-12 22:58:28 +08:00
balibabu	89aac82663	Fix: chat/agent -- Default avatar is not displaying correctly. (#15948 ) ### What problem does this PR solve? Fix: chat/agent -- Default avatar is not displaying correctly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-12 17:54:36 +08:00
Yingfeng	bae8c6f109	Improve docx preview (#15907 )	2026-06-11 20:43:58 +08:00
balibabu	70ae25fc7b	Fix: Remove the pagination from the search and retrieval pages. (#15942 ) ### What problem does this PR solve? Fix: Remove the pagination from the search and retrieval pages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 16:36:05 +08:00
monsterDavid	a851228ded	fix(preview): authenticate markdown document preview requests (#15589 ) ## Summary Fixes [#15585](https://github.com/infiniflow/ragflow/issues/15585). - Route markdown preview through the shared `request` client (same as txt/image previewers) so `Authorization` headers and interceptors are applied consistently. - Add a unit test covering `AUTH_BETA` token loading for embedded search auth. ## Root cause Search result preview for `.md`/`.mdx` used raw `fetch`, which did not apply the same auth path as other preview types. That led to `401` on `GET /api/v1/documents/{id}/preview` even when the user was logged in or using an embedded search `auth` query param. ## Test plan - [ ] Log in, run a search, open a markdown citation link — preview loads (no 401). - [ ] Open an embedded shared search URL with `auth` query param, preview a markdown file — preview loads. - [ ] Confirm PDF/txt preview still works in the same search UI. --------- Co-authored-by: MkDev11 <89318445+bitloi@users.noreply.github.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:46:20 +08:00
chanx	84482762d5	feat: support custom editing for model list (#15855 ) ### What problem does this PR solve? feat: support custom editing for model list ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-09 19:24:43 +08:00
balibabu	d025e18176	Fix: Add a waiting status to the messages on the chat page. (#15773 ) ### What problem does this PR solve? Fix: Add a waiting status to the messages on the chat page. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 19:17:00 +08:00
chanx	7dd4030986	fix: Resolve error when checking pipeline parsing result (#15778 ) ### What problem does this PR solve? fix: Resolve error when checking pipeline parsing result ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 19:16:21 +08:00
chanx	2bd8900638	Fix: Model provider bugs (#15770 ) ### What problem does this PR solve? Fix: Model provider bugs ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 17:04:05 +08:00
chanx	144abbe2eb	feat: Unify the 'Add Model Provider' modal (#15768 ) ### What problem does this PR solve? feat：Unify the 'Add Model Provider' modal ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-06-08 16:46:52 +08:00
balibabu	9c32b73cf7	Fix: The embedded website floating component on the agent page does not display citations. (#15767 ) ### What problem does this PR solve? Fix: The embedded website floating component on the agent page does not display citations. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 16:09:36 +08:00
balibabu	9c14e3f377	Fix: When adding a chat in the main interface, a warning will automatically pop up (#15685 ) ### What problem does this PR solve? Fix: When adding a chat in the main interface, a warning will automatically pop up (even if embedding and LLM model have already been configured). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-05 19:09:22 +08:00
chanx	a678ed7b1f	Fix: Switching pagesize on a chunk page did not reset the current page. (#15401 ) ### What problem does this PR solve? Fix: Switching pagesize on a chunk page did not reset the current page. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-03 15:57:57 +08:00
Julian	33ef724b5f	Add Bulk action for linking Multiple Files to Datasets (#14960 ) ### What problem does this PR solve? Feature: #14961 ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-06-02 12:23:33 +08:00
balibabu	f194e8b4c4	Fix: The newly added model did not appear in the drop-down menu. (#15476 ) ### What problem does this PR solve? Fix: The newly added model did not appear in the drop-down menu. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-01 17:56:41 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
balibabu	187dc8a1e6	Fix: The Creativity parameter of chat was not saved. (#15243 ) ### What problem does this PR solve? Fix: The Creativity parameter of chat was not saved. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-27 11:02:30 +08:00
chanx	bce11527c3	Fix: Fixed metadata issue (#15226 ) ### What problem does this PR solve? Fix: Fixed metadata issue - The dataset's built-in metadata is now active, but it appears to be disabled in the individual file configuration. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-26 13:16:15 +08:00
balibabu	c7c75c0a87	Feat: Enable agent messages to display base64 images (#15212 ) ### What problem does this PR solve? Feat: Enable agent messages to display base64 images ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-25 19:02:03 +08:00
balibabu	0f92353bd9	Fix: Replace the red highlight at the top of the PDF document with yellow. (#15203 ) ### What problem does this PR solve? Fix: Replace the red highlight at the top of the PDF document with yellow. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-25 17:21:36 +08:00
Ahmad Intisar	e6068a7f7e	Fix: table parser metadata (#15127 ) ### What problem does this PR solve? This PR improves the table upload flow for CSV/Excel files by allowing table column role configuration at upload time. Previously, users had to: 1. Upload and parse a table file. 2. Open parser settings and manually set table column roles. 3. Re-parse the file for the roles to take effect. This was inefficient and required an unnecessary second parse. With this change: 1. When the knowledge base uses table parsing, the upload dialog extracts CSV/Excel headers client-side. 2. Users can choose Auto mode or Manual mode. 3. In Manual mode, users can assign per-column roles before upload. 4. The selected parser config is sent with the upload request and applied server-side during document creation. Result: configured table column roles are applied from the first parse. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-05-25 16:05:38 +08:00
buua436	71a52d579c	fix: move agent attachment download api (#15146 ) ### What problem does this PR solve? move agent attachment download api to the correct route and update frontend callers ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Notes - Move the attachment download endpoint from document routes to agent routes. - Update frontend download callers to use the agent attachment endpoint. - Reuse the shared file response header helper instead of duplicating it in `agent_api.py`.	2026-05-22 15:22:05 +08:00
balibabu	1ed8a118cf	Fix: The folder tree menu for moving folders cannot be scrolled. (#15037 ) ### What problem does this PR solve? Fix: The folder tree menu for moving folders cannot be scrolled. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-20 14:59:36 +08:00
Magicbook1108	b69a6a5d80	Feat: full optimization on connector dashboard (#14979 ) ### What problem does this PR solve? This PR improves the connector dashboard task management experience and adds better visibility into connector execution logs. ### Overview: #### Before <img width="700" alt="image" src="https://github.com/user-attachments/assets/e4a8ed6f-2e18-4f0f-8528-41a514550052" /> #### Now: <img width="700" alt="Screenshot from 2026-05-18 16-31-30" src="https://github.com/user-attachments/assets/d4ca193b-847a-49ae-9e4f-5fbca60ea627" /> ### 1. Add a new logging page to the connector dashboard A new logging page has been added so users can view connector task execution logs directly from the connector dashboard. ### 2. Merge the Resume button into Confirm The separate Resume button has been removed. The Confirm button now represents different actions depending on the current task state: - Save: Save form changes and reschedule tasks. - Stop: Cancel currently scheduled or running tasks. - Resume: Create new scheduled tasks after the previous tasks have been stopped. - Start: Start tasks when no task has been started yet. ### 3. Separate syncing and pruning tasks Connector tasks are now separated into syncing and pruning. Pruning is controlled by the Sync deleted files option: - When Sync deleted files is disabled, only syncing tasks are shown. - When Sync deleted files is enabled, both syncing and pruning tasks are shown. Now: Sync deleted files disabled <img width="700" alt="Sync deleted files disabled" src="https://github.com/user-attachments/assets/dbd9232e-614a-407f-a0b1-c109e5fa567d" /> Now: Sync deleted files enabled <img width="700" alt="Sync deleted files enabled" src="https://github.com/user-attachments/assets/1f527f48-ccb3-4ee8-97ca-086891489296" /> ### 4. Update logs in backend <img width="700" alt="image" src="https://github.com/user-attachments/assets/10a95a3f-98c1-4e67-8afa-ddf6cda5b0b2" /> ### 5. Remove connector resume API - Removed: `POST /v1/connectors/<connector_id>/resume` - Replaced by: `PATCH /v1/connectors/<connector_id>` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-19 10:07:11 +08:00
Wang Qi	13b422037f	Refactor: enhance graphrag - part 2 (#14972 ) ### What problem does this PR solve? 1. expose batch_chunk_token_size for configuration 2. retrieve chunks when build subgraph for the doc, not retreive all docs chunks at the begining 3. get all chunks for a document, used to be hard coded 10000 4. delete not used method run_graphrag ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring Follow on: #14617	2026-05-18 16:10:21 +08:00
小熊	09d45046e5	Feat/web markdown UI updates (#14214 ) ### What problem does this PR solve? LLM/chat and search UIs render Markdown in several places (document preview, floating chat widget, next-search, etc.). Plugin lists and behavior were duplicated or inconsistent, and single newlines in model output were not always rendered as visible line breaks, which hurts readability for chat-style content. This PR centralizes shared remark/rehype configuration (including `remark-breaks` for newline handling) and wires the main Markdown surfaces to use it, so behavior is consistent and easier to maintain. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-05-15 22:29:44 +08:00
yingjianzh	4c68a6b86c	fix(agent): pass top_k and fix similarity weight slider behavior (#14760 ) ### What problem does this PR solve? This PR fixes two issues in Agent Retrieval behavior and configuration UX: 1. `top_k` configured in Agent Retrieval was not passed down to the backend retriever call, so retrieval could ignore the configured vector recall limit. 2. Similarity weight slider semantics were confusing in Agent forms because the Agent field stores `keywords_similarity_weight` while UI interactions were interpreted as vector weight. This could cause displayed values and actual behavior to diverge. This PR ensures Agent retrieval uses configured `top_k`, and makes the slider behavior consistent and explicit for both vector and keyword weight modes. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-15 10:49:14 +08:00
balibabu	41072ed44d	Feat: This enables SelectWithSearch to search by label. (#14925 ) ### What problem does this PR solve? Feat: This enables SelectWithSearch to search by label. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: balibabu <assassin_cike@163.com>	2026-05-14 20:33:11 +08:00
plind	dd76653dc1	feat: add tag management for Agents with filtering and sorting (#14774 ) (#14799 ) ## Summary Closes #14774. Adds free-form tags on agents (UserCanvas) with full UI + API: - Stored as comma-separated `tags` column on `UserCanvas` with online migration. - New endpoints: `GET /v1/agents/tags` (aggregate counts) and `PUT /v1/agent/<id>/tags` (write). `GET /v1/agents` accepts a `tags=` query. - "Edit tags" item in agent dropdown opens a chip-style editor dialog; tags render as badges on each agent card. - New "Tags" facet in the agents filter bar, with counts. ## Implementation notes - Tag matching is exact-token: the SQL filter wraps stored tags as `,…,` and matches `,ml,` so `ml` doesn't match `ml-ops`. - Server-side normalization in `UserCanvasService.update_tags`: dedup (case-insensitive), per-tag cap of 64 chars, total length capped at 512 chars to fit the column, commas inside tag values are replaced with spaces. - Tenant authorization: `PUT /v1/agent/<id>/tags` gates on `UserCanvasService.accessible(canvas_id, tenant_id)`. - Tag listing scope: `UserCanvasService.list_tags` follows the same own + team-shared rule as `get_by_tenant_ids`. - i18n: keys added to `en.ts` and `zh.ts` only (per project convention; other locales fall back). - `HomeCard` gets a non-breaking `extra?: ReactNode` slot for the chip row; no `src/components/ui/` files modified. ## Test plan - [ ] Backend boot runs `migrate_db` → confirm `user_canvas.tags` column exists (`DESCRIBE user_canvas`). - [ ] Agents page renders cards normally (no console error from missing field). - [ ] `⋯ → Edit tags` opens a dialog that stays open (regression: dialog was unmounting with the dropdown). - [ ] Typing a tag without pressing Enter and clicking Save persists it (regression: last typed tag was being dropped). - [ ] Chip input supports Enter/comma to commit, Backspace on empty to remove, `×` to remove individual chip. - [ ] Tag containing a comma sent via API is stored with the comma replaced by a space. - [ ] 20 long tags sent via API does not error (length cap silently truncates). - [ ] "Tags" filter in the filter bar shows counts and narrows the list. - [ ] Filtering by `ml` does not return agents tagged `ml-ops`. - [ ] UI in Chinese shows 编辑标签 / 添加标签以整理和筛选你的智能体 etc. - [ ] `PUT /v1/agent/<other-tenant-id>/tags` returns `Agent not found or no permission.`	2026-05-13 21:41:32 +08:00
47NoahThompson	9e0f976729	Add widget customization and persistence (#14603 ) Introduce comprehensive floating widget customization: add new widget settings (title, subtitle, footer, colors, mute, streaming) with types and defaults, and expose them via EmbedDialog UI (split into Embed Setup and Widget Customization tabs). Persist and load settings through Agent page by reading/writing globals and wiring an onSaveWidgetSettings handler to setAgent; show a loading ButtonLoading for saving. Update embed iframe query params and FloatingChatWidget to honor URL params (colors, text, mute/streaming) with validation/normalization, color darkening for gradients, footer link normalization, and improved styling. Also add copy-to-clipboard in message toolbar, adjust syntax highlighter layout and Copy button, and add i18n key for muteWidget. ### What problem does this PR solve? Adds a few fields to the embed widget modal to customize the appearance of the floating widget when embedded into a page. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Noah <Noah.Thompson@ecn.forces.gc.ca>	2026-05-13 21:13:11 +08:00
Ahmad Intisar	e994051eb9	Feature/generic api connector (#13545 ) # feat: Add Generic REST API Connector ## What problem does this PR solve? RAGFlow supports many specific data source connectors (MySQL, Slack, Google Drive, etc.), but there was no way to connect an arbitrary REST API as a data source. Users with custom or third-party APIs had to write a new connector class for each one. This PR adds a generic, configuration-driven REST API connector that lets users connect any REST API as a data source entirely through the UI — no code changes needed per API. --- ## Features ### Core Connector (`common/data_source/rest_api_connector.py`) - Implements `LoadConnector` and `PollConnector` interfaces for full and incremental sync - Configurable authentication: None, API Key (custom header), Bearer Token, Basic Auth - Pluggable pagination: Page-based, Offset-based, Cursor-based, or None - Smart page-size inference from user's query parameters to avoid duplicate/conflicting params - Configurable request delay between pages to prevent API rate limiting - Auto-detection of the items array in JSON responses (`items`, `results`, `data`, `records`, or first list found) - Advanced field mapping with dot-notation (`country.name`), array wildcards (`newsType[].name`), type hints, and default values - Optional content template rendering (`"Title: {title}\nBody: {body}"`) - HTML stripping for content fields - Stable document IDs via `hash128` from a configurable ID field or auto-generated from item content - Pydantic configuration schema with automatic coercion of UI string inputs to dicts/lists ### Backend Registration (`rag/svr/sync_data_source.py`, `common/constants.py`, `common/data_source/config.py`) - `REST_API` sync class wired into RAGFlow's `func_factory` - Full sync (`load_from_state`) and incremental polling (`poll_source`) support - Credentials and config passed from task to connector following existing patterns (MySQL, SeaFile, etc.) ### Test Connection Endpoint (`api/apps/connector_app.py`) - `POST /v1/connector/<id>/test` validates config schema, authentication, and API connectivity without triggering a sync - Clear error messages for auth failures vs. config issues ### Frontend UI (`web/src/pages/user-setting/data-source/constant/`) - Postman-style configuration:* Base URL, Query Parameters (key=value per line), Auth, Content Fields, Metadata Fields, Pagination Type - Auth-type-aware form: fields for API key header/value, Bearer token, or Basic username/password appear only when relevant - Advanced Settings toggle for: Custom Headers, Max Pages, Request Delay, Poll Timestamp Field, Request Body (POST) - Connector icon (SVG) and i18n strings (English) - "Test Connection" button to validate before syncing --- ## Controls & Safety - Configurable max pages safety cap (default: 1000, adjustable in UI) - Configurable request delay between pages (default: 0.5s, adjustable in UI) - Auth errors (401/403) fail immediately without retries; transient errors retry with exponential backoff - Diagnostic logging: auth setup confirmation, request details on failure, content field extraction status --- ## Type of change - [x] New Feature (non-breaking change which adds functionality) ##Visual Screenshots of Features <img width="482" height="510" alt="Screenshot 2026-03-11 at 5 19 52 PM" src="https://github.com/user-attachments/assets/dcb7ab4a-1622-44f3-bb02-d6f0527314c4" /> (Connector can be configured within the external data sources tab) Configuration Parameters: <img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 46 PM" src="https://github.com/user-attachments/assets/5e154e71-4ab5-4872-bfb2-04f02b73c18a" /> <img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 54 PM" src="https://github.com/user-attachments/assets/00cb14b7-0bcf-4b94-9d71-34e93369ecb2" /> Connection can be tested before attaching to dataset: <img width="981" height="681" alt="Screenshot 2026-03-11 at 5 21 40 PM" src="https://github.com/user-attachments/assets/aaa6eeeb-89a7-4349-bc34-2423bf8be9ee" /> Ingestion tested with API connector (works perfectly fine): <img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 22 30 PM" src="https://github.com/user-attachments/assets/afcd0d58-cadd-4152-badc-d2f14d96fbec" /> Search & Retrieval works as well with metadata flow: <img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 23 05 PM" src="https://github.com/user-attachments/assets/d41ee935-dcf7-4456-b317-22a76ca032c0" /> --------- Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-13 20:35:01 +08:00
Wang Qi	76d5240fb5	Fix #14801 to allow search dataset list when add (#14841 ) ### What problem does this PR solve? Fix #14801 to allow search dataset list when add, following on #14825 <img width="2172" height="857" alt="image" src="https://github.com/user-attachments/assets/65ea7647-56f4-4c16-8437-121b834811f0" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-12 19:36:23 +08:00
CaptainTimon	2717ee283f	feat(raptor): add Psi tree builder with original-space ranking and safe migration (#14679 ) ### What problem does this PR solve? Closes #14674. This PR improves RAPTOR configuration and tree construction while preserving the existing RAPTOR behavior as the default. RAPTOR currently builds summary layers with the original UMAP + GMM clustering path. This PR keeps that default path, and adds: - A hidden backend tree-builder option: - `tree_builder="raptor"`: default, existing RAPTOR behavior. - `tree_builder="psi"`: rank-aware Psi-style tree builder using original embedding-space cosine ranking. - A user-facing clustering method option for the default RAPTOR builder: - `clustering_method="gmm"`: existing default. - `clustering_method="ahc"`: agglomerative hierarchical clustering path. - A RAPTOR UI setting for `Clustering method` and `Max cluster`. ### What changed #### Backend - Added `tree_builder` support for RAPTOR/Psi. - Added `clustering_method` support for GMM/AHC. - Kept existing RAPTOR + GMM as the default. - Added Psi tree building from original-space cosine similarity. - Added bucketed Psi building controls for large inputs: - `raptor.ext.psi_exact_max_leaves` - `raptor.ext.psi_bucket_size` - Added method-aware RAPTOR summary metadata using existing `extra.raptor_method`. - Avoided adding a dedicated DB schema field for experimental method tracking. - Added cleanup/migration logic to avoid mixing stale RAPTOR summary trees. - Added defensive checks for Psi tree construction and summary failures. #### Frontend/UI - Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`. - Added/kept `Max cluster` in RAPTOR settings. - Enlarged max cluster UI limit to `1024`, matching backend validation. - Kept AHC editable even when a RAPTOR task has already finished. - Fixed the UI save payload so `clustering_method` and `tree_builder` are serialized through `parser_config.raptor.ext`, avoiding backend validation errors for extra top-level RAPTOR fields. Example saved RAPTOR config: ```json { "raptor": { "max_cluster": 317, "ext": { "clustering_method": "ahc", "tree_builder": "raptor" } } } Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>	2026-05-12 09:42:31 +08:00
Nie WeiYang	1e80be77a2	fix(web): fix incomplete Docx preview in citation reference (#14122 ) This PR fixes a UI issue where the .docx document preview was displayed incompletely when clicking on a citation/reference link during a knowledge base conversation. ### What problem does this PR solve? The Issue: In the chat interface, when a user clicks the source citation at the end of an answer, the DocPreviewer opens. However, for .docx files, if the content exceeded the window height, it was truncated and unscrollable, preventing users from reading the full referenced text. Changes: web/src/components/document-preview/doc-preview.tsx: Added the overflow-auto Tailwind class to the DocPreviewer root container to ensure scrollbars appear automatically when content overflows. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: nie.weiyang <nie.weiyang@embedway.com>	2026-05-11 16:17:48 +08:00
Wang Qi	3838770e7a	GraphRAG feature - Part 1 - add spacy to extract entity and relation (#14670 ) ### What problem does this PR solve? GraphRAG feature - Part 1 - add spacy to extract entity and relation <img width="1621" height="1288" alt="image" src="https://github.com/user-attachments/assets/aadeddad-94da-46c6-adad-9c3784181f61" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 12:59:59 +08:00
很拉风的James	6cb4bc2947	Fix: Radio.Group cloneElement crashes on non-element children (#14407 ) ### What problem does this PR solve? `Radio.Group` in `web/src/components/ui/radio.tsx` injects the parent's `disabled` prop into each child via `React.cloneElement` with `as React.ReactElement` and no validation. This throws at runtime when a consumer passes strings, numbers, `null`, `false`, or other non-element nodes, while the cast hides the unsafe access from TypeScript. Use `React.isValidElement<RadioProps>(child)` as a type guard before calling `cloneElement`. Non-element children pass through unchanged, and `child.props` access becomes type-checked without an `as` cast. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 09:54:42 +08:00
chanx	8ac14b597f	Fix: Some bugs (#14734 ) ### What problem does this PR solve? Fix: Some bugs - Error during batch modification of metadata in the Knowledge Base - Manually configured metadata is not displayed in search settings ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-09 17:40:22 +08:00
buua436	de2abe9ed8	Fix: tag parser id (#14724 ) ### What problem does this PR solve? tag parser id ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-09 14:29:09 +08:00
Tim Wang	1bcb6deb6f	Fix: collapsible thinking display and separate deep research retrieval tag (#14613 ) ## Summary - Collapsible thinking: Replace `<section>` with `<details>` for `<think>` content, so model thinking output is collapsed by default (click to expand). Works for all models that output `<think>` tags (Qwen3, DeepSeek, Gemini, Claude, etc.). - Fix double thinking tags: When reasoning/deep research mode is enabled in knowledge base chat, both the retrieval progress and model thinking were wrapped in `<think>` tags, producing two "Thinking..." blocks. Now retrieval progress uses a dedicated `<retrieving>` tag rendered as a separate "Retrieving..." collapsible with a distinct green accent. ### Before - Thinking content displayed as flat gray-bordered `<section>`, occupying significant screen space - Deep research + model thinking both use `<think>` → two identical "Thinking..." blocks ### After - Thinking content collapsed by default in a `<details>` element, click "Thinking..." to expand - Deep research shows "Retrieving..." (green border), model thinking shows "Thinking..." (gray border) ## Changes Backend (`api/db/services/dialog_service.py`) - Deep research callback: replace `start_to_think`/`end_to_think` marker flags with direct `<retrieving>`/`</retrieving>` answer text Frontend - `web/src/utils/chat.ts`: `replaceThinkToSection()` now uses `<details>` instead of `<section>`; add new `replaceRetrievingToSection()` - 4 tsx files: import and pipe `replaceRetrievingToSection`, whitelist `details`, `summary`, `retrieving` in DOMPurify `ADD_TAGS` - 4 less files: `section.think` → `details.think` with `<summary>` styles; add `details.retrieving` with green accent; dark mode and RTL variants ## Test plan - [ ] Open a chat WITHOUT knowledge base, ask a question to a model with thinking (e.g. Qwen3) → thinking content should be collapsed by default, click "Thinking..." to expand - [ ] Open a chat WITH knowledge base and reasoning enabled, ask a question → "Retrieving..." (green) shows retrieval progress, "Thinking..." (gray) shows model thinking, each independently collapsible - [ ] Verify dark mode renders correctly for both collapsible blocks - [ ] Verify RTL layout renders correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: wanghualoong <wanghualoong@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-08 14:40:00 +08:00
buua436	f703169117	Refa: migrate document preview/download to RESTful API (#14633 ) ### What problem does this PR solve? migrate document preview/download to RESTful API ### Type of change - [x] Refactoring	2026-05-08 13:26:13 +08:00

1 2 3 4 5 ...

750 Commits