Commit Graph

750 Commits

Author SHA1 Message Date
balibabu
c849c76f8a Feat: Add a prefix to the name of the FormField associated with the chat. (#16178)
Fix: Add a prefix to the `name` of the `FormField` associated with the chat.
2026-06-22 19:18:11 +08:00
Zhichang Yu
3f805a64f1 feat(agent): align Go agent behavior with Python (except retrieval component) (#16225)
## Summary

Aligns the **Go agent runtime/canvas/components/tools** behavior with
the **Python `agent/` implementation** so the same stored canvas DSL
produces the same execution result on either side. Every component,
tool, and runtime primitive in `internal/agent/` is now driven by the
same semantics as its Python counterpart — variable resolution, template
substitution, control flow, error reporting, retry/cancel, and stream
event shapes.

The **retrieval component is the one explicit exception** in this PR. It
is being reworked in a separate change and is excluded from this
alignment pass; the wrapper slot (`universe_a_wrappers.go →
newRetrievalComponent`) is preserved.

## Scope of alignment

### Components (all aligned with `agent/component/`)
`Begin` · `Message` · `LLM` (incl. ChatTemplateKwargs,
MessageHistoryWindowSize, VisualFiles, Cite, OutputStructure,
JSONOutput, TopP, MaxRetries, DelayAfterError, credentials) · `Agent`
(react + tool artifact capture + `Reset()` interface-assert) · `Switch`
(12/12 operators, Python-equivalent semantics) · `Categorize` · `Invoke`
· `Iteration` · `Loop` (macro-expansion through `workflowx.AddLoopNode`)
· `UserFillUp` (Python-equivalent interrupt/resume via eino
`compose.Interrupt`/`ResumeWithData`) · `FillUp` · `DataOperations` ·
`ListOperations` · `StringTransform` · `VariableAggregator` ·
`VariableAssigner` · `Browser` (full stagehand runtime parity) ·
`DocsGenerator` · `ExcelProcessor`.

### Tools (all aligned with `agent/tools/`)
`Retrieval` (wrapper slot only — logic out of scope) · `MCPToolAdapter`
(streamable-HTTP) · `CodeExec` (sandbox bridge with
`code_exec_contract.go` matching Python contract) · `AkShare` · `ArXiv`
· `Crawler` · `DeepL` · `DuckDuckGo` · `Email` · `ExeSQL` · `GitHub` ·
`Google` · `GoogleScholar` · `Jin10` · `PubMed` · `QWeather` · `SearXNG`
· `Tavily` · `Tushare` · `Wencai` · `Wikipedia` · `YahooFinance` —
uniform `eino tool.InvokableTool` interface, SSRF protection, shared
HTTP client.

### Canvas execution engine (`internal/agent/canvas/`)
Aligned with Python's `agent/canvas.py`:
- **Scheduler** (`scheduler.go`): state pre/post handlers, node lambdas,
per-component timeout resolver (4-level: per-class env → per-class table
→ uniform env → 600s fallback), `legacyNoOpNames`.
- **Loop subgraph** (`loop_subgraph.go`): Python-equivalent
`AddLoopNode` macro expansion + condition translation.
- **Multibranch** (`multibranch.go`): `Switch` / `Categorize` routing
via `compose.NewGraphMultiBranch` — same branch selection semantics as
Python.
- **Parallel subgraph** (`parallel_subgraph.go`): matches Python's
parallel fan-out contract.
- **Interrupt/Resume** (`interrupt_resume.go`): `UserFillUpNodeBody` /
`IsInterruptError` / `ExtractInterruptContexts` — replaces the
deprecated Python sentinel chain with eino's native interrupt API,
preserving the same external behavior.
- **Checkpoint** (`checkpoint_store.go`): `RedisCheckPointStore`
Get/Set/Delete, with business metadata (status / canvas_id /
parent_run_id) on a parallel Redis Hash.
- **RunTracker** (`run_tracker.go`): Start / MarkSucceeded / MarkFailed
/ MarkCancelled / AttachCheckpoint — same lifecycle as the Python run
record.
- **Cancel** (`cancel.go`): Redis pub/sub watch.
- **Stream** (`stream.go`): SSE channel with `messages` / `waiting` /
`errors` / `done` events, same shape as Python's `agent.canvas.RunEvent`
payload.

### DSL bridge (`internal/agent/dsl/`)
- `normalize.go`: v1↔v2 collapsed into a single wire format — Python and
Go consume the same stored JSON.
- `reset.go`: per-run state reset matches Python's `Canvas.reset()`
semantics.
- Testdata mirrors Python's `agent_msg.json` / `all.json` / etc.

### Runtime (`internal/agent/runtime/`)
- `CanvasState` / `NewCanvasState` / `GetVar` / `SetVar` / `ReadVars`:
same `{{cpn_id@param}}` resolution model.
- `ResolveTemplate` (regex fast path + gonja fallback) — Python
Jinja-style semantics.
- `selector.go`, `metrics.go`, `component.go`: shared runtime contracts.

## Out of scope (intentionally)

- **`Retrieval` component logic** — wrapped only; full parity lands in a
follow-up PR.
- **Frontend** — only minor dsl-bridge / canvas UX fixes ride along.
- **CLI / admin / model registry** — orthogonal to agent behavior.

## How alignment is verified

`internal/service/agent_run_e2e_test.go` exercises the **full production
chain** against real Python-shaped DSL fixtures:
```
loadCanvasForUser → versionDAO.GetLatest → decodeCanvasFromDSL →
canvas.Compile → cc.Workflow.Invoke → answer extraction
```
using in-memory SQLite + miniredis (no Docker). Covers:
- `TestRunAgent_RealCanvas_BeginMessage` — happy path, `{{sys.query}}`
resolution
- `TestRunAgent_RealCanvas_WaitForUserResume` — two-run resume cycle
(Python-equivalent)
- `TestRunAgent_RealCanvas_CompileFails` — unknown component name →
sanitized error (Python-equivalent)
- `TestRunAgent_RealCanvas_InvokeFails` — unresolvable template ref
(Python-equivalent)
- `TestRunAgent_RunTracker_AttachCheckpoint_CallSequence` —
Start→AttachCheckpoint→MarkSucceeded lifecycle

`internal/handler/agent_test.go` — SSE streaming parity (`Content-Type:
text/event-stream`, `data: {…}\n\n`, trailing `data: [DONE]\n\n`,
OpenAI-compatible non-stream `choices`).

`internal/agent/canvas/fixture_compile_test.go` + per-component tests
pin the Python-equivalent outputs.

```
go test -count=1 -v -run 'TestRunAgent_RealCanvas|TestRunAgent_RunTracker' ./internal/service/
```

## Design reference

`docs/develop/agent-go-port-design.md` (1329 lines, last cross-checked
2026-06-17) — module layout, per-component / per-tool inventory,
corner-case catalogue, and the actionable backlog (Section 14, including
the retrieval alignment follow-up).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-22 11:58:29 +08:00
balibabu
a9021528c3 Fix: Lint error. (#16172)
### What problem does this PR solve?

Fix: Lint error.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-18 13:14:18 +08:00
Wang Qi
99a25dca34 Fix Chat/Search/Agent bot show image (#16152)
Fix Chat/Search/Agent bot show image
2026-06-18 09:38:31 +08:00
balibabu
3247e353c7 Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. (#16134)
### What problem does this PR solve?

Fix: The .docx file is not displaying fully; the hierarchy of the
pipeline created from the template is missing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 16:18:47 +08:00
Rander
1235da7093 refactor(paddleocr): migrate from sync API to async Job API (#15967)
## Summary

Migrate PaddleOCR integration from the deprecated synchronous HTTP API
to the new asynchronous Job API (`submit → poll → fetch`), aligning with
PaddleOCR 3.6.0+ architecture.

## Changes

### Python (`deepdoc/parser/paddleocr_parser.py`)
- Replace synchronous `requests.post()` with async Job API flow (submit
→ poll → fetch)
- Authentication: `token {token}` → `Bearer {token}`
- File transfer: base64 JSON body → multipart file upload
- Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout
controlled by `request_timeout`)
- Result: fetch full JSONL from result URL, preserving `prunedResult`
with bbox info for crop functionality
- Rename `api_url` → `base_url` (backward compatible: `api_url` still
accepted as fallback)

### Python (`rag/llm/ocr_model.py`)
- Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to
`paddleocr_api_url` / `PADDLEOCR_API_URL`

### Go (`internal/entity/models/paddleocr.go`)
- Add `Client-Platform: ragflow` header to submit and poll requests
- Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5,
max 15s)

### Python (`common/constants.py`)
- Add `PADDLEOCR_BASE_URL` to env keys and default config

## Backward Compatibility

- Old env var `PADDLEOCR_API_URL` still works (used as fallback)
- Frontend field `paddleocr_api_url` still works (backend reads it as
fallback)
- No user-facing configuration changes required for existing setups

## Why not use the `paddleocr` SDK package directly?

RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing
`block_bbox`, `block_label`, `parsing_res_list`) from the raw API
response for PDF crop functionality. The SDK's public `parse_document()`
API only returns `DocParsingResult` with `markdown_text`, discarding the
bbox data. Therefore we implement the async Job API flow directly via
HTTP, following the same logic as the SDK internally.
2026-06-16 19:34:21 +08:00
chanx
cac87d7f77 fix: remove unnecessary 'asChild' prop from FilterButton component (#16094)
### What problem does this PR solve?

fix: remove unnecessary 'asChild' prop from FilterButton component

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-16 17:55:04 +08:00
balibabu
ba93ac3bd7 Feat: Move less important chat settings into a collapsible panel. (#16024)
### What problem does this PR solve?

Feat: Move less important chat settings into a collapsible panel.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-06-15 19:09:19 +08:00
balibabu
fa6d29603a Fix: Adjust chat line height. (#16021)
### What problem does this PR solve?

Fix: Adjust chat line height.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-15 16:53:45 +08:00
buua436
400dfd50d8 feat: add custom value support for s3 region (#15968)
### What problem does this PR solve?
Allow S3-compatible data source region fields to accept custom values
while preserving search-and-select behavior.

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2026-06-15 11:40:28 +08:00
Zhichang Yu
3fa15c0e2f feat(agent): Go port — canvas engine, 22 components, DSL v2, 13 endpoints (#15952)
Ports the agent canvas subsystem from Python to Go.

## What's included

### Canvas Engine (Phase 0/1)
- State engine, scheduler, variable resolver, Redis checkpoint store,
cancel protocol
- **209 tests** across canvas / component / io packages

### 22 Components (P0–P4)
| Tier | Components |
|---|---|
| P0 T1+T2+T3 | LLM, Agent, ExitLoop, Switch, Categorize, Begin,
Message, Invoke |
| P1 T3 | VariableAggregator, VariableAssigner, StringTransform,
ListOperations, DataOperations |
| P2 T3 | Iteration, IterationItem, Loop, LoopItem |
| P3 T3 | UserFillUp, Fillup |
| P4 T5 | Browser, ExcelProcessor, DocsGenerator |

### DSL v2 Schema (Phase 2.5)
- Typed v2 in-memory model with v1-to-v2 auto-detect converter
- v1 legacy field stripping per plan §2.11.7

### HTTP Endpoints & Bug Fixes (Plans PR1–PR3)
- **DELETE SQL bug fix**: gorm v2 `Where("id = ?", id).Delete(...)`
pattern
- **CreateAgent validation**: title/DSL required, duplicate check, 103
envelope
- **13 new endpoints**: templates, prompts, tags, sessions CRUD,
chat/completions (SSE + non-stream stubs), rerun, test_db_connection,
logs, webhook/logs
- **756 Go unit tests** (745 → 756, +18)
- **17 → 0 Python integration test failures** (test_agents.py +
test_session_management/)

### Tools
21 eino tools: HTTPHelper, search tools, financial/data tools, mandatory
stubs

### Infrastructure
OTel observability, NATS message queue, DeepDoc gRPC client, SSRF
guards, IDOR mitigation
2026-06-12 22:58:28 +08:00
balibabu
89aac82663 Fix: chat/agent -- Default avatar is not displaying correctly. (#15948)
### What problem does this PR solve?

Fix: chat/agent -- Default avatar is not displaying correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-12 17:54:36 +08:00
Yingfeng
bae8c6f109 Improve docx preview (#15907) 2026-06-11 20:43:58 +08:00
balibabu
70ae25fc7b Fix: Remove the pagination from the search and retrieval pages. (#15942)
### What problem does this PR solve?

Fix: Remove the pagination from the search and retrieval pages.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-11 16:36:05 +08:00
monsterDavid
a851228ded fix(preview): authenticate markdown document preview requests (#15589)
## Summary

Fixes [#15585](https://github.com/infiniflow/ragflow/issues/15585).

- Route markdown preview through the shared `request` client (same as
txt/image previewers) so `Authorization` headers and interceptors are
applied consistently.
- Add a unit test covering `AUTH_BETA` token loading for embedded search
auth.

## Root cause

Search result preview for `.md`/`.mdx` used raw `fetch`, which did not
apply the same auth path as other preview types. That led to `401` on
`GET /api/v1/documents/{id}/preview` even when the user was logged in or
using an embedded search `auth` query param.

## Test plan

- [ ] Log in, run a search, open a markdown citation link — preview
loads (no 401).
- [ ] Open an embedded shared search URL with `auth` query param,
preview a markdown file — preview loads.
- [ ] Confirm PDF/txt preview still works in the same search UI.

---------

Co-authored-by: MkDev11 <89318445+bitloi@users.noreply.github.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
2026-06-11 15:46:20 +08:00
chanx
84482762d5 feat: support custom editing for model list (#15855)
### What problem does this PR solve?

feat: support custom editing for model list

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-06-09 19:24:43 +08:00
balibabu
d025e18176 Fix: Add a waiting status to the messages on the chat page. (#15773)
### What problem does this PR solve?

Fix: Add a waiting status to the messages on the chat page.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 19:17:00 +08:00
chanx
7dd4030986 fix: Resolve error when checking pipeline parsing result (#15778)
### What problem does this PR solve?

fix: Resolve error when checking pipeline parsing result

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 19:16:21 +08:00
chanx
2bd8900638 Fix: Model provider bugs (#15770)
### What problem does this PR solve?

Fix: Model provider bugs

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 17:04:05 +08:00
chanx
144abbe2eb feat: Unify the 'Add Model Provider' modal (#15768)
### What problem does this PR solve?

feat:Unify the 'Add Model Provider' modal

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-06-08 16:46:52 +08:00
balibabu
9c32b73cf7 Fix: The embedded website floating component on the agent page does not display citations. (#15767)
### What problem does this PR solve?

Fix: The embedded website floating component on the agent page does not
display citations.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-08 16:09:36 +08:00
balibabu
9c14e3f377 Fix: When adding a chat in the main interface, a warning will automatically pop up (#15685)
### What problem does this PR solve?

Fix: When adding a chat in the main interface, a warning will
automatically pop up (even if embedding and LLM model have already been
configured).
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-05 19:09:22 +08:00
chanx
a678ed7b1f Fix: Switching pagesize on a chunk page did not reset the current page. (#15401)
### What problem does this PR solve?

Fix: Switching pagesize on a chunk page did not reset the current page.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-03 15:57:57 +08:00
Julian
33ef724b5f Add Bulk action for linking Multiple Files to Datasets (#14960)
### What problem does this PR solve?

Feature: #14961 


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-06-02 12:23:33 +08:00
balibabu
f194e8b4c4 Fix: The newly added model did not appear in the drop-down menu. (#15476)
### What problem does this PR solve?

Fix: The newly added model did not appear in the drop-down menu.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-01 17:56:41 +08:00
Lynn
dc4b82523b Feat: tenant llm provider (#14595)
### What problem does this PR solve?

Python implementation of the Go-based model_provider API suite.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: bill <yibie_jingnian@163.com>
2026-05-29 17:39:41 +08:00
balibabu
187dc8a1e6 Fix: The Creativity parameter of chat was not saved. (#15243)
### What problem does this PR solve?

Fix: The Creativity parameter of chat was not saved.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-27 11:02:30 +08:00
chanx
bce11527c3 Fix: Fixed metadata issue (#15226)
### What problem does this PR solve?

Fix: Fixed metadata issue

- The dataset's built-in metadata is now active, but it appears to be
disabled in the individual file configuration.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-26 13:16:15 +08:00
balibabu
c7c75c0a87 Feat: Enable agent messages to display base64 images (#15212)
### What problem does this PR solve?

Feat: Enable agent messages to display base64 images

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-05-25 19:02:03 +08:00
balibabu
0f92353bd9 Fix: Replace the red highlight at the top of the PDF document with yellow. (#15203)
### What problem does this PR solve?

Fix: Replace the red highlight at the top of the PDF document with
yellow.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-25 17:21:36 +08:00
Ahmad Intisar
e6068a7f7e Fix: table parser metadata (#15127)
### What problem does this PR solve?

This PR improves the table upload flow for CSV/Excel files by allowing
table column role configuration at upload time.

Previously, users had to:
1. Upload and parse a table file.
2. Open parser settings and manually set table column roles.
3. Re-parse the file for the roles to take effect.

This was inefficient and required an unnecessary second parse.

With this change:
1. When the knowledge base uses table parsing, the upload dialog
extracts CSV/Excel headers client-side.
2. Users can choose Auto mode or Manual mode.
3. In Manual mode, users can assign per-column roles before upload.
4. The selected parser config is sent with the upload request and
applied server-side during document creation.

Result: configured table column roles are applied from the first parse.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
2026-05-25 16:05:38 +08:00
buua436
71a52d579c fix: move agent attachment download api (#15146)
### What problem does this PR solve?

move agent attachment download api to the correct route and update
frontend callers

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Notes

- Move the attachment download endpoint from document routes to agent
routes.
- Update frontend download callers to use the agent attachment endpoint.
- Reuse the shared file response header helper instead of duplicating it
in `agent_api.py`.
2026-05-22 15:22:05 +08:00
balibabu
1ed8a118cf Fix: The folder tree menu for moving folders cannot be scrolled. (#15037)
### What problem does this PR solve?

Fix: The folder tree menu for moving folders cannot be scrolled.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-20 14:59:36 +08:00
Magicbook1108
b69a6a5d80 Feat: full optimization on connector dashboard (#14979)
### What problem does this PR solve?

This PR improves the connector dashboard task management experience and
adds better visibility into connector execution logs.

### Overview:

#### Before
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/e4a8ed6f-2e18-4f0f-8528-41a514550052"
/>

#### Now:
<img width="700" alt="Screenshot from 2026-05-18 16-31-30"
src="https://github.com/user-attachments/assets/d4ca193b-847a-49ae-9e4f-5fbca60ea627"
/>

### 1. Add a new logging page to the connector dashboard

A new logging page has been added so users can view connector task
execution logs directly from the connector dashboard.

### 2. Merge the Resume button into Confirm

The separate **Resume** button has been removed. The **Confirm** button
now represents different actions depending on the current task state:

- **Save**: Save form changes and reschedule tasks.
- **Stop**: Cancel currently scheduled or running tasks.
- **Resume**: Create new scheduled tasks after the previous tasks have
been stopped.
- **Start**: Start tasks when no task has been started yet.

### 3. Separate syncing and pruning tasks

Connector tasks are now separated into **syncing** and **pruning**.

Pruning is controlled by the **Sync deleted files** option:

- When **Sync deleted files** is disabled, only syncing tasks are shown.
- When **Sync deleted files** is enabled, both syncing and pruning tasks
are shown.

**Now: Sync deleted files disabled**

<img width="700" alt="Sync deleted files disabled"
src="https://github.com/user-attachments/assets/dbd9232e-614a-407f-a0b1-c109e5fa567d"
/>

**Now: Sync deleted files enabled**

<img width="700" alt="Sync deleted files enabled"
src="https://github.com/user-attachments/assets/1f527f48-ccb3-4ee8-97ca-086891489296"
/>

### 4. Update logs in backend

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/10a95a3f-98c1-4e67-8afa-ddf6cda5b0b2"
/>

### 5. Remove connector resume API

- Removed: `POST /v1/connectors/<connector_id>/resume`
- Replaced by: `PATCH /v1/connectors/<connector_id>`


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-19 10:07:11 +08:00
Wang Qi
13b422037f Refactor: enhance graphrag - part 2 (#14972)
### What problem does this PR solve?
1. expose batch_chunk_token_size for configuration
2. retrieve chunks when build subgraph for the doc, not retreive all
docs chunks at the begining
3. get all chunks for a document, used to be hard coded 10000
4. delete not used method run_graphrag

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

Follow on: #14617
2026-05-18 16:10:21 +08:00
小熊
09d45046e5 Feat/web markdown UI updates (#14214)
### What problem does this PR solve?

LLM/chat and search UIs render Markdown in several places (document
preview, floating chat widget, next-search, etc.). Plugin lists and
behavior were duplicated or inconsistent, and single newlines in model
output were not always rendered as visible line breaks, which hurts
readability for chat-style content.

This PR centralizes shared **remark/rehype** configuration (including
**`remark-breaks`** for newline handling) and wires the main Markdown
surfaces to use it, so behavior is consistent and easier to maintain.


### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-05-15 22:29:44 +08:00
yingjianzh
4c68a6b86c fix(agent): pass top_k and fix similarity weight slider behavior (#14760)
### What problem does this PR solve?

This PR fixes two issues in Agent Retrieval behavior and configuration
UX:

1. `top_k` configured in Agent Retrieval was not passed down to the
backend retriever call, so retrieval could ignore the configured vector
recall limit.
2. Similarity weight slider semantics were confusing in Agent forms
because the Agent field stores `keywords_similarity_weight` while UI
interactions were interpreted as vector weight. This could cause
displayed values and actual behavior to diverge.

This PR ensures Agent retrieval uses configured `top_k`, and makes the
slider behavior consistent and explicit for both vector and keyword
weight modes.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-15 10:49:14 +08:00
balibabu
41072ed44d Feat: This enables SelectWithSearch to search by label. (#14925)
### What problem does this PR solve?

Feat: This enables SelectWithSearch to search by label.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: balibabu <assassin_cike@163.com>
2026-05-14 20:33:11 +08:00
plind
dd76653dc1 feat: add tag management for Agents with filtering and sorting (#14774) (#14799)
## Summary
Closes #14774.

Adds free-form tags on agents (UserCanvas) with full UI + API:
- Stored as comma-separated `tags` column on `UserCanvas` with online
migration.
- New endpoints: `GET /v1/agents/tags` (aggregate counts) and `PUT
/v1/agent/<id>/tags` (write). `GET /v1/agents` accepts a `tags=` query.
- "Edit tags" item in agent dropdown opens a chip-style editor dialog;
tags render as badges on each agent card.
- New "Tags" facet in the agents filter bar, with counts.

## Implementation notes
- **Tag matching is exact-token**: the SQL filter wraps stored tags as
`,…,` and matches `,ml,` so `ml` doesn't match `ml-ops`.
- **Server-side normalization** in `UserCanvasService.update_tags`:
dedup (case-insensitive), per-tag cap of 64 chars, total length capped
at 512 chars to fit the column, commas inside tag values are replaced
with spaces.
- **Tenant authorization**: `PUT /v1/agent/<id>/tags` gates on
`UserCanvasService.accessible(canvas_id, tenant_id)`.
- **Tag listing scope**: `UserCanvasService.list_tags` follows the same
own + team-shared rule as `get_by_tenant_ids`.
- **i18n**: keys added to `en.ts` and `zh.ts` only (per project
convention; other locales fall back).
- **`HomeCard`** gets a non-breaking `extra?: ReactNode` slot for the
chip row; no `src/components/ui/` files modified.

## Test plan
- [ ] Backend boot runs `migrate_db` → confirm `user_canvas.tags` column
exists (`DESCRIBE user_canvas`).
- [ ] Agents page renders cards normally (no console error from missing
field).
- [ ] `⋯ → Edit tags` opens a dialog that stays open (regression: dialog
was unmounting with the dropdown).
- [ ] Typing a tag without pressing Enter and clicking Save persists it
(regression: last typed tag was being dropped).
- [ ] Chip input supports Enter/comma to commit, Backspace on empty to
remove, `×` to remove individual chip.
- [ ] Tag containing a comma sent via API is stored with the comma
replaced by a space.
- [ ] 20 long tags sent via API does not error (length cap silently
truncates).
- [ ] "Tags" filter in the filter bar shows counts and narrows the list.
- [ ] Filtering by `ml` does **not** return agents tagged `ml-ops`.
- [ ] UI in Chinese shows 编辑标签 / 添加标签以整理和筛选你的智能体 etc.
- [ ] `PUT /v1/agent/<other-tenant-id>/tags` returns `Agent not found or
no permission.`
2026-05-13 21:41:32 +08:00
47NoahThompson
9e0f976729 Add widget customization and persistence (#14603)
Introduce comprehensive floating widget customization: add new widget
settings (title, subtitle, footer, colors, mute, streaming) with types
and defaults, and expose them via EmbedDialog UI (split into Embed Setup
and Widget Customization tabs). Persist and load settings through Agent
page by reading/writing globals and wiring an onSaveWidgetSettings
handler to setAgent; show a loading ButtonLoading for saving. Update
embed iframe query params and FloatingChatWidget to honor URL params
(colors, text, mute/streaming) with validation/normalization, color
darkening for gradients, footer link normalization, and improved
styling. Also add copy-to-clipboard in message toolbar, adjust syntax
highlighter layout and Copy button, and add i18n key for muteWidget.

### What problem does this PR solve?

Adds a few fields to the embed widget modal to customize the appearance
of the floating widget when embedded into a page.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Noah <Noah.Thompson@ecn.forces.gc.ca>
2026-05-13 21:13:11 +08:00
Ahmad Intisar
e994051eb9 Feature/generic api connector (#13545)
# feat: Add Generic REST API Connector

## What problem does this PR solve?

RAGFlow supports many specific data source connectors (MySQL, Slack,
Google Drive, etc.), but there was no way to connect an arbitrary REST
API as a data source. Users with custom or third-party APIs had to write
a new connector class for each one.

This PR adds a **generic, configuration-driven REST API connector** that
lets users connect any REST API as a data source entirely through the UI
— no code changes needed per API.

---

## Features

### Core Connector (`common/data_source/rest_api_connector.py`)

- Implements `LoadConnector` and `PollConnector` interfaces for full and
incremental sync
- **Configurable authentication:** None, API Key (custom header), Bearer
Token, Basic Auth
- **Pluggable pagination:** Page-based, Offset-based, Cursor-based, or
None
- Smart page-size inference from user's query parameters to avoid
duplicate/conflicting params
- Configurable request delay between pages to prevent API rate limiting
- Auto-detection of the items array in JSON responses (`items`,
`results`, `data`, `records`, or first list found)
- **Advanced field mapping** with dot-notation (`country.name`), array
wildcards (`newsType[*].name`), type hints, and default values
- Optional content template rendering (`"Title: {title}\nBody: {body}"`)
- HTML stripping for content fields
- Stable document IDs via `hash128` from a configurable ID field or
auto-generated from item content
- Pydantic configuration schema with automatic coercion of UI string
inputs to dicts/lists

### Backend Registration (`rag/svr/sync_data_source.py`,
`common/constants.py`, `common/data_source/config.py`)

- `REST_API` sync class wired into RAGFlow's `func_factory`
- Full sync (`load_from_state`) and incremental polling (`poll_source`)
support
- Credentials and config passed from task to connector following
existing patterns (MySQL, SeaFile, etc.)

### Test Connection Endpoint (`api/apps/connector_app.py`)

- `POST /v1/connector/<id>/test` validates config schema,
authentication, and API connectivity without triggering a sync
- Clear error messages for auth failures vs. config issues

### Frontend UI (`web/src/pages/user-setting/data-source/constant/`)

- **Postman-style configuration:** Base URL, Query Parameters (key=value
per line), Auth, Content Fields, Metadata Fields, Pagination Type
- Auth-type-aware form: fields for API key header/value, Bearer token,
or Basic username/password appear only when relevant
- **Advanced Settings** toggle for: Custom Headers, Max Pages, Request
Delay, Poll Timestamp Field, Request Body (POST)
- Connector icon (SVG) and i18n strings (English)
- **"Test Connection"** button to validate before syncing

---

## Controls & Safety

- Configurable max pages safety cap (default: 1000, adjustable in UI)
- Configurable request delay between pages (default: 0.5s, adjustable in
UI)
- Auth errors (401/403) fail immediately without retries; transient
errors retry with exponential backoff
- Diagnostic logging: auth setup confirmation, request details on
failure, content field extraction status

---

## Type of change

- [x] New Feature (non-breaking change which adds functionality)


##Visual Screenshots of Features
<img width="482" height="510" alt="Screenshot 2026-03-11 at 5 19 52 PM"
src="https://github.com/user-attachments/assets/dcb7ab4a-1622-44f3-bb02-d6f0527314c4"
/>
(Connector can be configured within the external data sources tab)

Configuration Parameters:
<img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 46 PM"
src="https://github.com/user-attachments/assets/5e154e71-4ab5-4872-bfb2-04f02b73c18a"
/>
<img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 54 PM"
src="https://github.com/user-attachments/assets/00cb14b7-0bcf-4b94-9d71-34e93369ecb2"
/>

Connection can be tested before attaching to dataset:
<img width="981" height="681" alt="Screenshot 2026-03-11 at 5 21 40 PM"
src="https://github.com/user-attachments/assets/aaa6eeeb-89a7-4349-bc34-2423bf8be9ee"
/>

Ingestion tested with API connector (works perfectly fine):
<img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 22 30 PM"
src="https://github.com/user-attachments/assets/afcd0d58-cadd-4152-badc-d2f14d96fbec"
/>

Search & Retrieval works as well with metadata flow:
<img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 23 05 PM"
src="https://github.com/user-attachments/assets/d41ee935-dcf7-4456-b317-22a76ca032c0"
/>

---------

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-13 20:35:01 +08:00
Wang Qi
76d5240fb5 Fix #14801 to allow search dataset list when add (#14841)
### What problem does this PR solve?

Fix #14801 to allow search dataset list when add, following on #14825

<img width="2172" height="857" alt="image"
src="https://github.com/user-attachments/assets/65ea7647-56f4-4c16-8437-121b834811f0"
/>


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-12 19:36:23 +08:00
CaptainTimon
2717ee283f feat(raptor): add Psi tree builder with original-space ranking and safe migration (#14679)
### What problem does this PR solve?

Closes #14674.

This PR improves RAPTOR configuration and tree construction while
preserving the existing RAPTOR behavior as the default.

RAPTOR currently builds summary layers with the original UMAP + GMM
clustering path. This PR keeps that default path, and adds:

- A hidden backend tree-builder option:
  - `tree_builder="raptor"`: default, existing RAPTOR behavior.
- `tree_builder="psi"`: rank-aware Psi-style tree builder using original
embedding-space cosine ranking.
- A user-facing clustering method option for the default RAPTOR builder:
  - `clustering_method="gmm"`: existing default.
- `clustering_method="ahc"`: agglomerative hierarchical clustering path.
- A RAPTOR UI setting for `Clustering method` and `Max cluster`.

### What changed

#### Backend

- Added `tree_builder` support for RAPTOR/Psi.
- Added `clustering_method` support for GMM/AHC.
- Kept existing RAPTOR + GMM as the default.
- Added Psi tree building from original-space cosine similarity.
- Added bucketed Psi building controls for large inputs:
  - `raptor.ext.psi_exact_max_leaves`
  - `raptor.ext.psi_bucket_size`
- Added method-aware RAPTOR summary metadata using existing
`extra.raptor_method`.
- Avoided adding a dedicated DB schema field for experimental method
tracking.
- Added cleanup/migration logic to avoid mixing stale RAPTOR summary
trees.
- Added defensive checks for Psi tree construction and summary failures.

#### Frontend/UI

- Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`.
- Added/kept `Max cluster` in RAPTOR settings.
- Enlarged max cluster UI limit to `1024`, matching backend validation.
- Kept AHC editable even when a RAPTOR task has already finished.
- Fixed the UI save payload so `clustering_method` and `tree_builder`
are serialized through `parser_config.raptor.ext`, avoiding backend
validation errors for extra top-level RAPTOR fields.

Example saved RAPTOR config:

```json
{
  "raptor": {
    "max_cluster": 317,
    "ext": {
      "clustering_method": "ahc",
      "tree_builder": "raptor"
    }
  }
}

Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>
2026-05-12 09:42:31 +08:00
Nie WeiYang
1e80be77a2 fix(web): fix incomplete Docx preview in citation reference (#14122)
This PR fixes a UI issue where the .docx document preview was displayed
incompletely when clicking on a citation/reference link during a
knowledge base conversation.

### What problem does this PR solve?

The Issue:
In the chat interface, when a user clicks the source citation at the end
of an answer, the DocPreviewer opens. However, for .docx files, if the
content exceeded the window height, it was truncated and unscrollable,
preventing users from reading the full referenced text.

Changes:
web/src/components/document-preview/doc-preview.tsx: Added the
overflow-auto Tailwind class to the DocPreviewer root container to
ensure scrollbars appear automatically when content overflows.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: nie.weiyang <nie.weiyang@embedway.com>
2026-05-11 16:17:48 +08:00
Wang Qi
3838770e7a GraphRAG feature - Part 1 - add spacy to extract entity and relation (#14670)
### What problem does this PR solve?

GraphRAG feature - Part 1 - add spacy to extract entity and relation

<img width="1621" height="1288" alt="image"
src="https://github.com/user-attachments/assets/aadeddad-94da-46c6-adad-9c3784181f61"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-11 12:59:59 +08:00
很拉风的James
6cb4bc2947 Fix: Radio.Group cloneElement crashes on non-element children (#14407)
### What problem does this PR solve?

`Radio.Group` in `web/src/components/ui/radio.tsx` injects the parent's
`disabled` prop into each child via `React.cloneElement` with
`as React.ReactElement` and no validation.

This throws at runtime when a consumer passes strings, numbers, `null`,
`false`, or other non-element nodes, while the cast hides the unsafe
access from TypeScript.

Use `React.isValidElement<RadioProps>(child)` as a type guard before
calling `cloneElement`. Non-element children pass through unchanged,
and `child.props` access becomes type-checked without an `as` cast.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-11 09:54:42 +08:00
chanx
8ac14b597f Fix: Some bugs (#14734)
### What problem does this PR solve?

Fix: Some bugs
- Error during batch modification of metadata in the Knowledge Base
- Manually configured metadata is not displayed in search settings

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-09 17:40:22 +08:00
buua436
de2abe9ed8 Fix: tag parser id (#14724)
### What problem does this PR solve?
tag parser id
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-09 14:29:09 +08:00
Tim Wang
1bcb6deb6f Fix: collapsible thinking display and separate deep research retrieval tag (#14613)
## Summary

- **Collapsible thinking**: Replace `<section>` with `<details>` for
`<think>` content, so model thinking output is collapsed by default
(click to expand). Works for all models that output `<think>` tags
(Qwen3, DeepSeek, Gemini, Claude, etc.).
- **Fix double thinking tags**: When reasoning/deep research mode is
enabled in knowledge base chat, both the retrieval progress and model
thinking were wrapped in `<think>` tags, producing two "Thinking..."
blocks. Now retrieval progress uses a dedicated `<retrieving>` tag
rendered as a separate "Retrieving..." collapsible with a distinct green
accent.

### Before
- Thinking content displayed as flat gray-bordered `<section>`,
occupying significant screen space
- Deep research + model thinking both use `<think>` → two identical
"Thinking..." blocks

### After
- Thinking content collapsed by default in a `<details>` element, click
"Thinking..." to expand
- Deep research shows "Retrieving..." (green border), model thinking
shows "Thinking..." (gray border)

## Changes

**Backend (`api/db/services/dialog_service.py`)**
- Deep research callback: replace `start_to_think`/`end_to_think` marker
flags with direct `<retrieving>`/`</retrieving>` answer text

**Frontend**
- `web/src/utils/chat.ts`: `replaceThinkToSection()` now uses
`<details>` instead of `<section>`; add new
`replaceRetrievingToSection()`
- 4 tsx files: import and pipe `replaceRetrievingToSection`, whitelist
`details`, `summary`, `retrieving` in DOMPurify `ADD_TAGS`
- 4 less files: `section.think` → `details.think` with `<summary>`
styles; add `details.retrieving` with green accent; dark mode and RTL
variants

## Test plan
- [ ] Open a chat WITHOUT knowledge base, ask a question to a model with
thinking (e.g. Qwen3) → thinking content should be collapsed by default,
click "Thinking..." to expand
- [ ] Open a chat WITH knowledge base and reasoning enabled, ask a
question → "Retrieving..." (green) shows retrieval progress,
"Thinking..." (gray) shows model thinking, each independently
collapsible
- [ ] Verify dark mode renders correctly for both collapsible blocks
- [ ] Verify RTL layout renders correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: wanghualoong <wanghualoong@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-08 14:40:00 +08:00
buua436
f703169117 Refa: migrate document preview/download to RESTful API (#14633)
### What problem does this PR solve?

migrate document preview/download to RESTful API

### Type of change
- [x] Refactoring
2026-05-08 13:26:13 +08:00