ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Tim Wang	f0f10b6092	Fix: UserFillUp interactive forms not working in agent explore mode (#14589 ) ## Summary - Backend: `_iter_session_completion_events` in `agent_api.py` was filtering out `user_inputs` and `workflow_finished` SSE events, causing agents with UserFillUp components to silently fail in explore mode — the interactive form never appeared, while the same agent worked correctly in run (editor) mode. - Frontend: `SessionChat` component in explore mode was missing `DebugContent` children rendering inside `MessageItem`, so even if the backend forwarded the events, the form UI would not render. Added `DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and input-disabling logic to match the run mode's `chat/box.tsx` behavior. ## What was changed ### Backend (`api/apps/restful_apis/agent_api.py`) - Line 266: Added `"user_inputs"` and `"workflow_finished"` to the allowed event filter in `_iter_session_completion_events` ### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`) - Added imports: `DebugContent`, `MarkdownContent`, `useAwaitCompentData`, `useParams` - Added `sendFormMessage` from `useSendSessionMessage()` hook - Added `useAwaitCompentData` hook for form state management - Added `DebugContent` as `MessageItem` children for the latest assistant message (renders UserFillUp form) - Added `MarkdownContent` + submitted values display for previous assistant messages - Updated `NextMessageInput` disabled states to respect `isWaitting` (form submission in progress) ## Test plan - [x] Agent with UserFillUp component (e.g., email draft with send/edit/cancel options) shows interactive form in explore mode - [x] Same agent continues to work correctly in run (editor) mode - [x] Form submission sends data back to the agent and workflow continues - [x] Input field is disabled while waiting for form submission - [ ] Agents without UserFillUp components are unaffected in explore mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:17 +08:00
kpdev	212429bf9d	fix(api): gate sandbox artifact download on agent session ownership (#16169 ) Fixes #16168 ## Summary - Add session-scoped authorization for `GET /api/v1/documents/artifact/<filename>` - Allow download only when the artifact filename appears in the caller's `api_4_conversation` message and `UserCanvasService.accessible(dialog_id, user_id)` passes - Deny with generic `"Artifact not found."` before storage access (no cross-user enumeration) - Return 4xx when the blob is missing (existing behavior preserved) ## Approach Sandbox artifacts are runtime CodeExec outputs, not KB documents — this uses the same session gate pattern as `agent_chat_completion`, not `DocumentService.accessible`. ## Test plan - [x] Unit: denied when filename not referenced in user sessions - [x] Unit: denied when agent canvas is not accessible - [x] Unit: authorized user receives bytes; missing blob returns `"Artifact not found."` - [ ] `pytest test/testcases/test_web_api/test_document_app/test_document_metadata.py -k get_artifact` --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Renzo	6079ded70b	fix: require explicit anonymous webhook access (#14890 ) ### What problem does this PR solve? Fixes #14882 Agent webhook execution currently fails open when the saved webhook `security` block is missing/empty, or when `auth_type` is set to `none`. This allows unauthenticated webhook invocation without an explicit operator opt-in. This PR makes anonymous webhook access explicit: - Rejects missing or empty webhook security config. - Requires `allow_anonymous: true` when `auth_type` is `none`. - Preserves explicit anonymous webhooks by having the frontend serialize `allow_anonymous: true` when the user selects `None` auth. - Updates webhook unit tests to cover both denied implicit-anonymous configs and allowed explicit-anonymous configs. ### Type of change - [x] Bug Fix - [x] Security hardening - [x] Test ### Tests - [x] `ZHIPU_AI_API_KEY=dummy uv run python -m pytest --confcutdir=test/testcases/test_web_api/test_agent_app test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py` - [x] `uv run ruff check api/apps/restful_apis/agent_api.py test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py` - [x] `npm exec eslint src/pages/agent/utils.ts src/pages/agent/form/begin-form/schema.ts` --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
philluiz2323	43a9d53c72	fix(agent): enforce tenant ownership on agentbots completions/inputs (#15457 ) ### What problem does this PR solve? Fixes #15456. The SDK agent-bot routes `POST /api/v1/agentbots/<agent_id>/completions` and `GET /api/v1/agentbots/<agent_id>/inputs` (`api/apps/restful_apis/bot_api.py`) authenticate the caller with a beta API token — which only yields the caller's `tenant_id` — but then load and run the agent named in the URL without verifying the agent belongs to the caller's tenant. `UserCanvasService.get_agent_dsl_with_release` even accepts a `tenant_id` it never uses, and `begin_inputs` calls `get_by_id` directly. Any holder of a single valid beta token could therefore run another tenant's agent (leaking its DSL/prompts/tool config) or read another tenant's agent metadata and begin input form, just by substituting a victim `agent_id`. This PR adds the project's existing ownership gate, `UserCanvasService.accessible(agent_id, tenant_id)`, to both endpoints right after token authentication — mirroring the checks already enforced on the equivalent first-party routes in `api/apps/restful_apis/agent_api.py` (lines 75/578/775) and on the sibling `chatbot_completions` / `create_agent_session` / `delete_agent_session` handlers in the same file. On failure it returns the same `Can't find agent by ID: <id>` message already used by `begin_inputs`, so it does not reveal whether an `agent_id` exists in another tenant. Added a regression test (`test/unit_test/api/apps/restful_apis/test_agentbots_access_control.py`, following the existing stubbed-loader pattern from `test_get_agent_session.py`) asserting that an inaccessible `agent_id` is rejected before the agent is loaded (`begin_inputs`) or executed (`completions`), and that an accessible agent still proceeds. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Rene Arredondo	7ecc0908ef	fix(agent): authenticate "Thinking" button in shared/embedded chat via beta token (#14985 ) (#15238 ) ## Summary Fixes #14985 — clicking the Thinking button in a shared/embedded chat returns 401 and bounces the user to the login page, even though the same share page can chat with the agent just fine. ## Root cause In shared chat, `useGetSharedChatSearchParams` binds `conversationId` to the URL's `shared_id` query param — which is the beta APIToken, not the real agent id. That `conversationId` propagates through the component tree: ```tsx <WorkFlowTimeline canvasId={conversationId}> → useFetchMessageTrace(canvasId) → GET /api/v1/agents/<sharedId>/logs/<messageId> ``` But `/agents/<agent_id>/logs/<message_id>` is decorated with `@login_required` (`api/apps/restful_apis/agent_api.py:842-846`). The share page only holds the beta token — there is no session JWT — so the request 401s and quart-auth redirects to the login page. The reporter's server log matches exactly: ``` load_user from jwt got exception No b'.' found in value load_user: No APIToken found for token=ULG10SWG3E... Unauthorized request (quart_auth) GET /api/v1/agents/394013f8d42211f0bad6123fa55e8ed9/logs/96fd72e2-... 1.1 401 ``` The `394013f8...` segment in the URL is the `shared_id` (beta token), not an actual agent id. `_load_user` already accepts the regular `APIToken.token` field, but not `APIToken.beta`, by design — beta is a much weaker share-link credential than a personal API key. The sibling endpoints `/agentbots/<id>/completions` and `/agentbots/<id>/inputs` already use the right auth pattern for this scope (beta-token via `_get_sdk_authorization_token` → `APIToken.query(beta=token)`). Trace just didn't have a parallel. ## Fix ### Backend (`api/apps/restful_apis/bot_api.py`) Added a beta-token sibling endpoint: ``` GET /api/v1/agentbots/<shared_id>/logs/<message_id> ``` - Same auth shape as the existing `agentbots` endpoints. - The `<shared_id>` path segment is a client-supplied label only. The real `agent_id` used to build the Redis key (`<agent_id>-<message_id>-logs`) is taken from `APIToken.dialog_id` on the looked-up token, so the endpoint never trusts client-supplied identifiers for the data lookup. - Returns the same `{data: ...}` shape as the existing `/agents/<id>/logs/<message_id>` endpoint, so the frontend doesn't need to reshape the response. ### Frontend - `web/src/utils/api.ts`: added `sharedTrace(sharedId, messageId)` URL builder. - `web/src/services/agent-service.ts`: added `fetchSharedTrace({ shared_id, message_id })`. - `web/src/hooks/use-agent-request.ts`: `useFetchMessageTrace` takes an optional `isShare` argument. When set, it calls `fetchSharedTrace`; `isShare` is also folded into the `queryKey` so the two modes never share cached results. - `web/src/pages/agent/log-sheet/workflow-timeline.tsx`: forwards the already-existing `isShare` prop into the hook. All other existing call sites of `useFetchMessageTrace` (webhook timeline, pipeline log, dataflow result) pass no `isShare` argument → undefined → falsy → unchanged behavior. ## Test plan - [ ] In the regular Agent UI (logged-in user): open the trace / log sheet for any message and click into "Thinking" — the timeline should still load via `/agents/<id>/logs/<msg>`, same as before. - [ ] From the Agent page, click Chat in new tab to open `/chat/share?shared_id=<token>&from=agent`. Send a message, wait for a response, then click Thinking on the assistant turn. The trace panel should load instead of redirecting to the login page. - [ ] Same flow but with the agent embedded in an iframe ("Embed into webpage") — confirm there is no login redirect. - [ ] In DevTools → Network, confirm the share-chat trace request goes to `/api/v1/agentbots/<sharedId>/logs/<msgId>` and returns 200 with the same JSON shape as the logged-in path. - [ ] Confirm the chat completions, inputs, and upload flows in the share page still work — they were not touched. - [ ] Send a bogus / expired beta token to the new endpoint and confirm it returns the standard "Authentication error: API key is invalid!" response (no traceback, no 500). - [ ] Run `uv run pytest` to make sure no existing tests regress. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
jony376	7b81f63653	fix(agent): bind session_id to path agent_id on GET/DELETE agent sessions (#15374 ) ## Related issues Closes #15128 ### What problem does this PR solve? `GET` and `DELETE` `/api/v1/agents/<agent_id>/sessions/<session_id>` verified canvas access for `agent_id` in the URL but loaded/deleted sessions only by `session_id`, without checking `conv.dialog_id == agent_id`. Any user with access to any agent could read or delete another agent's `API4Conversation` session (messages, references, DSL, etc.) when they knew the session UUID. Agent completions in the same file already enforce this binding; chat sessions do too — these two routes were inconsistent. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/agent_api.py` \| Require `conv.dialog_id == agent_id` in `get_agent_session` and `delete_agent_session_item`; return generic `"Session not found!"` on mismatch \| \| `test/unit_test/api/apps/restful_apis/test_get_agent_session.py` \| Add IDOR regression tests for GET/DELETE; fix success fixture to include `dialog_id`; track `delete_by_id` calls \| ### Test plan - [x] Unit tests added for GET/DELETE IDOR and success paths - [ ] `pytest test/unit_test/api/apps/restful_apis/test_get_agent_session.py` Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Zhichang Yu	faef22c18a	Harden closed-advisory fixes (#16409 ) ## Summary - harden reopened advisory fixes across REST connector, invoke, document downloads, and markdown rendering - add targeted regression coverage for redirect-safe SSRF handling, invoke SSRF checks, document access control, and markdown sanitization - verify each referenced GHSA against the original GitHub advisory text and align the closed-advisory plan with the implemented remediation ## What changed - add tenant access checks to document download endpoints to avoid cross-tenant document disclosure - add per-hop SSRF validation, DNS pinning, redirect handling, and redirect limits to the REST API connector - ensure invoke requests validate and pin the resolved host and never follow redirects implicitly - keep the generic rate-limited request path wrapped, not just GET and POST helpers - sanitize markdown HTML before rendering in the highlight markdown component ## Validation - `cd web && npm test -- --runInBand src/components/highlight-markdown/__tests__/index.test.tsx` - `.venv/bin/python -m pytest -q test/unit_test/data_source/test_rest_api_connector.py` - targeted `test/testcases/test_web_api/...` unit additions were reviewed, but the suite cannot be executed end-to-end in this environment because parent `test/testcases/conftest.py` requires a local service on `127.0.0.1:9380` ## Notes - all GHSA entries referenced by the plan were checked against the original GitHub advisory text, not sampled - the closed-advisory plan document was updated locally during review, but is intentionally not included in this PR	2026-06-29 09:45:16 +08:00
Zhichang Yu	195bfffb5e	fix(security): address 93 CodeQL code-scanning alerts across 61 files (#16407 ) ## Summary Resolves all 93 open alerts at https://github.com/infiniflow/ragflow/security/code-scanning by rule: \| Rule \| Count \| Treatment \| \|------\|-------\|-----------\| \| py/clear-text-logging-sensitive-data \| 23 \| Real fix — log scrubbing \| \| go/path-injection \| 15 \| Real fix where possible, suppression with rationale \| \| go/request-forgery \| 8 \| Suppression with rationale (operator-controlled URLs) \| \| go/clear-text-logging \| 10 \| Real fix — log scrubbing \| \| go/unsafe-quoting \| 5 \| Real fix — escape or refactor \| \| go/sql-injection \| 3 \| Real fix — orderby whitelist + CodeQL comment \| \| go/uncontrolled-allocation-size \| 2 \| Real fix — cap to 1024 \| \| go/incorrect-integer-conversion \| 3 \| Real fix — ParseInt + range check \| \| go/insecure-hostkeycallback \| 1 \| Real fix — known_hosts file \| \| go/disabled-certificate-check \| 2 \| Suppression with rationale \| \| go/command-injection \| 1 \| Suppression (sanitized via shq()) \| \| go/email-injection \| 1 \| Suppression with rationale \| \| go/cookie-httponly-not-set \| 1 \| Suppression (SPA bootstrap) \| \| js/stack-trace-exposure \| 1 \| Real fix — generic client message \| \| js/prototype-pollution-utility \| 1 \| Real fix — reject __proto__/constructor/prototype \| \| py/weak-sensitive-data-hashing \| 1 \| Real fix — MD5 → SHA-256 \| \| py/incomplete-url-substring-sanitization \| 3 \| Real fix — urlparse(hostname) \| \| py/paramiko-missing-host-key-validation \| 1 \| Real fix — load_system_host_keys + RejectPolicy \| \| cpp/integer-multiplication-cast-to-long \| 2 \| Real fix — cast to size_t \| ## Real fixes (with measurable security improvement) SSH host key verification (Go + Python) Replace `InsecureIgnoreHostKey()` / `paramiko.AutoAddPolicy()` with proper host key verification against a known_hosts file (configurable via `SSH_KNOWN_HOSTS` env / `known_hosts` config field; fail-closed when unset). Loads `~/.ssh/known_hosts` first via `load_system_host_keys()` so existing setups keep working. SQL injection in `user_canvas` Add `userCanvasOrderableColumns` whitelist + `userCanvasOrderClause` helper. Both `GetList()` and `ListByTenantIDs()` now route the user-supplied `orderby` query param through the helper, defaulting to `create_time` on miss. SQL injection in `pipeline_operation_log` Existing whitelist documented via CodeQL comment. Real SQL injection in `infinity/chunk.go:931` Escape `'` → `''` on user-controlled `questionText` before splicing into `filter_fulltext(...)` SQL filter. Real SQL injection in `elasticsearch/sql.go:75` Defense-in-depth escape on tokenizer output before splicing into `MATCH(...)`. Python code injection in `result_protocol.go` Replace raw JSON literal embedding into Python/JS expressions with base64 + `json.loads` / `JSON.parse(Buffer.from(..., 'base64').toString('utf8'))`. Eliminates both the unsafe-quoting sink and the brittleness of mixing JSON true/false/null with Python syntax. URL substring check bypass in `embedding_model.py` Replace `if "dashscope-intl.aliyuncs.com" in u` with `urlparse(u).hostname == "dashscope-intl.aliyuncs.com"` so a base_url like `https://attacker.example/?u=dashscope-intl.aliyuncs.com` cannot bypass the routing. Prototype pollution in `setNestedValue` (TS) Reject `__proto__`/`constructor`/`prototype` keys before any assignment. Integer overflow - scrypt params via `ParseInt` + non-positive check (`internal/common/password.go`) - `topN` and `n` caps to 1024 (retrieval_service.go, dataset.go) - `nallocstatesize` cast to `size_t` (cpp/re2/onepass.cc) Cookie httponly* Set explicitly with rationale: this is the OAuth bootstrap cookie intentionally read by the SPA. Stack trace exposure Replace `error.message` in HTTP 500 response with generic `"internal error"`; full error still logged server-side via `console.error`. Weak hashing MD5 → SHA-256 for deterministic `conv_id` derivation (`conversation_service.py`). Log scrubbing Remove or redact user-controlled / sensitive content from clear-text logs across 8 ingestion parsers, `llm_service.py` ×11, `tenant_llm_service.py` ×7, `misc_utils.py` ×4, `redis_conn.py` ×10, `conftest.py` ×4, `init_data.py`, `dataset_api_service.py`, `generator.py`, `mysql_migration.py`, `cli.go`, `user_command.go`, `pdf_parser.go`. Most patterns converted to parameterized logging (`logging.info("...: %d", n)`) or static messages. ## CodeQL suppressions (each with rationale) For alerts where the data flow is genuinely safe but CodeQL can't see the context — operator-controlled URLs, sanitized inputs, etc. — I added `// codeql[go/<rule>] <rationale>` annotations rather than dismissing them, so future readers can audit the rationale inline: - `internal/agent/component/invoke.go:135` — Invoke is a generic canvas HTTP client - `internal/service/langfuse.go` ×2 — host is per-tenant operator config - `internal/service/file.go:1184` — already SSRF-guarded by `assertURLSafe` - `internal/utility/mcp_client.go` ×3 — already `AssertURLSafe` + IP-pinned - `internal/entity/models/bedrock.go` — sigv4-signed request, URL can't be tampered - `internal/service/deep_researcher.go:269` — `callback` is SSE display string, not SQL - `internal/engine/infinity/chunk.go:346` — UUIDs can't contain `'` (RFC 4122) - `internal/cli/common_command.go` ×2 — CLI trusts operator-configured URL - `internal/utility/smtp.go:194` — msg is server-built, not user form input - `internal/entity/models/*` ×14 (path-injection) — audio file paths are caller-supplied ## Test plan - ✅ All 13 modified Go packages build cleanly - ✅ 663 tests pass across `internal/agent/sandbox`, `internal/common`, `internal/agent/component`, `internal/engine/infinity`, `internal/dao` - ✅ All 11 modified Python files parse via `ast.parse` - ✅ TypeScript `tsc --noEmit` clean on the modified `use-provider-fields.tsx` - ✅ `node --check` clean on the modified JS file 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-06-29 09:45:16 +08:00
Zhichang Yu	477f2fcebd	feat[Go]: port agent webhook trigger, agent file upload/download, component input-form + debug endpoints from Python (#16403 ) port agent webhook trigger, agent file upload/download, component input-form + debug endpoints from Python - [x] New Feature (non-breaking change which adds functionality)	2026-06-29 09:45:16 +08:00
Zhichang Yu	f58fae5fb7	feat(go-agent): Ported retrieval node, added Keenable web search tool (#16396 ) Ported retrieval node, added Keenable web search tool - [x] New Feature (non-breaking change which adds functionality)	2026-06-29 09:45:16 +08:00
Wang Qi	638b59fbcd	Fix handle move file failed (#16384 ) Follow on PR: #16350	2026-06-26 18:46:21 +08:00
Wang Qi	ac9469e5f5	Fix add VLLM without apikey will fail (#16352 )	2026-06-25 17:17:29 +08:00
buua436	479a9a715e	feat: unify provider id or name routing (#16336 )	2026-06-25 13:04:21 +08:00
kpdev	68d2ca0ff1	fix(api): use dataset-owner tenant for legacy /chunks docstore cleanup (#15961 )	2026-06-24 14:24:40 +08:00
Ambercssa	e9cdd09b67	fix(agent): handle different reference data formats (#16276 )	2026-06-24 13:33:59 +08:00
buua436	ba4021a9de	fix: restore dataflow rerun and detail payload (#16292 )	2026-06-24 13:06:06 +08:00
Wang Qi	a4f325be24	Fix: add /v1/document/upload_info -> /api/v1/documents/upload back (#16264 )	2026-06-23 17:47:55 +08:00
buua436	aba5d172bd	feat: add whatsapp web qr chat channel (#16238 ) Adds a WhatsApp chat channel backed by a QR-based web login flow so users can connect without manual token setup.	2026-06-23 17:45:31 +08:00
Wang Qi	5ca1686ac7	Fix that agent cannot be the same name (#16192 ) Fix that agent cannot be the same name	2026-06-18 19:10:21 +08:00
euvre	9bd53ce675	fix: return full record in get_ingestion_log (#16120 ) ### What problem does this PR solve? The `get_ingestion_log` endpoint (both Python `dataset_api_service.get_ingestion_log` and Go `DatasetService.GetIngestionLog`) was returning only the dataset-level field set, which omits critical fields such as `dsl`, `document_id`, `parser_id`, `document_name`, `pipeline_id`, etc. This caused the front-end dataflow-result page to be unable to render the pipeline timeline and chunks when viewing a single ingestion log, regardless of whether the log was a dataset-level operation (graph/raptor/mindmap) or a per-file parse. ### Background `PipelineOperationLogService` provides two field sets: \| Method \| Fields \| \|---\|---\| \| `get_dataset_logs_fields` \| Minimal set (progress, status, timestamps, etc.) \| \| `get_file_logs_fields` \| Superset — includes `document_id`, `dsl`, `parser_id`, `document_name`, `pipeline_id`, … \| When listing logs, the API correctly distinguishes dataset-level vs file-level logs and uses the appropriate converter. However, when fetching a single log by ID, both the Python and Go implementations were hardcoded to the dataset-level set, dropping the extra fields that the front-end needs.	2026-06-17 13:03:51 +08:00
Wang Qi	8067e97f0d	Refactor: rename /chat_channels to /chat-channels (#16099 )	2026-06-16 19:15:43 +08:00
Kevin Hu	15f50e5cb2	fix: rename dialog_id to chat_id in chat_channel (backend + frontend) (#16096 ) ## Summary - The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id` via a migration (added in a prior commit). - Aligns the REST API layer (`chat_channel_api.py`, `chat_channel_service.py`) to use `chat_id` consistently. - Updates the frontend (`interface.ts`, `hooks.ts`, `connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write `chat_id` instead of `dialog_id`. - The joined `dialog_name` alias in the list query is unchanged (backend still returns it under that name). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-16 19:02:20 +08:00
Lynn	b4a161b50e	Fix: filter unsupported model_type (#16062 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 13:15:42 +08:00
Kevin Hu	5a817762fa	Refactor: Change table chat_channel status data type. (#16061 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring	2026-06-16 12:02:12 +08:00
buua436	8e235b7b95	fix: add legacy chat/completions mode (#16014 ) ### What problem does this PR solve? Adds a legacy mode for /chat/completions that restores v0.23.0-style output by converting start_to_think/end_to_think back into raw <think></think> markers and streaming cumulative answer text. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-16 10:34:06 +08:00
Lynn	47495c1f6a	Feat: model provider (#16028 ) ### What problem does this PR solve? Feat: - Allow upsert model_type for instance model Fix: - Allow create instance with duplicate api_key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-06-15 19:10:33 +08:00
dripsmvcp	53d4d9b3bd	fix(api): return 4xx not 500 when attachment blob is missing (#15509 ) Guard the agent-attachment download against a missing or empty storage blob so the caller gets a structured 4xx (`Document not found!`) instead of an HTTP 500. Same bug class as #15365 on document preview. Resolve #15502	2026-06-15 15:41:49 +08:00
Yingfeng	b5bea72e4b	Add git-like file commit API (#15978 ) ### What problem does this PR solve? \| # \| Method \| Endpoint \| Description \| Git Equivalent \| \|---\|--------\|----------\|-------------\|----------------\| \| 1 \| `POST` \| `/api/v1/{prefix}/{folder_id}/commits` \| Create a snapshot commit with file changes (add/modify/delete/rename) \| `git add` + `git commit` \| \| 2 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits` \| List commit history (paginated) \| `git log` \| \| 3 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` \| Get commit detail with file changes \| `git show` \| \| 4 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` \| List file changes in a commit \| `git show --name-status` \| \| 5 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` \| Compare two commits and return differences \| `git diff` \| \| 6 \| `GET` \| `/api/v1/{prefix}/{folder_id}/changes` \| Get uncommitted changes (add/modify/delete) \| `git status` \| \| 7 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` \| Get the folder tree snapshot at commit time \| `git ls-tree` \| \| 8 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content` \| Get a file's content as it existed in a specific commit \| `git show HEAD:file` \| \| 9 \| `GET` \| `/api/v1/{prefix}/{file_id}/versions` \| Get version history for a specific file across all commits \| `git log -- file` \| Where `{prefix}/{id}` can be: - `folders/{folder_id}` — direct folder access - `workspaces/{workspace_id}` — alias of `folders/{folder_id}` - `datasets/{dataset_id}` — resolves to the dataset's folder - `memories/{memory_id}` — resolves to the memory's folder - `skills/{skill_id}` — resolves to the skill's folder ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-06-15 11:19:56 +08:00
Wang Qi	59d4203947	Fix last login time (#16004 ) Fix last login time	2026-06-15 10:06:24 +08:00
Kevin Hu	b5a426e6e0	Feat: chat channels — connect assistants to external messaging bots (#15850 ) ### What problem does this PR solve? #15844 Adds a Chat channels capability so a RAGFlow assistant (Dialog) can be exposed as a bot on external messaging platforms (Feishu/Lark, Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot in the UI, connects it to an assistant, and inbound messages are answered from that assistant's knowledge base — replies are delivered back on the channel. Feishu/Lark is implemented and tested end-to-end. Discord, Telegram, LINE, and WeCom are scaffolded against the same interface; the remaining listed channels are tracked as follow-ups. ### Design Backend - New `chat_channel` table (`tenant_id`, `name`, `channel`, `config` JSON holding `{credential: {...}}`, `dialog_id`, `status`) + `ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`. - Channel framework under `api/channels/`: a `core` registry + per-channel packages that self-register a builder and implement a common `Channel` interface (`start`/`stop`/`send` + inbound normalization) over `IncomingMessage`/`OutgoingMessage`. - Embedded reconcile loop in `ragflow_server` (`api/channels/bootstrap.py`): loads enabled bots, and starts/stops/restarts them as rows change (no server restart needed). Inbound messages run the connected dialog via the non-streaming completion path, keeping per-end-user conversation history. - Missing optional channel SDKs degrade gracefully (channel skipped with a warning; others unaffected). Channel-level errors are logged, not crashed. - Feishu's WebSocket client runs in a dedicated thread with its own event loop to avoid cross-loop/contextvars conflicts with the channel runtime. Frontend - Settings → Chat channels panel: available-channels grid + configured-bots list with add/edit/delete and a Connect assistant popup that binds a bot to a dialog. - Brand icons via simple-icons / reused shared data-source assets, with colored fallbacks for brands not available. - Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix so the settings page no longer highlights the Chat tab. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Notes - DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id` is also covered by a `migrate_db` `alter_db_add_column` for existing installs. - Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`, `line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies. - Screenshots / per-channel credential docs to follow. <img width="1338" height="1290" alt="Image" src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704" /> <img width="1344" height="738" alt="Image" src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617" /> <img width="672" height="887" alt="Image" src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84" /> ---------	2026-06-12 18:21:30 +08:00
Carl Harris	a2de880b6d	fix(profile): enforce profile name validation and input constraints (#15694 ) ### What problem does this PR solve? The Profile Name field currently lacks application-level validation and allows users to save excessively long names and unsupported special characters. While the database enforces a maximum length of 100 characters, neither the frontend nor backend validates nickname format before persistence. This can result in inconsistent user data, poor user experience, and UI layout issues when long names wrap across multiple lines. This PR introduces consistent frontend and backend validation for profile names, enforces length and character constraints, provides clear validation feedback, and prevents invalid values from being saved. Fixes #15693 ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-12 11:13:18 +08:00
Jonathan Chang	de06c9a60b	feat: Langfuse session grouping for multi-turn chat traces (#15679 ) ## Summary This PR passes `session_id` into Langfuse trace observations so multi-turn chat messages can be grouped under the same session in Langfuse. Changes include: - Propagate `session_id` from chat/session APIs into `dialog_service.async_chat`. - Pass `session_id` into Langfuse `start_observation(...)`. - Share Langfuse `trace_context` with chat, embedding, rerank, and TTS model bundles where applicable. - Add unit coverage to verify Langfuse observations receive `session_id`. - Update affected test stubs for the new optional Langfuse context arguments. ## Related Issue Closes: #15636 ## Change Type - [x] Feature - [x] Bug fix - [x] Test - [ ] Refactor - [ ] Documentation - [ ] Breaking change ## Real Behavior Proof Before this change: - Langfuse observations were created without `session_id`. - Multi-turn chat traces could not be grouped by session in Langfuse. After this change: - Chat/session flows pass `session_id` into `async_chat`. - Langfuse observations include `session_id`. - Related model bundles receive shared trace context and session metadata. Validation result: ```bash uv run python -m py_compile \ api/db/services/tenant_llm_service.py \ api/db/services/llm_service.py \ api/db/services/dialog_service.py \ api/db/services/conversation_service.py \ api/apps/restful_apis/chat_api.py \ test/unit_test/api/db/services/test_dialog_service_final_answer.py \ test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py ``` Passed. ```bash uv run pytest \ test/unit_test/api/db/services/test_dialog_service_final_answer.py \ test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py -q ``` Result: ```text 11 passed in 16.89s ``` ```bash git diff --check ``` Passed. ## Checklist - [x] Analyzed the issue requirement. - [x] Checked existing Langfuse trace integration. - [x] Implemented only the requested session grouping behavior. - [x] Added/updated unit tests. - [x] Ran focused tests successfully. - [x] Ran Python compile validation. - [x] Ran whitespace diff validation.	2026-06-12 10:18:06 +08:00
balibabu	70ae25fc7b	Fix: Remove the pagination from the search and retrieval pages. (#15942 ) ### What problem does this PR solve? Fix: Remove the pagination from the search and retrieval pages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 16:36:05 +08:00
jaso0n0818	2971849783	fix: guard docStoreConn.delete with index_exist in parse and stop_parsing (#15876 ) ## What problem does this PR solve? Closes #15874 Both the `POST /api/v1/datasets/<dataset_id>/chunks` (re-parse) and `DELETE /api/v1/datasets/<dataset_id>/chunks` (stop-parsing) handlers called `settings.docStoreConn.delete` unconditionally. When the tenant/dataset index has not been created yet — fresh dataset, first parse interrupted before any chunks were indexed, or index manually removed — the delete call throws and the handler returns HTTP 500 after the document state was already mutated (RUNNING with zeroed counters for the parse path; CANCEL with zeroed counters for the stop path), leaving the document in an inconsistent state. The newer `parse_documents` path in `document_api.py` already uses `index_exist` before deleting: ## How to fix? Apply the same `index_exist` guard to both call sites in `chunk_api.py`: - `parse` (POST path, line ~192): guard the delete before `TaskService.filter_delete`. - `stop_parsing` (DELETE path, line ~242): guard the delete after `DocumentService.update_by_id`. Both sites already have the correct `search.index_name(tenant_id)` and `dataset_id` parameters; the guard is a one-line addition at each site. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 16:30:03 +08:00
kpdev	de18313f97	fix(api): POST /documents/stop removes partial chunks and resets counters (#15789 ) ### What problem does this PR solve? `POST /api/v1/datasets/{dataset_id}/documents/stop` (`stop_parse_documents`) cancels parsing tasks and sets `run` to `CANCEL`, but it does not remove chunks already indexed in the doc store or reset `progress` / `chunk_num`. REST callers can end up with a “cancelled” document that still returns partial chunks in `GET .../chunks` and in retrieval. Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`) already performs full cleanup: it resets counters and calls `docStoreConn.delete`. This PR aligns the newer stop endpoint with that behavior so both paths leave the dataset consistent. Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Update `stop_parse_documents` in `document_api.py` to reset `progress` and `chunk_num` to `0` and delete partial chunks via `docStoreConn.delete` after `cancel_all_task_of`. - Add unit test `test_stop_parse_documents_cleans_partial_chunks` to assert counters reset and doc store delete is invoked. ### Test plan - [x] Unit test: `pytest test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks -v` - [ ] Manual: upload a slow document, start parse, call `POST .../documents/stop` while `RUNNING`, verify `GET .../chunks` returns zero chunks and UI `chunk_count` is 0 - [ ] Control: legacy `DELETE .../chunks` behavior unchanged --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:51:32 +08:00
bohdansolovie	47fb462e46	fix(api): guard dataset delete when File2Document row is missing (#15533 ) ## Summary Fixes #15532 — `delete_datasets()` crashes with `IndexError` when a document has no `File2Document` row. `delete_datasets()` in `dataset_api_service.py` called `File2DocumentService.get_by_document_id()` and immediately accessed `f2d[0].file_id` without checking whether the lookup returned any rows. Documents created via API ingestion or connector sync may exist without a linked file record, causing dataset deletion to abort with HTTP 500. This PR mirrors the existing guard already used in `file_service.py` and `document_api_service.py`.	2026-06-11 15:18:08 +08:00
Idriss Sbaaoui	9871a7e0b6	fix: replicate model provider (#15933 ) ### What problem does this PR solve? FIx replicate model provider failing with valid api key ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:08:33 +08:00
zaviermeekz-cpu	a1dc2da7b4	fix: add model_name to embed completion request (#15883 ) (#15888 ) ### What problem does this PR solve? When embedding a chatbot, the API returned `"Model Name is required"`. The embed widget now includes the assistant's `llm_id` as `model_name` in the completion request. ### Type of change - [x] Bug Fix ### How has this been tested? - Created a chatbot with a default model. - Embedded it and sent a message – the error is gone and the assistant replies correctly. ### Related Issue Closes #15883 Co-authored-by: RAGFlow Dev <dev@ragflow.local> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 14:38:37 +08:00
zaviermeekz-cpu	c50f9c59aa	fix: allow zero message history window and clear history for new sessions (#15897 ) (#15902 ) ### What problem does this PR solve? Two bugs in the Agent Categorize component: 1. The backend rejected `message_history_window_size = 0` while frontend allowed it, causing API errors. 2. When calling the agent API without a `session_id`, a new session was created but retained history from previous conversations. ### Type of change - [x] Bug Fix ### How has this been tested? - Issue 1: `CategorizeParam().check()` now accepts `0` and rejects negative values. - Issue 2: `canvas.clear_history()` is called for new sessions (no `session_id`), ensuring fresh conversation state. Verified via UI and API that a second call without `session_id` does not remember the first conversation. ### Related Issue Closes #15897 Co-authored-by: RAGFlow Dev <dev@ragflow.local> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 13:24:48 +08:00
Wang Qi	238a01d9e3	Fix multiple tags (#15931 ) Fix multiple tags	2026-06-11 10:55:28 +08:00
Lynn	32559d2dfc	Fix: model list (#15914 ) ### What problem does this PR solve? Display OCR tag for model providers. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-11 09:40:45 +08:00
Wang Qi	acaeb416ca	Fix cannot add fish audio (#15913 ) Fix cannot add fish audio	2026-06-10 20:27:43 +08:00
balibabu	aafe6c5534	Fix: The dataset retrieval test returned an incorrect total number. (#15901 ) ### What problem does this PR solve? Fix: The dataset retrieval test returned an incorrect total number. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-06-10 19:11:31 +08:00
Wang Qi	3091d91cf7	Fix no need to put inactive models to bottom (#15903 ) Fix no need to put inactive models to bottom	2026-06-10 16:55:02 +08:00
buua436	dcf623d60d	feat: support multi-type factory models (#15893 ) ### What problem does this PR solve? Support factory models with multiple model types, so visual chat models can be exposed as both image2text and chat while preserving the database model-type-per-record design. This also updates the SILICONFLOW model list and adds a helper script to refresh SiliconFlow models from the provider API. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-10 15:35:21 +08:00
Lynn	478c9846a1	Fix: model list (#15860 ) ### What problem does this PR solve? Remove tenant_llm call in rag. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 14:59:57 +08:00
Wang Qi	9aa81e7cad	Fix paddle ocr / minerU cannot add (#15858 ) Fix paddle ocr / minerU cannot add	2026-06-10 13:04:13 +08:00
Wang Qi	7ed1f1c865	Fix VLLM cannot add without /v1 (#15851 ) Fix VLLM cannot add without /v1	2026-06-09 19:11:15 +08:00
Wang Qi	2773208159	Fix: MinerU cannot be added (#15841 ) Fix: MinerU cannot be added	2026-06-09 19:06:51 +08:00
euvre	f97d6396b4	fix: BaiduYiyan API key validation fails in set_api_key (#15828 ) ### What problem does this PR solve? When setting the API key for the BaiduYiyan provider, all model validations fail with the error "Fail to access model using this api key. No valid response received". Root cause: 1. `BaiduYiyanChat` in `rag/llm/chat_model.py` does not override `async_chat_streamly()`. The `verify_api_key()` function uses `mdl.async_chat_streamly()` to validate, but `BaiduYiyanChat` inherits `Base.async_chat_streamly()` which uses the OpenAI client, not the Baidu Qianfan SDK (qianfan). Since BaiduYiyan has no OpenAI-compatible base_url, validation always fails. 2. `verify_api_key()` in `provider_api_service.py` does not format the raw API key string into the JSON format (`{"yiyan_ak": "...", "yiyan_sk": "..."}`) that `BaiduYiyanChat.__init__()` expects via `json.loads(key)`. Fix: 1. Add `async_chat_streamly()` method to `BaiduYiyanChat` using the qianfan SDK, consistent with the existing `chat_streamly()` method. 2. Add BaiduYiyan API key formatting in `provider_api_service.py` `verify_api_key()` to match the format expected by `BaiduYiyanChat.__init__()`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-06-09 19:05:58 +08:00

1 2 3 4 5 ...

1223 Commits