Commit Graph

2156 Commits

Author SHA1 Message Date
Renzo
6c872256a9 fix: require explicit anonymous webhook access (#14890)
### What problem does this PR solve?

Fixes #14882

Agent webhook execution currently fails open when the saved webhook
`security` block is missing/empty, or when `auth_type` is set to `none`.
This allows unauthenticated webhook invocation without an explicit
operator opt-in.

This PR makes anonymous webhook access explicit:
- Rejects missing or empty webhook security config.
- Requires `allow_anonymous: true` when `auth_type` is `none`.
- Preserves explicit anonymous webhooks by having the frontend serialize
`allow_anonymous: true` when the user selects `None` auth.
- Updates webhook unit tests to cover both denied implicit-anonymous
configs and allowed explicit-anonymous configs.

### Type of change

- [x] Bug Fix
- [x] Security hardening
- [x] Test

### Tests

- [x] `ZHIPU_AI_API_KEY=dummy uv run python -m pytest
--confcutdir=test/testcases/test_web_api/test_agent_app
test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py`
- [x] `uv run ruff check api/apps/restful_apis/agent_api.py
test/testcases/test_web_api/test_agent_app/test_agents_webhook_unit.py`
- [x] `npm exec eslint src/pages/agent/utils.ts
src/pages/agent/form/begin-form/schema.ts`

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-28 13:20:29 +08:00
Rene Arredondo
78832ffc92 fix(agent): authenticate "Thinking" button in shared/embedded chat via beta token (#14985) (#15238)
## Summary

Fixes #14985 — clicking the **Thinking** button in a shared/embedded
chat returns 401 and bounces the user to the login page, even though
the same share page can chat with the agent just fine.

## Root cause

In shared chat, `useGetSharedChatSearchParams` binds `conversationId`
to the URL's `shared_id` query param — which is the **beta APIToken**,
not the real agent id. That `conversationId` propagates through the
component tree:

```tsx
<WorkFlowTimeline canvasId={conversationId}>
  → useFetchMessageTrace(canvasId)
  → GET /api/v1/agents/<sharedId>/logs/<messageId>
```

But `/agents/<agent_id>/logs/<message_id>` is decorated with
`@login_required` (`api/apps/restful_apis/agent_api.py:842-846`).
The share page only holds the beta token — there is no session JWT
— so the request 401s and quart-auth redirects to the login page.
The reporter's server log matches exactly:

```
load_user from jwt got exception No b'.' found in value
load_user: No APIToken found for token=ULG10SWG3E...
Unauthorized request (quart_auth)
GET /api/v1/agents/394013f8d42211f0bad6123fa55e8ed9/logs/96fd72e2-... 1.1 401
```

The `394013f8...` segment in the URL is the `shared_id` (beta
token), not an actual agent id. `_load_user` already accepts the
regular `APIToken.token` field, but not `APIToken.beta`, by design
— beta is a much weaker share-link credential than a personal API
key.

The sibling endpoints `/agentbots/<id>/completions` and
`/agentbots/<id>/inputs` already use the right auth pattern for
this scope (beta-token via `_get_sdk_authorization_token` →
`APIToken.query(beta=token)`). Trace just didn't have a parallel.

## Fix

### Backend (`api/apps/restful_apis/bot_api.py`)

Added a beta-token sibling endpoint:

```
GET /api/v1/agentbots/<shared_id>/logs/<message_id>
```

- Same auth shape as the existing `agentbots` endpoints.
- The `<shared_id>` path segment is a client-supplied label only.
  The real `agent_id` used to build the Redis key
  (`<agent_id>-<message_id>-logs`) is taken from
  `APIToken.dialog_id` on the looked-up token, so the endpoint
  never trusts client-supplied identifiers for the data lookup.
- Returns the same `{data: ...}` shape as the existing
  `/agents/<id>/logs/<message_id>` endpoint, so the frontend
  doesn't need to reshape the response.

### Frontend

- `web/src/utils/api.ts`: added `sharedTrace(sharedId, messageId)`
  URL builder.
- `web/src/services/agent-service.ts`: added
  `fetchSharedTrace({ shared_id, message_id })`.
- `web/src/hooks/use-agent-request.ts`: `useFetchMessageTrace`
  takes an optional `isShare` argument. When set, it calls
  `fetchSharedTrace`; `isShare` is also folded into the
  `queryKey` so the two modes never share cached results.
- `web/src/pages/agent/log-sheet/workflow-timeline.tsx`:
  forwards the already-existing `isShare` prop into the hook.

All other existing call sites of `useFetchMessageTrace` (webhook
timeline, pipeline log, dataflow result) pass no `isShare`
argument → undefined → falsy → unchanged behavior.

## Test plan

- [ ] In the regular Agent UI (logged-in user): open the trace /
      log sheet for any message and click into "Thinking" — the
      timeline should still load via `/agents/<id>/logs/<msg>`,
      same as before.
- [ ] From the Agent page, click **Chat in new tab** to open
      `/chat/share?shared_id=<token>&from=agent`. Send a message,
      wait for a response, then click **Thinking** on the
      assistant turn. The trace panel should load instead of
      redirecting to the login page.
- [ ] Same flow but with the agent embedded in an iframe ("Embed
      into webpage") — confirm there is no login redirect.
- [ ] In DevTools → Network, confirm the share-chat trace request
      goes to `/api/v1/agentbots/<sharedId>/logs/<msgId>` and
      returns 200 with the same JSON shape as the logged-in path.
- [ ] Confirm the chat completions, inputs, and upload flows in
      the share page still work — they were not touched.
- [ ] Send a bogus / expired beta token to the new endpoint and
      confirm it returns the standard "Authentication error: API
      key is invalid!" response (no traceback, no 500).
- [ ] Run `uv run pytest` to make sure no existing tests regress.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-28 13:00:50 +08:00
jiashi19
db188cc705 Feat/agent thinking switch (#15446)
### What problem does this PR solve?

This PR adds an Agent LLM setting to control thinking mode for official
providers that expose a thinking switch.

Related to #12842.  
Closes #15445.

Some providers expose thinking controls through provider-specific
request fields, but Agent LLM settings did not have a unified option for
users to enable or disable thinking mode.

This PR adds a `Thinking` selector with:

- System default
- Enabled
- Disabled
<img width="452" height="278" alt="8566b0b4-0546-4c8a-913d-f9bbd38319f6"
src="https://github.com/user-attachments/assets/25b497f7-1ba0-4bfe-940d-6fe79287d6ab"
/>
<img width="471" height="971" alt="8a0a6bee-f45f-48d5-bd83-17af260de3db"
src="https://github.com/user-attachments/assets/41ad43c1-5087-48f1-bf37-f2ca14c2be2f"
/>
Initial support is limited to the verified official providers:

- Qwen / DashScope: `enable_thinking`
- Kimi / Moonshot: `thinking.type`
- GLM / ZHIPU-AI: `thinking.type`

For LiteLLM-based providers, provider-specific fields are forwarded
through `extra_body` before `drop_params` filtering so the request
parameters are preserved.



### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: jiashi <jiashi19@outlook.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-28 12:02:55 +08:00
Zhichang Yu
c4fe68eaa0 Harden closed-advisory fixes (#16409)
## Summary
- harden reopened advisory fixes across REST connector, invoke, document
downloads, and markdown rendering
- add targeted regression coverage for redirect-safe SSRF handling,
invoke SSRF checks, document access control, and markdown sanitization
- verify each referenced GHSA against the original GitHub advisory text
and align the closed-advisory plan with the implemented remediation

## What changed
- add tenant access checks to document download endpoints to avoid
cross-tenant document disclosure
- add per-hop SSRF validation, DNS pinning, redirect handling, and
redirect limits to the REST API connector
- ensure invoke requests validate and pin the resolved host and never
follow redirects implicitly
- keep the generic rate-limited request path wrapped, not just GET and
POST helpers
- sanitize markdown HTML before rendering in the highlight markdown
component

## Validation
- `cd web && npm test -- --runInBand
src/components/highlight-markdown/__tests__/index.test.tsx`
- `.venv/bin/python -m pytest -q
test/unit_test/data_source/test_rest_api_connector.py`
- targeted `test/testcases/test_web_api/...` unit additions were
reviewed, but the suite cannot be executed end-to-end in this
environment because parent `test/testcases/conftest.py` requires a local
service on `127.0.0.1:9380`

## Notes
- all GHSA entries referenced by the plan were checked against the
original GitHub advisory text, not sampled
- the closed-advisory plan document was updated locally during review,
but is intentionally not included in this PR
2026-06-28 11:17:54 +08:00
Zhichang Yu
a06343eafe fix(codeql): close remaining 44 CodeQL alerts post-merge (#16408)
## Summary

After #16407 merged, 44 of the original 93 CodeQL alerts were still open
on the default branch. This PR closes the remaining ones by:

1. **Moving 32 existing `// codeql[...]` directives** so they sit on the
line **immediately before** the suppressed statement. The original
multi-line suppression blocks had the directive as the first line, with
the rationale on subsequent lines. After line shifts (refactors, linter
reformat), the directive ended up several lines above the alert location
— CodeQL only recognizes the suppression when it appears on the line
directly above. (32 alerts across 27 files.)

2. **Adding 9 new `// codeql[...]` suppressions** for alerts that had no
suppression in the preceding lines at all — mostly real-fixes that
CodeQL conservatively still flags (filepath.Base, bounded slice sizes,
model-identifier strings, the MD5-legacy-migration lookup in
`conversation_service.py`).

## Files changed

- `api/db/services/conversation_service.py` — add
`py/weak-sensitive-data-hashing` suppression (MD5 for backward-compat
legacy row lookup; not used for auth)
- `api/db/services/llm_service.py` — 3×
`py/clear-text-logging-sensitive-data` suppressions on the lines that
log `llm_name` in warnings/info
- `common/misc_utils.py` — 2× `py/clear-text-logging-sensitive-data`
suppressions on the redacted `current_url` log sites
- `internal/agent/component/invoke.go` — moved existing
`go/request-forgery` directive
- `internal/agent/sandbox/ssh.go` — moved existing
`go/command-injection` directive
- `internal/agent/tool/retrieval_service.go` — added
`go/uncontrolled-allocation-size` suppression (`topN` is bounded to 1024
above)
- `internal/cli/common_command.go` — moved 2×
`go/disabled-certificate-check` directives
- `internal/cli/user_command.go` — added `go/clear-text-logging`
suppression (filepath.Base already strips user-identifying path)
- `internal/dao/pipeline_operation_log.go` — moved 2× `go/sql-injection`
directives
- `internal/dao/user_canvas.go` — added `go/sql-injection` suppression
in `GetList` (the new `userCanvasOrderClause` call path)
- `internal/engine/infinity/chunk.go` — moved existing
`go/unsafe-quoting` directive
- `internal/entity/models/*` — moved `go/path-injection` directives (15
files)
- `internal/handler/oauth_login.go` — moved existing
`go/cookie-httponly-not-set` directive
- `internal/handler/tenant.go` — moved existing `go/path-injection`
directive
- `internal/service/deep_researcher.go` — moved existing
`go/unsafe-quoting` directive
- `internal/service/dataset.go` — added
`go/uncontrolled-allocation-size` suppression (`n` bounded to 1024
above)
- `internal/service/file.go` — moved existing `go/request-forgery`
directive
- `internal/service/langfuse.go` — moved 2× `go/request-forgery`
directives
- `internal/utility/mcp_client.go` — moved 3× `go/request-forgery`
directives
- `internal/utility/smtp.go` — moved existing `go/email-injection`
directive
- `rag/prompts/generator.py` — added
`py/clear-text-logging-sensitive-data` suppression
- `web/.../use-provider-fields.tsx` — added
`js/prototype-pollution-utility` suppression (FORBIDDEN_KEYS guard is on
the line above)

## Why the previous PR left alerts open

`// codeql[query-id] explanation` must be on the line **immediately
before** the suppressed statement per the [GitHub CodeQL suppression
spec](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning-with-codeql/suppressing-code-scanning-alerts).
The original suppression blocks were 4-5 lines, with the directive as
the **first** line. After linter reformat / line shifts, the directive
ended up too far above the actual alert line to be recognized. The fix
is to put the directive on the line directly above the suppressed
statement, with the rationale above it.

## Test plan

- All 9 modified Python files `ast.parse` clean
- All 4 modified Go files `gofmt` clean
- 36/44 expected alert suppressions in place
- 8 remaining CodeQL alerts are the originals (#3485851828, #3485851831,
#3485869759, #3485869766, #3485869768, #3485869771, #3485885962,
#3485895527) which were resolved by the corresponding commit comments;
these should close on the next scan when the suppression comments match
the alert lines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-27 20:49:06 +08:00
Zhichang Yu
730f33b1f9 fix(security): address 93 CodeQL code-scanning alerts across 61 files (#16407)
## Summary

Resolves all 93 open alerts at
https://github.com/infiniflow/ragflow/security/code-scanning by rule:

| Rule | Count | Treatment |
|------|-------|-----------|
| py/clear-text-logging-sensitive-data | 23 | Real fix — log scrubbing |
| go/path-injection | 15 | Real fix where possible, suppression with
rationale |
| go/request-forgery | 8 | Suppression with rationale
(operator-controlled URLs) |
| go/clear-text-logging | 10 | Real fix — log scrubbing |
| go/unsafe-quoting | 5 | Real fix — escape or refactor |
| go/sql-injection | 3 | Real fix — orderby whitelist + CodeQL comment |
| go/uncontrolled-allocation-size | 2 | Real fix — cap to 1024 |
| go/incorrect-integer-conversion | 3 | Real fix — ParseInt + range
check |
| go/insecure-hostkeycallback | 1 | Real fix — known_hosts file |
| go/disabled-certificate-check | 2 | Suppression with rationale |
| go/command-injection | 1 | Suppression (sanitized via shq()) |
| go/email-injection | 1 | Suppression with rationale |
| go/cookie-httponly-not-set | 1 | Suppression (SPA bootstrap) |
| js/stack-trace-exposure | 1 | Real fix — generic client message |
| js/prototype-pollution-utility | 1 | Real fix — reject
__proto__/constructor/prototype |
| py/weak-sensitive-data-hashing | 1 | Real fix — MD5 → SHA-256 |
| py/incomplete-url-substring-sanitization | 3 | Real fix —
urlparse(hostname) |
| py/paramiko-missing-host-key-validation | 1 | Real fix —
load_system_host_keys + RejectPolicy |
| cpp/integer-multiplication-cast-to-long | 2 | Real fix — cast to
size_t |

## Real fixes (with measurable security improvement)

**SSH host key verification (Go + Python)**  
Replace `InsecureIgnoreHostKey()` / `paramiko.AutoAddPolicy()` with
proper host key verification against a known_hosts file (configurable
via `SSH_KNOWN_HOSTS` env / `known_hosts` config field; fail-closed when
unset). Loads `~/.ssh/known_hosts` first via `load_system_host_keys()`
so existing setups keep working.

**SQL injection in `user_canvas`**  
Add `userCanvasOrderableColumns` whitelist + `userCanvasOrderClause`
helper. Both `GetList()` and `ListByTenantIDs()` now route the
user-supplied `orderby` query param through the helper, defaulting to
`create_time` on miss.

**SQL injection in `pipeline_operation_log`**  
Existing whitelist documented via CodeQL comment.

**Real SQL injection in `infinity/chunk.go:931`**  
Escape `'` → `''` on user-controlled `questionText` before splicing into
`filter_fulltext(...)` SQL filter.

**Real SQL injection in `elasticsearch/sql.go:75`**  
Defense-in-depth escape on tokenizer output before splicing into
`MATCH(...)`.

**Python code injection in `result_protocol.go`**  
Replace raw JSON literal embedding into Python/JS expressions with
base64 + `json.loads` / `JSON.parse(Buffer.from(...,
'base64').toString('utf8'))`. Eliminates both the unsafe-quoting sink
and the brittleness of mixing JSON true/false/null with Python syntax.

**URL substring check bypass in `embedding_model.py`**  
Replace `if "dashscope-intl.aliyuncs.com" in u` with
`urlparse(u).hostname == "dashscope-intl.aliyuncs.com"` so a base_url
like `https://attacker.example/?u=dashscope-intl.aliyuncs.com` cannot
bypass the routing.

**Prototype pollution in `setNestedValue` (TS)**  
Reject `__proto__`/`constructor`/`prototype` keys before any assignment.

**Integer overflow**  
- scrypt params via `ParseInt` + non-positive check
(`internal/common/password.go`)
- `topN` and `n` caps to 1024 (retrieval_service.go, dataset.go)
- `nalloc*statesize` cast to `size_t` (cpp/re2/onepass.cc)

**Cookie httponly**  
Set explicitly with rationale: this is the OAuth bootstrap cookie
intentionally read by the SPA.

**Stack trace exposure**  
Replace `error.message` in HTTP 500 response with generic `"internal
error"`; full error still logged server-side via `console.error`.

**Weak hashing**  
MD5 → SHA-256 for deterministic `conv_id` derivation
(`conversation_service.py`).

**Log scrubbing**  
Remove or redact user-controlled / sensitive content from clear-text
logs across 8 ingestion parsers, `llm_service.py` ×11,
`tenant_llm_service.py` ×7, `misc_utils.py` ×4, `redis_conn.py` ×10,
`conftest.py` ×4, `init_data.py`, `dataset_api_service.py`,
`generator.py`, `mysql_migration.py`, `cli.go`, `user_command.go`,
`pdf_parser.go`. Most patterns converted to parameterized logging
(`logging.info("...: %d", n)`) or static messages.

## CodeQL suppressions (each with rationale)

For alerts where the data flow is genuinely safe but CodeQL can't see
the context — operator-controlled URLs, sanitized inputs, etc. — I added
`// codeql[go/<rule>] <rationale>` annotations rather than dismissing
them, so future readers can audit the rationale inline:

- `internal/agent/component/invoke.go:135` — Invoke is a generic canvas
HTTP client
- `internal/service/langfuse.go` ×2 — host is per-tenant operator config
- `internal/service/file.go:1184` — already SSRF-guarded by
`assertURLSafe`
- `internal/utility/mcp_client.go` ×3 — already `AssertURLSafe` +
IP-pinned
- `internal/entity/models/bedrock.go` — sigv4-signed request, URL can't
be tampered
- `internal/service/deep_researcher.go:269` — `callback` is SSE display
string, not SQL
- `internal/engine/infinity/chunk.go:346` — UUIDs can't contain `'` (RFC
4122)
- `internal/cli/common_command.go` ×2 — CLI trusts operator-configured
URL
- `internal/utility/smtp.go:194` — msg is server-built, not user form
input
- `internal/entity/models/*` ×14 (path-injection) — audio file paths are
caller-supplied

## Test plan

-  All 13 modified Go packages build cleanly
-  663 tests pass across `internal/agent/sandbox`, `internal/common`,
`internal/agent/component`, `internal/engine/infinity`, `internal/dao`
-  All 11 modified Python files parse via `ast.parse`
-  TypeScript `tsc --noEmit` clean on the modified
`use-provider-fields.tsx`
-  `node --check` clean on the modified JS file

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-27 19:48:29 +08:00
Zhichang Yu
70546ea406 feat(go-agent): Ported retrieval node, added Keenable web search tool (#16396)
Ported retrieval node, added Keenable web search tool
- [x] New Feature (non-breaking change which adds functionality)
2026-06-26 22:55:49 +08:00
Wang Qi
3a829fb6dd Fix VLM PDF parser only parse first 12 pages, and default page range for PDF files align with backend (#16394)
1. Fix VLM parser only parse first 12 pages
2. Fix frontend default pages 1 - 100000, keep aligned with backend.
2026-06-26 20:15:25 +08:00
Tim Wang
ca96d61e73 Feat: Add New API model provider for OpenAI-compatible gateways (#15991)
## Summary

Add support for **"New API"** as a model provider, enabling connection
to [New API](https://github.com/QuantumNous/new-api) /
[one-api](https://github.com/songquanpeng/one-api) compatible gateways
that aggregate multiple LLM backends behind a unified OpenAI-compatible
`/v1` endpoint.

### Features

- **All model types**: Chat, Embedding, Rerank, Image2Text, TTS,
Speech2Text
- **List Models discovery**: `NewAPI(OpenAIAPICompatible)` class in
`model_meta.py` queries the gateway's `/v1/models` to auto-discover
available models via the native `GET /api/v1/providers/<name>/models`
endpoint
- **Model parameter editing**: Pencil icon on each discovered model row
to edit `model_type`, `max_tokens`, and `features` (e.g. tool call
support) before submitting
- **Custom model addition**: "Add Custom Model" button at the bottom of
the List Models dropdown for models not returned by the API
- **Gear icon settings**: Enabled the Settings gear button on provider
instances to manage models on existing instances (viewMode)
- **viewMode credential passthrough**: Fixed List Models in viewMode —
merges `initialValues` credentials when `api_key`/`base_url` fields are
hidden by `hideWhenInstanceExists`

### Changes

**Backend** (8 files):
- `rag/llm/chat_model.py` — `NewAPIChat(Base)` class
- `rag/llm/embedding_model.py` — `NewAPIEmbed(OpenAIEmbed)` class (no
auto `/v1` append)
- `rag/llm/rerank_model.py` — `NewAPIRerank(Base)` class (uses `/rerank`
endpoint)
- `rag/llm/cv_model.py` — `NewAPICv(GptV4)` class
- `rag/llm/tts_model.py` — `NewAPITTS(OpenAITTS)` class
- `rag/llm/sequence2txt_model.py` — `NewAPISeq2txt(GPTSeq2txt)` class
- `rag/llm/model_meta.py` — `NewAPI(OpenAIAPICompatible)` class for List
Models discovery
- `conf/llm_factories.json` — New API factory entry with all model type
tags

**Frontend** (8 files + 1 new SVG):
- `web/src/assets/svg/llm/new-api.svg` — New API logo icon
- `web/src/constants/llm.ts` — `LLMFactory.NewAPI` enum + `IconMap`
entry
- `web/src/components/svg-icon.tsx` — `NewAPI` added to `svgIcons`
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/field-config/local-llm-configs.ts`
— New API `buildLocalConfig`
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/constants.ts`
— `LIST_MODEL_PROVIDERS` includes NewAPI
- `web/src/pages/user-setting/setting-model/components/used-model.tsx` —
Enable Settings gear button
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-picker.ts`
— viewMode credential merge + model editing state/handlers
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-options.tsx`
— Pencil edit icon per model row
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/index.tsx`
— `AddCustomModelDialog` import + edit dialog rendering

**Note on Go implementation**: A Go model driver (`NewAPIModel`
delegating to `OpenAIModel`) has been prepared but is deferred until the
Go runtime is enabled in a future release (current v0.26.0 images use
`API_PROXY_SCHEME=python` and do not compile Go binaries). Will submit
as a follow-up PR.

## Related

- Depends on: #15996 (provider instance API improvements — server-side
credential lookup, idempotent `add_model`, security fixes — required for
viewMode gear icon and batch model submission)

## Test plan

- [ ] Add New API provider with api_key and base_url pointing to an
OpenAI-compatible gateway
- [ ] Click "List Models" — should discover and display available models
from `/v1/models`
- [ ] Click pencil icon on a model — should open edit dialog to change
model_type, max_tokens, features
- [ ] Select multiple models and click OK — should add all selected
models
- [ ] Click gear icon on the added instance — should open viewMode with
List Models working
- [ ] In viewMode, select new models including pre-existing ones, click
OK — should succeed (requires #15996)
- [ ] Verify all model types work: create a Chat assistant, Embedding
KB, Rerank setting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Tim Wang <wanghualoong@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-26 18:47:20 +08:00
chanx
10140b1d02 fix: adjust table height and button position in DatasetTable component (#16390) 2026-06-26 18:46:55 +08:00
balibabu
d14d2068c4 Fix: If the type of the loop variable in the Loop operator is set to object, an error occurs when clicking the Variable Replicator operator inside it. (#16388) 2026-06-26 18:44:56 +08:00
chanx
9610173a74 feat: add log icon to parsing status display (#16383) 2026-06-26 16:13:01 +08:00
Yoorim Choi
46b97bd1a1 fix(web): fix layout issues with text, overflow, and spacing consistency (#16324) 2026-06-25 19:25:32 +08:00
balibabu
1dfc24003b Fix: An empty message notification pops up at the top of the agent conversation. (#16353) 2026-06-25 17:32:24 +08:00
Wang Qi
31e50b164f Fix [ID:0] not converted to Fig. 1 (#16357) 2026-06-25 17:17:46 +08:00
balibabu
3f3a2ece3d Fix: Flexible Chat Configuration (#16293) 2026-06-25 14:56:30 +08:00
Harsh Kashyap
09047d6edf fix(web): bump lodash past vulnerable range (#16281) 2026-06-25 14:40:39 +08:00
chanx
d44359826d fix(web): agent log refetch and slider percentage rounding (#16344) 2026-06-25 13:49:25 +08:00
Ilya Bogin
10d02e54a8 Add Keenable web search tool to the agent (#16233)
Adds Keenable as a web search tool in the agent, alongside the existing
Tavily/DuckDuckGo/SearXNG/Google tools.

The main difference from the other search tools is that it doesn't
require an
API key. By default it uses Keenable's keyless public endpoint, so it
works out
of the box. Providing a key (in the tool config) switches to the
authenticated
endpoint and lifts the rate limits.

### Changes

- Backend: `agent/tools/keenable.py` — `KeenableSearch`, follows the
  Tavily/DuckDuckGo tool shape (results go through `_retrieve_chunks`).
  Auto-registered by `agent/tools/__init__.py`.
- Frontend: wired into the agent builder — operator + icon, config form
(optional API key, search mode, site filter, top N), the search tool
menu,
  and the existing api_key export sanitizer.

### Config

- API key: optional. Blank = keyless free tier; set it to lift limits /
enable
  `realtime` mode.
- `site`: restrict to a single domain.
- `mode`: `pro` (default) or `realtime`.

### Notes

`KEENABLE_API_URL` can override the API base (HTTPS enforced; defaults
to
`https://api.keenable.ai`). The tool only sends the query (no URL
fetch), so
there's no SSRF surface. Verified the frontend with `vite build` and the
backend search path against the public endpoint.
2026-06-25 12:12:28 +08:00
buua436
ba4021a9de fix: restore dataflow rerun and detail payload (#16292) 2026-06-24 13:06:06 +08:00
buua436
8d4f4a093b fix: restore dataflow defaults and SSE response (#16290) 2026-06-24 11:51:24 +08:00
Yoorim Choi
6a8281721f fix(i18n): fix missing i18n coverage and refine Korean translations (#16203)
### What problem does this PR solve?

This PR follows up on
[#15863](https://github.com/infiniflow/ragflow/pull/15863) (Korean i18n)
with translation refinements and i18n coverage for hardcoded strings
found in the UI.

- Refine awkward Korean phrasing (e.g. 'Chunk 만들기' → 'Chunk 생성', '유형' →
'타입', etc.)
- Apply i18n to hardcoded strings in `message-item`,
`next-message-item`, `multi-select`, `chat-prompt-engine`, and various
filter hooks
- Rename `use-selelct-filters.ts` → `use-select-filters.ts` (typo fix)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-24 10:14:19 +08:00
buua436
aba5d172bd feat: add whatsapp web qr chat channel (#16238)
Adds a WhatsApp chat channel backed by a QR-based web login flow so users can connect without manual token setup.
2026-06-23 17:45:31 +08:00
balibabu
d8ee1ffaad Fix: When re-entering the agent page, the data from the previous session flashes briefly. (#16251)
Fix: When re-entering the agent page, the data from the previous session flashes briefly.
2026-06-23 14:13:47 +08:00
VincentLambert
a4fcc988e7 i18n(fr): add missing French translations for chat channels, username validation and model editing (#16217)
## Summary

Several keys added in recent releases were missing from the French
(`fr.ts`) locale file.

- **`top`** — missing in both the common section and the dataset section
- **Chat channels** — all UI strings for the new chat channels feature
(`chatChannels`, `chatChannelDesc.*`, `connectDialog`, `notConnected`,
etc.)
- **Username validation** — `usernameMaxLength`,
`usernameInvalidCharacters`
- **Model editing** — `editCustomModelTitle`

## Changes

- `web/src/locales/fr.ts` — 47 lines added, no other files touched


🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-22 20:09:59 +08:00
balibabu
c849c76f8a Feat: Add a prefix to the name of the FormField associated with the chat. (#16178)
Fix: Add a prefix to the `name` of the `FormField` associated with the chat.
2026-06-22 19:18:11 +08:00
Nick M
329e09f16a Fix: metadata add modal sends empty value due to stale closure (#15229)
Closes #15139.

The "+ Add" flow in the Set/Edit Metadata modal posted updates with an
empty value, so backend saves were silent no-ops and the document's "X
fields" count stayed at 0 despite a "Success" toast.

The value `<Input>` updates `tempValues` synchronously per keystroke but
only writes through to `metaData.values` on blur (via
`handleValueBlur`). When the user clicks the nested modal's Confirm
button without first blurring, the click handler races the blur and
`handleSave` closes over the pre-blur `metaData.values` — still the
initial `['']`. `addUpdateValue` then queues an empty-string update; the
auto-fire save sends it, and after `resetOperations()` the outer Save
button posts `updates: []`.

Read from `tempValues` instead so the queued update carries the typed
value.

Regression test in `tests/use-manage-values-modal.test.ts` asserts that
`handleSave` passes the typed value (not the pre-blur empty string) to
`addUpdateValue` in the add-new code path.
2026-06-22 16:30:42 +08:00
Zhichang Yu
3f805a64f1 feat(agent): align Go agent behavior with Python (except retrieval component) (#16225)
## Summary

Aligns the **Go agent runtime/canvas/components/tools** behavior with
the **Python `agent/` implementation** so the same stored canvas DSL
produces the same execution result on either side. Every component,
tool, and runtime primitive in `internal/agent/` is now driven by the
same semantics as its Python counterpart — variable resolution, template
substitution, control flow, error reporting, retry/cancel, and stream
event shapes.

The **retrieval component is the one explicit exception** in this PR. It
is being reworked in a separate change and is excluded from this
alignment pass; the wrapper slot (`universe_a_wrappers.go →
newRetrievalComponent`) is preserved.

## Scope of alignment

### Components (all aligned with `agent/component/`)
`Begin` · `Message` · `LLM` (incl. ChatTemplateKwargs,
MessageHistoryWindowSize, VisualFiles, Cite, OutputStructure,
JSONOutput, TopP, MaxRetries, DelayAfterError, credentials) · `Agent`
(react + tool artifact capture + `Reset()` interface-assert) · `Switch`
(12/12 operators, Python-equivalent semantics) · `Categorize` · `Invoke`
· `Iteration` · `Loop` (macro-expansion through `workflowx.AddLoopNode`)
· `UserFillUp` (Python-equivalent interrupt/resume via eino
`compose.Interrupt`/`ResumeWithData`) · `FillUp` · `DataOperations` ·
`ListOperations` · `StringTransform` · `VariableAggregator` ·
`VariableAssigner` · `Browser` (full stagehand runtime parity) ·
`DocsGenerator` · `ExcelProcessor`.

### Tools (all aligned with `agent/tools/`)
`Retrieval` (wrapper slot only — logic out of scope) · `MCPToolAdapter`
(streamable-HTTP) · `CodeExec` (sandbox bridge with
`code_exec_contract.go` matching Python contract) · `AkShare` · `ArXiv`
· `Crawler` · `DeepL` · `DuckDuckGo` · `Email` · `ExeSQL` · `GitHub` ·
`Google` · `GoogleScholar` · `Jin10` · `PubMed` · `QWeather` · `SearXNG`
· `Tavily` · `Tushare` · `Wencai` · `Wikipedia` · `YahooFinance` —
uniform `eino tool.InvokableTool` interface, SSRF protection, shared
HTTP client.

### Canvas execution engine (`internal/agent/canvas/`)
Aligned with Python's `agent/canvas.py`:
- **Scheduler** (`scheduler.go`): state pre/post handlers, node lambdas,
per-component timeout resolver (4-level: per-class env → per-class table
→ uniform env → 600s fallback), `legacyNoOpNames`.
- **Loop subgraph** (`loop_subgraph.go`): Python-equivalent
`AddLoopNode` macro expansion + condition translation.
- **Multibranch** (`multibranch.go`): `Switch` / `Categorize` routing
via `compose.NewGraphMultiBranch` — same branch selection semantics as
Python.
- **Parallel subgraph** (`parallel_subgraph.go`): matches Python's
parallel fan-out contract.
- **Interrupt/Resume** (`interrupt_resume.go`): `UserFillUpNodeBody` /
`IsInterruptError` / `ExtractInterruptContexts` — replaces the
deprecated Python sentinel chain with eino's native interrupt API,
preserving the same external behavior.
- **Checkpoint** (`checkpoint_store.go`): `RedisCheckPointStore`
Get/Set/Delete, with business metadata (status / canvas_id /
parent_run_id) on a parallel Redis Hash.
- **RunTracker** (`run_tracker.go`): Start / MarkSucceeded / MarkFailed
/ MarkCancelled / AttachCheckpoint — same lifecycle as the Python run
record.
- **Cancel** (`cancel.go`): Redis pub/sub watch.
- **Stream** (`stream.go`): SSE channel with `messages` / `waiting` /
`errors` / `done` events, same shape as Python's `agent.canvas.RunEvent`
payload.

### DSL bridge (`internal/agent/dsl/`)
- `normalize.go`: v1↔v2 collapsed into a single wire format — Python and
Go consume the same stored JSON.
- `reset.go`: per-run state reset matches Python's `Canvas.reset()`
semantics.
- Testdata mirrors Python's `agent_msg.json` / `all.json` / etc.

### Runtime (`internal/agent/runtime/`)
- `CanvasState` / `NewCanvasState` / `GetVar` / `SetVar` / `ReadVars`:
same `{{cpn_id@param}}` resolution model.
- `ResolveTemplate` (regex fast path + gonja fallback) — Python
Jinja-style semantics.
- `selector.go`, `metrics.go`, `component.go`: shared runtime contracts.

## Out of scope (intentionally)

- **`Retrieval` component logic** — wrapped only; full parity lands in a
follow-up PR.
- **Frontend** — only minor dsl-bridge / canvas UX fixes ride along.
- **CLI / admin / model registry** — orthogonal to agent behavior.

## How alignment is verified

`internal/service/agent_run_e2e_test.go` exercises the **full production
chain** against real Python-shaped DSL fixtures:
```
loadCanvasForUser → versionDAO.GetLatest → decodeCanvasFromDSL →
canvas.Compile → cc.Workflow.Invoke → answer extraction
```
using in-memory SQLite + miniredis (no Docker). Covers:
- `TestRunAgent_RealCanvas_BeginMessage` — happy path, `{{sys.query}}`
resolution
- `TestRunAgent_RealCanvas_WaitForUserResume` — two-run resume cycle
(Python-equivalent)
- `TestRunAgent_RealCanvas_CompileFails` — unknown component name →
sanitized error (Python-equivalent)
- `TestRunAgent_RealCanvas_InvokeFails` — unresolvable template ref
(Python-equivalent)
- `TestRunAgent_RunTracker_AttachCheckpoint_CallSequence` —
Start→AttachCheckpoint→MarkSucceeded lifecycle

`internal/handler/agent_test.go` — SSE streaming parity (`Content-Type:
text/event-stream`, `data: {…}\n\n`, trailing `data: [DONE]\n\n`,
OpenAI-compatible non-stream `choices`).

`internal/agent/canvas/fixture_compile_test.go` + per-component tests
pin the Python-equivalent outputs.

```
go test -count=1 -v -run 'TestRunAgent_RealCanvas|TestRunAgent_RunTracker' ./internal/service/
```

## Design reference

`docs/develop/agent-go-port-design.md` (1329 lines, last cross-checked
2026-06-17) — module layout, per-component / per-tool inventory,
corner-case catalogue, and the actionable backlog (Section 14, including
the retrieval alignment follow-up).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-22 11:58:29 +08:00
buua436
b409cfc3d5 feat: add dingtalk chat channel (#16183)
### What does this PR do?
This PR adds a new DingTalk chat channel integration and hardens the
inbound callback path.

### Summary
- Adds DingTalk as a selectable chat channel in the UI and backend
channel registry.
- Adds the DingTalk chat channel icon asset.
- Acknowledges DingTalk Stream callbacks and deduplicates repeated
inbound messages to avoid duplicate replies.
2026-06-18 20:06:00 +08:00
Wang Qi
5ca1686ac7 Fix that agent cannot be the same name (#16192)
Fix that agent cannot be the same name
2026-06-18 19:10:21 +08:00
buua436
a2de7d0060 fix: chat channel defaults and feishu shutdown (#16176)
This PR keeps the chat-channel default values and Feishu shutdown behavior consistent after the rebase.
2026-06-18 17:44:48 +08:00
euvre
72db9044e2 fix: use RESTful pipeline detail API with knowledgeId and logId (#16182)
The pipeline file log detail hook (`useFetchPipelineFileLogDetail`) was
calling the legacy `kbService.getPipelineDetail({ log_id })` endpoint,
which does not match the current RESTful API contract. The backend now
expects both `datasetId` and `logId` to construct the correct URL (`GET
/api/v1/datasets/{datasetId}/ingestions/{logId}`).
2026-06-18 16:24:35 +08:00
Wang Qi
b47af3b5de Fix search rename error with multiple error message (#664) (#16186) 2026-06-18 15:51:41 +08:00
balibabu
a9021528c3 Fix: Lint error. (#16172)
### What problem does this PR solve?

Fix: Lint error.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-18 13:14:18 +08:00
buua436
ea70663f09 feat: support wecom websocket channel (#16175)
Added WeCom chat channel websocket mode alongside the existing webhook mode, plus frontend support for selecting the connection type.
2026-06-18 13:10:09 +08:00
Wang Qi
99a25dca34 Fix Chat/Search/Agent bot show image (#16152)
Fix Chat/Search/Agent bot show image
2026-06-18 09:38:31 +08:00
balibabu
cf7b06c0f3 Fix: A pipeline created from a template fails immediately upon execution with a "hierarchy does not exist" error. (#16151)
### What problem does this PR solve?

Fix: A pipeline created from a template fails immediately upon execution
with a "hierarchy does not exist" error.
2026-06-17 19:07:04 +08:00
buua436
43d121ad38 feat: add qqbot chat channel (#16140)
### What problem does this PR solve?
Adds qqbot as a built-in chat channel so it can be discovered and
started by the channel bootstrapper and shown in the chat channel
settings UI.

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2026-06-17 18:49:38 +08:00
balibabu
70f319c536 Fix: The pipeline created from the template fails immediately upon execution. (#16149)
### What problem does this PR solve?

Fix: The pipeline created from the template fails immediately upon
execution.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 17:03:17 +08:00
chanx
9302233b95 fix: misc frontend fixes for agent log, login, search settings (#16137)
### What problem does this PR solve?

fix: misc frontend fixes for agent log, login, search settings
- agent-log: restore server-side pagination on export and search;
replace hardcoded labels with i18n keys; switch container to
text-text-primary
- login: validate register nickname against NICKNAME_PATTERN with
reusable setting i18n
- next-search: align llm_setting schema with chat (LlmSettingFieldSchema
+ LLMIdFormField nested, LlmSettingEnabledSchema at form
root) so the slider Switch reads the correct path; strip *Enabled flags
before submit to avoid backend "Unrecognized field name"
  errors
  - locales: add common.reset (zh/en)
  - skills/go-naming: fix relative link to rules/named.md

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 16:20:26 +08:00
balibabu
3247e353c7 Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. (#16134)
### What problem does this PR solve?

Fix: The .docx file is not displaying fully; the hierarchy of the
pipeline created from the template is missing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 16:18:47 +08:00
Wang Qi
b3ac03b96c Set default Paddle OCR URL (#16128)
Set default Paddle OCR URL
2026-06-17 14:29:20 +08:00
buua436
486b28c409 fix: show telegram chat channel (#16125)
### What problem does this PR solve?
Show Telegram in the chat channel picker alongside the existing Discord
and Feishu entries.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 14:18:16 +08:00
Zhichang Yu
e45659868a feat(agent): ship the Go agent canvas port — eino interrupt/resume + Redis check-pointing (#16035)
Replaces the Python agent canvas runtime with a Go implementation that
runs inside `cmd/server_main`.

The canvas compiles into an eino Workflow that pauses on wait-for-user
via native Interrupt/Resume (no sentinel flag) and resumes from a
Redis-backed CheckPointStore.

All 21 Python agent components and ~35 tools are ported with functional
parity.

Sandbox providers now read their JSON config from the admin-panel
system_settings table with env fallback.

234 files / +35,413 / -6,111. All Go files are gofmt-clean (CI gate
added); drops the v2 DSL E2E step and the gap-analysis plan (both
redundant after the port ships).

## Type of change

- [x] Refactoring
- [x] New feature
- [x] Bug fix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-17 13:24:03 +08:00
balibabu
5de00bdf50 Fix: Importing the MCP dialog causes duplicate submissions. (#16037)
### What problem does this PR solve?

Fix: Importing the MCP dialog causes duplicate submissions.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-17 09:49:51 +08:00
Rander
1235da7093 refactor(paddleocr): migrate from sync API to async Job API (#15967)
## Summary

Migrate PaddleOCR integration from the deprecated synchronous HTTP API
to the new asynchronous Job API (`submit → poll → fetch`), aligning with
PaddleOCR 3.6.0+ architecture.

## Changes

### Python (`deepdoc/parser/paddleocr_parser.py`)
- Replace synchronous `requests.post()` with async Job API flow (submit
→ poll → fetch)
- Authentication: `token {token}` → `Bearer {token}`
- File transfer: base64 JSON body → multipart file upload
- Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout
controlled by `request_timeout`)
- Result: fetch full JSONL from result URL, preserving `prunedResult`
with bbox info for crop functionality
- Rename `api_url` → `base_url` (backward compatible: `api_url` still
accepted as fallback)

### Python (`rag/llm/ocr_model.py`)
- Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to
`paddleocr_api_url` / `PADDLEOCR_API_URL`

### Go (`internal/entity/models/paddleocr.go`)
- Add `Client-Platform: ragflow` header to submit and poll requests
- Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5,
max 15s)

### Python (`common/constants.py`)
- Add `PADDLEOCR_BASE_URL` to env keys and default config

## Backward Compatibility

- Old env var `PADDLEOCR_API_URL` still works (used as fallback)
- Frontend field `paddleocr_api_url` still works (backend reads it as
fallback)
- No user-facing configuration changes required for existing setups

## Why not use the `paddleocr` SDK package directly?

RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing
`block_bbox`, `block_label`, `parsing_res_list`) from the raw API
response for PDF crop functionality. The SDK's public `parse_document()`
API only returns `DocParsingResult` with `markdown_text`, discarding the
bbox data. Therefore we implement the async Job API flow directly via
HTTP, following the same logic as the SDK internally.
2026-06-16 19:34:21 +08:00
Wang Qi
8067e97f0d Refactor: rename /chat_channels to /chat-channels (#16099) 2026-06-16 19:15:43 +08:00
Kevin Hu
15f50e5cb2 fix: rename dialog_id to chat_id in chat_channel (backend + frontend) (#16096)
## Summary

- The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id`
via a migration (added in a prior commit).
- Aligns the REST API layer (`chat_channel_api.py`,
`chat_channel_service.py`) to use `chat_id` consistently.
- Updates the frontend (`interface.ts`, `hooks.ts`,
`connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write
`chat_id` instead of `dialog_id`.
- The joined `dialog_name` alias in the list query is unchanged (backend
still returns it under that name).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-16 19:02:20 +08:00
chanx
cac87d7f77 fix: remove unnecessary 'asChild' prop from FilterButton component (#16094)
### What problem does this PR solve?

fix: remove unnecessary 'asChild' prop from FilterButton component

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-16 17:55:04 +08:00
chanx
ff2e76e77c fix: remove unnecessary div in profile page layout (#16091)
### What problem does this PR solve?

fix: remove unnecessary div in profile page layout

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-16 17:42:29 +08:00