Replaces the Python agent canvas runtime with a Go implementation that
runs inside `cmd/server_main`.
The canvas compiles into an eino Workflow that pauses on wait-for-user
via native Interrupt/Resume (no sentinel flag) and resumes from a
Redis-backed CheckPointStore.
All 21 Python agent components and ~35 tools are ported with functional
parity.
Sandbox providers now read their JSON config from the admin-panel
system_settings table with env fallback.
234 files / +35,413 / -6,111. All Go files are gofmt-clean (CI gate
added); drops the v2 DSL E2E step and the gap-analysis plan (both
redundant after the port ships).
## Type of change
- [x] Refactoring
- [x] New feature
- [x] Bug fix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
### What problem does this PR solve?
| # | Method | Endpoint | Description | Git Equivalent |
|---|--------|----------|-------------|----------------|
| 1 | `POST` | `/api/v1/{prefix}/{folder_id}/commits` | Create a
snapshot commit with file changes (add/modify/delete/rename) | `git add`
+ `git commit` |
| 2 | `GET` | `/api/v1/{prefix}/{folder_id}/commits` | List commit
history (paginated) | `git log` |
| 3 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` | Get
commit detail with file changes | `git show` |
| 4 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` |
List file changes in a commit | `git show --name-status` |
| 5 | `GET` |
`/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` | Compare
two commits and return differences | `git diff` |
| 6 | `GET` | `/api/v1/{prefix}/{folder_id}/changes` | Get uncommitted
changes (add/modify/delete) | `git status` |
| 7 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` |
Get the folder tree snapshot at commit time | `git ls-tree` |
| 8 | `GET` |
`/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content`
| Get a file's content as it existed in a specific commit | `git show
HEAD:file` |
| 9 | `GET` | `/api/v1/{prefix}/{file_id}/versions` | Get version
history for a specific file across all commits | `git log -- file` |
Where `{prefix}/{id}` can be:
- `folders/{folder_id}` — direct folder access
- `workspaces/{workspace_id}` — alias of `folders/{folder_id}`
- `datasets/{dataset_id}` — resolves to the dataset's folder
- `memories/{memory_id}` — resolves to the memory's folder
- `skills/{skill_id}` — resolves to the skill's folder
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
The Profile **Name** field currently lacks application-level validation
and allows users to save excessively long names and unsupported special
characters.
While the database enforces a maximum length of 100 characters, neither
the frontend nor backend validates nickname format before persistence.
This can result in inconsistent user data, poor user experience, and UI
layout issues when long names wrap across multiple lines.
This PR introduces consistent frontend and backend validation for
profile names, enforces length and character constraints, provides clear
validation feedback, and prevents invalid values from being saved.
Fixes#15693
### Type of change
* [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
`POST /api/v1/datasets/{dataset_id}/documents/stop`
(`stop_parse_documents`) cancels parsing tasks and sets `run` to
`CANCEL`, but it does **not** remove chunks already indexed in the doc
store or reset `progress` / `chunk_num`. REST callers can end up with a
“cancelled” document that still returns partial chunks in `GET
.../chunks` and in retrieval.
Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`)
already performs full cleanup: it resets counters and calls
`docStoreConn.delete`. This PR aligns the newer stop endpoint with that
behavior so both paths leave the dataset consistent.
Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Changes
- Update `stop_parse_documents` in `document_api.py` to reset `progress`
and `chunk_num` to `0` and delete partial chunks via
`docStoreConn.delete` after `cancel_all_task_of`.
- Add unit test `test_stop_parse_documents_cleans_partial_chunks` to
assert counters reset and doc store delete is invoked.
### Test plan
- [x] Unit test: `pytest
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks
-v`
- [ ] Manual: upload a slow document, start parse, call `POST
.../documents/stop` while `RUNNING`, verify `GET .../chunks` returns
zero chunks and UI `chunk_count` is 0
- [ ] Control: legacy `DELETE .../chunks` behavior unchanged
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
## Summary
Fixes [#15585](https://github.com/infiniflow/ragflow/issues/15585).
- Route markdown preview through the shared `request` client (same as
txt/image previewers) so `Authorization` headers and interceptors are
applied consistently.
- Add a unit test covering `AUTH_BETA` token loading for embedded search
auth.
## Root cause
Search result preview for `.md`/`.mdx` used raw `fetch`, which did not
apply the same auth path as other preview types. That led to `401` on
`GET /api/v1/documents/{id}/preview` even when the user was logged in or
using an embedded search `auth` query param.
## Test plan
- [ ] Log in, run a search, open a markdown citation link — preview
loads (no 401).
- [ ] Open an embedded shared search URL with `auth` query param,
preview a markdown file — preview loads.
- [ ] Confirm PDF/txt preview still works in the same search UI.
---------
Co-authored-by: MkDev11 <89318445+bitloi@users.noreply.github.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
Refine the stream parsing for `<think>` / `</think>` so MiniMax and
DeepSeek-style chunking both flush in the right order without mixing
think and answer buffers.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- Infer `Content-Type` from the stored document filename on SDK download
routes.
- Covers `GET /api/v1/datasets/<dataset_id>/documents/<document_id>` and
`GET /api/v1/documents/<document_id>`.
- Aligns with REST preview/download via `CONTENT_TYPE_MAP`.
## Test plan
- [x] `pytest
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_download_mimetype_from_filename`
- [x] Manual: `curl -sSI` on SDK dataset document download for a PDF;
expect `Content-Type: application/pdf`
Fixes#15112.
### What problem does this PR solve?
Closes#15388.
Chat completion routes did not reliably honor per-request generation
settings:
- `/api/v1/chat/completions` copied generation settings with a
truthiness check, so valid zero values such as `temperature: 0`, `top_p:
0`, `frequency_penalty: 0`, `presence_penalty: 0`, and `max_tokens: 0`
were dropped.
- `/api/v1/openai/{chat_id}/chat/completions` did not forward standard
generation settings into the request-specific dialog LLM settings before
calling `async_chat`.
This PR preserves explicitly supplied generation parameters, including
zero values, and merges request-level overrides into existing dialog
settings where appropriate.
The supported generation parameter keys and merge behavior live in a
shared REST API helper to keep both completion routes aligned.
Validation:
- `git diff --check`
- `python3 -m py_compile api/apps/restful_apis/_generation_params.py
api/apps/restful_apis/chat_api.py api/apps/restful_apis/openai_api.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `uv run ruff check api/apps/restful_apis/_generation_params.py
api/apps/restful_apis/chat_api.py api/apps/restful_apis/openai_api.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `ZHIPU_AI_API_KEY=dummy uv run pytest
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py
-q -k generation_params`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes#15115.
`GET /api/v1/documents/images/<image_id>` returned **Image not found**
when the thumbnail storage object key contained hyphens (e.g.
`page-1.png`). Document APIs build URLs as `{dataset_id}-{thumbnail}`,
but `get_document_image()` used `image_id.split("-")` and required
exactly two segments, so keys like `<kb_id>-page-1.png` were rejected
even though the blob existed.
This PR splits only on the first hyphen (`split("-", 1)`) and sets
`Content-Type` from the object key extension via `CONTENT_TYPE_MAP`
instead of hardcoding `image/JPEG`.
### Related issues
Closes#15358
<!-- After filing upstream, replace XXXX with your issue number. -->
---
### What problem does this PR solve?
`POST /api/v1/openai/<chat_id>/chat/completions` forwards `messages` to
`async_chat` without normalizing `content`. Downstream, `dialog_service`
assumes string content:
```python
re.sub(r"##\d+\$\$", "", m["content"])
```
OpenAI-compatible clients may send `content` as an **array** of parts
(text, `image_url`, etc.), including text-only arrays. That causes
`TypeError` and HTTP **500** instead of a valid response or a clear
**400**.
`openai_api.py` also reads `messages[-1]["content"]` directly for
`prompt` without handling list-shaped content.
This PR normalizes array `content` to a string (concatenating `type:
text` parts) before calling `async_chat`, matching a minimal
OpenAI-compat path. Image parts can be documented as unsupported or
handled in a follow-up if vision integration is required.
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
Closes#15180.
`OIDCClient.parse_id_token` in `api/apps/auth/oidc.py` read the JWT
signing
algorithm from the **unverified** JWT header and passed it through to
`jwt.decode(..., algorithms=[alg], ...)` as the trust anchor. This is
the
textbook JWT algorithm-confusion vulnerability (CWE-345 / CWE-347). Any
unauthenticated client capable of reaching the OIDC callback could take
over
an arbitrary account on any RAGFlow deployment with OIDC login enabled:
1. **`alg: "none"`** — present a JWT with `{"alg": "none"}` and no
signature segment → `jwt.decode(..., algorithms=["none"])` → PyJWT's
`NoneAlgorithm` accepts the token without verification → login as any
user.
2. **RSA / HMAC confusion** — fetch the public RSA key from the
provider's
JWKS (it's public), forge a JWT with `{"alg": "HS256"}` HMAC-signed
using the public-key bytes as the secret → `jwt.decode(...,
algorithms=["HS256"], key=public_key)` → verifier accepts → login as
any user. (Modern PyJWT independently refuses to use a PEM-formatted
key as an HMAC secret, which mitigates this leg for PEM key formats;
the fix here is the only mitigation for raw / DER / JWK octet keys and
for older PyJWT versions.)
### What changed
**`api/apps/auth/oidc.py`:**
- New module constants `_ALLOWED_OIDC_SIGNING_ALGS` (asymmetric-only:
`RS*`, `ES*`, `PS*`, `EdDSA` — explicitly excludes `none` and `HS*`)
and `_DEFAULT_OIDC_SIGNING_ALGS = ("RS256",)` (the OIDC Core 1.0 §2
spec default).
- New helper `_resolve_id_token_signing_algs(metadata)` — intersects the
provider's advertised `id_token_signing_alg_values_supported` from
`/.well-known/openid-configuration` with the safe allowlist; falls back
to RS256 when the field is missing or contains only unsafe values.
- `OIDCClient.__init__` now stores the resolved allowlist on
`self.id_token_signing_algs` — pinned once, from a trusted source, at
construction time.
- `parse_id_token` no longer calls `jwt.get_unverified_header` and no
longer reads `alg` from the JWT header. It passes
`self.id_token_signing_algs` to `jwt.decode(..., algorithms=...)`.
`PyJWKClient.get_signing_key_from_jwt` still reads the `kid` from the
header internally for JWKS lookup — that's fine, `kid` is not a
security decision; the signature still proves which key was actually
used.
**`test/testcases/test_web_api/test_auth_app/test_oidc_client_unit.py`:**
- Existing `test_parse_id_token_success_and_error` drops its
`jwt.get_unverified_header` mock (no longer called by `parse_id_token`).
- `_metadata` and `_make_client` helpers grew an optional `signing_algs`
parameter so tests can configure what the discovery document advertises.
- New `TestSSRFValidation` / algorithm-confusion regression block (7
tests):
- `test_id_token_signing_algs_default_to_rs256_when_metadata_missing`
- `test_id_token_signing_algs_intersect_metadata_with_safe_allowlist`
- `test_id_token_signing_algs_fall_back_when_only_unsafe_advertised`
- `test_id_token_signing_algs_ignores_non_string_entries`
- `test_id_token_signing_algs_handles_non_list_metadata_field`
- `test_parse_id_token_passes_pinned_algorithms_to_jwt_decode` —
sabotages `jwt.get_unverified_header` to raise on call, proving the
verification path never consults the unverified header.
- `test_parse_id_token_rejects_alg_none` — uses real PyJWT to encode an
`alg: "none"` token; `parse_id_token` raises `ValueError("Error
parsing ID Token: …")` instead of accepting it.
- `test_parse_id_token_rejects_hs256_when_allowlist_is_asymmetric` —
uses real PyJWT to forge an `alg: "HS256"` token with a non-PEM
shared secret (so PyJWT's incidental PEM-as-HMAC refusal isn't what
blocks it); `parse_id_token` raises because `HS256` is not in the
pinned allowlist.
Sanity-checked end-to-end with real PyJWT outside the project test
runner:
- `alg=none` forged token + `algorithms=["RS256"]` →
`InvalidAlgorithmError` ✓
- `alg=HS256` forged token + `algorithms=["RS256"]` →
`InvalidAlgorithmError` ✓
- Same `alg=HS256` token + `algorithms=["HS256"]` → **accepted**
({'sub': 'admin'})
— confirming the attack path was real before the fix.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: galuis116 <contact@duerrimports.com>
### What problem does this PR solve?
default OpenAI chat completions to non-stream when `stream` is omitted
https://github.com/infiniflow/ragflow/issues/15356
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Python implementation of the Go-based model_provider API suite.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: bill <yibie_jingnian@163.com>
### What problem does this PR solve?
Closes#15187.
RAGFlow shipped a Slack connector
(`common/data_source/slack_connector.py`) but it was never usable:
`Slack._generate()` in the sync worker was a `pass` stub, the
connector's document-generating code was incompatible with the current
data model,
and Slack was commented out of the data-source settings UI. As a result,
teams had no way to index Slack channels/threads into a knowledge base.
This PR completes the connector end to end.
**Backend**
- `common/data_source/slack_connector.py`
- Rewrote `thread_to_doc` to produce a blob-based `Document`
(`extension`/`blob`/`size_bytes`). The previous implementation built the
doc with a `sections=[...]` argument and omitted the now-required
`blob`/`extension`/ `size_bytes` fields, so it raised a validation error
against the current `Document` model. Thread messages are now cleaned
and flattened into a single UTF-8 text blob.
- Added `load_from_state()` / `poll_source(start, end)` generators. The
connector's checkpoint interface is a no-op stub, so both full and
incremental syncs run through a single channel-iterating generator built
on the existing module helpers (`get_channels`, `filter_channels`,
`get_channel_messages`, `_process_message`), with per-channel thread
de-duplication.
- `rag/svr/sync_data_source.py`
- Implemented `Slack._generate()`. Credentials are loaded via
`StaticCredentialsProvider` (the connector requires `slack_bot_token`
and does not support `load_credentials`). Supports full reindex and
incremental polling from `poll_range_start`, plus the optional channel
filter. Modeled on the Confluence/Dropbox wrappers.
- `SlackConnector` was already exported from
`common/data_source/__init__.py`.
**Frontend (`web/`)**
- Enabled the `SLACK` data-source enum and added its form fields (Slack
bot token + optional channel filter), default values, display metadata,
and a Slack icon.
- Added `slackDescription` / `slackBotTokenTip` / `slackChannelsTip`
strings to `en.ts` and `zh.ts`.
**Tests**
- `test/unit_test/data_source/test_slack_connector_unit.py`: unit tests
covering credential loading (`load_credentials` raises,
`set_credentials_provider` initializes clients, missing credentials
raises) and document generation (standalone message + flattened thread,
blob/extension/size_bytes/metadata, and the incremental poll time
window). All 5 pass; `ruff check` is clean.
Required Slack scopes: `channels:read`, `channels:history`,
`users:read`.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
Creating or updating an agent via `POST /api/v1/agents` and `PUT
/api/v1/agents/{agent_id}` did not persist `canvas_type` because the
handler `req` dict never assigned the field before
`UserCanvasService.save` / `update_by_id`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: Cursor <cursoragent@cursor.com>
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
Follow on PR #15146 to reslove the backwad compatability issue.
1. /agents/<attachment_id>/download ->
/agents/attachments/<attachment_id>/download
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
This change fixes ingestion quality issues where MinerU parser output
may contain HTML fragments (for example, table-related tags like `<tr>`,
`<td>`, `<br>`), which were previously passed directly into
chunking/tokenization and degraded chunk quality.
The fix adds a sanitization step in the MinerU parser path so parsed
sections are normalized to clean text before chunking.
## Change Type (select all)
- [x] Bug fix
- [x] Ingestion pipeline improvement
- [x] Parser/chunking quality fix
## Related Issue
- https://github.com/infiniflow/ragflow/issues/14831
### What problem does this PR solve?
1. Fix /chat/completions to send only the latest message
2. Allo chat stream=False
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: /openai/<chat_id>/chat/completions not aware of session_id
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Closes#14789
### What problem does this PR solve?
User API endpoints (`login`, `user_profile`, `user_add`,
`forget_reset_password`) were returning full user objects via
`to_json()` / `to_dict()`, which included sensitive fields like
`password` and `access_token` in the response body. This leaks
credentials to the client.
This PR adds a `to_safe_dict()` method on the `User` model that strips
sensitive fields (`password`, `access_token`) and replaces all affected
call sites to use it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Enhance retry and timeout, and adjust the default timeout
2. NER: spacy do not batch chunks
3. extract _has_cancel_and_exit
4. enhance log messages
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
implement ASR and TTS for Xinference
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
Closes#15048.
Several SDK session routes in `api/apps/sdk/session.py` called
`.split()` directly on `request.headers.get("Authorization")`. When
clients omitted the header, the handlers raised `AttributeError` before
returning the existing `Authorization is not valid!` response.
This PR centralizes SDK Authorization parsing in a small helper and
keeps the existing error response for missing, empty, or malformed
headers.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Tests
- `ZHIPU_AI_API_KEY=dummy uv run --python 3.13 --group test pytest
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py::test_sdk_session_routes_missing_authorization_unit
-q`
- `uv run --python 3.13 --group test ruff check api/apps/sdk/session.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `python3 -m py_compile api/apps/sdk/session.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `git diff --check`
## Summary
- Align **GET `/api/v1/documents/<doc_id>/download`** with
**`/preview`**: resolve extension and MIME type from the stored document
name when the **`ext` query parameter is omitted**, instead of
defaulting to `markdown`.
- When **`?ext=`** is present, behavior stays the same as before
(explicit extension / `Content-Type` mapping).
- Enforce the same access + document lookup pattern as preview
(**`accessible`** + **`get_by_id`**).
- Extend unit tests for the no-`ext` PDF filename case.
## Test plan
- [x] `uv run pytest
test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_download_attachment_success_and_exception_unit`
- [x] Optional: `curl -sSI` against
`/api/v1/documents/<pdf_doc_id>/download` without `ext` and confirm
`Content-Type: application/pdf`
Fixes#15052.