## Summary
- harden reopened advisory fixes across REST connector, invoke, document
downloads, and markdown rendering
- add targeted regression coverage for redirect-safe SSRF handling,
invoke SSRF checks, document access control, and markdown sanitization
- verify each referenced GHSA against the original GitHub advisory text
and align the closed-advisory plan with the implemented remediation
## What changed
- add tenant access checks to document download endpoints to avoid
cross-tenant document disclosure
- add per-hop SSRF validation, DNS pinning, redirect handling, and
redirect limits to the REST API connector
- ensure invoke requests validate and pin the resolved host and never
follow redirects implicitly
- keep the generic rate-limited request path wrapped, not just GET and
POST helpers
- sanitize markdown HTML before rendering in the highlight markdown
component
## Validation
- `cd web && npm test -- --runInBand
src/components/highlight-markdown/__tests__/index.test.tsx`
- `.venv/bin/python -m pytest -q
test/unit_test/data_source/test_rest_api_connector.py`
- targeted `test/testcases/test_web_api/...` unit additions were
reviewed, but the suite cannot be executed end-to-end in this
environment because parent `test/testcases/conftest.py` requires a local
service on `127.0.0.1:9380`
## Notes
- all GHSA entries referenced by the plan were checked against the
original GitHub advisory text, not sampled
- the closed-advisory plan document was updated locally during review,
but is intentionally not included in this PR
### What problem does this PR solve?
The `get_ingestion_log` endpoint (both Python
`dataset_api_service.get_ingestion_log` and Go
`DatasetService.GetIngestionLog`) was returning only the
**dataset-level** field set, which omits critical fields such as `dsl`,
`document_id`, `parser_id`, `document_name`, `pipeline_id`, etc.
This caused the front-end **dataflow-result page** to be unable to
render the pipeline timeline and chunks when viewing a single ingestion
log, regardless of whether the log was a dataset-level operation
(graph/raptor/mindmap) or a per-file parse.
### Background
`PipelineOperationLogService` provides two field sets:
| Method | Fields |
|---|---|
| `get_dataset_logs_fields` | Minimal set (progress, status, timestamps,
etc.) |
| `get_file_logs_fields` | Superset — includes `document_id`, `dsl`,
`parser_id`, `document_name`, `pipeline_id`, … |
When listing logs, the API correctly distinguishes dataset-level vs
file-level logs and uses the appropriate converter. However, when
**fetching a single log by ID**, both the Python and Go implementations
were hardcoded to the dataset-level set, dropping the extra fields that
the front-end needs.
## Summary
- The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id`
via a migration (added in a prior commit).
- Aligns the REST API layer (`chat_channel_api.py`,
`chat_channel_service.py`) to use `chat_id` consistently.
- Updates the frontend (`interface.ts`, `hooks.ts`,
`connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write
`chat_id` instead of `dialog_id`.
- The joined `dialog_name` alias in the list query is unchanged (backend
still returns it under that name).
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
Adds a legacy mode for /chat/completions that restores v0.23.0-style
output by converting start_to_think/end_to_think back into raw
<think></think> markers and streaming cumulative answer text.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat:
- Allow upsert model_type for instance model
Fix:
- Allow create instance with duplicate api_key
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
Guard the agent-attachment download against a missing or empty storage blob so the caller gets a structured 4xx (`Document not found!`) instead of an HTTP 500. Same bug class as #15365 on document preview.
Resolve#15502
### What problem does this PR solve?
| # | Method | Endpoint | Description | Git Equivalent |
|---|--------|----------|-------------|----------------|
| 1 | `POST` | `/api/v1/{prefix}/{folder_id}/commits` | Create a
snapshot commit with file changes (add/modify/delete/rename) | `git add`
+ `git commit` |
| 2 | `GET` | `/api/v1/{prefix}/{folder_id}/commits` | List commit
history (paginated) | `git log` |
| 3 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` | Get
commit detail with file changes | `git show` |
| 4 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` |
List file changes in a commit | `git show --name-status` |
| 5 | `GET` |
`/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` | Compare
two commits and return differences | `git diff` |
| 6 | `GET` | `/api/v1/{prefix}/{folder_id}/changes` | Get uncommitted
changes (add/modify/delete) | `git status` |
| 7 | `GET` | `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` |
Get the folder tree snapshot at commit time | `git ls-tree` |
| 8 | `GET` |
`/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content`
| Get a file's content as it existed in a specific commit | `git show
HEAD:file` |
| 9 | `GET` | `/api/v1/{prefix}/{file_id}/versions` | Get version
history for a specific file across all commits | `git log -- file` |
Where `{prefix}/{id}` can be:
- `folders/{folder_id}` — direct folder access
- `workspaces/{workspace_id}` — alias of `folders/{folder_id}`
- `datasets/{dataset_id}` — resolves to the dataset's folder
- `memories/{memory_id}` — resolves to the memory's folder
- `skills/{skill_id}` — resolves to the skill's folder
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
#15844
Adds a **Chat channels** capability so a RAGFlow assistant (Dialog) can
be exposed as a bot on external messaging platforms (Feishu/Lark,
Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot
in the UI, connects it to an assistant, and inbound messages are
answered from that assistant's knowledge base — replies are delivered
back on the channel.
**Feishu/Lark is implemented and tested end-to-end.** Discord, Telegram,
LINE, and WeCom are scaffolded against the same interface; the remaining
listed channels are tracked as follow-ups.
### Design
**Backend**
- New `chat_channel` table (`tenant_id`, `name`, `channel`, `config`
JSON holding `{credential: {...}}`, `dialog_id`, `status`) +
`ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`.
- Channel framework under `api/channels/`: a `core` registry +
per-channel packages that self-register a builder and implement a common
`Channel` interface (`start`/`stop`/`send` + inbound normalization) over
`IncomingMessage`/`OutgoingMessage`.
- Embedded **reconcile loop** in `ragflow_server`
(`api/channels/bootstrap.py`): loads enabled bots, and
starts/stops/restarts them as rows change (no server restart needed).
Inbound messages run the connected dialog via the non-streaming
completion path, keeping per-end-user conversation history.
- Missing optional channel SDKs degrade gracefully (channel skipped with
a warning; others unaffected). Channel-level errors are logged, not
crashed.
- Feishu's WebSocket client runs in a dedicated thread with its own
event loop to avoid cross-loop/contextvars conflicts with the channel
runtime.
**Frontend**
- **Settings → Chat channels** panel: available-channels grid +
configured-bots list with add/edit/delete and a **Connect assistant**
popup that binds a bot to a dialog.
- Brand icons via simple-icons / reused shared data-source assets, with
colored fallbacks for brands not available.
- Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix
so the settings page no longer highlights the Chat tab.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Notes
- DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id`
is also covered by a `migrate_db` `alter_db_add_column` for existing
installs.
- Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`,
`line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies.
- Screenshots / per-channel credential docs to follow.
<img width="1338" height="1290" alt="Image"
src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704"
/>
<img width="1344" height="738" alt="Image"
src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617"
/>
<img width="672" height="887" alt="Image"
src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84"
/>
---------
### What problem does this PR solve?
The Profile **Name** field currently lacks application-level validation
and allows users to save excessively long names and unsupported special
characters.
While the database enforces a maximum length of 100 characters, neither
the frontend nor backend validates nickname format before persistence.
This can result in inconsistent user data, poor user experience, and UI
layout issues when long names wrap across multiple lines.
This PR introduces consistent frontend and backend validation for
profile names, enforces length and character constraints, provides clear
validation feedback, and prevents invalid values from being saved.
Fixes#15693
### Type of change
* [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
This PR passes `session_id` into Langfuse trace observations so
multi-turn chat messages can be grouped under the same session in
Langfuse.
Changes include:
- Propagate `session_id` from chat/session APIs into
`dialog_service.async_chat`.
- Pass `session_id` into Langfuse `start_observation(...)`.
- Share Langfuse `trace_context` with chat, embedding, rerank, and TTS
model bundles where applicable.
- Add unit coverage to verify Langfuse observations receive
`session_id`.
- Update affected test stubs for the new optional Langfuse context
arguments.
## Related Issue
Closes: #15636
## Change Type
- [x] Feature
- [x] Bug fix
- [x] Test
- [ ] Refactor
- [ ] Documentation
- [ ] Breaking change
## Real Behavior Proof
Before this change:
- Langfuse observations were created without `session_id`.
- Multi-turn chat traces could not be grouped by session in Langfuse.
After this change:
- Chat/session flows pass `session_id` into `async_chat`.
- Langfuse observations include `session_id`.
- Related model bundles receive shared trace context and session
metadata.
Validation result:
```bash
uv run python -m py_compile \
api/db/services/tenant_llm_service.py \
api/db/services/llm_service.py \
api/db/services/dialog_service.py \
api/db/services/conversation_service.py \
api/apps/restful_apis/chat_api.py \
test/unit_test/api/db/services/test_dialog_service_final_answer.py \
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py
```
Passed.
```bash
uv run pytest \
test/unit_test/api/db/services/test_dialog_service_final_answer.py \
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py -q
```
Result:
```text
11 passed in 16.89s
```
```bash
git diff --check
```
Passed.
## Checklist
- [x] Analyzed the issue requirement.
- [x] Checked existing Langfuse trace integration.
- [x] Implemented only the requested session grouping behavior.
- [x] Added/updated unit tests.
- [x] Ran focused tests successfully.
- [x] Ran Python compile validation.
- [x] Ran whitespace diff validation.
### What problem does this PR solve?
Fix: Remove the pagination from the search and retrieval pages.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## What problem does this PR solve?
Closes#15874
Both the `POST /api/v1/datasets/<dataset_id>/chunks` (re-parse) and
`DELETE /api/v1/datasets/<dataset_id>/chunks` (stop-parsing) handlers
called `settings.docStoreConn.delete` unconditionally. When the
tenant/dataset index has not been created yet — fresh dataset, first
parse interrupted before any chunks were indexed, or index manually
removed — the delete call throws and the handler returns HTTP 500
**after** the document state was already mutated (RUNNING with zeroed
counters for the parse path; CANCEL with zeroed counters for the stop
path), leaving the document in an inconsistent state.
The newer `parse_documents` path in `document_api.py` already uses
`index_exist` before deleting:
## How to fix?
Apply the same `index_exist` guard to both call sites in `chunk_api.py`:
- **`parse`** (POST path, line ~192): guard the delete before
`TaskService.filter_delete`.
- **`stop_parsing`** (DELETE path, line ~242): guard the delete after
`DocumentService.update_by_id`.
Both sites already have the correct `search.index_name(tenant_id)` and
`dataset_id` parameters; the guard is a one-line addition at each site.
## Type of change
- [x] Bug fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
`POST /api/v1/datasets/{dataset_id}/documents/stop`
(`stop_parse_documents`) cancels parsing tasks and sets `run` to
`CANCEL`, but it does **not** remove chunks already indexed in the doc
store or reset `progress` / `chunk_num`. REST callers can end up with a
“cancelled” document that still returns partial chunks in `GET
.../chunks` and in retrieval.
Legacy `DELETE /api/v1/datasets/{dataset_id}/chunks` (`stop_parsing`)
already performs full cleanup: it resets counters and calls
`docStoreConn.delete`. This PR aligns the newer stop endpoint with that
behavior so both paths leave the dataset consistent.
Fixes [#15788](https://github.com/infiniflow/ragflow/issues/15788).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Changes
- Update `stop_parse_documents` in `document_api.py` to reset `progress`
and `chunk_num` to `0` and delete partial chunks via
`docStoreConn.delete` after `cancel_all_task_of`.
- Add unit test `test_stop_parse_documents_cleans_partial_chunks` to
assert counters reset and doc store delete is invoked.
### Test plan
- [x] Unit test: `pytest
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py::TestDocRoutesUnit::test_stop_parse_documents_cleans_partial_chunks
-v`
- [ ] Manual: upload a slow document, start parse, call `POST
.../documents/stop` while `RUNNING`, verify `GET .../chunks` returns
zero chunks and UI `chunk_count` is 0
- [ ] Control: legacy `DELETE .../chunks` behavior unchanged
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
## Summary
Fixes#15532 — `delete_datasets()` crashes with `IndexError` when a
document has no `File2Document` row.
`delete_datasets()` in `dataset_api_service.py` called
`File2DocumentService.get_by_document_id()` and immediately accessed
`f2d[0].file_id` without checking whether the lookup returned any rows.
Documents created via API ingestion or connector sync may exist without
a linked file record, causing dataset deletion to abort with HTTP 500.
This PR mirrors the existing guard already used in `file_service.py` and
`document_api_service.py`.
### What problem does this PR solve?
FIx replicate model provider failing with valid api key
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
When embedding a chatbot, the API returned `"Model Name is required"`.
The embed widget now includes the assistant's `llm_id` as `model_name`
in the completion request.
### Type of change
- [x] Bug Fix
### How has this been tested?
- Created a chatbot with a default model.
- Embedded it and sent a message – the error is gone and the assistant
replies correctly.
### Related Issue
Closes#15883
Co-authored-by: RAGFlow Dev <dev@ragflow.local>
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
Two bugs in the Agent Categorize component:
1. The backend rejected `message_history_window_size = 0` while frontend
allowed it, causing API errors.
2. When calling the agent API without a `session_id`, a new session was
created but retained history from previous conversations.
### Type of change
- [x] Bug Fix
### How has this been tested?
- Issue 1: `CategorizeParam().check()` now accepts `0` and rejects
negative values.
- Issue 2: `canvas.clear_history()` is called for new sessions (no
`session_id`), ensuring fresh conversation state. Verified via UI and
API that a second call without `session_id` does not remember the first
conversation.
### Related Issue
Closes#15897
Co-authored-by: RAGFlow Dev <dev@ragflow.local>
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
Fix: The dataset retrieval test returned an incorrect total number.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
Support factory models with multiple model types, so visual chat models
can be exposed as both image2text and chat while preserving the database
model-type-per-record design.
This also updates the SILICONFLOW model list and adds a helper script to
refresh SiliconFlow models from the provider API.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
When setting the API key for the BaiduYiyan provider, all model
validations fail with the error "Fail to access model using this api
key. No valid response received".
**Root cause:**
1. `BaiduYiyanChat` in `rag/llm/chat_model.py` does not override
`async_chat_streamly()`. The `verify_api_key()` function uses
`mdl.async_chat_streamly()` to validate, but `BaiduYiyanChat` inherits
`Base.async_chat_streamly()` which uses the OpenAI client, not the Baidu
Qianfan SDK (qianfan). Since BaiduYiyan has no OpenAI-compatible
base_url, validation always fails.
2. `verify_api_key()` in `provider_api_service.py` does not format the
raw API key string into the JSON format (`{"yiyan_ak": "...",
"yiyan_sk": "..."}`) that `BaiduYiyanChat.__init__()` expects via
`json.loads(key)`.
**Fix:**
1. Add `async_chat_streamly()` method to `BaiduYiyanChat` using the
qianfan SDK, consistent with the existing `chat_streamly()` method.
2. Add BaiduYiyan API key formatting in `provider_api_service.py`
`verify_api_key()` to match the format expected by
`BaiduYiyanChat.__init__()`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Propagate `tenant_id` from memory task messages into task collection so
refactored task execution can build a valid context.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: An error message appears when accessing the agent's launch page:
"pagesize exceeds maximum value".
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
Display intl `base_url` for Tongyi-Qianwen
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Support model list for VolcEngine.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix LM-Studio provider connection verification so embedding checks await
the async wrapper correctly and log the full traceback on failures.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Normalize agent session chunk references so they are mapped through a
dedicated helper instead of duplicating the field extraction inline.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)