ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 15:31:05 +08:00

Author	SHA1	Message	Date
Zhichang Yu	195bfffb5e	fix(security): address 93 CodeQL code-scanning alerts across 61 files (#16407 ) ## Summary Resolves all 93 open alerts at https://github.com/infiniflow/ragflow/security/code-scanning by rule: \| Rule \| Count \| Treatment \| \|------\|-------\|-----------\| \| py/clear-text-logging-sensitive-data \| 23 \| Real fix — log scrubbing \| \| go/path-injection \| 15 \| Real fix where possible, suppression with rationale \| \| go/request-forgery \| 8 \| Suppression with rationale (operator-controlled URLs) \| \| go/clear-text-logging \| 10 \| Real fix — log scrubbing \| \| go/unsafe-quoting \| 5 \| Real fix — escape or refactor \| \| go/sql-injection \| 3 \| Real fix — orderby whitelist + CodeQL comment \| \| go/uncontrolled-allocation-size \| 2 \| Real fix — cap to 1024 \| \| go/incorrect-integer-conversion \| 3 \| Real fix — ParseInt + range check \| \| go/insecure-hostkeycallback \| 1 \| Real fix — known_hosts file \| \| go/disabled-certificate-check \| 2 \| Suppression with rationale \| \| go/command-injection \| 1 \| Suppression (sanitized via shq()) \| \| go/email-injection \| 1 \| Suppression with rationale \| \| go/cookie-httponly-not-set \| 1 \| Suppression (SPA bootstrap) \| \| js/stack-trace-exposure \| 1 \| Real fix — generic client message \| \| js/prototype-pollution-utility \| 1 \| Real fix — reject __proto__/constructor/prototype \| \| py/weak-sensitive-data-hashing \| 1 \| Real fix — MD5 → SHA-256 \| \| py/incomplete-url-substring-sanitization \| 3 \| Real fix — urlparse(hostname) \| \| py/paramiko-missing-host-key-validation \| 1 \| Real fix — load_system_host_keys + RejectPolicy \| \| cpp/integer-multiplication-cast-to-long \| 2 \| Real fix — cast to size_t \| ## Real fixes (with measurable security improvement) SSH host key verification (Go + Python) Replace `InsecureIgnoreHostKey()` / `paramiko.AutoAddPolicy()` with proper host key verification against a known_hosts file (configurable via `SSH_KNOWN_HOSTS` env / `known_hosts` config field; fail-closed when unset). Loads `~/.ssh/known_hosts` first via `load_system_host_keys()` so existing setups keep working. SQL injection in `user_canvas` Add `userCanvasOrderableColumns` whitelist + `userCanvasOrderClause` helper. Both `GetList()` and `ListByTenantIDs()` now route the user-supplied `orderby` query param through the helper, defaulting to `create_time` on miss. SQL injection in `pipeline_operation_log` Existing whitelist documented via CodeQL comment. Real SQL injection in `infinity/chunk.go:931` Escape `'` → `''` on user-controlled `questionText` before splicing into `filter_fulltext(...)` SQL filter. Real SQL injection in `elasticsearch/sql.go:75` Defense-in-depth escape on tokenizer output before splicing into `MATCH(...)`. Python code injection in `result_protocol.go` Replace raw JSON literal embedding into Python/JS expressions with base64 + `json.loads` / `JSON.parse(Buffer.from(..., 'base64').toString('utf8'))`. Eliminates both the unsafe-quoting sink and the brittleness of mixing JSON true/false/null with Python syntax. URL substring check bypass in `embedding_model.py` Replace `if "dashscope-intl.aliyuncs.com" in u` with `urlparse(u).hostname == "dashscope-intl.aliyuncs.com"` so a base_url like `https://attacker.example/?u=dashscope-intl.aliyuncs.com` cannot bypass the routing. Prototype pollution in `setNestedValue` (TS) Reject `__proto__`/`constructor`/`prototype` keys before any assignment. Integer overflow - scrypt params via `ParseInt` + non-positive check (`internal/common/password.go`) - `topN` and `n` caps to 1024 (retrieval_service.go, dataset.go) - `nallocstatesize` cast to `size_t` (cpp/re2/onepass.cc) Cookie httponly* Set explicitly with rationale: this is the OAuth bootstrap cookie intentionally read by the SPA. Stack trace exposure Replace `error.message` in HTTP 500 response with generic `"internal error"`; full error still logged server-side via `console.error`. Weak hashing MD5 → SHA-256 for deterministic `conv_id` derivation (`conversation_service.py`). Log scrubbing Remove or redact user-controlled / sensitive content from clear-text logs across 8 ingestion parsers, `llm_service.py` ×11, `tenant_llm_service.py` ×7, `misc_utils.py` ×4, `redis_conn.py` ×10, `conftest.py` ×4, `init_data.py`, `dataset_api_service.py`, `generator.py`, `mysql_migration.py`, `cli.go`, `user_command.go`, `pdf_parser.go`. Most patterns converted to parameterized logging (`logging.info("...: %d", n)`) or static messages. ## CodeQL suppressions (each with rationale) For alerts where the data flow is genuinely safe but CodeQL can't see the context — operator-controlled URLs, sanitized inputs, etc. — I added `// codeql[go/<rule>] <rationale>` annotations rather than dismissing them, so future readers can audit the rationale inline: - `internal/agent/component/invoke.go:135` — Invoke is a generic canvas HTTP client - `internal/service/langfuse.go` ×2 — host is per-tenant operator config - `internal/service/file.go:1184` — already SSRF-guarded by `assertURLSafe` - `internal/utility/mcp_client.go` ×3 — already `AssertURLSafe` + IP-pinned - `internal/entity/models/bedrock.go` — sigv4-signed request, URL can't be tampered - `internal/service/deep_researcher.go:269` — `callback` is SSE display string, not SQL - `internal/engine/infinity/chunk.go:346` — UUIDs can't contain `'` (RFC 4122) - `internal/cli/common_command.go` ×2 — CLI trusts operator-configured URL - `internal/utility/smtp.go:194` — msg is server-built, not user form input - `internal/entity/models/*` ×14 (path-injection) — audio file paths are caller-supplied ## Test plan - ✅ All 13 modified Go packages build cleanly - ✅ 663 tests pass across `internal/agent/sandbox`, `internal/common`, `internal/agent/component`, `internal/engine/infinity`, `internal/dao` - ✅ All 11 modified Python files parse via `ast.parse` - ✅ TypeScript `tsc --noEmit` clean on the modified `use-provider-fields.tsx` - ✅ `node --check` clean on the modified JS file 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-06-29 09:45:16 +08:00
Zhichang Yu	dfe2dc346d	feat[Go]: port agent attachment download, chatbot + agentbot completion/info endpoints from Python (#16405 ) ## Summary Ports five Python agent APIs to Go under the v1 Gin router: - `GET /api/v1/agents/attachments/<attachment_id>/download` - `POST /api/v1/chatbots/<dialog_id>/completions` (SSE) - `GET /api/v1/chatbots/<dialog_id>/info` - `POST /api/v1/agentbots/<agent_id>/completions` (SSE) - `GET /api/v1/agentbots/<agent_id>/inputs` Mirrors the existing Python wire shape (`{code, message, data:{answer,reference,...}}` per Python `canvas_service.completion`) so the iframe SDK and existing JS widgets keep working. ## Behavioural parity with Python \| # \| Concern \| How it's met \| \|---\|---------\|--------------\| \| R0 \| Bot routes must not require regular user session \| Routes mount on `apiNoAuth` (router.go:198-202), with `BetaAuthMiddleware` only \| \| R3 \| Two SSE formats in Go drift \| F2: `AgentChatCompletions` and `AgentbotCompletion` share `service.WriteChatbotRunEvent` \| \| R7 \| `GetBySessionID` returns `(nil, nil)` on miss \| Defensive nil-check before `session.UserID != tenantID` \| \| R8 \| Begin component name vs ID \| `FindBeginComponentID` resolves name → ID first, then `ExtractComponentInputForm(dsl, beginID)` \| \| R9 \| Defensive PromptConfig parsing \| `stringFromMap` helper used for `prologue` and `tavily_api_key` \| \| R10 \| `BetaAuthMiddleware` Bearer-prefix pre-filter \| Removed — `GetUserByToken` is called unconditionally, falls back to `GetUserByBetaAPIToken` \| \| F8 \| Multi-turn chatbot history \| `ChatbotCompletion` reads prior turns from `session.Message`, appends user turn, calls LLM, persists new pair via new `API4ConversationDAO.Update` \| \| F9 \| UUID gate stricter than plan \| Removed — only `filepath.Base` + CR/LF/quote header sanitization remains \| \| H2 \| Defence-in-depth IDOR \| `AgentbotCompletion` calls `loadCanvas` before delegating to `RunAgent` \| \| M2 \| SSE error leakage \| `WriteChatbotFrame` emits generic `"an internal error occurred"`; real error logged via `common.Error` \| ## Verification ```bash $ go vet ./... # clean (only pre-existing issues) $ go build ./... # success $ go test ./internal/handler/ ./internal/service/ ./internal/agent/dsl/ ./internal/common/ ./internal/dao/ ok ragflow/internal/handler 0.617s ok ragflow/internal/service 1.729s ok ragflow/internal/agent/dsl 0.008s ok ragflow/internal/common 0.087s ok ragflow/internal/dao 0.083s ``` 1199 tests pass across 5 packages. ## Known follow-ups (out of scope for this PR) - F1: token-level streaming in `ChatbotCompletion` (currently emits one frame per turn) - F3: per-route `auth_types` attribute in Go (currently applied via route group middleware) --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-06-29 09:45:16 +08:00
Zhichang Yu	477f2fcebd	feat[Go]: port agent webhook trigger, agent file upload/download, component input-form + debug endpoints from Python (#16403 ) port agent webhook trigger, agent file upload/download, component input-form + debug endpoints from Python - [x] New Feature (non-breaking change which adds functionality)	2026-06-29 09:45:16 +08:00
Zhichang Yu	f58fae5fb7	feat(go-agent): Ported retrieval node, added Keenable web search tool (#16396 ) Ported retrieval node, added Keenable web search tool - [x] New Feature (non-breaking change which adds functionality)	2026-06-29 09:45:16 +08:00
Liu An	f86a0e7386	Docs: Update version references to v0.26.2 in READMEs and docs (#16387 ) v0.26.2	2026-06-29 09:45:16 +08:00
Haruko386	9d18f33296	fix: remove dup-method (#16393 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-26 20:51:10 +08:00
Wang Qi	3a829fb6dd	Fix VLM PDF parser only parse first 12 pages, and default page range for PDF files align with backend (#16394 ) 1. Fix VLM parser only parse first 12 pages 2. Fix frontend default pages 1 - 100000, keep aligned with backend.	2026-06-26 20:15:25 +08:00
Haruko386	a57a841a11	feat[Go]: implement Create-Chat/Session, Delete-Session (#16386 ) ### What problem does this PR solve? As title: implement: ```go chats.POST("", r.chatHandler.Create) chats.POST("/:chat_id/sessions", r.chatSessionHandler.CreateSession) chats.DELETE("/:chat_id/sessions", r.chatSessionHandler.DeleteSessions) ``` bug fixed: `f80d4c7843/internal/handler/chat.go (L84)` ↓ ```go result, err := h.chatService.ListChats(userID, "1", keywords, page, pageSize, orderby, desc) ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-06-26 19:23:45 +08:00
Hz_	e3063da390	feat(go-api): add chat update endpoints (#16378 ) ## Summary - Added Go API route `PUT /api/v1/chats/:chat_id` to align with Python `PUT /api/v1/chats/<chat_id>` chat update behavior. - Added Go API route `PATCH /api/v1/chats/:chat_id` to align with Python `PATCH /api/v1/chats/<chat_id>` partial chat update behavior. - Added matching handler and service logic for owner checks, tenant validation, persisted-field filtering, read-only field filtering, `dataset_ids` to `kb_ids` conversion, and PATCH shallow merge semantics for `prompt_config` and `llm_setting`.	2026-06-26 19:22:57 +08:00
Haruko386	a1f1dd5007	feat[Go]: implement Add messages for Go (#16375 ) ### What problem does this PR solve? As title ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-26 19:21:52 +08:00
Jin Hai	f763044889	Go CLI: Fix show admin server and api server (#16382 ) ### What problem does this PR solve? RAGFlow(api/default)> show admin server; RAGFlow(api/default)> show api server 'default'; RAGFlow(admin)> show admin server; RAGFlow(admin)> show api server 'default'; ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 19:16:14 +08:00
Tim Wang	ca96d61e73	Feat: Add New API model provider for OpenAI-compatible gateways (#15991 ) ## Summary Add support for "New API" as a model provider, enabling connection to [New API](https://github.com/QuantumNous/new-api) / [one-api](https://github.com/songquanpeng/one-api) compatible gateways that aggregate multiple LLM backends behind a unified OpenAI-compatible `/v1` endpoint. ### Features - All model types: Chat, Embedding, Rerank, Image2Text, TTS, Speech2Text - List Models discovery: `NewAPI(OpenAIAPICompatible)` class in `model_meta.py` queries the gateway's `/v1/models` to auto-discover available models via the native `GET /api/v1/providers/<name>/models` endpoint - Model parameter editing: Pencil icon on each discovered model row to edit `model_type`, `max_tokens`, and `features` (e.g. tool call support) before submitting - Custom model addition: "Add Custom Model" button at the bottom of the List Models dropdown for models not returned by the API - Gear icon settings: Enabled the Settings gear button on provider instances to manage models on existing instances (viewMode) - viewMode credential passthrough: Fixed List Models in viewMode — merges `initialValues` credentials when `api_key`/`base_url` fields are hidden by `hideWhenInstanceExists` ### Changes Backend (8 files): - `rag/llm/chat_model.py` — `NewAPIChat(Base)` class - `rag/llm/embedding_model.py` — `NewAPIEmbed(OpenAIEmbed)` class (no auto `/v1` append) - `rag/llm/rerank_model.py` — `NewAPIRerank(Base)` class (uses `/rerank` endpoint) - `rag/llm/cv_model.py` — `NewAPICv(GptV4)` class - `rag/llm/tts_model.py` — `NewAPITTS(OpenAITTS)` class - `rag/llm/sequence2txt_model.py` — `NewAPISeq2txt(GPTSeq2txt)` class - `rag/llm/model_meta.py` — `NewAPI(OpenAIAPICompatible)` class for List Models discovery - `conf/llm_factories.json` — New API factory entry with all model type tags Frontend (8 files + 1 new SVG): - `web/src/assets/svg/llm/new-api.svg` — New API logo icon - `web/src/constants/llm.ts` — `LLMFactory.NewAPI` enum + `IconMap` entry - `web/src/components/svg-icon.tsx` — `NewAPI` added to `svgIcons` - `web/src/pages/user-setting/setting-model/modal/provider-modal/field-config/local-llm-configs.ts` — New API `buildLocalConfig` - `web/src/pages/user-setting/setting-model/modal/provider-modal/constants.ts` — `LIST_MODEL_PROVIDERS` includes NewAPI - `web/src/pages/user-setting/setting-model/components/used-model.tsx` — Enable Settings gear button - `web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-picker.ts` — viewMode credential merge + model editing state/handlers - `web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-options.tsx` — Pencil edit icon per model row - `web/src/pages/user-setting/setting-model/modal/provider-modal/index.tsx` — `AddCustomModelDialog` import + edit dialog rendering Note on Go implementation: A Go model driver (`NewAPIModel` delegating to `OpenAIModel`) has been prepared but is deferred until the Go runtime is enabled in a future release (current v0.26.0 images use `API_PROXY_SCHEME=python` and do not compile Go binaries). Will submit as a follow-up PR. ## Related - Depends on: #15996 (provider instance API improvements — server-side credential lookup, idempotent `add_model`, security fixes — required for viewMode gear icon and batch model submission) ## Test plan - [ ] Add New API provider with api_key and base_url pointing to an OpenAI-compatible gateway - [ ] Click "List Models" — should discover and display available models from `/v1/models` - [ ] Click pencil icon on a model — should open edit dialog to change model_type, max_tokens, features - [ ] Select multiple models and click OK — should add all selected models - [ ] Click gear icon on the added instance — should open viewMode with List Models working - [ ] In viewMode, select new models including pre-existing ones, click OK — should succeed (requires #15996) - [ ] Verify all model types work: create a Chat assistant, Embedding KB, Rerank setting 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Tim Wang <wanghualoong@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-26 18:47:20 +08:00
chanx	10140b1d02	fix: adjust table height and button position in DatasetTable component (#16390 )	2026-06-26 18:46:55 +08:00
Wang Qi	638b59fbcd	Fix handle move file failed (#16384 ) Follow on PR: #16350	2026-06-26 18:46:21 +08:00
balibabu	d14d2068c4	Fix: If the type of the loop variable in the Loop operator is set to `object`, an error occurs when clicking the Variable Replicator operator inside it. (#16388 )	2026-06-26 18:44:56 +08:00
Lynn	bf1eabea72	Feat: support new qwen model (#16385 )	2026-06-26 17:30:16 +08:00
buua436	f80d4c7843	fix: tighten loop validation (#16374 )	2026-06-26 16:29:08 +08:00
chanx	9610173a74	feat: add log icon to parsing status display (#16383 )	2026-06-26 16:13:01 +08:00
Wang Qi	985e3c1db5	Fix document progress not set to fail when embedding model error (#16381 )	2026-06-26 16:11:54 +08:00
Öndery	8081a77c7c	Fix missing move and copy methods in Python RAGFlowS3 storage implementation (#16350 )	2026-06-26 15:51:24 +08:00
Jin Hai	2667995b25	Go CLI: Fix show model and list models (#16380 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> show model 'WiseDiag-Z1 Think'; RAGFlow(api/default)> list models; RAGFlow(admin)> show model 'WiseDiag-Z1 Think'; RAGFlow(admin)> list models; ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 15:36:01 +08:00
Hz_	0de8f3e127	feat: add missing qwen models to all_models.json (#16379 ) Add 19 missing qwen models and 3 aliases to all_models.json. Models added: qwen-image-2.0-pro (2026-06-22, 2026-04-22), qwen3.5-ocr, qwen3.7-max-2026-05-17, qwen3.5-livetranslate-flash-realtime, qwen3.5-omni-plus/flash-realtime, qwen-deep-research-2025-12-15, qwen-flash-character-2026-02-26, qwen-plus-2025-11-05, qwen-deep-search-planning, qwen3-s2s-flash-realtime-2025-09-22, qwen-max-1201/longcontext/0107, qwen-1.8b-longcontext-chat Aliases: qwen3.5-plus-2026-04-20, qwen-turbo-0919, qwen-1.8b-chat	2026-06-26 15:35:30 +08:00
writinwaters	5af798607e	Docs: Added v0.26.2 release notes. (#16373 )	2026-06-26 15:18:54 +08:00
Jin Hai	8bc27d8df1	Go CLI: fix show variable (#16370 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> show var 'mail.port'; +-----------+-----------+--------------+-------+ \| data_type \| name \| setting_type \| value \| +-----------+-----------+--------------+-------+ \| integer \| mail.port \| config \| 30 \| +-----------+-----------+--------------+-------+ ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 13:51:56 +08:00
Jin Hai	65afaa1292	Model config: add tools (#16371 ) ### What problem does this PR solve? ``` { "name": "glm-4-flash", "max_tokens": 128000, "model_types": [ "chat" ], "tools": { "support": true } } ``` ``` RAGFlow(admin)> list provider 'zhipu-ai' models; +------------+---------------+------------+---------------+----------------+-----------+-----------+ \| dimensions \| max_dimension \| max_tokens \| model_type \| name \| thinking \| tools \| +------------+---------------+------------+---------------+----------------+-----------+-----------+ \| \| \| 204800 \| [chat] \| glm-5 \| supported \| supported \| \| \| \| 204800 \| [chat] \| glm-5-turbo \| supported \| supported \| ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 11:37:51 +08:00
Jack	70250ec88c	Fix: remove deepdoc dep (#16372 ) dev-20260626	2026-06-26 11:32:16 +08:00
Yash Raj Pandey	dd2c88b768	fix(excel_parser): keep zero-valued cells when building Excel text chunks (#16287 )	2026-06-26 09:30:09 +08:00
Jin Hai	58da1d6bc3	Go CLI: fix model related commands (#16368 ) ### What problem does this PR solve? ``` RAGFlow(api/default)> show provider 'zhipu-ai' RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test'; RAGFlow(api/default)> show provider 'zhipu-ai' instance 'test' balance; RAGFlow(api/default)> show provider 'zhipu-ai' model 'glm-4.5'; ``` ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-26 07:07:49 +08:00
Jin Hai	dbefadd86a	Go CLI: refactor (#16355 )	2026-06-25 20:36:50 +08:00
Jack	304d9e02bb	Refactor: migrate pdf_parser.py to golang (#16323 ) ### What problem does this PR solve? Http API based on onnx model. pdf_parser.py to golang ### Type of change - [x] Refactoring	2026-06-25 20:16:16 +08:00
Harsh Kashyap	c7052f4dd1	fix(rag/nlp): treat string input as one phrase in is_english (#16308 )	2026-06-25 20:07:09 +08:00
Wang Qi	5defb4e7d6	Revert "fix(deepdoc): keep zero and false Excel cells in __call__" (#16366 ) Reverts infiniflow/ragflow#16318	2026-06-25 19:56:47 +08:00
Harsh Kashyap	8d3c3f868c	fix(api): validate immutable document fields when value is zero (#16309 )	2026-06-25 19:29:12 +08:00
Harsh Kashyap	66d86154ab	fix(deepdoc): accept GFM table separators with one or more dashes (#16319 )	2026-06-25 19:25:57 +08:00
Hz_	e290a0d23e	feat(go-api): Langfuse API key migration behavior (#16356 ) ## Summary - Align Langfuse API key set/get/delete behavior with the Python implementation. - Improve DAO handling for Langfuse credential save/delete flows. - Add tests for Langfuse service error handling and API key lifecycle behavior.	2026-06-25 19:25:55 +08:00
Yoorim Choi	46b97bd1a1	fix(web): fix layout issues with text, overflow, and spacing consistency (#16324 )	2026-06-25 19:25:32 +08:00
cleanjunc	e8bb534b90	fix: naive_merge splits oversized sections and counts overlap tokens correctly (#15802 )	2026-06-25 19:19:38 +08:00
Harsh Kashyap	0af5d43e8d	fix(deepdoc): keep zero and false Excel cells in __call__ (#16318 )	2026-06-25 19:12:57 +08:00
Haruko386	43b96223b4	feat[go]: add router for connectors/<connector_id> PATCH (#16358 ) ### What problem does this PR solve? As title /api/v1/connectors/<connector_id> PATCH was implemented in #15512 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-06-25 19:07:52 +08:00
Haruko386	74597b8683	feat[Go]: implemet api: Search/Get/Update-Messages (#16307 ) ### What problem does this PR solve? As title: implement: ``` /api/v1/messages/search GET /api/v1/messages GET /api/v1/messages/<memory_id>:<message_id>/content GET /api/v1/memories/<memory_id>/config GET /api/v1/messages/<memory_id>:<message_id> PUT ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-25 19:07:34 +08:00
Harsh Kashyap	49312cace3	fix(api): align use_sql Markdown separator with Source header (#16317 )	2026-06-25 19:00:01 +08:00
balibabu	1dfc24003b	Fix: An empty message notification pops up at the top of the agent conversation. (#16353 )	2026-06-25 17:32:24 +08:00
Wang Qi	31e50b164f	Fix [ID:0] not converted to Fig. 1 (#16357 )	2026-06-25 17:17:46 +08:00
Wang Qi	ac9469e5f5	Fix add VLLM without apikey will fail (#16352 )	2026-06-25 17:17:29 +08:00
Wang Qi	97c519662a	Add env ALLOW_ANY_HOST to skip host check (#16351 )	2026-06-25 17:17:02 +08:00
maoyifeng	6e7aa75e71	Go:CLI add new response function (#16347 ) ### What problem does this PR solve? add new response function ### Type of change - [ ] New Feature (non-breaking change which adds functionality)	2026-06-25 16:49:47 +08:00
Yash Raj Pandey	091417980e	fix(html_parser): preserve original text when splitting oversized blocks (#16052 ) ### Bug `RAGFlowHtmlParser.chunk_block()` splits an oversized block by slicing the tokenized string and storing the joined tokens: ```python tks_str = rag_tokenizer.tokenize(block) ... tokens = tks_str.split(" ") while start < len(tokens): chunks.append(" ".join(tokens[start:start + chunk_token_num])) # tokenized form, not source ``` On the default (Elasticsearch) backend `rag_tokenizer.tokenize` transforms text: it lowercases/stems Latin words and inserts spaces between CJK characters. So any text block longer than `chunk_token_num` is stored as garbled, lowercased, space-segmented text instead of the source content. The small-block branch correctly stores the original `block`, so only oversized blocks are corrupted. Affects HTML and EPUB ingestion (both go through `chunk_block`), degrading retrieved chunks and the answers generated from them. ### Real tokenizer behavior (infinity-sdk 0.7.0, ES backend) ``` tokenize("Hello World FOO Bar Baz Qux Jumps") -> "hello world foo bar baz qux jump" # lowercased + stemmed tokenize("你好世界这是一个测试") -> "你好世界这是一个测试" # spaces inserted ``` ### Fix Split the original text: break it into atoms (whitespace-delimited runs for space-separated scripts, per-character for spaceless scripts such as Chinese) and pack them into pieces of at most `chunk_token_num` tokens. This preserves the source characters and still splits scripts that have no whitespace — a plain whitespace split would leave CJK as one un-splittable chunk. ### Proof (real tokenizer, before/after) Running the old vs new split against the real `infinity.rag_tokenizer`: ``` ENGLISH "Hello World FOO Bar Baz Qux Lazy Dogs" (chunk_token_num=4) OLD: ['hello world foo bar', 'baz qux jump over', 'lazi dog'] # lowercased + stemmed NEW: ['Hello World FOO Bar ', 'Baz Qux Jumps Over ', 'Lazy Dogs'] # preserved; each <= 4 tokens NEW preserves text exactly: True CHINESE "你好世界这是一个测试用例需要被切分成多个块" (chunk_token_num=3) OLD: ['你好世界这是', '一个测试用例需要', ...] # spurious spaces NEW: ['你好世', '界这是', '一个测', ...] # preserved; each <= 3 tokens NEW preserves text exactly: True ``` ### Tests Added `test/unit_test/deepdoc/parser/test_html_parser.py` (English + Chinese oversized blocks, plus small-block merge). Before the fix the two oversized tests fail (English shows lowercasing, Chinese shows inserted spaces); after the fix all pass. `ruff check` clean.	2026-06-25 16:43:35 +08:00
Jin Hai	edfa9be67f	Go CLI: fix list provider instance tasks (#16345 )	2026-06-25 15:49:31 +08:00
balibabu	3f3a2ece3d	Fix: Flexible Chat Configuration (#16293 )	2026-06-25 14:56:30 +08:00
Muhammad Furqan	fe14cc35cf	fix(agent/tools): DeepL component fails validation and drops errors (#16332 ) ### What problem does this PR solve? `DeepLParam.check()` validated `self.top_n`, but DeepL has no such parameter (it is not defined on the param class or its base), so `check()` always raised `AttributeError` and a DeepL component could never pass validation. Removed the bogus `top_n` check. Also fixed the `_run` except branch, which computed `be_output("Error...")` but never returned it, silently dropping the error message. Closes #16329 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Add test cases ### Testing Added `test/unit_test/agent/component/test_deepl.py` covering `DeepLParam.check()` with valid defaults and rejection of invalid source/target languages.	2026-06-25 14:40:56 +08:00

1 2 3 4 5 ...

6991 Commits