ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Jin Hai	20d11648a4	Go: add statistics command (#16119 ) ### What problem does this PR solve? Prepare for enterprise command ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-18 15:21:44 +08:00
Haruko386	351b61a243	Go CLI: add support for windows, linux, macos (#16082 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-06-18 15:20:00 +08:00
jaso0n0818	a70c7e8cc7	fix(deepdoc): attach lone header lines to the following section when delimiter is set (#16109 ) ## Summary Fixes #15487 — lone markdown headers are no longer isolated as empty chunks when a custom `delimiter` is set. - Merge consecutive lone headers before attaching to the following prose body - Skip code fences, tables, lists, and blockquotes via `_is_attachable_body()` - Unit tests include the `# Title / ## Intro / Body` regression from CodeRabbit review ## Validation - `pytest test/unit_test/deepdoc/parser/test_markdown_parser.py` (11 passed locally) Closes #15487	2026-06-18 14:24:09 +08:00
Haruko386	27d723e13a	fix: fix some bugs in check_conn and drop_inst (#16180 ) ### What problem does this PR solve? As title: ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-18 14:19:46 +08:00
balibabu	a9021528c3	Fix: Lint error. (#16172 ) ### What problem does this PR solve? Fix: Lint error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-18 13:14:18 +08:00
buua436	ea70663f09	feat: support wecom websocket channel (#16175 ) Added WeCom chat channel websocket mode alongside the existing webhook mode, plus frontend support for selecting the connection type.	2026-06-18 13:10:09 +08:00
Hz_	69dbc44983	feat(go-api): migrate MCP server detail and download API to Go (#16113 ) ### What problem does this PR solve? - Migrated MCP server detail and export (download) API from Python to Go. - Registered route: `GET /api/v1/mcp/servers/:mcp_id` (supporting `?mode=download` query parameter).	2026-06-18 11:09:22 +08:00
Hz_	f59332bc37	feat(go-api): implement Go-side document PATCH API & align parsing/metadata sync behavior (#15975 ) ### What problem does this PR solve? This PR implements the Go backend counterpart for the document partial update API: `PATCH /api/v1/datasets/:dataset_id/documents/:document_id` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-06-18 11:08:47 +08:00
Idriss Sbaaoui	8ff6a21af9	Fix: cli points to the wrong api endpoints (#16171 ) ### What problem does this PR solve? fix the cli endpoints ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-18 10:54:33 +08:00
xu haiLong	a9ddcae0b3	Fix: MCP dataset discovery fails due to REST API max page size limit … (#16148 ) Fix #16146	2026-06-18 09:39:37 +08:00
Wang Qi	99a25dca34	Fix Chat/Search/Agent bot show image (#16152 ) Fix Chat/Search/Agent bot show image	2026-06-18 09:38:31 +08:00
Hz_	065797b047	Refactor(go-cli): improve variable and label naming in CLI parseAddModel (#16145 ) ### What problem does this PR solve? This PR improves code readability in the CLI parser by renaming the loop index `i` to `modelIndex`. It also renames the loop label `A` to `optionsLoop` to align with standard Go naming conventions. ### Type of change - [x] Refactoring	2026-06-17 20:21:42 +08:00
Wang Qi	27a05be643	Fix the launch script (#16159 ) Fix the launch script	2026-06-17 20:20:37 +08:00
Haruko386	a3e3bdd386	fix back release.yml to old version (#16160 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) v0.26.1	2026-06-17 20:02:42 +08:00
dependabot[bot]	c1c79c2e55	build(deps): bump python-multipart from 0.0.21 to 0.0.31 (#16088 )	2026-06-17 19:39:42 +08:00
Liu An	4379269374	Docs: Update version references to v0.26.1 in READMEs and docs (#16158 ) ### What problem does this PR solve? - Update version tags in README files (including translations) from v0.26.0 to v0.26.1 - Modify Docker image references and documentation to reflect new version - Update version badges and image descriptions - Maintain consistency across all language variants of README files ### Type of change - [x] Documentation Update	2026-06-17 19:35:32 +08:00
Idriss Sbaaoui	7d3928e501	Enhancement: update ci for parallel test execution (#16133 ) ### What problem does this PR solve? split ci into multiple jobs ### Type of change - [x] Performance Improvement	2026-06-17 19:22:24 +08:00
BitToby	2ab9256e8a	fix(go): correct OpenRouter streaming URL routing and reasoning parameter (#16111 ) ### What problem does this PR solve? Fixes two bugs in the OpenRouter streaming chat request builder (`internal/entity/models/openrouter.go`, `ChatStreamlyWithSender`): 1. qwen/glm models streamed to a broken URL. The code routed any `qwen`/`glm` model to `URLSuffix.AsyncChat`, but `conf/models/openrouter.json` defines no `async_chat` suffix (empty), so the request was POSTed to `<base>/` instead of `<base>/chat/completions` — breaking streaming for every qwen/glm model. The non-stream path has no such branch. Fix: all models use the standard `Chat` suffix, consistent with the non-stream path. 2. Streaming reasoning was never enabled. The request set reasoning via a non-standard `thinking` key, which OpenRouter ignores. OpenRouter's API — and this provider's own non-stream request (line ~110) and its streamed `delta.reasoning` parser (line ~311) — use the `reasoning` object. Fix: send `reasoning: {"enabled": <thinking>}` (and `{"effort": ...}` when set, taking precedence as in the non-stream path). Closes #16110 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 19:14:13 +08:00
balibabu	cf7b06c0f3	Fix: A pipeline created from a template fails immediately upon execution with a "hierarchy does not exist" error. (#16151 ) ### What problem does this PR solve? Fix: A pipeline created from a template fails immediately upon execution with a "hierarchy does not exist" error.	2026-06-17 19:07:04 +08:00
Lynn	a5cce29f22	Fix: add mimo (#16136 ) ### What problem does this PR solve? Add chat model factory for Xiaomi model. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 19:02:33 +08:00
writinwaters	cb2e061120	Docs: Updated v0.26.1 release date. (#16154 ) ### What problem does this PR solve? Updated v0.26.1 release date. ### Type of change - [x] Documentation Update	2026-06-17 18:53:06 +08:00
buua436	43d121ad38	feat: add qqbot chat channel (#16140 ) ### What problem does this PR solve? Adds qqbot as a built-in chat channel so it can be discovered and started by the channel bootstrapper and shown in the chat channel settings UI. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-06-17 18:49:38 +08:00
Hunnyboy1217	e178c81bb4	refactor(go-models): harden Ollama ListModels and route through ParseListModel (#15853 ) (#15955 ) ### What problem does this PR solve? Part of #15853 (provider model-list refactor). Refactors Ollama `ListModels` onto the shared `ParseListModel` pattern and fixes two correctness issues: - Endpoint: switch the models suffix from `api/ps` (only currently-running models) to `api/tags` (all installed models) — the latter is what a model picker should show. - Parsing: Ollama returns `{"models":[{"name","model"}]}`, a non-OpenAI shape. Decode it into a typed struct, map the names into `ModelList`, then enrich through `ParseListModel`. This removes the previous unchecked type assertions (`result["models"].([]interface{})` / `.(map[string]interface{})` / `.(string)`) that panicked when the body was missing the `models` array or any field, and adds a fallback to the `model` field when `name` is blank. - Drops the no-op GET request body and a dead base-URL reassignment. #### Drive-by fix Shared gitee_test.go `DSModelList` -> `ModelList` compile fix (renamed in #15900) so the models test package builds; auto-resolves against the sibling #15853 PRs. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-06-17 18:47:27 +08:00
balibabu	70f319c536	Fix: The pipeline created from the template fails immediately upon execution. (#16149 ) ### What problem does this PR solve? Fix: The pipeline created from the template fails immediately upon execution. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 17:03:17 +08:00
chanx	9302233b95	fix: misc frontend fixes for agent log, login, search settings (#16137 ) ### What problem does this PR solve? fix: misc frontend fixes for agent log, login, search settings - agent-log: restore server-side pagination on export and search; replace hardcoded labels with i18n keys; switch container to text-text-primary - login: validate register nickname against NICKNAME_PATTERN with reusable setting i18n - next-search: align llm_setting schema with chat (LlmSettingFieldSchema + LLMIdFormField nested, LlmSettingEnabledSchema at form root) so the slider Switch reads the correct path; strip *Enabled flags before submit to avoid backend "Unrecognized field name" errors - locales: add common.reset (zh/en) - skills/go-naming: fix relative link to rules/named.md ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 16:20:26 +08:00
balibabu	3247e353c7	Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. (#16134 ) ### What problem does this PR solve? Fix: The .docx file is not displaying fully; the hierarchy of the pipeline created from the template is missing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 16:18:47 +08:00
Wang Qi	fcb4f78d97	Dev: add go starter (#16138 ) Dev: add go starter	2026-06-17 16:09:53 +08:00
Wang Qi	e08bcd4d0d	Update doc rerank_id from int to string (#16142 ) Update doc rerank_id from int to string	2026-06-17 16:09:33 +08:00
buua436	be869f5d96	fix: chat channel runtime (#16129 ) ### What problem does this PR solve? Fix chat channel message routing to use the connected `chat_id`, and make the Feishu websocket client bind to the thread-local event loop. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 15:52:13 +08:00
Idriss Sbaaoui	44164e18d8	Enhancement: optimize ci (#16130 ) ### What problem does this PR solve? optimize ci by fixing flaky clean-ups and rendundant tasks ### Type of change - [x] Performance Improvement	2026-06-17 15:16:11 +08:00
Wang Qi	b3ac03b96c	Set default Paddle OCR URL (#16128 ) Set default Paddle OCR URL	2026-06-17 14:29:20 +08:00
buua436	486b28c409	fix: show telegram chat channel (#16125 ) ### What problem does this PR solve? Show Telegram in the chat channel picker alongside the existing Discord and Feishu entries. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 14:18:16 +08:00
buua436	78b4906f7a	fix: tighten embedding truncation threshold (#16123 ) ### What problem does this PR solve? Use a 95% max_length threshold before truncating embedding inputs, which reduces the chance of provider-side invalid-parameter errors on near-limit chunks. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 14:18:02 +08:00
Zhichang Yu	e45659868a	feat(agent): ship the Go agent canvas port — eino interrupt/resume + Redis check-pointing (#16035 ) Replaces the Python agent canvas runtime with a Go implementation that runs inside `cmd/server_main`. The canvas compiles into an eino Workflow that pauses on wait-for-user via native Interrupt/Resume (no sentinel flag) and resumes from a Redis-backed CheckPointStore. All 21 Python agent components and ~35 tools are ported with functional parity. Sandbox providers now read their JSON config from the admin-panel system_settings table with env fallback. 234 files / +35,413 / -6,111. All Go files are gofmt-clean (CI gate added); drops the v2 DSL E2E step and the gap-analysis plan (both redundant after the port ships). ## Type of change - [x] Refactoring - [x] New feature - [x] Bug fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-06-17 13:24:03 +08:00
Wang Qi	2290bb0023	Fix MinerU table option sanitization (#16118 ) Follow on issue: #14831 and PR: #14920 to fix the table options, with table recognition enabled, do not sanitize html tags.	2026-06-17 13:06:07 +08:00
euvre	9bd53ce675	fix: return full record in get_ingestion_log (#16120 ) ### What problem does this PR solve? The `get_ingestion_log` endpoint (both Python `dataset_api_service.get_ingestion_log` and Go `DatasetService.GetIngestionLog`) was returning only the dataset-level field set, which omits critical fields such as `dsl`, `document_id`, `parser_id`, `document_name`, `pipeline_id`, etc. This caused the front-end dataflow-result page to be unable to render the pipeline timeline and chunks when viewing a single ingestion log, regardless of whether the log was a dataset-level operation (graph/raptor/mindmap) or a per-file parse. ### Background `PipelineOperationLogService` provides two field sets: \| Method \| Fields \| \|---\|---\| \| `get_dataset_logs_fields` \| Minimal set (progress, status, timestamps, etc.) \| \| `get_file_logs_fields` \| Superset — includes `document_id`, `dsl`, `parser_id`, `document_name`, `pipeline_id`, … \| When listing logs, the API correctly distinguishes dataset-level vs file-level logs and uses the appropriate converter. However, when fetching a single log by ID, both the Python and Go implementations were hardcoded to the dataset-level set, dropping the extra fields that the front-end needs.	2026-06-17 13:03:51 +08:00
Hunnyboy1217	fd196f694e	feat(go-models): harden ListModels for FishAudio (#15853 ) (#15957 ) ### What problem does this PR solve? Part of #15853 (provider model-list refactor). Final two providers. - voyage: Voyage AI exposes no live model-list endpoint — its public API only has `/v1/embeddings` and `/v1/rerank` — so the previous `ListModels` was a `no such method` stub. Replace it with a static-catalog listing sourced from the loaded provider definition, carrying each model's `max_tokens`, `model_types`, and embedding `dimensions`. `list models from voyage` now returns the 13-model catalog instead of erroring. - fishaudio: route the existing `/model` voice listing through the shared `ParseListModel` helper for consistency; keep the human-readable `title` as the model name and fall back to `_id` when a title is blank. #### Drive-by fix Shared gitee_test.go `DSModelList` -> `ModelList` compile fix (renamed in #15900); auto-resolves against the sibling #15853 PRs. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring Co-authored-by: Haruko386 <tryeverypossible@163.com>	2026-06-17 11:56:20 +08:00
writinwaters	0aaba0033f	Docs: Updated Converse with chat assistant (#16117 ) ### What problem does this PR solve? Miscellaneous editorial updates to the API reference. ### Type of change - [x] Documentation Update	2026-06-17 11:50:14 +08:00
Wang Qi	02ccd35241	Fix RAGFlow cannot start (#16116 ) # Summary - The culprit is commit `b4c8711d5` / PR #15415 (fix: upgrade crawl4ai to 0.8.0). - That upgrade brought in unclecode-litellm, which installs the same top-level litellm namespace as upstream litellm. - The crash happens when files from one LiteLLM distribution are mixed with files from the other: custom_guardrail.py expects GuardrailTracingDetail, but types/utils.py can come from the older conflicting package.	2026-06-17 11:27:31 +08:00
Hz_	b48f03d0f5	feat(go/dao): migrate chat channel database entity and DAO to Go (#16055 ) ## Changes 1. Entity (`internal/entity/chat_channel.go`): - Implemented `ChatChannel` struct mapping the `chat_channel` database table. - Declared `ChatChannelListResponse` as a DTO to filter out sensitive credentials (`config` field) and fetch the associated `dialog_name` via left join. 2. GORM Migration (`internal/dao/database.go`): - Registered `&entity.ChatChannel{}` in the `dataModels` array inside `InitDB()` to enable safe GORM schema synchronization. 3. DAO (`internal/dao/chat_channel.go`): - Implemented `ChatChannelDAO` wrapping GORM CRUD methods (`Create`, `GetByID`, `UpdateByID`, `DeleteByID`). - Implemented `ListByTenantID` performing a `LEFT JOIN` on the `dialog` table to retrieve `dialog_name` while excluding `config` values to avoid credential leaks. 4. Test (`internal/dao/chat_channel_test.go`): - Added integration unit tests testing the full CRUD lifecycle and GORM left-join mapping list querying.	2026-06-17 11:26:13 +08:00
balibabu	5de00bdf50	Fix: Importing the MCP dialog causes duplicate submissions. (#16037 ) ### What problem does this PR solve? Fix: Importing the MCP dialog causes duplicate submissions. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-17 09:49:51 +08:00
euvre	fe46244d30	fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#16106 ) The parser pods suffer from OOM kills when processing large PDF documents. The root cause is in api/db/services/task_service.py: when layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be processed as a single task with all pages loaded into memory simultaneously. This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the same way DeepDOC already does.	2026-06-17 09:33:53 +08:00
Jin Hai	6865039a22	Go: add more start server parameters (#16093 ) ### What problem does this PR solve? ``` $ ./bin/ragflow_server --version RAGFlow version: v0.26.0-65-g549f6109c $ ./bin/ragflow_server --debug # start server with debug log level $ ./bin/admin_server --version RAGFlow version: v0.26.0-65-g549f6109c $ ./bin/admin_server --debug # start server with debug log level $ ./bin/admin_server --init-superuser # init default superuser $ ./bin/ingestor --version RAGFlow version: v0.26.0-68-g6f6c39706 $ ./bin/ingestor --debug ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-16 20:27:37 +08:00
Wang Qi	17e3aad7ae	Revert "fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM" (#16104 ) Reverts infiniflow/ragflow#15951	2026-06-16 20:11:45 +08:00
buua436	1e4796da9d	Docs: update chat completions docs (#16100 ) ### What problem does this PR solve? Syncs the /api/v1/chat/completions docs with the current behavior, including the new legacy streaming mode. ### Type of change - [x] Documentation Update	2026-06-16 20:08:23 +08:00
dependabot[bot]	b732636546	build(deps): bump aiohttp from 3.13.3 to 3.14.1 (#16090 )	2026-06-16 20:07:32 +08:00
euvre	d2a18d5c46	fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#15951 ) ### What problem does this PR solve? The parser pods suffer from OOM kills when processing large PDF documents. The root cause is in api/db/services/task_service.py: when layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be processed as a single task with all pages loaded into memory simultaneously. This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the same way DeepDOC already does. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [x] Performance Improvement - [ ] Other (please describe):	2026-06-16 20:07:19 +08:00
Rander	62698725ca	feat(paddleocr): add image parsing support with async Job API (#16086 ) ## Summary Add image parsing capability to PaddleOCR integration, building on top of #15967 (async Job API migration). ## Changes ### `deepdoc/parser/paddleocr_parser.py` - Add `parse_image()` method that uses the same async Job API flow as `parse_pdf()` - Extracts text from `layoutParsingResults` → `prunedResult` → `parsing_res_list` - Returns concatenated block content as a single string ### `rag/llm/ocr_model.py` - Add `parse_image()` wrapper to `PaddleOCROcrModel` with availability check and logging ## Relationship to other PRs - Depends on: #15967 (async Job API migration) — this PR is based on that branch - Replaces: #14826 (original image processing PR based on old sync API) ## Notes This PR uses `base_url` and the async Job API (submit → poll → fetch) consistent with #15967, rather than the old `api_url` + sync POST pattern from #14826.	2026-06-16 19:34:38 +08:00
Rander	1235da7093	refactor(paddleocr): migrate from sync API to async Job API (#15967 ) ## Summary Migrate PaddleOCR integration from the deprecated synchronous HTTP API to the new asynchronous Job API (`submit → poll → fetch`), aligning with PaddleOCR 3.6.0+ architecture. ## Changes ### Python (`deepdoc/parser/paddleocr_parser.py`) - Replace synchronous `requests.post()` with async Job API flow (submit → poll → fetch) - Authentication: `token {token}` → `Bearer {token}` - File transfer: base64 JSON body → multipart file upload - Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout controlled by `request_timeout`) - Result: fetch full JSONL from result URL, preserving `prunedResult` with bbox info for crop functionality - Rename `api_url` → `base_url` (backward compatible: `api_url` still accepted as fallback) ### Python (`rag/llm/ocr_model.py`) - Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to `paddleocr_api_url` / `PADDLEOCR_API_URL` ### Go (`internal/entity/models/paddleocr.go`) - Add `Client-Platform: ragflow` header to submit and poll requests - Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5, max 15s) ### Python (`common/constants.py`) - Add `PADDLEOCR_BASE_URL` to env keys and default config ## Backward Compatibility - Old env var `PADDLEOCR_API_URL` still works (used as fallback) - Frontend field `paddleocr_api_url` still works (backend reads it as fallback) - No user-facing configuration changes required for existing setups ## Why not use the `paddleocr` SDK package directly? RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing `block_bbox`, `block_label`, `parsing_res_list`) from the raw API response for PDF crop functionality. The SDK's public `parse_document()` API only returns `DocParsingResult` with `markdown_text`, discarding the bbox data. Therefore we implement the async Job API flow directly via HTTP, following the same logic as the SDK internally.	2026-06-16 19:34:21 +08:00
Jin Hai	3d8bc76e27	Go refactor: merge similar functions (#16098 ) ### What problem does this PR solve? Merge password related functions ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-16 19:26:42 +08:00

1 2 3 4 5 ...

6841 Commits