ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-02 16:55:42 +08:00

Author	SHA1	Message	Date
Julian	33ef724b5f	Add Bulk action for linking Multiple Files to Datasets (#14960 ) ### What problem does this PR solve? Feature: #14961 ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-06-02 12:23:33 +08:00
balibabu	f194e8b4c4	Fix: The newly added model did not appear in the drop-down menu. (#15476 ) ### What problem does this PR solve? Fix: The newly added model did not appear in the drop-down menu. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-01 17:56:41 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
balibabu	187dc8a1e6	Fix: The Creativity parameter of chat was not saved. (#15243 ) ### What problem does this PR solve? Fix: The Creativity parameter of chat was not saved. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-27 11:02:30 +08:00
chanx	bce11527c3	Fix: Fixed metadata issue (#15226 ) ### What problem does this PR solve? Fix: Fixed metadata issue - The dataset's built-in metadata is now active, but it appears to be disabled in the individual file configuration. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-26 13:16:15 +08:00
balibabu	c7c75c0a87	Feat: Enable agent messages to display base64 images (#15212 ) ### What problem does this PR solve? Feat: Enable agent messages to display base64 images ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-25 19:02:03 +08:00
balibabu	0f92353bd9	Fix: Replace the red highlight at the top of the PDF document with yellow. (#15203 ) ### What problem does this PR solve? Fix: Replace the red highlight at the top of the PDF document with yellow. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-25 17:21:36 +08:00
Ahmad Intisar	e6068a7f7e	Fix: table parser metadata (#15127 ) ### What problem does this PR solve? This PR improves the table upload flow for CSV/Excel files by allowing table column role configuration at upload time. Previously, users had to: 1. Upload and parse a table file. 2. Open parser settings and manually set table column roles. 3. Re-parse the file for the roles to take effect. This was inefficient and required an unnecessary second parse. With this change: 1. When the knowledge base uses table parsing, the upload dialog extracts CSV/Excel headers client-side. 2. Users can choose Auto mode or Manual mode. 3. In Manual mode, users can assign per-column roles before upload. 4. The selected parser config is sent with the upload request and applied server-side during document creation. Result: configured table column roles are applied from the first parse. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-05-25 16:05:38 +08:00
buua436	71a52d579c	fix: move agent attachment download api (#15146 ) ### What problem does this PR solve? move agent attachment download api to the correct route and update frontend callers ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Notes - Move the attachment download endpoint from document routes to agent routes. - Update frontend download callers to use the agent attachment endpoint. - Reuse the shared file response header helper instead of duplicating it in `agent_api.py`.	2026-05-22 15:22:05 +08:00
balibabu	1ed8a118cf	Fix: The folder tree menu for moving folders cannot be scrolled. (#15037 ) ### What problem does this PR solve? Fix: The folder tree menu for moving folders cannot be scrolled. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-20 14:59:36 +08:00
Magicbook1108	b69a6a5d80	Feat: full optimization on connector dashboard (#14979 ) ### What problem does this PR solve? This PR improves the connector dashboard task management experience and adds better visibility into connector execution logs. ### Overview: #### Before <img width="700" alt="image" src="https://github.com/user-attachments/assets/e4a8ed6f-2e18-4f0f-8528-41a514550052" /> #### Now: <img width="700" alt="Screenshot from 2026-05-18 16-31-30" src="https://github.com/user-attachments/assets/d4ca193b-847a-49ae-9e4f-5fbca60ea627" /> ### 1. Add a new logging page to the connector dashboard A new logging page has been added so users can view connector task execution logs directly from the connector dashboard. ### 2. Merge the Resume button into Confirm The separate Resume button has been removed. The Confirm button now represents different actions depending on the current task state: - Save: Save form changes and reschedule tasks. - Stop: Cancel currently scheduled or running tasks. - Resume: Create new scheduled tasks after the previous tasks have been stopped. - Start: Start tasks when no task has been started yet. ### 3. Separate syncing and pruning tasks Connector tasks are now separated into syncing and pruning. Pruning is controlled by the Sync deleted files option: - When Sync deleted files is disabled, only syncing tasks are shown. - When Sync deleted files is enabled, both syncing and pruning tasks are shown. Now: Sync deleted files disabled <img width="700" alt="Sync deleted files disabled" src="https://github.com/user-attachments/assets/dbd9232e-614a-407f-a0b1-c109e5fa567d" /> Now: Sync deleted files enabled <img width="700" alt="Sync deleted files enabled" src="https://github.com/user-attachments/assets/1f527f48-ccb3-4ee8-97ca-086891489296" /> ### 4. Update logs in backend <img width="700" alt="image" src="https://github.com/user-attachments/assets/10a95a3f-98c1-4e67-8afa-ddf6cda5b0b2" /> ### 5. Remove connector resume API - Removed: `POST /v1/connectors/<connector_id>/resume` - Replaced by: `PATCH /v1/connectors/<connector_id>` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-19 10:07:11 +08:00
Wang Qi	13b422037f	Refactor: enhance graphrag - part 2 (#14972 ) ### What problem does this PR solve? 1. expose batch_chunk_token_size for configuration 2. retrieve chunks when build subgraph for the doc, not retreive all docs chunks at the begining 3. get all chunks for a document, used to be hard coded 10000 4. delete not used method run_graphrag ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring Follow on: #14617	2026-05-18 16:10:21 +08:00
小熊	09d45046e5	Feat/web markdown UI updates (#14214 ) ### What problem does this PR solve? LLM/chat and search UIs render Markdown in several places (document preview, floating chat widget, next-search, etc.). Plugin lists and behavior were duplicated or inconsistent, and single newlines in model output were not always rendered as visible line breaks, which hurts readability for chat-style content. This PR centralizes shared remark/rehype configuration (including `remark-breaks` for newline handling) and wires the main Markdown surfaces to use it, so behavior is consistent and easier to maintain. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-05-15 22:29:44 +08:00
yingjianzh	4c68a6b86c	fix(agent): pass top_k and fix similarity weight slider behavior (#14760 ) ### What problem does this PR solve? This PR fixes two issues in Agent Retrieval behavior and configuration UX: 1. `top_k` configured in Agent Retrieval was not passed down to the backend retriever call, so retrieval could ignore the configured vector recall limit. 2. Similarity weight slider semantics were confusing in Agent forms because the Agent field stores `keywords_similarity_weight` while UI interactions were interpreted as vector weight. This could cause displayed values and actual behavior to diverge. This PR ensures Agent retrieval uses configured `top_k`, and makes the slider behavior consistent and explicit for both vector and keyword weight modes. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-15 10:49:14 +08:00
balibabu	41072ed44d	Feat: This enables SelectWithSearch to search by label. (#14925 ) ### What problem does this PR solve? Feat: This enables SelectWithSearch to search by label. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: balibabu <assassin_cike@163.com>	2026-05-14 20:33:11 +08:00
plind	dd76653dc1	feat: add tag management for Agents with filtering and sorting (#14774 ) (#14799 ) ## Summary Closes #14774. Adds free-form tags on agents (UserCanvas) with full UI + API: - Stored as comma-separated `tags` column on `UserCanvas` with online migration. - New endpoints: `GET /v1/agents/tags` (aggregate counts) and `PUT /v1/agent/<id>/tags` (write). `GET /v1/agents` accepts a `tags=` query. - "Edit tags" item in agent dropdown opens a chip-style editor dialog; tags render as badges on each agent card. - New "Tags" facet in the agents filter bar, with counts. ## Implementation notes - Tag matching is exact-token: the SQL filter wraps stored tags as `,…,` and matches `,ml,` so `ml` doesn't match `ml-ops`. - Server-side normalization in `UserCanvasService.update_tags`: dedup (case-insensitive), per-tag cap of 64 chars, total length capped at 512 chars to fit the column, commas inside tag values are replaced with spaces. - Tenant authorization: `PUT /v1/agent/<id>/tags` gates on `UserCanvasService.accessible(canvas_id, tenant_id)`. - Tag listing scope: `UserCanvasService.list_tags` follows the same own + team-shared rule as `get_by_tenant_ids`. - i18n: keys added to `en.ts` and `zh.ts` only (per project convention; other locales fall back). - `HomeCard` gets a non-breaking `extra?: ReactNode` slot for the chip row; no `src/components/ui/` files modified. ## Test plan - [ ] Backend boot runs `migrate_db` → confirm `user_canvas.tags` column exists (`DESCRIBE user_canvas`). - [ ] Agents page renders cards normally (no console error from missing field). - [ ] `⋯ → Edit tags` opens a dialog that stays open (regression: dialog was unmounting with the dropdown). - [ ] Typing a tag without pressing Enter and clicking Save persists it (regression: last typed tag was being dropped). - [ ] Chip input supports Enter/comma to commit, Backspace on empty to remove, `×` to remove individual chip. - [ ] Tag containing a comma sent via API is stored with the comma replaced by a space. - [ ] 20 long tags sent via API does not error (length cap silently truncates). - [ ] "Tags" filter in the filter bar shows counts and narrows the list. - [ ] Filtering by `ml` does not return agents tagged `ml-ops`. - [ ] UI in Chinese shows 编辑标签 / 添加标签以整理和筛选你的智能体 etc. - [ ] `PUT /v1/agent/<other-tenant-id>/tags` returns `Agent not found or no permission.`	2026-05-13 21:41:32 +08:00
47NoahThompson	9e0f976729	Add widget customization and persistence (#14603 ) Introduce comprehensive floating widget customization: add new widget settings (title, subtitle, footer, colors, mute, streaming) with types and defaults, and expose them via EmbedDialog UI (split into Embed Setup and Widget Customization tabs). Persist and load settings through Agent page by reading/writing globals and wiring an onSaveWidgetSettings handler to setAgent; show a loading ButtonLoading for saving. Update embed iframe query params and FloatingChatWidget to honor URL params (colors, text, mute/streaming) with validation/normalization, color darkening for gradients, footer link normalization, and improved styling. Also add copy-to-clipboard in message toolbar, adjust syntax highlighter layout and Copy button, and add i18n key for muteWidget. ### What problem does this PR solve? Adds a few fields to the embed widget modal to customize the appearance of the floating widget when embedded into a page. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Noah <Noah.Thompson@ecn.forces.gc.ca>	2026-05-13 21:13:11 +08:00
Ahmad Intisar	e994051eb9	Feature/generic api connector (#13545 ) # feat: Add Generic REST API Connector ## What problem does this PR solve? RAGFlow supports many specific data source connectors (MySQL, Slack, Google Drive, etc.), but there was no way to connect an arbitrary REST API as a data source. Users with custom or third-party APIs had to write a new connector class for each one. This PR adds a generic, configuration-driven REST API connector that lets users connect any REST API as a data source entirely through the UI — no code changes needed per API. --- ## Features ### Core Connector (`common/data_source/rest_api_connector.py`) - Implements `LoadConnector` and `PollConnector` interfaces for full and incremental sync - Configurable authentication: None, API Key (custom header), Bearer Token, Basic Auth - Pluggable pagination: Page-based, Offset-based, Cursor-based, or None - Smart page-size inference from user's query parameters to avoid duplicate/conflicting params - Configurable request delay between pages to prevent API rate limiting - Auto-detection of the items array in JSON responses (`items`, `results`, `data`, `records`, or first list found) - Advanced field mapping with dot-notation (`country.name`), array wildcards (`newsType[].name`), type hints, and default values - Optional content template rendering (`"Title: {title}\nBody: {body}"`) - HTML stripping for content fields - Stable document IDs via `hash128` from a configurable ID field or auto-generated from item content - Pydantic configuration schema with automatic coercion of UI string inputs to dicts/lists ### Backend Registration (`rag/svr/sync_data_source.py`, `common/constants.py`, `common/data_source/config.py`) - `REST_API` sync class wired into RAGFlow's `func_factory` - Full sync (`load_from_state`) and incremental polling (`poll_source`) support - Credentials and config passed from task to connector following existing patterns (MySQL, SeaFile, etc.) ### Test Connection Endpoint (`api/apps/connector_app.py`) - `POST /v1/connector/<id>/test` validates config schema, authentication, and API connectivity without triggering a sync - Clear error messages for auth failures vs. config issues ### Frontend UI (`web/src/pages/user-setting/data-source/constant/`) - Postman-style configuration:* Base URL, Query Parameters (key=value per line), Auth, Content Fields, Metadata Fields, Pagination Type - Auth-type-aware form: fields for API key header/value, Bearer token, or Basic username/password appear only when relevant - Advanced Settings toggle for: Custom Headers, Max Pages, Request Delay, Poll Timestamp Field, Request Body (POST) - Connector icon (SVG) and i18n strings (English) - "Test Connection" button to validate before syncing --- ## Controls & Safety - Configurable max pages safety cap (default: 1000, adjustable in UI) - Configurable request delay between pages (default: 0.5s, adjustable in UI) - Auth errors (401/403) fail immediately without retries; transient errors retry with exponential backoff - Diagnostic logging: auth setup confirmation, request details on failure, content field extraction status --- ## Type of change - [x] New Feature (non-breaking change which adds functionality) ##Visual Screenshots of Features <img width="482" height="510" alt="Screenshot 2026-03-11 at 5 19 52 PM" src="https://github.com/user-attachments/assets/dcb7ab4a-1622-44f3-bb02-d6f0527314c4" /> (Connector can be configured within the external data sources tab) Configuration Parameters: <img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 46 PM" src="https://github.com/user-attachments/assets/5e154e71-4ab5-4872-bfb2-04f02b73c18a" /> <img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 54 PM" src="https://github.com/user-attachments/assets/00cb14b7-0bcf-4b94-9d71-34e93369ecb2" /> Connection can be tested before attaching to dataset: <img width="981" height="681" alt="Screenshot 2026-03-11 at 5 21 40 PM" src="https://github.com/user-attachments/assets/aaa6eeeb-89a7-4349-bc34-2423bf8be9ee" /> Ingestion tested with API connector (works perfectly fine): <img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 22 30 PM" src="https://github.com/user-attachments/assets/afcd0d58-cadd-4152-badc-d2f14d96fbec" /> Search & Retrieval works as well with metadata flow: <img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 23 05 PM" src="https://github.com/user-attachments/assets/d41ee935-dcf7-4456-b317-22a76ca032c0" /> --------- Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-13 20:35:01 +08:00
Wang Qi	76d5240fb5	Fix #14801 to allow search dataset list when add (#14841 ) ### What problem does this PR solve? Fix #14801 to allow search dataset list when add, following on #14825 <img width="2172" height="857" alt="image" src="https://github.com/user-attachments/assets/65ea7647-56f4-4c16-8437-121b834811f0" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-12 19:36:23 +08:00
CaptainTimon	2717ee283f	feat(raptor): add Psi tree builder with original-space ranking and safe migration (#14679 ) ### What problem does this PR solve? Closes #14674. This PR improves RAPTOR configuration and tree construction while preserving the existing RAPTOR behavior as the default. RAPTOR currently builds summary layers with the original UMAP + GMM clustering path. This PR keeps that default path, and adds: - A hidden backend tree-builder option: - `tree_builder="raptor"`: default, existing RAPTOR behavior. - `tree_builder="psi"`: rank-aware Psi-style tree builder using original embedding-space cosine ranking. - A user-facing clustering method option for the default RAPTOR builder: - `clustering_method="gmm"`: existing default. - `clustering_method="ahc"`: agglomerative hierarchical clustering path. - A RAPTOR UI setting for `Clustering method` and `Max cluster`. ### What changed #### Backend - Added `tree_builder` support for RAPTOR/Psi. - Added `clustering_method` support for GMM/AHC. - Kept existing RAPTOR + GMM as the default. - Added Psi tree building from original-space cosine similarity. - Added bucketed Psi building controls for large inputs: - `raptor.ext.psi_exact_max_leaves` - `raptor.ext.psi_bucket_size` - Added method-aware RAPTOR summary metadata using existing `extra.raptor_method`. - Avoided adding a dedicated DB schema field for experimental method tracking. - Added cleanup/migration logic to avoid mixing stale RAPTOR summary trees. - Added defensive checks for Psi tree construction and summary failures. #### Frontend/UI - Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`. - Added/kept `Max cluster` in RAPTOR settings. - Enlarged max cluster UI limit to `1024`, matching backend validation. - Kept AHC editable even when a RAPTOR task has already finished. - Fixed the UI save payload so `clustering_method` and `tree_builder` are serialized through `parser_config.raptor.ext`, avoiding backend validation errors for extra top-level RAPTOR fields. Example saved RAPTOR config: ```json { "raptor": { "max_cluster": 317, "ext": { "clustering_method": "ahc", "tree_builder": "raptor" } } } Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>	2026-05-12 09:42:31 +08:00
Nie WeiYang	1e80be77a2	fix(web): fix incomplete Docx preview in citation reference (#14122 ) This PR fixes a UI issue where the .docx document preview was displayed incompletely when clicking on a citation/reference link during a knowledge base conversation. ### What problem does this PR solve? The Issue: In the chat interface, when a user clicks the source citation at the end of an answer, the DocPreviewer opens. However, for .docx files, if the content exceeded the window height, it was truncated and unscrollable, preventing users from reading the full referenced text. Changes: web/src/components/document-preview/doc-preview.tsx: Added the overflow-auto Tailwind class to the DocPreviewer root container to ensure scrollbars appear automatically when content overflows. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: nie.weiyang <nie.weiyang@embedway.com>	2026-05-11 16:17:48 +08:00
Wang Qi	3838770e7a	GraphRAG feature - Part 1 - add spacy to extract entity and relation (#14670 ) ### What problem does this PR solve? GraphRAG feature - Part 1 - add spacy to extract entity and relation <img width="1621" height="1288" alt="image" src="https://github.com/user-attachments/assets/aadeddad-94da-46c6-adad-9c3784181f61" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 12:59:59 +08:00
很拉风的James	6cb4bc2947	Fix: Radio.Group cloneElement crashes on non-element children (#14407 ) ### What problem does this PR solve? `Radio.Group` in `web/src/components/ui/radio.tsx` injects the parent's `disabled` prop into each child via `React.cloneElement` with `as React.ReactElement` and no validation. This throws at runtime when a consumer passes strings, numbers, `null`, `false`, or other non-element nodes, while the cast hides the unsafe access from TypeScript. Use `React.isValidElement<RadioProps>(child)` as a type guard before calling `cloneElement`. Non-element children pass through unchanged, and `child.props` access becomes type-checked without an `as` cast. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 09:54:42 +08:00
chanx	8ac14b597f	Fix: Some bugs (#14734 ) ### What problem does this PR solve? Fix: Some bugs - Error during batch modification of metadata in the Knowledge Base - Manually configured metadata is not displayed in search settings ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-09 17:40:22 +08:00
buua436	de2abe9ed8	Fix: tag parser id (#14724 ) ### What problem does this PR solve? tag parser id ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-09 14:29:09 +08:00
Tim Wang	1bcb6deb6f	Fix: collapsible thinking display and separate deep research retrieval tag (#14613 ) ## Summary - Collapsible thinking: Replace `<section>` with `<details>` for `<think>` content, so model thinking output is collapsed by default (click to expand). Works for all models that output `<think>` tags (Qwen3, DeepSeek, Gemini, Claude, etc.). - Fix double thinking tags: When reasoning/deep research mode is enabled in knowledge base chat, both the retrieval progress and model thinking were wrapped in `<think>` tags, producing two "Thinking..." blocks. Now retrieval progress uses a dedicated `<retrieving>` tag rendered as a separate "Retrieving..." collapsible with a distinct green accent. ### Before - Thinking content displayed as flat gray-bordered `<section>`, occupying significant screen space - Deep research + model thinking both use `<think>` → two identical "Thinking..." blocks ### After - Thinking content collapsed by default in a `<details>` element, click "Thinking..." to expand - Deep research shows "Retrieving..." (green border), model thinking shows "Thinking..." (gray border) ## Changes Backend (`api/db/services/dialog_service.py`) - Deep research callback: replace `start_to_think`/`end_to_think` marker flags with direct `<retrieving>`/`</retrieving>` answer text Frontend - `web/src/utils/chat.ts`: `replaceThinkToSection()` now uses `<details>` instead of `<section>`; add new `replaceRetrievingToSection()` - 4 tsx files: import and pipe `replaceRetrievingToSection`, whitelist `details`, `summary`, `retrieving` in DOMPurify `ADD_TAGS` - 4 less files: `section.think` → `details.think` with `<summary>` styles; add `details.retrieving` with green accent; dark mode and RTL variants ## Test plan - [ ] Open a chat WITHOUT knowledge base, ask a question to a model with thinking (e.g. Qwen3) → thinking content should be collapsed by default, click "Thinking..." to expand - [ ] Open a chat WITH knowledge base and reasoning enabled, ask a question → "Retrieving..." (green) shows retrieval progress, "Thinking..." (gray) shows model thinking, each independently collapsible - [ ] Verify dark mode renders correctly for both collapsible blocks - [ ] Verify RTL layout renders correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: wanghualoong <wanghualoong@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-08 14:40:00 +08:00
buua436	f703169117	Refa: migrate document preview/download to RESTful API (#14633 ) ### What problem does this PR solve? migrate document preview/download to RESTful API ### Type of change - [x] Refactoring	2026-05-08 13:26:13 +08:00
Attili-sys	24af0875e5	Feat/configurable metadata display (#13464 ) ### What problem does this PR solve? Currently, RAGFlow's Search and Chat interfaces display only raw vectorized text chunks during retrieval, without contextual information about their source documents. Users cannot see document titles, page numbers, upload dates, or custom metadata fields that would help them understand and trust the retrieved results. This PR introduces an optional metadata display feature that enriches retrieved chunks with document-level metadata in both the Search tab and Chatbot interface. Key improvements: - Search results: Display document metadata as styled badges beneath chunk snippets - Chat citations: Show metadata in citation popovers and reference lists for better source context - LLM context: Metadata is injected into the LLM prompt to enable more accurate, citation-aware responses - External API support: Applications using RAGFlow's SDK retrieval endpoints (`/v1/retrieval`, `/v1/searchbots/retrieval_test`) can opt-in via request parameters - User control: Multi-select dropdown UI allows users to choose which metadata fields to display Implementation approach: - ✅ Reuses existing `DocMetadataService` infrastructure (no new database tables or indices) - ✅ Settings stored in existing JSON configuration fields (`search_config.reference_metadata`, `prompt_config.reference_metadata`) - ✅ No database migrations required - ✅ Disabled by default (fully opt-in and backward-compatible) - ✅ Dynamic metadata field selection populated from actual document metadata keys - ✅ Fixed critical bug where Python's builtin `set()` was shadowed by a route handler function Modified endpoints (all backward-compatible): - `POST /v1/retrieval` (Public SDK) - `POST /v1/searchbots/retrieval_test` (Searchbots) - `POST /v1/chunk/retrieval_test` (UI/Internal) - Chat completions endpoints (via `extra_body.reference_metadata` or `prompt_config`) ### Type of change - [x] New Feature (non-breaking change which adds functionality) ###Images - <img width="879" height="1275" alt="image" src="https://github.com/user-attachments/assets/95b2d731-31ae-45a1-b081-bf5893f52aeb" /> <br><br> <br><br> <img width="1532" height="362" alt="image" src="https://github.com/user-attachments/assets/9cebc65b-b7a7-459f-b25e-3b13fa9b638e" /> <br><br> <br><br> <img width="2586" height="1320" alt="image" src="https://github.com/user-attachments/assets/2153d493-d899-461f-a7a9-041391e07776" /> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Attili-sys <Attili-sys@users.noreply.github.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-04-30 23:13:27 +08:00
Yingfeng	4ee0702aed	Feat: add skills space to context engine (#13908 ) ### What problem does this PR solve? issue #13714 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-30 12:36:03 +08:00
balibabu	a736948493	Fix: Clicking the button in the bottom-right corner of the `/chats/widget` page fails to display the dialog box. (#14465 ) ### What problem does this PR solve? Fix: Clicking the button in the bottom-right corner of the `/chats/widget` page fails to display the dialog box. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 17:03:33 +08:00
balibabu	ce933357c6	Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. (#14459 ) ### What problem does this PR solve? Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-04-29 14:37:48 +08:00
Jack	2d522ccb36	Fix: thumbnails issue in chat (#14415 ) [Uploading part_4-13.pdf…]() ### What problem does this PR solve? In chat, the thumbnails didn't display correctly ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) Steps to reproduce: 1. create dataset and upload a file (see attached) 2. parse the document 3. once parsing completed, create a chat and associate it with the dataset 4. ask a question (DAP VS DAPE comparison) 5. check result	2026-04-28 11:39:29 +08:00
Jack	290f0294d6	Refactor: migrate artifact API (#14348 ) ### What problem does this PR solve? Before migration: GET /v1/document/artifact/<filename> After migration: GET /api/v1/documents/artifact/<filename> ### Type of change - [x] Refactoring	2026-04-27 15:19:41 +08:00
wdeveloper16	78188ce9e9	Feat: add OpenDataLoader PDF parser backend (#14058 ) (#14097 ) ### What problem does this PR solve? Closes #14058. RAGFlow supports multiple PDF parsing backends (DeepDOC, MinerU, Docling, TCADP, PaddleOCR). This PR adds OpenDataLoader ([opendataloader-project/opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf)) as a new optional backend, giving users a deterministic, local-first alternative with competitive table extraction accuracy. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --- ### Changes #### Backend - `deepdoc/parser/opendataloader_parser.py` — new `OpenDataLoaderParser` class inheriting `RAGFlowPdfParser`. Implements `check_installation()` (guards Python package + Java 11+ runtime), `parse_pdf()` with JSON-first extraction (heading/paragraph/table/list/image/formula) and Markdown fallback, position-tag generation compatible with the shared `@@page\tx0\tx1\ty0\ty1##` format, and temp-dir lifecycle with cleanup. - `rag/app/naive.py` — new `by_opendataloader()` wrapper, registered in `PARSERS` dict, added to `chunk_token_num=0` override list. - `rag/flow/parser/parser.py` — `"opendataloader"` branch in the pipeline PDF handler + check validation list. #### Infrastructure - `docker/entrypoint.sh` — `ensure_opendataloader()` function: opt-in via `USE_OPENDATALOADER=true`, skips gracefully if Java is not on PATH. #### Frontend - `web/src/components/layout-recognize-form-field.tsx` — `OpenDataLoader` added to `ParseDocumentType` enum and parser dropdown. Cascades automatically to the pipeline editor's Parser component. #### Docs - `docs/guides/dataset/select_pdf_parser.md` — added OpenDataLoader entry and full env-var reference. --- ### Environment variables \| Variable \| Default \| Description \| \|---\|---\|---\| \| `USE_OPENDATALOADER` \| `false` \| Set `true` to install `opendataloader-pdf` on container startup \| \| `OPENDATALOADER_VERSION` \| latest \| Pin the PyPI release (e.g. `==2.2.1`) \| \| `OPENDATALOADER_HYBRID` \| _(unset)_ \| Enable hybrid AI mode (e.g. `docling-fast`) \| \| `OPENDATALOADER_IMAGE_OUTPUT` \| _(unset)_ \| `off` / `embedded` / `external` \| \| `OPENDATALOADER_OUTPUT_DIR` \| _(tmp)_ \| Persistent output dir; temp dir used + cleaned if unset \| \| `OPENDATALOADER_DELETE_OUTPUT` \| `1` \| `0` to retain intermediate files for debugging \| \| `OPENDATALOADER_SANITIZE` \| _(unset)_ \| `1` to filter prompt-injection patterns from output \| --- ### Dependencies - Runtime: `opendataloader-pdf` (PyPI, Apache 2.0) — opt-in, not added to `pyproject.toml` core deps. Installed by `ensure_opendataloader()` at container startup when `USE_OPENDATALOADER=true`. - System: Java 11+ on PATH (JVM is the underlying engine). The installer skips with a warning if `java` is not found. --- ### How to test Standalone parser: ```bash source .venv/bin/activate uv pip install opendataloader-pdf python3 -c " import sys; sys.path.insert(0, '.') from deepdoc.parser.opendataloader_parser import OpenDataLoaderParser p = OpenDataLoaderParser() print('available:', p.check_installation()) s, t = p.parse_pdf('path/to/test.pdf', parse_method='pipeline') print(f'sections={len(s)} tables={len(t)}') " ``` ### Benchmark vs Docling ``` file parser secs sections tables ---------------------------------------------------------------------- text-heavy.pdf docling 45.29 148 10 text-heavy.pdf opendataloader 3.14 559 0 table-heavy.pdf docling 7.05 76 3 table-heavy.pdf opendataloader 3.71 90 0 complex.pdf docling 42.67 114 8 complex.pdf opendataloader 3.51 180 0 ```	2026-04-25 00:33:02 +08:00
balibabu	4841ce4239	Fix: Component definition is missing display name. (#14255 ) ### What problem does this PR solve? Fix: Component definition is missing display name. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-21 16:53:08 +08:00
balibabu	a2bea30749	Fix: Editing an empty response in the retrieval operator will cause the focus to shift to the metadata input box. (#14253 ) ### What problem does this PR solve? Fix: Editing an empty response in the retrieval operator will cause the focus to shift to the metadata input box. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-21 16:19:55 +08:00
chanx	05c39b90a8	Fix: pipeline parser log not display (#14251 ) ### What problem does this PR solve? Fix: pipeline parser log not display ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-21 15:24:13 +08:00
balibabu	78b800e685	Fix: Fix: The minimum value for the "Suggested text block size" input box is set to 1. (#14246 ) ### What problem does this PR solve? Fix: The minimum value for the "Suggested text block size" input box is set to 1. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-21 14:06:36 +08:00
balibabu	c43367eca3	Fix: The number of chunks in the file list is not displayed. (#14232 ) ### What problem does this PR solve? Fix: The number of chunks in the file list is not displayed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-20 19:24:20 +08:00
chanx	dce0b1c030	Fix: Pipeline page style optimizations (#14128 ) ### What problem does this PR solve? Fix: Pipeline page style optimizations ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-15 19:05:54 +08:00
balibabu	c56a7f99d1	Fix: The pop-up menu of the PromptEditor will be blocked. #14126 (#14127 ) ### What problem does this PR solve? Fix: The pop-up menu of the PromptEditor will be blocked. #14126 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: balibabu <assassin_cike@163.com>	2026-04-15 18:42:02 +08:00
xinmotlanthua	e1dede1366	fix(web): replace hardcoded English strings with i18n in floating chat widget (#14095 ) ## Summary - Replace 3 hardcoded English strings in `floating-chat-widget.tsx` with `react-i18next` `t()` calls so the widget respects the `locale` query parameter - Add `useTranslation` hook to the component - Add translation keys (`chat.chatSupport`, `chat.replyInstantly`, `chat.typeYourMessage`) to all 14 locale files ## Strings replaced \| Original \| i18n key \| \|---\|---\| \| `'Chat Support'` \| `t('chat.chatSupport')` \| \| `'We typically reply instantly'` \| `t('chat.replyInstantly')` \| \| `'Type your message...'` \| `t('chat.typeYourMessage')` \| Closes #14072 Co-authored-by: khanhkhanhlele <namkhanh2172@gmail.com>	2026-04-14 20:12:56 +08:00
balibabu	1bc4868abe	Fix: The file count in the file header did not change after uploading or deleting files. (#14034 ) ### What problem does this PR solve? Fix: The file count in the file header did not change after uploading or deleting files. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-04-14 18:07:32 +08:00
balibabu	27ebc64ec0	Feat: Adapted for the upgraded knowledge graph of @antv/g6. (#14103 ) ### What problem does this PR solve? Feat: Adapted for the upgraded knowledge graph of @antv/g6. ### Type of change - [x] Refactoring	2026-04-14 16:33:52 +08:00
Magicbook1108	1376c004a9	Fix: update docs generator (#14070 ) ### What problem does this PR solve? Refactor: update docs generator ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 1. Support multiple document generator components and correctly display messages in the message component. The document generator will not overwrite other messages. <img width="700" alt="Screenshot from 2026-04-13 13-56-17" src="https://github.com/user-attachments/assets/3f3e06e8-33ce-4df1-8b05-510c86af70a4" /> 2. Support Chinese content and ensure correct Markdown rendering in PDF and DOCX <img width="700" alt="image" src="https://github.com/user-attachments/assets/69bf1f7b-261d-48e5-a9f3-8e94462b90ed" /> 3. Simplify configuration page and support more output format <img height="700" alt="image" src="https://github.com/user-attachments/assets/8647374c-c055-4daa-ad71-cd9052eb138e" /> 4. Hide download from other components except for message <img width="700" alt="image" src="https://github.com/user-attachments/assets/a723dfcb-b60d-4eb5-b2f6-d41ca5955eb4" /> <img width="700" alt="image" src="https://github.com/user-attachments/assets/a8762ac4-807b-4f0b-9287-65f82f7c9c98" /> 5. Sanitize filename <img width="700" alt="image" src="https://github.com/user-attachments/assets/df49509f-37c0-40f9-b03d-bd6ce7fdefa8" /> 6. And more changes on usability	2026-04-14 15:24:43 +08:00
balibabu	d2b744facd	Fix: The indented tree text generated on the search page overlaps. #14077 (#14078 ) ### What problem does this PR solve? Fix: The indented tree text generated on the search page overlaps. #14077 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-04-14 10:02:00 +08:00
balibabu	9a38af7cbf	Feat: Hide the download button embedded in the agent page. (#14083 ) ### What problem does this PR solve? Feat: Hide the download button embedded in the agent page. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-13 21:06:41 +08:00
balibabu	a023305b96	Fix: The chat page is not displaying the meta tags. (#14071 ) ### What problem does this PR solve? Fix: The chat page is not displaying the meta tags. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-13 16:18:25 +08:00
balibabu	462be53b76	Fix: When creating a dataset, if no `chunk_method` is selected, there is no indication that this is a required field. (#14039 ) ### What problem does this PR solve? Fix: When creating a dataset, if no `chunk_method` is selected, there is no indication that this is a required field. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-10 19:05:14 +08:00
balibabu	11c89d87da	Fix: The dataset on the search page is not displaying the required field error message. (#14041 ) ### What problem does this PR solve? Fix: The dataset on the search page is not displaying the required field error message. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-10 18:20:50 +08:00

1 2 3 4 5 ...

727 Commits