ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 15:31:05 +08:00

Author	SHA1	Message	Date
euvre	6dd38eca6a	fix: file logs not displayed in dataset ingestion page (#14479 ) ### What problem does this PR solve? ## Summary Fixed a bug where the File Logs tab in the dataset ingestion page always showed "No logs" even after files were parsed successfully. ## Root Cause Both the File Logs and Dataset Logs tabs on the frontend called the same backend endpoint `/datasets/{dataset_id}/ingestions`. However, the backend only queried `get_dataset_logs_by_kb_id`, which hard-filtered records by `document_id == GRAPH_RAPTOR_FAKE_DOC_ID` (dataset-level logs). As a result, real file-level logs were never returned, causing the table to appear empty. ## Changes ### Backend - `api/apps/restful_apis/dataset_api.py` - Added two new query parameters to `list_ingestion_logs`: - `log_type` — `"file"` or `"dataset"` (default: `"dataset"`) - `keywords` — search keyword for filtering by document / task name - `api/apps/services/dataset_api_service.py` - Updated `list_ingestion_logs` signature to accept `log_type` and `keywords`. - Added conditional routing: - When `log_type == "file"`, call `PipelineOperationLogService.get_file_logs_by_kb_id` - Otherwise, call `PipelineOperationLogService.get_dataset_logs_by_kb_id` - `api/db/services/pipeline_operation_log_service.py` - Extended `get_dataset_logs_by_kb_id` with an optional `keywords` parameter so dataset logs can also be searched. ### Frontend - `web/src/pages/dataset/dataset-overview/hook.ts` - Removed the separate API function switching (`listPipelineDatasetLogs` vs `listDataPipelineLogDocument`). - Unified both tabs to call `listDataPipelineLogDocument` with the new `log_type` query parameter (`"file"` or `"dataset"`). - Ensured `keywords` and filter values are passed through correctly. ## Behavior After Fix \| Tab \| `log_type` \| Returned Records \| Searchable Field \| \|---\|---\|---\|---\| \| File Logs \| `file` \| Real document-level logs \| `document_name` (file name) \| \| Dataset Logs \| `dataset` \| GraphRAG / RAPTOR / MindMap logs \| `document_name` (task type) \| ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com> Co-authored-by: Wang Qi <wangq8@outlook.com> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-04-29 22:10:24 +08:00
Wang Qi	5018459112	Fix metadata config (#14480 ) ### What problem does this PR solve? Fix metadata config ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 21:09:54 +08:00
Wang Qi	c4d0b0ebcf	Fix visit dataset error (#14490 ) ### What problem does this PR solve? Fix visit dataset error ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 20:17:00 +08:00
balibabu	1692f0928f	Fix: The pipeline column header in the FileLogsTable is displaying incorrectly. (#14489 ) ### What problem does this PR solve? Fix: The pipeline column header in the FileLogsTable is displaying incorrectly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 19:52:28 +08:00
writinwaters	9280c64518	Docs: Updated Title chunker references (#14483 ) ### What problem does this PR solve? Updated Title chunker references ### Type of change - [x] Documentation Update	2026-04-29 19:37:24 +08:00
Magicbook1108	de8c6ad0f3	Feat: enable sync deleted file for Discord (#14451 ) ### What problem does this PR solve? Feat: enable sync deleted file for Discord ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-29 19:05:40 +08:00
bitloi	2bc8c6d35e	feat(dropbox): support deleted-file sync (#14476 ) ### What problem does this PR solve? Partially addresses #14362 by adding deleted-file sync support for the Dropbox data source. Dropbox previously did not provide the slim current-file snapshot required by stale document reconciliation, and its sync runner returned only document batches. As a result, enabling deleted-file sync could not remove local documents that had been deleted from Dropbox. This PR: - Adds `retrieve_all_slim_docs_perm_sync()` to `DropboxConnector`. - Reuses Dropbox metadata traversal to collect current remote file IDs without downloading file contents. - Wires incremental Dropbox sync to return `(document_generator, file_list)` when `sync_deleted_files` is enabled. - Enables the deleted-file sync toggle for Dropbox in the data source settings UI. - Adds regression coverage for slim snapshots, nested folders, paginated listings, duplicate filenames, and full reindex behavior. Tests: - `uv run pytest test/unit_test/common/test_dropbox_connector.py -q` - `uv run pytest test/unit_test/rag/test_sync_data_source.py -q` - `uv run pytest test/unit_test/common/test_dropbox_connector.py test/unit_test/rag/test_sync_data_source.py -q` - `uv run ruff check common/data_source/dropbox_connector.py rag/svr/sync_data_source.py test/unit_test/common/test_dropbox_connector.py test/unit_test/rag/test_sync_data_source.py` - `./node_modules/.bin/eslint src/pages/user-setting/data-source/constant/index.tsx` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-29 19:05:11 +08:00
Magicbook1108	db1a73b255	Feat: enable sync deleted files in gitlab (#14481 ) ### What problem does this PR solve? Feat: enable sync deleted files in gitlab ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-29 19:04:10 +08:00
euvre	a0f9ae16d2	Fix: RAPTOR "Generation scope" reset to "Single file" when selecting "Dataset" (#14477 ) ## Problem In the Dataset Configuration page, changing the RAPTOR Generation scope from "Single file" to "Dataset" and clicking Save did not persist the change. After refreshing or re-entering the page, the scope always reverted to "Single file". ## Root Cause 1. Backend: The `RaptorConfig` Pydantic model in `api/utils/validation_utils.py` was configured with `extra="forbid"` but did not declare a `scope` field. When the frontend sent `"scope": "dataset"`, Pydantic rejected the request. 2. Frontend: The `extractRaptorConfigExt` utility in `web/src/hooks/parser-config-utils.ts` treated `scope` as an unknown field and moved it into the nested `ext` object. Consequently, the backend could not read `raptor_config.get("scope", "file")` correctly, so the default `"file"` was always used. ## Changes - Added `scope: Literal["file", "dataset"]` to the backend `RaptorConfig` model with a default of `"file"`. - Added `scope` to the known-field whitelist in the frontend `extractRaptorConfigExt` helper so it is transmitted as a top-level raptor field instead of being buried in `ext`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-29 18:46:28 +08:00
Wang Qi	1b84892e3a	Fix delete graph (#14484 ) ### What problem does this PR solve? Fix delete graph ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 18:09:10 +08:00
Wang Qi	3991bdfaf5	Fix graph task type (#14475 ) ### What problem does this PR solve? Fix graph task type ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 17:05:56 +08:00
Magicbook1108	e0b3070012	Feat: enable sync deleted files for Gmail && fix google drive issues (#14462 ) ### What problem does this PR solve? Feat: enable sync deleted files for Gmail && fix google drive issues ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: bill <yibie_jingnian@163.com> Co-authored-by: balibabu <assassin_cike@163.com>	2026-04-29 17:03:56 +08:00
balibabu	a736948493	Fix: Clicking the button in the bottom-right corner of the `/chats/widget` page fails to display the dialog box. (#14465 ) ### What problem does this PR solve? Fix: Clicking the button in the bottom-right corner of the `/chats/widget` page fails to display the dialog box. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 17:03:33 +08:00
Wang Qi	9690923516	Fix delete graphrag raptor (#14469 ) ### What problem does this PR solve? Fix delete graphrag raptor ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 16:47:42 +08:00
balibabu	ce933357c6	Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. (#14459 ) ### What problem does this PR solve? Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-04-29 14:37:48 +08:00
Magicbook1108	3b7a6eaa6c	Feat: sync deleted files in Bitbucket (#14450 ) ### What problem does this PR solve? Feat: sync deleted files in Bitbucket ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-29 11:29:17 +08:00
Paras Sondhi	74fa54f122	feat(google-drive): optimize memory payload and enable sync deletion (#14372 ) Addresses the Google Drive integration for #14362 This PR completely overhauls the Google Drive sync logic to accurately detect remote deletions, while drastically reducing the memory footprint during the snapshot phase. ### What changed under the hood: * Killed the memory bloat: Swapped out the massive document dictionary objects for a lightweight `collections.namedtuple` (`SlimDoc = namedtuple('SlimDoc', ['id'])`). This prevents RAM spikes during `retrieve_all_slim_docs_perm_sync` on massive enterprise drives. * Flawless downstream integration: The `SlimDoc` object relies on simple duck typing. It perfectly delivers the `.id` attribute required by `ConnectorService.cleanup_stale_documents_for_task`, meaning your core `hash128` vector cleanup logic runs natively without modification. * Fixed the Shared Drive blindspot: The standard API query was missing team folders. Injected the `corpora="allDrives"` and `includeItemsFromAllDrives=True` override flags so the connector now accurately maps state across both personal workspaces and organizational Shared Drives. ### Testing: Isolated the Google API retrieval logic locally to prove the `SlimDoc` mapping works and correctly registers state drops when a file is trashed remotely. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Performance Improvement	2026-04-29 10:04:36 +08:00
Magicbook1108	0d18b293f5	Fix: enable sync deleted file in airtable (#14438 ) ### What problem does this PR solve? Fix: enable sync deleted file in airtable ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-28 20:09:08 +08:00
euvre	35f6d81b73	Refactor: migrate chunk retrieval_test and knowledge_graph to REST API endpoints (#14402 ) ### What problem does this PR solve? ## Summary Migrate two web API endpoints to REST-style HTTP API endpoints, following the pattern established in #14222: \| Old Endpoint \| New Endpoint \| \|---\|---\| \| `POST /v1/chunk/retrieval_test` \| `POST /api/v1/datasets/<dataset_id>/search` \| \| `GET /v1/chunk/knowledge_graph` \| `GET /api/v1/datasets/<dataset_id>/graph` \|	2026-04-28 20:00:26 +08:00
Magicbook1108	d532151be0	Feat: more model for paddle (#14436 ) ### What problem does this PR solve? Feat: more model for paddle ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-28 18:07:00 +08:00
Jack	c330005659	Fix: document level auto metadata config missing after save (#14421 ) ### What problem does this PR solve? Steps to re-produce (existing bug before API migration): create a new dataset upload a file click on "General" in "Parse" column and then click on "switch or configure ingestion pipeline" click on "Settings" (at right of "Auto metadata") click "Add" to add new metadata click on "Save" re-open "Settings" and the newly added metadata is not there ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-28 17:09:23 +08:00
Magicbook1108	18fbfafca6	Feat: enable sync deleted files for more connectors (#14353 ) ### What problem does this PR solve? Feat: enable sync delted files for connectors ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-28 15:07:14 +08:00
buua436	444e564329	Fix: align chat recommendation and thumbup APIs (#14413 ) ### What problem does this PR solve? align chat recommendation and thumbup APIs ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-28 12:55:16 +08:00
Jack	2d522ccb36	Fix: thumbnails issue in chat (#14415 ) [Uploading part_4-13.pdf…]() ### What problem does this PR solve? In chat, the thumbnails didn't display correctly ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) Steps to reproduce: 1. create dataset and upload a file (see attached) 2. parse the document 3. once parsing completed, create a chat and associate it with the dataset 4. ask a question (DAP VS DAPE comparison) 5. check result	2026-04-28 11:39:29 +08:00
Jack	c81081f8ef	Refactor: Doc change parser (#14327 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/change_parser HTTP API: PATCH /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API PATCH /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-27 23:42:57 +08:00
Jack	c5116b90e5	Refactor: migrate document thumbnails API (#14344 ) ### What problem does this PR solve? Before migration: GET /v1/document/thumbnails After migration: GET /api/v1/thumbnails ### Type of change - [x] Refactoring	2026-04-27 21:29:09 +08:00
Jack	49912a156e	Refactor: migrate document run api (#14351 ) ### What problem does this PR solve? Before migration: POST /v1/document/run After migration: POST /api/v1/documents/ingest/ ### Type of change - [x] Refactoring	2026-04-27 21:25:58 +08:00
Jack	a536980e22	Refactor: Doc batch change status (#14337 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/change_status After consolidation, Restful API POST /api/v1/datasets/<dataset_id>/documents/batch-update-status ### Type of change - [x] Refactoring	2026-04-27 20:00:23 +08:00
Wang Qi	488c3ef6a3	Add task API (#14393 ) ### What problem does this PR solve? Add task API ### Type of change - [x] Refactor	2026-04-27 19:16:37 +08:00
buua436	82313020c7	Refa: align list operations and strict mode (#14387 ) ### What problem does this PR solve? align list operations and strict mode ### Type of change - [x] Refactoring	2026-04-27 19:13:00 +08:00
buua436	4f6651968a	Fix: prioritize explore session ID and reset default conversation variables (#14399 ) ### What problem does this PR solve? prioritize explore session ID and reset default conversation variables ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-27 18:52:40 +08:00
Jack	61a24a2c14	Refactor: migrate doc upload info used in chat (#14359 ) ### What problem does this PR solve? Before migration: POST /v1/document/upload_info/ After migration: POST /api/v1/documentss/upload/ ### Type of change - [x] Refactoring	2026-04-27 16:58:42 +08:00
Jack	290f0294d6	Refactor: migrate artifact API (#14348 ) ### What problem does this PR solve? Before migration: GET /v1/document/artifact/<filename> After migration: GET /api/v1/documents/artifact/<filename> ### Type of change - [x] Refactoring	2026-04-27 15:19:41 +08:00
LeonTung	6a23dfeec1	chore(CLAUDE.md): add shared UI component lock convention to CLAUDE.md (#14381 ) ### What problem does this PR solve? AI coding agents (Claude, Copilot, etc.) tend to directly edit files in `src/components/ui/` when asked to tweak styles or add props, treating them like ordinary feature code. This silently breaks the shared component library that both shadcn primitives and project-authored common components live in. This PR adds a `Shared UI Component Lock` convention to `web/CLAUDE.md` to instruct AI agents to treat the entire `src/components/ui/` directory as read-only. Any customization must be done via wrappers or composition outside the directory; exceptions require explicit user approval. ### Type of change - [x] Other (please describe): Update `CLAUDE.md`	2026-04-27 12:03:32 +08:00
euvre	33bb464ce3	fix: skip canvas SSE fetch in chat shared page to eliminate spurious 103 error (#14190 ) ## What does this PR do? Fixes the `hint : 103 Only owner of canvas authorized for this operation` error that appears when opening a Chat shared link (`/chats/share?shared_id=...&from=chat`). ## Root Cause The Chat shared page (`web/src/pages/next-chats/share/index.tsx`) unconditionally calls `useFetchFlowSSE()`, which requests `/api/canvas/getsse/{sharedId}`. This is an Agent Canvas endpoint that validates canvas ownership. When sharing a Chat dialog (not an Agent): 1. `sharedId` is a `dialog_id`, not a `canvas_id` 2. The API token's `tenant_id` doesn't match any canvas owner 3. The backend returns `code: 103, message: "Only owner of canvas authorized for this operation."` 4. The global error interceptor in `request.ts` displays it as a notification: `hint : 103 Only owner of canvas authorized for this operation.` ## Changes - `web/src/hooks/use-agent-request.ts`: Added an `enabled` parameter to `useFetchFlowSSE` so callers can conditionally skip the query. - `web/src/pages/next-chats/share/index.tsx`: Only enable `useFetchFlowSSE` when `from === SharedFrom.Agent`. For Chat shares, the hook is disabled, avoiding the unnecessary canvas API call entirely. ## Related Issue Closes #14115 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 11:27:39 +08:00
buua436	a9e5724b46	Refa: unify document create flows under REST documents API (#14345 ) ### What problem does this PR solve? unify document create flows under REST documents API ### Type of change - [x] Refactoring	2026-04-27 10:18:16 +08:00
euvre	4dcc42e0e1	feat(api): add unified index API and dataset management endpoints (#14222 ) ### What problem does this PR solve? ## Summary Refactor the dataset API layer into a clean service/REST separation pattern, add a unified `/index` API for graph/raptor/mindmap operations, and introduce several new dataset management endpoints with full test coverage. ## Changes ### Service Layer (`dataset_api_service.py`) - Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace function for all index types - Added `run_index`, `delete_index` service functions - Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`, `get_ingestion_log` - Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`, `rename_tag` - Added `get_flattened_metadata`, `get_auto_metadata`, `update_auto_metadata` ### REST API Layer (`dataset_api.py`) New unified routes: \| Method \| Route \| Description \| \|--------\|-------\|-------------\| \| POST \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Run index task \| \| GET \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Trace index task \| \| DELETE \| `/datasets/<id>/<index_type>` \| Delete index \| \| GET \| `/datasets/<id>` \| Get dataset details \| \| GET \| `/datasets/<id>/ingestions/summary` \| Ingestion summary \| \| GET \| `/datasets/<id>/ingestions` \| List ingestion logs \| \| GET \| `/datasets/<id>/ingestions/<log_id>` \| Get single ingestion log \| \| POST \| `/datasets/<id>/embedding` \| Run embedding \| \| GET \| `/datasets/<id>/tags` \| List tags \| \| GET \| `/datasets/tags/aggregation` \| Aggregate tags across datasets \| \| DELETE \| `/datasets/<id>/tags` \| Delete tags \| \| PUT \| `/datasets/<id>/tags` \| Rename tag \| \| GET \| `/datasets/metadata/flattened` \| Get flattened metadata \| \| GET/PUT \| `/datasets/<id>/metadata/config` \| New metadata config path \| Removed routes (replaced by unified `/index`): - `POST /datasets/<id>/mindmap` - `GET /datasets/<id>/mindmap` Preserved legacy routes (backward compatibility): - `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor` - `/auto_metadata` GET/PUT ### Test Suite - Updated `common.py` helpers: added `trace_index`, removed `run_mindmap`/`trace_mindmap` - Added 7 new test files with 39 test cases total: \| Test File \| Cases \| \|-----------\|-------\| \| `test_get_dataset.py` \| 4 \| \| `test_ingestion_summary.py` \| 2 \| \| `test_ingestion_logs.py` \| 5 \| \| `test_index_api.py` \| 14 \| \| `test_embedding.py` \| 2 \| \| `test_tags.py` \| 8 \| \| `test_flattened_metadata.py` \| 4 \| - Deleted `test_mindmap_tasks.py` (covered by unified index tests) ## Design Decisions 1. Unified `/index?type=...` — single endpoint replaces 3 separate route pairs for graph/raptor/mindmap 2. Backward compatibility — old routes (`/run_graphrag`, `/run_raptor`, `/auto_metadata`) preserved alongside new paths 3. `_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}` — input validation via constant set 4. `_INDEX_TYPE_TO_TASK_ID_FIELD` — maps index type to KB model task ID field for clean dispatch ## Files Changed - `api/apps/restful_apis/dataset_api.py` - `api/apps/services/dataset_api_service.py` - `sdk/python/ragflow_sdk/modules/dataset.py` - `test/testcases/test_http_api/common.py` - `test/testcases/test_http_api/test_dataset_management/` (7 new files) ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 09:38:01 +08:00
wdeveloper16	78188ce9e9	Feat: add OpenDataLoader PDF parser backend (#14058 ) (#14097 ) ### What problem does this PR solve? Closes #14058. RAGFlow supports multiple PDF parsing backends (DeepDOC, MinerU, Docling, TCADP, PaddleOCR). This PR adds OpenDataLoader ([opendataloader-project/opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf)) as a new optional backend, giving users a deterministic, local-first alternative with competitive table extraction accuracy. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --- ### Changes #### Backend - `deepdoc/parser/opendataloader_parser.py` — new `OpenDataLoaderParser` class inheriting `RAGFlowPdfParser`. Implements `check_installation()` (guards Python package + Java 11+ runtime), `parse_pdf()` with JSON-first extraction (heading/paragraph/table/list/image/formula) and Markdown fallback, position-tag generation compatible with the shared `@@page\tx0\tx1\ty0\ty1##` format, and temp-dir lifecycle with cleanup. - `rag/app/naive.py` — new `by_opendataloader()` wrapper, registered in `PARSERS` dict, added to `chunk_token_num=0` override list. - `rag/flow/parser/parser.py` — `"opendataloader"` branch in the pipeline PDF handler + check validation list. #### Infrastructure - `docker/entrypoint.sh` — `ensure_opendataloader()` function: opt-in via `USE_OPENDATALOADER=true`, skips gracefully if Java is not on PATH. #### Frontend - `web/src/components/layout-recognize-form-field.tsx` — `OpenDataLoader` added to `ParseDocumentType` enum and parser dropdown. Cascades automatically to the pipeline editor's Parser component. #### Docs - `docs/guides/dataset/select_pdf_parser.md` — added OpenDataLoader entry and full env-var reference. --- ### Environment variables \| Variable \| Default \| Description \| \|---\|---\|---\| \| `USE_OPENDATALOADER` \| `false` \| Set `true` to install `opendataloader-pdf` on container startup \| \| `OPENDATALOADER_VERSION` \| latest \| Pin the PyPI release (e.g. `==2.2.1`) \| \| `OPENDATALOADER_HYBRID` \| _(unset)_ \| Enable hybrid AI mode (e.g. `docling-fast`) \| \| `OPENDATALOADER_IMAGE_OUTPUT` \| _(unset)_ \| `off` / `embedded` / `external` \| \| `OPENDATALOADER_OUTPUT_DIR` \| _(tmp)_ \| Persistent output dir; temp dir used + cleaned if unset \| \| `OPENDATALOADER_DELETE_OUTPUT` \| `1` \| `0` to retain intermediate files for debugging \| \| `OPENDATALOADER_SANITIZE` \| _(unset)_ \| `1` to filter prompt-injection patterns from output \| --- ### Dependencies - Runtime: `opendataloader-pdf` (PyPI, Apache 2.0) — opt-in, not added to `pyproject.toml` core deps. Installed by `ensure_opendataloader()` at container startup when `USE_OPENDATALOADER=true`. - System: Java 11+ on PATH (JVM is the underlying engine). The installer skips with a warning if `java` is not found. --- ### How to test Standalone parser: ```bash source .venv/bin/activate uv pip install opendataloader-pdf python3 -c " import sys; sys.path.insert(0, '.') from deepdoc.parser.opendataloader_parser import OpenDataLoaderParser p = OpenDataLoaderParser() print('available:', p.check_installation()) s, t = p.parse_pdf('path/to/test.pdf', parse_method='pipeline') print(f'sections={len(s)} tables={len(t)}') " ``` ### Benchmark vs Docling ``` file parser secs sections tables ---------------------------------------------------------------------- text-heavy.pdf docling 45.29 148 10 text-heavy.pdf opendataloader 3.14 559 0 table-heavy.pdf docling 7.05 76 3 table-heavy.pdf opendataloader 3.71 90 0 complex.pdf docling 42.67 114 8 complex.pdf opendataloader 3.51 180 0 ```	2026-04-25 00:33:02 +08:00
balibabu	3ccd58f28c	Fix: The button styles in the PaddleOCR dialog are not applying correctly. (#14350 ) ### What problem does this PR solve? Fix: The button styles in the PaddleOCR dialog are not applying correctly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Copilot <copilot@github.com>	2026-04-24 20:17:01 +08:00
writinwaters	1870c934c6	Refact: Updated rootAsHeadingTip (#14363 ) ### What problem does this PR solve? Updated rootASHeadingTip. ### Type of change - [x] Documentation Update	2026-04-24 20:08:44 +08:00
buua436	9ad752f497	Refa：migrate agent webhook routes to REST APIs (#14330 ) ### What problem does this PR solve? migrate agent webhook routes to REST APIs ### Type of change - [x] Refactoring	2026-04-24 17:55:53 +08:00
Wang Qi	b8d831c1c3	Fix api user patch verb does not work (#14358 ) ### What problem does this PR solve? Fix api user patch verb does not work ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue)	2026-04-24 17:27:41 +08:00
Wang Qi	199fbceb72	Refactor user REST API (#14334 ) ### What problem does this PR solve? Refactor user REST API ### Type of change - [x] Refactoring	2026-04-24 10:25:15 +08:00
Magicbook1108	c74aece63c	Feat: Agent api (#14157 ) ### What problem does this PR solve? 1. List agents Prev API: - `/v1/canvas/list GET` - `/api/v1/agents GET` Current API: `/api/v2/agents GET` 2. Get canvas template Prev API: `/v1/canvas/templates GET` Current API: `/api/v2/agents/templates GET` 3. Delete an agent Prev API: - `/v1/canvas/rm POST` - `/api/v1/agents/<agent_id> DELETE` Current API: `/api/v2/agents/<agent_id> DELETE` 4. Update an agent Prev API: - `/api/v1/agents/<agent_id> PUT` - `/v1/canvas/setting POST ` Current API: `/api/v2/agents/<agent_id> PATCH` 5. Create an agent Prev API: - `/v1/canvas/set POST` - `/api/v1/agents POST` Current API: `/api/v2/agents POST` 6. Get an agent Prev API: - `/v1/canvas/get/<canvas_id> GET ` Current API: `/api/v2/agents/<agent_id> GET` 7. Reset an agent Prev API: - `/v1/canvas/reset POST` Current API: `/api/v2/agents/<agent_id>/reset POST` 8. Upload a file to an agent Prev API: - `/v1/canvas/upload/<canvas_id> POST` Current API: `/api/v2/agents/<agent_id>/upload POST` 9. Input form Prev API: - `/v1/canvas/input_form GET` Current API: `/api/v2/agents/<agent_id>/components/<component_id>/input-form GET` 10. Debug an agent Prev API: - `/v1/canvas/debug POST` Current API: `/api/v2/agents/<agent_id>/components/<component_id>/debug POST` 11. Trace an agent Prev API: - `/v1/canvas/trace GET` Current API: `/api/v2/agents/<agent_id>/logs/<message_id> GET` 12. Get an agent version list Prev API: - `/v1/canvas/getlistversion/<canvas_id>` Current API: `/api/v2/agents/<agent_id>/versions GET` 13. Get a version of agent Prev API: - `/v1/canvas/getversion/<version_id>` Current API: `/api/v2/agents/<agent_id>/versions/<version_id> GET` 14. Test db connection Prev API: - `/v1/canvas/test_db_connect POST` Current API: `/api/v2/agents/test_db_connection` 15. Rerun the agent Prev API: - `/v1/canvas/rerun POST` Current API: `/api/v2/agents/rerun POST` 16. Get prompts Prev API: - `/v1/canvas/prompts GET` Current API: `/api/v2/agents/prompts GET` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: chanx <1243304602@qq.com>	2026-04-24 10:02:22 +08:00
Magicbook1108	75a5548b85	Feat: optimize title chunk (#14325 ) ### What problem does this PR solve? Feat: optimize title chunk 1. Add a new button to enable "Use root chunk as H0 heading", so that the first chunk is carried on to all remaining chunks. 2. Update resume agent template ### Type of change - [x] New Feature (non-breaking change which adds functionality) <img width="700" alt="img_v3_02111_63b04951-b3d7-4001-a08b-539db6d5298g" src="https://github.com/user-attachments/assets/4179ac4d-90e7-4353-9b93-d649a455e634" /> <img width="700" alt="image" src="https://github.com/user-attachments/assets/c0ba0f3c-05aa-4f2c-b418-e808ca1a2641" />	2026-04-23 18:55:55 +08:00
Wang Qi	ba47c13eb5	Fix commit override from #14298 of api-key to api_key (#14328 ) ### What problem does this PR solve? Fix commit override from https://github.com/infiniflow/ragflow/pull/14298/ of `api-key` to `api_key` ### Type of change - [x] Refactoring	2026-04-23 17:16:32 +08:00
Wang Qi	4458763a93	API refactor: stats_api and plugin_api (#14324 ) ### What problem does this PR solve? API refactor: stats_api and plugin_api ### Type of change - [x] Refactoring	2026-04-23 17:16:04 +08:00
buua436	7817b0d779	Refa: migrate chunk APIs to RESTful routes (#14291 ) ### What problem does this PR solve? migrate chunk APIs to RESTful routes ### Type of change - [x] Refactoring	2026-04-23 14:17:23 +08:00
Magicbook1108	76b017ca32	Refact: system apis (#14298 ) ### What problem does this PR solve? Refact: system apis ### Type of change - [x] Refactoring	2026-04-23 14:09:42 +08:00
buua436	aa4526266f	Refa: migrate MCP APIs to RESTful api (#14317 ) ### What problem does this PR solve? migrate MCP APIs to RESTful api ### Type of change - [x] Refactoring	2026-04-23 12:51:27 +08:00

... 3 4 5 6 7 ...

2161 Commits