ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
kpdev	cb1ea5a47f	Validate chunk image_base64 before doc-store write (#15364 ) ## Summary Fixes [#15363](https://github.com/infiniflow/ragflow/issues/15363) — `add_chunk` / `update_chunk` indexed chunks with `image_id` before validating or storing `image_base64`, leaving orphan chunks on invalid input. ## Related Issue Fixes #15363 ## Change Type - [x] Bug fix - [x] Regression tests ## What Changed - Added `_decode_chunk_image_base64()` — strict base64 decode with structured 4xx errors - Added `_store_chunk_image_or_error()` — catches `store_chunk_image` failures - `add_chunk` / `update_chunk`: decode + store image before `docStoreConn.insert` / `update`; only set `img_id` after successful storage ## Files Changed \| File \| Change \| \|------\|--------\| \| `api/apps/restful_apis/chunk_api.py` \| Helpers + reorder image handling \| \| `test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py` \| 3 regression tests \| ## Validation ```bash cd /root/gittensor/ragflow pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_add_chunk_invalid_image_base64_does_not_index_chunk -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_update_chunk_invalid_image_base64_does_not_update_chunk -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py::test_restful_add_chunk_valid_image_base64_stores_before_insert -v pytest test/testcases/test_web_api/test_chunk_app/test_chunk_routes_unit.py -v ``` ## Test Plan - [x] Invalid `image_base64` on add → 4xx, no doc-store insert - [x] Invalid `image_base64` on update → 4xx, no doc-store update - [x] Valid PNG base64 on add → image stored, chunk indexed with `img_id` - [ ] CI green	2026-05-29 19:36:46 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
buua436	a03b95f8c4	Fix: shared dataset chunk index lookup (#14764 ) ### What problem does this PR solve? shared dataset chunk index lookup ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 13:50:08 +08:00
euvre	35f6d81b73	Refactor: migrate chunk retrieval_test and knowledge_graph to REST API endpoints (#14402 ) ### What problem does this PR solve? ## Summary Migrate two web API endpoints to REST-style HTTP API endpoints, following the pattern established in #14222: \| Old Endpoint \| New Endpoint \| \|---\|---\| \| `POST /v1/chunk/retrieval_test` \| `POST /api/v1/datasets/<dataset_id>/search` \| \| `GET /v1/chunk/knowledge_graph` \| `GET /api/v1/datasets/<dataset_id>/graph` \|	2026-04-28 20:00:26 +08:00
euvre	4dcc42e0e1	feat(api): add unified index API and dataset management endpoints (#14222 ) ### What problem does this PR solve? ## Summary Refactor the dataset API layer into a clean service/REST separation pattern, add a unified `/index` API for graph/raptor/mindmap operations, and introduce several new dataset management endpoints with full test coverage. ## Changes ### Service Layer (`dataset_api_service.py`) - Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace function for all index types - Added `run_index`, `delete_index` service functions - Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`, `get_ingestion_log` - Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`, `rename_tag` - Added `get_flattened_metadata`, `get_auto_metadata`, `update_auto_metadata` ### REST API Layer (`dataset_api.py`) New unified routes: \| Method \| Route \| Description \| \|--------\|-------\|-------------\| \| POST \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Run index task \| \| GET \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Trace index task \| \| DELETE \| `/datasets/<id>/<index_type>` \| Delete index \| \| GET \| `/datasets/<id>` \| Get dataset details \| \| GET \| `/datasets/<id>/ingestions/summary` \| Ingestion summary \| \| GET \| `/datasets/<id>/ingestions` \| List ingestion logs \| \| GET \| `/datasets/<id>/ingestions/<log_id>` \| Get single ingestion log \| \| POST \| `/datasets/<id>/embedding` \| Run embedding \| \| GET \| `/datasets/<id>/tags` \| List tags \| \| GET \| `/datasets/tags/aggregation` \| Aggregate tags across datasets \| \| DELETE \| `/datasets/<id>/tags` \| Delete tags \| \| PUT \| `/datasets/<id>/tags` \| Rename tag \| \| GET \| `/datasets/metadata/flattened` \| Get flattened metadata \| \| GET/PUT \| `/datasets/<id>/metadata/config` \| New metadata config path \| Removed routes (replaced by unified `/index`): - `POST /datasets/<id>/mindmap` - `GET /datasets/<id>/mindmap` Preserved legacy routes (backward compatibility): - `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor` - `/auto_metadata` GET/PUT ### Test Suite - Updated `common.py` helpers: added `trace_index`, removed `run_mindmap`/`trace_mindmap` - Added 7 new test files with 39 test cases total: \| Test File \| Cases \| \|-----------\|-------\| \| `test_get_dataset.py` \| 4 \| \| `test_ingestion_summary.py` \| 2 \| \| `test_ingestion_logs.py` \| 5 \| \| `test_index_api.py` \| 14 \| \| `test_embedding.py` \| 2 \| \| `test_tags.py` \| 8 \| \| `test_flattened_metadata.py` \| 4 \| - Deleted `test_mindmap_tasks.py` (covered by unified index tests) ## Design Decisions 1. Unified `/index?type=...` — single endpoint replaces 3 separate route pairs for graph/raptor/mindmap 2. Backward compatibility — old routes (`/run_graphrag`, `/run_raptor`, `/auto_metadata`) preserved alongside new paths 3. `_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}` — input validation via constant set 4. `_INDEX_TYPE_TO_TASK_ID_FIELD` — maps index type to KB model task ID field for clean dispatch ## Files Changed - `api/apps/restful_apis/dataset_api.py` - `api/apps/services/dataset_api_service.py` - `sdk/python/ragflow_sdk/modules/dataset.py` - `test/testcases/test_http_api/common.py` - `test/testcases/test_http_api/test_dataset_management/` (7 new files) ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 09:38:01 +08:00
buua436	9ad752f497	Refa：migrate agent webhook routes to REST APIs (#14330 ) ### What problem does this PR solve? migrate agent webhook routes to REST APIs ### Type of change - [x] Refactoring	2026-04-24 17:55:53 +08:00
buua436	7817b0d779	Refa: migrate chunk APIs to RESTful routes (#14291 ) ### What problem does this PR solve? migrate chunk APIs to RESTful routes ### Type of change - [x] Refactoring	2026-04-23 14:17:23 +08:00
Jack	3d8a82c0aa	Refactor: Consolidation WEB API & HTTP API for document delete api (#14254 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/rm Http API - DELETE /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- DELETE /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-22 10:49:52 +08:00
Liu An	6e33d8722f	Revert "Fix: forwarding highlight param" (#14249 ) Reverts infiniflow/ragflow#14112	2026-04-21 15:23:18 +08:00
Jack	939933649a	Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/list Http API - GET /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-20 14:54:40 +08:00
Daniil Sivak	22c6648348	Fix: forwarding highlight param (#14112 ) Closes #9078 ### What problem does this PR solve? The `retrieval_test` endpoint in `chunk_app.py` never forwarded the `highlight` request parameter to `retriever.retrieval()`, so the search engine never produced highlight snippets. Additionally, the frontend always rendered `content_with_weight` instead of preferring the `highlight` field, and the CSS rule color `var(--accent-primary)` didn't work because the variable stores an RGB triplet `(45,212,191)` requiring the `rgb()` wrapper. ### Before - Search page: displayed raw content_with_weight as a wall of plain white text with no term highlighting, including markdown headings rendered as literal text - Retrieval testing page: showed `content_with_weight` in a plain `<p>` tag, no `<em>` tags rendered, no highlight coloring - Children chunks: when child chunks were consolidated into a parent via `retrieval_by_children`, any highlight data from children was discarded - TOC chunks: chunks fetched via `retrieval_by_toc` had no `highlight` field, appearing as plain text while other chunks had highlights Retrieval testing: <img width="1449" height="1178" alt="before-retrieval-no-highlight-cropped" src="https://github.com/user-attachments/assets/5c6f5a5e-6c11-461a-bdb4-049d7dfb7a33" /> Search: <img width="1378" height="711" alt="before-search-no-highlight-cropped" src="https://github.com/user-attachments/assets/be7b5152-72ef-40da-a8fd-921e997ae7d3" /> ### After - Search page: displays the highlight field with search terms rendered in teal/cyan color (`rgb(var(--accent-primary))`) - Retrieval testing page: sends highlight: true in the request, uses `HighLightMarkdown` component to render `<em>` tags with proper coloring - Children chunks: highlights from child chunks are joined and preserved on the parent - TOC chunks: when other chunks have highlights, TOC-fetched chunks use `content_with_weight` as a highlight fallback Retrieval testing: <img width="1410" height="1015" alt="05-retrieval-testing-results" src="https://github.com/user-attachments/assets/f0cff8cf-0962-4320-b559-cd5037f622d2" /> Search: <img width="1294" height="455" alt="03-search-highlight-results" src="https://github.com/user-attachments/assets/a90e0e3e-3837-46be-8ddd-2412ff7cbc19" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-17 20:59:20 +08:00
Daniil Sivak	c93ec0a1f3	Fix: reject empty/space-only content in update_chunk API (#14082 ) Closes #6541 ### What problem does this PR solve? Add content validation to `update_chunk` (SDK and non-SDK) to reject empty or whitespace-only content before it reaches the embedding model. Before: Calling `update_chunk` with space-only content (like `" "`, `""`, `"\n"`) bypassed validation and was sent directly to the embedding model, which returned an error. This was the same bug previously fixed for `add_chunk` in #6390, but `update_chunk` was missed. After: Empty/whitespace-only content is caught by validation and returns an error: `` `content` is required `` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-15 18:43:53 +08:00
Ea001	38cefd88e2	Fix tag_feas code injection in retrieval ranking (#13923 ) ## Summary - remove eval-based parsing from retrieval rank feature scoring - validate `tag_feas` at write time in chunk APIs and SDK routes - add regression tests for safe parsing and malicious payload rejection ## Details `tag_feas` is intended to be structured rank-feature data, but the retrieval ranking path was evaluating stored values as Python expressions. This change treats `tag_feas` strictly as data. ### What changed - replace `eval()` in `rag/nlp/search.py` with safe parsing via `json.loads()` and optional `ast.literal_eval()` compatibility for legacy Python-dict strings - strictly filter parsed values down to `dict[str, finite number]` - reject invalid `tag_feas` payloads at write time in web chunk routes and SDK document chunk routes - add focused regression tests to prove executable strings are ignored and invalid payloads are rejected ## Validation - `python -m pytest test/unit_test/common/test_tag_feature_utils.py test/unit_test/rag/test_rank_feature_scores.py -q` --------- Co-authored-by: unknown <zhenglinkai@CCN.Local> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-04-15 16:31:11 +08:00
Zhichang Yu	b7744e053e	fix: support dense_vector from ES fields response (ES 9.x compatibility) (#13972 ) fix: support dense_vector from ES fields response (ES 9.x compatibility) - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Configuration Chore (non-breaking change which updates configuration) ## Summary by CodeRabbit * Bug Fixes * More accurate handling and unwrapping of dense-vector fields so returned values have correct shapes. * Field selection reliably limits returned data and falls back to alternate result locations when needed. * Use of consistent result IDs and tolerant handling when score values are missing. * Chores / Configuration * Increased build memory and adjusted build-time flags for the frontend build. * Simplified runtime model/GPU checks and removed an automated runtime GPU-install attempt. * Build Fixes * `web/vite.config.ts`: make `build.minify` and `build.sourcemap` respect `VITE_MINIFY` and `VITE_BUILD_SOURCEMAP` env vars from Dockerfile instead of hardcoding `terser` and `true`. * Environment * Allow stack version override and default the runtime image tag to "latest". <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Correct unwrapping of dense-vector fields and reliable field selection with fallback locations. * Consistent use of hit-level IDs and tolerant handling when score values are missing. * Chores / Configuration * Increased frontend build memory and added build-time minify/sourcemap flags; build minification and sourcemap now configurable. * Removed runtime GPU detection for model initialization; force CPU initialization. * Environment * Allow stack version override and default runtime image tag to "latest". <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 17:44:13 +08:00
Jin Hai	ad789f5c43	Fix list files (#13960 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Standardized the query parameter used when listing documents so listings behave consistently across the web and client interfaces. * Clarified the error message shown when a required dataset ID is missing to give clearer guidance to users. * Tests * Updated test coverage to reflect the standardized dataset identifier usage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-08 13:38:30 +08:00
Idriss Sbaaoui	f3b4d6ab0e	Fix: ci fails (#13778 ) ### What problem does this PR solve? fix tests failing at p2 and p3 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-25 17:56:13 +08:00
Magicbook1108	161659becc	Fix: model selecton rule in get_model_config_by_type_and_name (#13569 ) ### What problem does this PR solve? Fix: model selecton rule in get_model_config_by_type_and_name ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-13 19:46:13 +08:00
Yongteng Lei	51be1f1442	Refa: empty ids means no-op operation (#13439 ) ### What problem does this PR solve? Empty ids means no-op operation. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-06 18:16:42 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
6ba3i	22c4d72891	tests: improve RAGFlow coverage based on Codecov report (#13219 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-26 19:03:26 +08:00
6ba3i	38011f2c16	tests: improve RAGFlow coverage based on Codecov report (#13200 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-25 19:12:11 +08:00
Liu An	c4f60b349d	Fix(test): downgrade test priorities (#12913 ) ### What problem does this PR solve? Changed test priorities in multiple test files, downgrading from p1 to p2 and p2 to p3. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-30 20:02:56 +08:00
6ba3i	2b20d0b3bb	Fix : Web API tests by normalizing errors, validation, and uploads (#12620 ) ### What problem does this PR solve? Fixes web API behavior mismatches that caused test failures by normalizing error responses, tightening validations, correcting error messages, and closing upload file handles. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 11:09:22 +08:00
Zhichang Yu	73144e278b	Don't release full image (#10654 ) ### What problem does this PR solve? Introduced gpu profile in .env Added Dockerfile_tei fix datrie Removed LIGHTEN flag ### Type of change - [x] Documentation Update - [x] Refactoring	2025-10-23 23:02:27 +08:00
Zhichang Yu	342a04ec8a	Added infinity rank_feature support (#9044 ) ### What problem does this PR solve? Added infinity rank_feature support ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-29 09:14:23 +08:00
Liu An	0b40eb3e90	Test: Add tests for chunk API endpoints (#8616 ) ### What problem does this PR solve? - Add comprehensive test suite for chunk operations including: - Test files for create, list, retrieve, update, and delete chunks - Authorization tests - Batch operations tests - Update test configurations and common utilities - Validate `important_kwd` and `question_kwd` fields are lists in chunk_app.py - Reorganize imports and clean up duplicate code ### Type of change - [x] Add test cases	2025-07-02 09:49:08 +08:00

26 Commits