mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
… ### What problem does this PR solve? Closes #13857 Parent-child chunking was introduced in v0.23.0 but is only configurable through the web UI. Users managing datasets programmatically cannot enable it via the HTTP API or Python SDK because `ParserConfig` uses `extra="forbid"`, rejecting the `children_delimiter` field at validation. ### What does this PR change? Adds a `parent_child` nested config to `ParserConfig`, following the same pattern as `raptor` and `graphrag`: ```json "parser_config": { "parent_child": { "use_parent_child": true, "children_delimiter": "\n" } } ``` - api/utils/validation_utils.py — new ParentChildConfig model, added to ParserConfig - api/utils/api_utils.py — naive defaults + flatten to children_delimiter for the execution layer - api/apps/services/dataset_api_service.py — flatten on the update path - test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG - test/testcases/test_http_api/test_dataset_management/test_create_dataset.py — 4 valid + 2 invalid test cases No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py). Existing UI flow via ext is unaffected. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added parent-child chunking configuration for dataset creation and updates with new `use_parent_child` toggle and customizable `children_delimiter` setting to specify how parent chunks are split into child chunks. * **Documentation** * Updated HTTP and Python API references with parent-child chunking configuration details and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
(1). Deploy RAGFlow services and images
https://ragflow.io/docs/build_docker_image
(2). Configure the required environment for testing
Install Python dependencies (including test dependencies):
uv sync --python 3.12 --only-group test --no-default-groups --frozen
Activate the environment:
source .venv/bin/activate
Install SDK:
uv pip install sdk/python
Modify the .env file: Add the following code:
COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.24.0 #Replace with the image you are using
Start the container(wait two minutes):
docker compose -f docker/docker-compose.yml up -d
(3). Test Elasticsearch
a) Run sdk tests against Elasticsearch:
export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380 # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Elasticsearch:
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
(4). Test Infinity
Modify the .env file:
DOC_ENGINE=${DOC_ENGINE:-infinity}
Start the container:
docker compose -f docker/docker-compose.yml down -v
docker compose -f docker/docker-compose.yml up -d
a) Run sdk tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api