Files
Harsh Kashyap ebd4f4e633 fix(rag/nlp): handle non-numbered DOCX heading styles (#16219)
## What problem does this PR solve?

DOCX parsing could crash when a paragraph used a `Heading`-prefixed
style without a trailing numeric level, such as `Heading`, `Heading1`,
or `Heading Title`.

`docx_question_level()` assumed every heading style looked like `Heading
N` and called `int(p.style.name.split(" ")[-1])`. For non-numbered
heading styles, that raises `ValueError` and breaks Manual, Q&A, and
Laws chunking.

This PR parses heading levels safely and falls back to level 1 for
Heading-prefixed styles without an explicit numeric suffix.

Closes #16163.

## Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Test Case (non-breaking change which adds test coverage)
2026-06-29 15:21:17 +08:00
..


(1). Deploy RAGFlow services and images

https://ragflow.io/docs/build_docker_image

(2). Configure the required environment for testing

Install Python dependencies (including test dependencies):

uv sync --python 3.13 --only-group test --no-default-groups --frozen

Activate the environment:

source .venv/bin/activate

Install SDK:

uv pip install sdk/python 

Modify the .env file: Add the following code:

COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.26.2 #Replace with the image you are using

Start the containerwait two minutes:

docker compose -f docker/docker-compose.yml up -d


(3). Test Elasticsearch

a) Run sdk tests against Elasticsearch:

export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380  # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 

b) Run http api tests against Elasticsearch:

pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 


(4). Test Infinity

Modify the .env file:

DOC_ENGINE=${DOC_ENGINE:-infinity}

Start the container:

docker compose -f docker/docker-compose.yml down -v 
docker compose -f docker/docker-compose.yml up -d

a) Run sdk tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 

b) Run http api tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api