mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
fix: restore TitleChunker output for json/chunks upstream formats
## Summary
The refactor commit e194027b (#14247) introduced two regressions that
caused `TitleChunker` to produce zero chunks when the upstream Parser
node outputs `json` or `chunks` format (e.g. PDF parsing).
## Root Cause
### 1. Dead code in `extract_line_records` (critical)
After refactor, when `payload` is `None` (which is the case for `json`
and `chunks` output formats), the method returns an empty list
immediately via `return []`, so no records are ever extracted from
structured upstream output. The original `json`/`chunks` handling code
became unreachable dead code.
### 2. Unconditional overwrite in `build_chunks_from_record_groups`
The `chunks` variable assigned in the `if` branch for markdown/text/html
formats was unconditionally overwritten by the statement below it, due
to a missing `else` keyword.
## Fix
- Remove the premature `return []` so the `json`/`chunks` branch is
reachable again.
- Add `else` branch in `build_chunks_from_record_groups` so the two
format families are handled independently.
## Test Plan
- [x] Verified no lint errors on the changed file
- [ ] Tested with a PDF document parsed via DeepDOC → TitleChunker
pipeline
- [ ] Tested with markdown input through TitleChunker
- [ ] Tested hierarchy and group chunking modes
## Impact
- Fixes the regression where documents parsed with `json`/`chunks`
output format produced no chunks from `TitleChunker`.
- No API or configuration changes. Fully backward compatible.
Signed-off-by: noob <yixiao121314@outlook.com>