fix: restore TitleChunker output for json/chunks upstream formats
## Summary
The refactor commit e194027b (#14247) introduced two regressions that
caused `TitleChunker` to produce zero chunks when the upstream Parser
node outputs `json` or `chunks` format (e.g. PDF parsing).
## Root Cause
### 1. Dead code in `extract_line_records` (critical)
After refactor, when `payload` is `None` (which is the case for `json`
and `chunks` output formats), the method returns an empty list
immediately via `return []`, so no records are ever extracted from
structured upstream output. The original `json`/`chunks` handling code
became unreachable dead code.
### 2. Unconditional overwrite in `build_chunks_from_record_groups`
The `chunks` variable assigned in the `if` branch for markdown/text/html
formats was unconditionally overwritten by the statement below it, due
to a missing `else` keyword.
## Fix
- Remove the premature `return []` so the `json`/`chunks` branch is
reachable again.
- Add `else` branch in `build_chunks_from_record_groups` so the two
format families are handled independently.
## Test Plan
- [x] Verified no lint errors on the changed file
- [ ] Tested with a PDF document parsed via DeepDOC → TitleChunker
pipeline
- [ ] Tested with markdown input through TitleChunker
- [ ] Tested hierarchy and group chunking modes
## Impact
- Fixes the regression where documents parsed with `json`/`chunks`
output format produced no chunks from `TitleChunker`.
- No API or configuration changes. Fully backward compatible.
Signed-off-by: noob <yixiao121314@outlook.com>
## RAG Optimization Description
Optimize the core `BaseTitleChunker` in
`rag/flow/chunker/title_chunker/common.py` to improve RAG document
chunking quality and retrieval accuracy.
## Key Changes
1. **Format-branched text processing**: Preserve original whitespace &
indentation for Markdown/HTML payloads to maintain document semantics
and chunk fidelity; only perform full whitespace cleaning on plain text
content.
2. **Empty chunk filtering**: Thoroughly filter invalid pure-blank lines
to reduce noisy data in vector database.
3. **Code deduplication**: Unified markdown/text/html payload extraction
logic, removed redundant repeated code blocks.
4. **None serialization fix**: Avoid converting `None` value into
literal `"None"` string in chunk text fields.
5. **Production logging**: Added input/output line count logging for
filter logic, observable in online environment.
6. **100% backward compatible**: No changes to chunking hierarchy rules,
output format and all existing workflows.
## RAG Business Value
- Preserves document format fidelity for structured Markdown/HTML files
- Reduces invalid noisy chunks → improves RAG retrieval precision
- Cleans plain text data → optimizes vector embedding quality
- Improves code maintainability with no breaking changes
- Provides observable logging for chunk filtering behavior
## Compatibility
- ✅ No API changes
- ✅ No chunk logic modifications
- ✅ All document parsing/chunking workflows unaffected
- ✅ All pre-checks passed, no code conflicts
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Feat: add button to turn off vlm parsing
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: chanx <1243304602@qq.com>
### What problem does this PR solve?
Feat: update templates && add resume template
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: pipeline support ONE chunking method
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
Fix broken imports
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
### What problem does this PR solve?
Dataflow supports Spreadsheet and Word processor document
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refine dataflow and initialize dataflow app.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)