fix(html-parser): correct h4 heading mapping from ##### to #### (#13833)

## Summary

- Fix incorrect Markdown heading mapping for `h4` in `TITLE_TAGS`
dictionary
- `h4` was mapped to `"#####"` (h5 level) instead of `"####"` (correct
h4 level)

Closes #13819

## Details

In `deepdoc/parser/html_parser.py`, the `TITLE_TAGS` dictionary had a
typo where `h4` was assigned 5 `#` characters instead of 4, causing h4
headings to be converted to h5-level Markdown headings during HTML
parsing.

## Test plan

- [ ] Parse an HTML document containing `<h4>` tags and verify the
output uses `####` (4 hashes)
- [ ] Verify other heading levels remain correct

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
黄圣祺
2026-03-30 13:17:32 +08:00
committed by GitHub
parent 2faaa9f9ce
commit 534729546e

View File

@@ -33,7 +33,7 @@ BLOCK_TAGS = [
"table", "pre", "code", "blockquote",
"figure", "figcaption"
]
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "#####", "h5": "#####", "h6": "######"}
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "####", "h5": "#####", "h6": "######"}
class RAGFlowHtmlParser: