mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
fix(html-parser): correct h4 heading mapping from ##### to #### (#13833)
## Summary - Fix incorrect Markdown heading mapping for `h4` in `TITLE_TAGS` dictionary - `h4` was mapped to `"#####"` (h5 level) instead of `"####"` (correct h4 level) Closes #13819 ## Details In `deepdoc/parser/html_parser.py`, the `TITLE_TAGS` dictionary had a typo where `h4` was assigned 5 `#` characters instead of 4, causing h4 headings to be converted to h5-level Markdown headings during HTML parsing. ## Test plan - [ ] Parse an HTML document containing `<h4>` tags and verify the output uses `####` (4 hashes) - [ ] Verify other heading levels remain correct 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -33,7 +33,7 @@ BLOCK_TAGS = [
|
||||
"table", "pre", "code", "blockquote",
|
||||
"figure", "figcaption"
|
||||
]
|
||||
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "#####", "h5": "#####", "h6": "######"}
|
||||
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "####", "h5": "#####", "h6": "######"}
|
||||
|
||||
|
||||
class RAGFlowHtmlParser:
|
||||
|
||||
Reference in New Issue
Block a user