Feat: add button for remove header & footer in pipeline (#14486)

### What problem does this PR solve?

Feat: add button for remove header & footer in pipeline

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
This commit is contained in:
Magicbook1108
2026-04-30 12:30:41 +08:00
committed by GitHub
parent 2932b65da6
commit bb3b99f0a5
13 changed files with 135 additions and 82 deletions

View File

@@ -52,7 +52,7 @@ class RAGFlowHtmlParser:
raise TypeError("txt type should be string!")
temp_sections = []
soup = BeautifulSoup(txt, "html5lib")
soup = BeautifulSoup(txt, "html.parser")
# delete <style> tag
for style_tag in soup.find_all(["style", "script"]):
style_tag.decompose()
@@ -210,4 +210,3 @@ class RAGFlowHtmlParser:
chunks.append(current_block)
return chunks