ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
bitloi	01a5598aa5	Fix: markdown fenced code block extraction (#15630 ) ### What problem does this PR solve? Markdown extraction currently applies custom delimiters before respecting fenced code blocks. When a delimiter such as a newline is configured, fenced code can be split into separate chunks, and longer outer fences can be closed incorrectly by shorter nested fences. This PR keeps the fix intentionally narrow for the Markdown chunking discussion in #15482: - preserve fenced code blocks when delimiter-based extraction is used - support both backtick and tilde fences - respect fence length so longer outer fences can contain shorter inner fences - keep delimiter splitting unchanged outside fenced blocks Refs #15482 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Testing - `ruff check deepdoc/parser/markdown_parser.py test/unit_test/deepdoc/parser/test_markdown_parser.py` - `python3 run_tests.py -t test/unit_test/deepdoc/parser/test_markdown_parser.py`	2026-06-04 13:33:46 +08:00
Idriss Sbaaoui	dd529137eb	Fix: markdown table double extraction in parser (#13892 ) ### What problem does this PR solve? Fixes markdown tables being parsed twice (once as markdown and again as generated HTML), which caused duplicate table chunks in the chunk list UI. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-02 13:31:56 +08:00
Yongteng Lei	7c20c964b4	Fix: incorrect image merging for naive markdown parser (#11520 ) ### What problem does this PR solve? Fix incorrect image merging for naive markdown parser. #9349 [ragflow_readme.webm](https://github.com/user-attachments/assets/ca3f1e18-72b6-4a4c-80db-d03da9adf8dc) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-25 19:54:06 +08:00
Billy Bao	121c51661d	Fix: Markdown table extractor (#11018 ) ### What problem does this PR solve? Now markdown table extractor supports <table ...>. #10966 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-05 16:10:21 +08:00
buua436	bb9504d1cc	Fix:enhance delimiters in markdown parser (#10896 ) ### What problem does this PR solve? issue: [#10890](https://github.com/infiniflow/ragflow/issues/10890) change： enhance delimiters in markdown parser ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-30 17:36:51 +08:00
Yongteng Lei	5200711441	Feat: add support for multi-column PDF parsing (#10475 ) ### What problem does this PR solve? Add support for multi-columns PDF parsing. #9878, #9919. Two-column sample: <img width="1885" height="1020" alt="image" src="https://github.com/user-attachments/assets/0270c028-2db8-4ca6-a4b7-cd5830882d28" /> Three-column sample: <img width="1881" height="992" alt="image" src="https://github.com/user-attachments/assets/9ee88844-d5b1-4927-9e4e-3bd810d6e03a" /> Single-column sample: <img width="1883" height="1042" alt="image" src="https://github.com/user-attachments/assets/e93d3d18-43c3-4067-b5fa-e454ed0ab093" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-10-11 18:46:09 +08:00
Yongteng Lei	382458ace7	Feat: advanced markdown parsing (#9607 ) ### What problem does this PR solve? Using AST parsing to handle markdown more accurately, preventing components from being cut off by chunking. #9564 <img width="1746" height="993" alt="image" src="https://github.com/user-attachments/assets/4aaf4bf6-5714-4d48-a9cf-864f59633f7f" /> <img width="1739" height="982" alt="image" src="https://github.com/user-attachments/assets/dc00233f-7a55-434f-bbb7-74ce7f57a6cf" /> <img width="559" height="100" alt="image" src="https://github.com/user-attachments/assets/4a556b5b-d9c6-4544-a486-8ac342bd504e" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-08-21 09:36:18 +08:00
Yongteng Lei	51a8604dcb	Fix: fixed context loss caused by separating markdown tables from original text (#8844 ) ### What problem does this PR solve? Fix context loss caused by separating markdown tables from original text. #6871, #8804. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-15 13:03:01 +08:00
liwenju0	5b0e38060a	Feat：Optimize the table extraction logic in the Markdown parser: (#5663 ) Enhance the recognition of both borderless and bordered Markdown tables. Add support for extracting HTML tables, including various scenarios with nested HTML tags. Improve performance by using conditional checks to reduce unnecessary regular expression matching. ### What problem does this PR solve? Optimize the table extraction logic in the Markdown parser: Enhance the recognition of both borderless and bordered Markdown tables. Add support for extracting HTML tables, including various scenarios with nested HTML tags. Improve performance by using conditional checks to reduce unnecessary regular expression matching. ### Type of change - [x] Performance Improvement Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-03-07 17:02:35 +08:00
Jin Hai	3894de895b	Update comments (#4569 ) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-01-21 20:52:28 +08:00
Zhedong Cen	a95c1d45f0	Support table for markdown file in general parser (#1278 ) ### What problem does this PR solve? Support extracting table for markdown file in general parser ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-06-27 14:38:35 +08:00

11 Commits