Commit Graph

3 Commits

Author SHA1 Message Date
Wang Qi
0aff6a3f32 Feature: Allow page_size max value 100 (#15292)
Feature: Allow page_size max value 100
2026-05-28 11:13:01 +08:00
Jonathan Chang
9d1006e4ec fix: The output of the parser in the ingestion pipeline contains HTML tags (#14920)
## Summary
This change fixes ingestion quality issues where MinerU parser output
may contain HTML fragments (for example, table-related tags like `<tr>`,
`<td>`, `<br>`), which were previously passed directly into
chunking/tokenization and degraded chunk quality.

The fix adds a sanitization step in the MinerU parser path so parsed
sections are normalized to clean text before chunking.

## Change Type (select all)
- [x] Bug fix
- [x] Ingestion pipeline improvement
- [x] Parser/chunking quality fix

## Related Issue
- https://github.com/infiniflow/ragflow/issues/14831
2026-05-25 16:06:36 +08:00
buua436
aa4526266f Refa: migrate MCP APIs to RESTful api (#14317)
### What problem does this PR solve?

migrate MCP APIs to RESTful api

### Type of change

- [x] Refactoring
2026-04-23 12:51:27 +08:00