ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-01 00:05:43 +08:00

Files

Paras Sondhi eeb89d604e feat: route docling parsing through native chunking endpoints (#14218 )

Resolves #14211

**Background:** Currently, RAGFlow routes all Docling parsing through
the standard `/convert/source` endpoint. For large documents, this
returns massive, unchunked text that exceeds RAGFlow's internal
embedding model context limits, causing pipeline failures.

**Solution:**
This PR updates the `_parse_pdf_remote` ingestion logic in
`docling_parser.py` to prioritize `docling-serve`'s native chunking
endpoints (`/v1/chunk/source` and `/v1alpha/chunk/source`).
- By receiving pre-sliced chunk objects directly from Docling, RAGFlow
natively bypasses token limit overflows.
- Included a graceful fallback mechanism to the standard
`/convert/source` endpoints to maintain backwards compatibility for
users running older versions of the Docling server that return 404s on
the new routes.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

2026-04-24 19:03:19 +08:00

resume

fix: close file handles in json.load() calls in resume parser (#14061 )

2026-04-14 11:43:58 +08:00

__init__.py

Feat: support epub parsing (#13650 )

2026-03-17 20:14:06 +08:00

docling_parser.py

feat: route docling parsing through native chunking endpoints (#14218 )

2026-04-24 19:03:19 +08:00

docx_parser.py

refactor: let excel use lazy image loader (#13558 )