mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-07-01 00:05:43 +08:00
Resolves #14211 **Background:** Currently, RAGFlow routes all Docling parsing through the standard `/convert/source` endpoint. For large documents, this returns massive, unchunked text that exceeds RAGFlow's internal embedding model context limits, causing pipeline failures. **Solution:** This PR updates the `_parse_pdf_remote` ingestion logic in `docling_parser.py` to prioritize `docling-serve`'s native chunking endpoints (`/v1/chunk/source` and `/v1alpha/chunk/source`). - By receiving pre-sliced chunk objects directly from Docling, RAGFlow natively bypasses token limit overflows. - Included a graceful fallback mechanism to the standard `/convert/source` endpoints to maintain backwards compatibility for users running older versions of the Docling server that return 404s on the new routes. ### Type of change - [x] New Feature (non-breaking change which adds functionality)