mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
feat(parser): support external Docling server via DOCLING_SERVER_URL (#13527)
### What problem does this PR solve? This PR adds support for parsing PDFs through an external Docling server, so RAGFlow can connect to remote `docling serve` deployments instead of relying only on local in-process Docling. It addresses the feature request in [#13426](https://github.com/infiniflow/ragflow/issues/13426) and aligns with the external-server usage pattern already used by MinerU. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### What is changed? - Add external Docling server support in `DoclingParser`: - Use `DOCLING_SERVER_URL` to enable remote parsing mode. - Try `POST /v1/convert/source` first, and fallback to `/v1alpha/convert/source`. - Keep existing local Docling behavior when `DOCLING_SERVER_URL` is not set. - Wire Docling env settings into parser invocation paths: - `rag/app/naive.py` - `rag/flow/parser/parser.py` - Add Docling env hints in constants and update docs: - `docs/guides/dataset/select_pdf_parser.md` - `docs/guides/agent/agent_component_reference/parser.md` - `docs/faq.mdx` ### Why this approach? This keeps the change focused on one issue and one capability (external Docling connectivity), without introducing unrelated provider-model plumbing. ### Validation - Static checks: - `python -m py_compile` on changed Python files - `python -m ruff check` on changed Python files - Functional checks: - Remote v1 endpoint path works - v1alpha fallback works - Local Docling path remains available when server URL is unset ### Related links - Feature request: [Support external Docling server (issue #13426)](https://github.com/infiniflow/ragflow/issues/13426) - Compare view for this branch: [main...feat/docling-server](https://github.com/infiniflow/ragflow/compare/main...spider-yamet:ragflow:feat/docling-server?expand=1) ##### Fixes [#13426](https://github.com/infiniflow/ragflow/issues/13426)
This commit is contained in:
@@ -219,6 +219,9 @@ class ForgettingPolicy(StrEnum):
|
||||
# ENV_MINERU_OUTPUT_DIR = "MINERU_OUTPUT_DIR"
|
||||
# ENV_MINERU_BACKEND = "MINERU_BACKEND"
|
||||
# ENV_MINERU_DELETE_OUTPUT = "MINERU_DELETE_OUTPUT"
|
||||
# ENV_DOCLING_SERVER_URL = "DOCLING_SERVER_URL"
|
||||
# ENV_DOCLING_OUTPUT_DIR = "DOCLING_OUTPUT_DIR"
|
||||
# ENV_DOCLING_DELETE_OUTPUT = "DOCLING_DELETE_OUTPUT"
|
||||
# ENV_TCADP_OUTPUT_DIR = "TCADP_OUTPUT_DIR"
|
||||
# ENV_LM_TIMEOUT_SECONDS = "LM_TIMEOUT_SECONDS"
|
||||
# ENV_LLM_MAX_RETRIES = "LLM_MAX_RETRIES"
|
||||
|
||||
Reference in New Issue
Block a user