Files
ragflow/api/db/services
euvre fe46244d30 fix: paginate non-DeepDOC PDF parsing tasks to prevent OOM (#16106)
The parser pods suffer from OOM kills when processing large PDF
documents. The root cause is in api/db/services/task_service.py: when
layout_recognize is not DeepDOC (e.g. Plain Text), page_size was set to
MAXIMUM_TASK_PAGE_NUMBER (100 million), causing the entire PDF to be
processed as a single task with all pages loaded into memory
simultaneously.

This PR fixes the issue by paginating non-DeepDOC PDF parsing tasks the
same way DeepDOC already does.
2026-06-17 09:33:53 +08:00
..
2026-05-29 17:39:41 +08:00
2026-04-24 10:02:22 +08:00