## Summary
This PR fixes two issues discovered during testing of the PaddleOCR
async API refactoring:
### 1. PP-OCRv6 returns `ocrResults` instead of `layoutParsingResults`
Models like PP-OCRv6 are pure text recognition models that return
results in `ocrResults.prunedResult.rec_texts` format rather than the
`layoutParsingResults.prunedResult.parsing_res_list` format used by
layout-aware models (PaddleOCR-VL series).
**Changes:**
- `deepdoc/parser/paddleocr_parser.py`: Extract `ocrResults` alongside
`layoutParsingResults` in `_send_request()`, add fallback logic in
`_transfer_to_sections()` and `parse_image()`
- `internal/entity/models/paddleocr.go`: Add `ocrResults` struct and
fallback extraction in Go OCR handler
### 2. Image parsing not integrated into picture chunker
The `parse_image()` method existed in PaddleOCRParser but was never
called from `rag/app/picture.py` (the module that handles image file
uploads). Users configuring PaddleOCR as their layout recognizer would
still get local deepdoc OCR for images.
**Changes:**
- `rag/app/picture.py`: When `layout_recognize` is set to PaddleOCR, use
`PaddleOCROcrModel.parse_image()` instead of local OCR. Falls back
gracefully to local OCR on failure.
## Testing
Verified end-to-end in Docker:
- PaddleOCR-VL-1.6 PDF parsing: ✅ (10 text blocks with bbox)
- PaddleOCR-VL-1.6 image parsing: ✅ (219 chars)
- PP-OCRv6 PDF parsing with ocrResults fallback: ✅ (10 text blocks)
- PP-OCRv6 image parsing with ocrResults fallback: ✅ (136 chars)
## Related PRs
- #15967 (merged) - PaddleOCR async Job API refactoring + new models
- #16086 (merged) - PaddleOCR image parsing support
## Summary
Migrate PaddleOCR integration from the deprecated synchronous HTTP API
to the new asynchronous Job API (`submit → poll → fetch`), aligning with
PaddleOCR 3.6.0+ architecture.
## Changes
### Python (`deepdoc/parser/paddleocr_parser.py`)
- Replace synchronous `requests.post()` with async Job API flow (submit
→ poll → fetch)
- Authentication: `token {token}` → `Bearer {token}`
- File transfer: base64 JSON body → multipart file upload
- Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout
controlled by `request_timeout`)
- Result: fetch full JSONL from result URL, preserving `prunedResult`
with bbox info for crop functionality
- Rename `api_url` → `base_url` (backward compatible: `api_url` still
accepted as fallback)
### Python (`rag/llm/ocr_model.py`)
- Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to
`paddleocr_api_url` / `PADDLEOCR_API_URL`
### Go (`internal/entity/models/paddleocr.go`)
- Add `Client-Platform: ragflow` header to submit and poll requests
- Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5,
max 15s)
### Python (`common/constants.py`)
- Add `PADDLEOCR_BASE_URL` to env keys and default config
## Backward Compatibility
- Old env var `PADDLEOCR_API_URL` still works (used as fallback)
- Frontend field `paddleocr_api_url` still works (backend reads it as
fallback)
- No user-facing configuration changes required for existing setups
## Why not use the `paddleocr` SDK package directly?
RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing
`block_bbox`, `block_label`, `parsing_res_list`) from the raw API
response for PDF crop functionality. The SDK's public `parse_document()`
API only returns `DocParsingResult` with `markdown_text`, discarding the
bbox data. Therefore we implement the async Job API flow directly via
HTTP, following the same logic as the SDK internally.
### What problem does this PR solve?
The Go model-driver layer () has ~38,700 lines across 109 files. Roughly
74% of that is boilerplate duplicated into every driver: identical HTTP
client setup, the same 65-line SSE scanner loop, and 10-11 one-line "not
supported" stub methods per driver. Any fix must be manually propagated
to every file. Closes#15820.
This PR establishes the three shared utility files that form the
foundation for incremental driver migration:
---
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: Haruko386 <tryeverypossible@163.com>
## What problem does this PR solve?
Go test files are never compiled in CI — only production binaries via
`go build`. This allowed a missing `"sort"` import in
`metadata_filter_test.go` to be merged without detection.
## Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
## Changes
- Add `go test -count=1 ./internal/...` step after Go build in CI
workflow
- Fix missing `"sort"` import in `metadata_filter_test.go` (pre-existing
compile error)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
### What problem does this PR solve?
1. Add license announcement
2. Add sanity check on API config
3. Add base class: BaseModel
4. Add GetBaseURL
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Closes#15379
Around 29 Go model providers in `internal/entity/models/` share an
`http.Client` configured with `Timeout: 120 * time.Second`, and reuse
that same client for `ChatStreamlyWithSender`. Go's
`http.Client.Timeout` is a hard ceiling on the whole request that also
covers reading the response body, so it behaves as a wall clock on
streaming. Any streamed chat response that lasts longer than 120 seconds
gets cut off in the middle with a timeout error. Long generations,
reasoning model outputs, and slow or overloaded upstreams are the common
victims.
The providers that already behave correctly (`groq`, `mistral`,
`voyage`, `anthropic`) set no client `Timeout` and instead wrap each
request in a `context.WithTimeout`. This change converges the affected
providers onto that same pattern.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### Summary
Closes#15381
Every provider in `internal/entity/models/` reads its streaming response
with `bufio.NewScanner(resp.Body)` and iterates over `scanner.Scan()`.
The default `bufio.Scanner` maximum token size is 64KB, so when an
upstream sends a single SSE `data:` line larger than 64KB (long content
deltas, large tool or function call argument blobs, bundled
`reasoning_content`, or providers that emit a whole message in one
event) `scanner.Scan()` returns `false` and `scanner.Err()` returns
`bufio.ErrTooLong`. Streaming chat then ends with an error partway
through the response.
This change adds `scanner.Buffer(make([]byte, 64*1024), 1024*1024)`
immediately after every SSE scanner that was still bare, raising the cap
to 1MB. 1MB is the value already used for streaming chat in `openai.go`,
`modelscope.go`, `groq.go`, `mistral.go`, `xai.go` and the other already
patched providers (the 8MB cap in the repo is reserved for TTS and
embedding paths), so this simply converges the remaining providers onto
the established pattern. Nothing else changes: line parsing, `data:`
prefix handling, `[DONE]` detection, JSON unmarshalling, error handling,
and the existing `scanner.Err()` checks all stay the same.
Providers covered (23 scanners across 22 files): 302ai, aliyun,
baichuan, baidu, cohere, deepinfra, deepseek, gitee, huggingface,
lmstudio, minimax (the chat scanner, whose TTS scanner was already
bumped), moonshot, nvidia, ollama, openrouter, orcarouter, paddleocr,
siliconflow, tokenhub, vllm, volcengine, xunfei, zhipu-ai. `jiekouai.go`
is excluded because it is covered by the in flight #15337.
A table driven regression test (`sse_scanner_buffer_test.go`) streams a
single 128KB `data:` content delta followed by `data: [DONE]` through an
`httptest` server and asserts that `ChatStreamlyWithSender` delivers the
full content with no error across a representative subset of providers.
Without the buffer fix the test fails with `bufio.Scanner: token too
long`.
This PR also removes three duplicate declarations of the package level
`roundTripperFunc` test helper that several recently merged provider PRs
each added independently, which had left the `internal/entity/models`
test package unable to compile. The helper now lives in a single place
and is shared.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
Closes#15089.
Adds PPIO support to the Go model-provider layer so PPIO instances can
be routed through the Go API server with the same OpenAI-compatible
chat, streaming, model listing, and connection-check flow used by other
SaaS providers.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Added a PPIO Go model driver.
- Added the PPIO provider catalog and default OpenAI-compatible API URL.
- Registered PPIO in the model factory.
- Added focused provider and provider-manager tests.
## What changed
- Implemented chat completions, SSE streaming, ListModels, and
CheckConnection for PPIO.
- Covered request shape, stream termination, reasoning fallback, model
listing, custom base URLs, safe transport setup, unsupported methods,
and provider config loading.
- Kept the provider catalog aligned with the existing RAGFlow PPIO
factory model set.
- Cleaned up pre-existing Go model package validation blockers so the
scoped provider tests can run normally with vet enabled.
## Why
The existing Python/provider catalog path includes PPIO, but the Go
model-provider layer did not have a PPIO driver, so the Go API server
could not instantiate or use PPIO as requested in #15089.
### What problem does this PR solve?
This PR implement implement OCR for Baidu and Mistral, implement
PaddleOCR provider and implement ASR for CoHere
**Verified examples from the CLI:**
```
RAGFlow(user)> ocr with 'mistral-ocr-2512@test@mistral' file './internal/text.jpg'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
RAGFlow(user)> ocr with 'paddleocr-vl-0.9b@test@baidu' file './internal/text.jpg'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# PaddleOCR
RAGFlow(user)> ocr with 'PaddleOCR-VL-1.5@test@paddleocr' file './internal/test.pdf'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke
Nando Metzger
Photogra
Anton Obukhov
Rodrigo Caye Daudt
netry and Remote Sensing,
Shengyu Huang
Konrad Schindler
ETH Zürich
<div style="text-align: c... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# Cohere
RAGFlow(user)> asr with 'cohere-transcribe-03-2026@test@cohere' audio './internal/test.wav' param '{"language": "en"}'
+-----------------------------------------------------------------------------------------------------------------------+
| text |
+-----------------------------------------------------------------------------------------------------------------------+
| The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired. |
+-----------------------------------------------------------------------------------------------------------------------+
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring