ragflow/internal/entity at 919f596066a721bd6cfa384e77d5f35e0dcf355d - ragflow - GetSkill.work

zlei6/ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Files

History

Rander 017adf841f fix(paddleocr): support PP-OCRv6 ocrResults fallback and integrate image parsing (#16150 )

## Summary

This PR fixes two issues discovered during testing of the PaddleOCR
async API refactoring:

### 1. PP-OCRv6 returns `ocrResults` instead of `layoutParsingResults`

Models like PP-OCRv6 are pure text recognition models that return
results in `ocrResults.prunedResult.rec_texts` format rather than the
`layoutParsingResults.prunedResult.parsing_res_list` format used by
layout-aware models (PaddleOCR-VL series).

**Changes:**
- `deepdoc/parser/paddleocr_parser.py`: Extract `ocrResults` alongside
`layoutParsingResults` in `_send_request()`, add fallback logic in
`_transfer_to_sections()` and `parse_image()`
- `internal/entity/models/paddleocr.go`: Add `ocrResults` struct and
fallback extraction in Go OCR handler

### 2. Image parsing not integrated into picture chunker

The `parse_image()` method existed in PaddleOCRParser but was never
called from `rag/app/picture.py` (the module that handles image file
uploads). Users configuring PaddleOCR as their layout recognizer would
still get local deepdoc OCR for images.

**Changes:**
- `rag/app/picture.py`: When `layout_recognize` is set to PaddleOCR, use
`PaddleOCROcrModel.parse_image()` instead of local OCR. Falls back
gracefully to local OCR on failure.

## Testing

Verified end-to-end in Docker:
- PaddleOCR-VL-1.6 PDF parsing: ✅ (10 text blocks with bbox)
- PaddleOCR-VL-1.6 image parsing: ✅ (219 chars)
- PP-OCRv6 PDF parsing with ocrResults fallback: ✅ (10 text blocks)
- PP-OCRv6 image parsing with ocrResults fallback: ✅ (136 chars)

## Related PRs

- #15967 (merged) - PaddleOCR async Job API refactoring + new models
- #16086 (merged) - PaddleOCR image parsing support

2026-06-23 22:02:54 +08:00

..

fix(paddleocr): support PP-OCRv6 ocrResults fallback and integrate image parsing (#16150 )

2026-06-23 22:02:54 +08:00

api_token.go

feat[Go]: implement /api/v1/agents/<agent_id>/sessions (#15705 )

2026-06-08 16:26:27 +08:00

base.go

Go: use NATS as the message queue (#15327 )

2026-06-12 14:56:44 +08:00

canvas.go

feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641 )

2026-06-05 13:22:23 +08:00

chat_channel.go

feat(go/dao): migrate chat channel database entity and DAO to Go (#16055 )

2026-06-17 11:26:13 +08:00

chat.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

connector.go

feat[Go]: implement delete/ rebuild/ listlog api for connector (#15300 )

2026-05-28 16:44:35 +08:00

document.go

Go: align document list response (#14982 )

2026-05-18 20:00:11 +08:00

evaluation.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

file_commit.go

Add git-like file commit API (#15978 )

2026-06-15 11:19:56 +08:00

file.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

ingestion_task.go

Go: use NATS as the message queue (#15327 )

2026-06-12 14:56:44 +08:00

kb.go

Implement retrieval_test in GO (#14231 )

2026-04-24 15:30:14 +08:00

license.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

llm.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

mcp.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

memory.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

pipeline.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

search.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

skill_search.go

GO: align time units with Python and centralize timestamp injection in BaseModel (#14875 )

2026-05-14 13:46:46 +08:00

skill_space.go

GO: align time units with Python and centralize timestamp injection in BaseModel (#14875 )

2026-05-14 13:46:46 +08:00

system.go

GO: align time units with Python and centralize timestamp injection in BaseModel (#14875 )

2026-05-14 13:46:46 +08:00

task.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

tenant_llm.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

tenant_model_group_mapping.go

Refactor model provider and command (#13887 )

2026-04-02 20:20:35 +08:00

tenant_model_group.go

Refactor model provider and command (#13887 )

2026-04-02 20:20:35 +08:00

tenant_model_instance.go

Fix: model-provider bugs (#15460 )

2026-06-02 13:24:53 +08:00

tenant_model_provider.go

Refactor model provider and command (#13887 )

2026-04-02 20:20:35 +08:00

tenant_model.go

Go: update db model (#14423 )

2026-04-28 16:04:55 +08:00

tenant.go

Go: fix register user (#16058 )

2026-06-16 14:03:53 +08:00

time_record.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

types.go

Use GetChatModel, remove duplicate functions in model_service.go (#14546 )

2026-05-06 11:33:32 +08:00

user_tenant.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

user.go

Fix auto migration issue (#16081 )

2026-06-16 17:02:35 +08:00