mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Files

Jack 304d9e02bb Refactor: migrate pdf_parser.py to golang (#16323 )

### What problem does this PR solve?

Http API based on onnx model.
pdf_parser.py to golang

### Type of change

- [x] Refactoring

2026-06-25 20:16:16 +08:00

4.4 KiB

Raw Permalink Blame History

OSS DeepDoc HTTP API Service

Serves DLA (Document Layout Analysis), OCR (Optical Character Recognition), and TSR (Table Structure Recognition) models via a unified HTTP API using LitServe and OSS ONNX Runtime models.

Quick Start

# Build
docker build -f Dockerfile_deepdoc_oss -t deepdoc_oss:latest .

# Run (CPU only; no GPU required)
docker run -p 9390:9390 deepdoc_oss:latest

# Or via docker compose
docker compose -f docker/docker-compose.yml up -d

The service listens on port 9390 by default. Pass --port to change it:

python deepdoc/server/deepdoc_server.py --port 9000 --model-dir /path/to/models

Endpoints

All prediction endpoints accept JPEG images via multipart/form-data. The form field for file uploads is named request.

Method	Path	Description
`GET`	`/health`	Liveness probe. Returns `ok`.
`GET`	`/model`	Model metadata. Returns `{"model":"oss","version":"1.0"}`.
`POST`	`/predict/dla`	Document Layout Analysis.
`POST`	`/predict/tsr`	Table Structure Recognition.
`POST`	`/predict/ocr`	OCR — use form field `operator=det` for detection or `operator=rec` for recognition.

`POST /predict/dla`

Analyzes a full page image and returns labelled layout regions.

Request

curl -X POST http://localhost:9390/predict/dla \
  -F "request=@page.jpg;type=image/jpeg"

Response

{
  "bboxes": [
    [x0, y0, x1, y1, score, class_id],
    ...
  ]
}

class_id	Label
0	title
1	text
2	reference
3	figure
4	figure caption
5	table
6	table caption
8	equation

The OSS model uses 8 unique class IDs. IDs 7 and 9 are reserved for compatibility with the SaaS label scheme but are never produced by the OSS model.

`POST /predict/tsr`

Recognizes table structure from a cropped table image.

Request

curl -X POST http://localhost:9390/predict/tsr \
  -F "request=@table_crop.jpg;type=image/jpeg"

Response

{
  "bboxes": [
    [x0, y0, x1, y1, score, class_id],
    ...
  ]
}

class_id	Label
0	table
1	table column
2	table row
3	table column header
4	table projected row header
5	table spanning cell

`POST /predict/ocr`

Two modes controlled by the operator form field.

Detection (`operator=det`)

Returns quadrilateral bounding boxes for detected text regions.

curl -X POST "http://localhost:9390/predict/ocr" \
  -F "operator=det" \
  -F "request=@page.jpg;type=image/jpeg"

Response (5-level nested array):

{
  "output": [
    [
      [
        [
          [[x0,y0],[x1,y1],[x2,y2],[x3,y3]],
          ...
        ]
      ]
    ]
  ]
}

Recognition (`operator=rec`)

Recognizes text within a cropped region.

curl -X POST "http://localhost:9390/predict/ocr" \
  -F "operator=rec" \
  -F "request=@char_crop.jpg;type=image/jpeg"

Response (4-level nested array):

{
  "output": [
    [
      [
        ["recognized text", 1.0],
        ...
      ]
    ]
  ]
}

Confidence is always 1.0 — the OSS recognition model does not return per-character confidence scores.

Error Responses

Scenario	HTTP Status
Missing `operator` field (OCR)	400
Invalid `operator` value	400
Empty or corrupt image	400
Image exceeds 4096×4096	400
Internal inference error	500

Models

All ONNX models are from the InfiniFlow/deepdoc HuggingFace repository (Apache 2.0 license):

File	Size	Purpose
`layout.onnx`	75.7 MB	DLA (YOLOv10)
`det.onnx`	4.7 MB	OCR text detection (PP-OCRv4)
`rec.onnx`	10.8 MB	OCR text recognition (PP-OCRv4)
`tsr.onnx`	12.2 MB	TSR (PaddleDetection)
`ocr.res`	26 KB	OCR character dictionary

Architecture

deepdoc/server/
├── deepdoc_server.py       # LitServe entry point
├── endpoints/            # LitAPI endpoints (HTTP layer)
│   ├── dla_endpoint.py
│   ├── tsr_endpoint.py
│   └── ocr_endpoint.py
└── adapters/             # Model wrappers (inference + format conversion)
    ├── dla_adapter.py
    ├── tsr_adapter.py
    └── ocr_adapter.py

Endpoints → Adapters → deepdoc/vision/ (reused OSS model classes) → ONNX Runtime.

4.4 KiB Raw Permalink Blame History Unescape Escape