Files
ragflow/deepdoc/server/README.md
Jack 304d9e02bb Refactor: migrate pdf_parser.py to golang (#16323)
### What problem does this PR solve?

Http API based on onnx model.
pdf_parser.py to golang

### Type of change

- [x] Refactoring
2026-06-25 20:16:16 +08:00

205 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OSS DeepDoc HTTP API Service
Serves DLA (Document Layout Analysis), OCR (Optical Character Recognition), and
TSR (Table Structure Recognition) models via a unified HTTP API using
[LitServe](https://github.com/Lightning-AI/litserve) and OSS ONNX Runtime models.
## Quick Start
```bash
# Build
docker build -f Dockerfile_deepdoc_oss -t deepdoc_oss:latest .
# Run (CPU only; no GPU required)
docker run -p 9390:9390 deepdoc_oss:latest
# Or via docker compose
docker compose -f docker/docker-compose.yml up -d
```
The service listens on port **9390** by default. Pass `--port` to change it:
```bash
python deepdoc/server/deepdoc_server.py --port 9000 --model-dir /path/to/models
```
## Endpoints
All prediction endpoints accept JPEG images via `multipart/form-data`. The form
field for file uploads is named `request`.
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Liveness probe. Returns `ok`. |
| `GET` | `/model` | Model metadata. Returns `{"model":"oss","version":"1.0"}`. |
| `POST` | `/predict/dla` | Document Layout Analysis. |
| `POST` | `/predict/tsr` | Table Structure Recognition. |
| `POST` | `/predict/ocr` | OCR — use form field `operator=det` for detection or `operator=rec` for recognition. |
### `POST /predict/dla`
Analyzes a full page image and returns labelled layout regions.
**Request**
```
curl -X POST http://localhost:9390/predict/dla \
-F "request=@page.jpg;type=image/jpeg"
```
**Response**
```json
{
"bboxes": [
[x0, y0, x1, y1, score, class_id],
...
]
}
```
| class_id | Label |
|:--------:|-------|
| 0 | title |
| 1 | text |
| 2 | reference |
| 3 | figure |
| 4 | figure caption |
| 5 | table |
| 6 | table caption |
| 8 | equation |
> The OSS model uses 8 unique class IDs. IDs 7 and 9 are reserved for
> compatibility with the SaaS label scheme but are never produced by the
> OSS model.
### `POST /predict/tsr`
Recognizes table structure from a cropped table image.
**Request**
```
curl -X POST http://localhost:9390/predict/tsr \
-F "request=@table_crop.jpg;type=image/jpeg"
```
**Response**
```json
{
"bboxes": [
[x0, y0, x1, y1, score, class_id],
...
]
}
```
| class_id | Label |
|:--------:|-------|
| 0 | table |
| 1 | table column |
| 2 | table row |
| 3 | table column header |
| 4 | table projected row header |
| 5 | table spanning cell |
### `POST /predict/ocr`
Two modes controlled by the `operator` form field.
#### Detection (`operator=det`)
Returns quadrilateral bounding boxes for detected text regions.
```
curl -X POST "http://localhost:9390/predict/ocr" \
-F "operator=det" \
-F "request=@page.jpg;type=image/jpeg"
```
**Response** (5-level nested array):
```json
{
"output": [
[
[
[
[[x0,y0],[x1,y1],[x2,y2],[x3,y3]],
...
]
]
]
]
}
```
#### Recognition (`operator=rec`)
Recognizes text within a cropped region.
```
curl -X POST "http://localhost:9390/predict/ocr" \
-F "operator=rec" \
-F "request=@char_crop.jpg;type=image/jpeg"
```
**Response** (4-level nested array):
```json
{
"output": [
[
[
["recognized text", 1.0],
...
]
]
]
}
```
> Confidence is always `1.0` — the OSS recognition model does not return
> per-character confidence scores.
## Error Responses
| Scenario | HTTP Status |
|----------|:-----------:|
| Missing `operator` field (OCR) | 400 |
| Invalid `operator` value | 400 |
| Empty or corrupt image | 400 |
| Image exceeds 4096×4096 | 400 |
| Internal inference error | 500 |
## Models
All ONNX models are from the [InfiniFlow/deepdoc](https://huggingface.co/InfiniFlow/deepdoc)
HuggingFace repository (Apache 2.0 license):
| File | Size | Purpose |
|------|------|---------|
| `layout.onnx` | 75.7 MB | DLA (YOLOv10) |
| `det.onnx` | 4.7 MB | OCR text detection (PP-OCRv4) |
| `rec.onnx` | 10.8 MB | OCR text recognition (PP-OCRv4) |
| `tsr.onnx` | 12.2 MB | TSR (PaddleDetection) |
| `ocr.res` | 26 KB | OCR character dictionary |
## Architecture
```
deepdoc/server/
├── deepdoc_server.py # LitServe entry point
├── endpoints/ # LitAPI endpoints (HTTP layer)
│ ├── dla_endpoint.py
│ ├── tsr_endpoint.py
│ └── ocr_endpoint.py
└── adapters/ # Model wrappers (inference + format conversion)
├── dla_adapter.py
├── tsr_adapter.py
└── ocr_adapter.py
```
Endpoints → Adapters → `deepdoc/vision/` (reused OSS model classes) → ONNX Runtime.