### What problem does this PR solve? Http API based on onnx model. pdf_parser.py to golang ### Type of change - [x] Refactoring
4.4 KiB
OSS DeepDoc HTTP API Service
Serves DLA (Document Layout Analysis), OCR (Optical Character Recognition), and TSR (Table Structure Recognition) models via a unified HTTP API using LitServe and OSS ONNX Runtime models.
Quick Start
# Build
docker build -f Dockerfile_deepdoc_oss -t deepdoc_oss:latest .
# Run (CPU only; no GPU required)
docker run -p 9390:9390 deepdoc_oss:latest
# Or via docker compose
docker compose -f docker/docker-compose.yml up -d
The service listens on port 9390 by default. Pass --port to change it:
python deepdoc/server/deepdoc_server.py --port 9000 --model-dir /path/to/models
Endpoints
All prediction endpoints accept JPEG images via multipart/form-data. The form
field for file uploads is named request.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness probe. Returns ok. |
GET |
/model |
Model metadata. Returns {"model":"oss","version":"1.0"}. |
POST |
/predict/dla |
Document Layout Analysis. |
POST |
/predict/tsr |
Table Structure Recognition. |
POST |
/predict/ocr |
OCR — use form field operator=det for detection or operator=rec for recognition. |
POST /predict/dla
Analyzes a full page image and returns labelled layout regions.
Request
curl -X POST http://localhost:9390/predict/dla \
-F "request=@page.jpg;type=image/jpeg"
Response
{
"bboxes": [
[x0, y0, x1, y1, score, class_id],
...
]
}
| class_id | Label |
|---|---|
| 0 | title |
| 1 | text |
| 2 | reference |
| 3 | figure |
| 4 | figure caption |
| 5 | table |
| 6 | table caption |
| 8 | equation |
The OSS model uses 8 unique class IDs. IDs 7 and 9 are reserved for compatibility with the SaaS label scheme but are never produced by the OSS model.
POST /predict/tsr
Recognizes table structure from a cropped table image.
Request
curl -X POST http://localhost:9390/predict/tsr \
-F "request=@table_crop.jpg;type=image/jpeg"
Response
{
"bboxes": [
[x0, y0, x1, y1, score, class_id],
...
]
}
| class_id | Label |
|---|---|
| 0 | table |
| 1 | table column |
| 2 | table row |
| 3 | table column header |
| 4 | table projected row header |
| 5 | table spanning cell |
POST /predict/ocr
Two modes controlled by the operator form field.
Detection (operator=det)
Returns quadrilateral bounding boxes for detected text regions.
curl -X POST "http://localhost:9390/predict/ocr" \
-F "operator=det" \
-F "request=@page.jpg;type=image/jpeg"
Response (5-level nested array):
{
"output": [
[
[
[
[[x0,y0],[x1,y1],[x2,y2],[x3,y3]],
...
]
]
]
]
}
Recognition (operator=rec)
Recognizes text within a cropped region.
curl -X POST "http://localhost:9390/predict/ocr" \
-F "operator=rec" \
-F "request=@char_crop.jpg;type=image/jpeg"
Response (4-level nested array):
{
"output": [
[
[
["recognized text", 1.0],
...
]
]
]
}
Confidence is always
1.0— the OSS recognition model does not return per-character confidence scores.
Error Responses
| Scenario | HTTP Status |
|---|---|
Missing operator field (OCR) |
400 |
Invalid operator value |
400 |
| Empty or corrupt image | 400 |
| Image exceeds 4096×4096 | 400 |
| Internal inference error | 500 |
Models
All ONNX models are from the InfiniFlow/deepdoc HuggingFace repository (Apache 2.0 license):
| File | Size | Purpose |
|---|---|---|
layout.onnx |
75.7 MB | DLA (YOLOv10) |
det.onnx |
4.7 MB | OCR text detection (PP-OCRv4) |
rec.onnx |
10.8 MB | OCR text recognition (PP-OCRv4) |
tsr.onnx |
12.2 MB | TSR (PaddleDetection) |
ocr.res |
26 KB | OCR character dictionary |
Architecture
deepdoc/server/
├── deepdoc_server.py # LitServe entry point
├── endpoints/ # LitAPI endpoints (HTTP layer)
│ ├── dla_endpoint.py
│ ├── tsr_endpoint.py
│ └── ocr_endpoint.py
└── adapters/ # Model wrappers (inference + format conversion)
├── dla_adapter.py
├── tsr_adapter.py
└── ocr_adapter.py
Endpoints → Adapters → deepdoc/vision/ (reused OSS model classes) → ONNX Runtime.