Files
ragflow/deepdoc/server/README.md

205 lines
4.4 KiB
Markdown
Raw Normal View History

# OSS DeepDoc HTTP API Service
Serves DLA (Document Layout Analysis), OCR (Optical Character Recognition), and
TSR (Table Structure Recognition) models via a unified HTTP API using
[LitServe](https://github.com/Lightning-AI/litserve) and OSS ONNX Runtime models.
## Quick Start
```bash
# Build
docker build -f Dockerfile_deepdoc_oss -t deepdoc_oss:latest .
# Run (CPU only; no GPU required)
docker run -p 9390:9390 deepdoc_oss:latest
# Or via docker compose
docker compose -f docker/docker-compose.yml up -d
```
The service listens on port **9390** by default. Pass `--port` to change it:
```bash
python deepdoc/server/deepdoc_server.py --port 9000 --model-dir /path/to/models
```
## Endpoints
All prediction endpoints accept JPEG images via `multipart/form-data`. The form
field for file uploads is named `request`.
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Liveness probe. Returns `ok`. |
| `GET` | `/model` | Model metadata. Returns `{"model":"oss","version":"1.0"}`. |
| `POST` | `/predict/dla` | Document Layout Analysis. |
| `POST` | `/predict/tsr` | Table Structure Recognition. |
| `POST` | `/predict/ocr` | OCR — use form field `operator=det` for detection or `operator=rec` for recognition. |
### `POST /predict/dla`
Analyzes a full page image and returns labelled layout regions.
**Request**
```
curl -X POST http://localhost:9390/predict/dla \
-F "request=@page.jpg;type=image/jpeg"
```
**Response**
```json
{
"bboxes": [
[x0, y0, x1, y1, score, class_id],
...
]
}
```
| class_id | Label |
|:--------:|-------|
| 0 | title |
| 1 | text |
| 2 | reference |
| 3 | figure |
| 4 | figure caption |
| 5 | table |
| 6 | table caption |
| 8 | equation |
> The OSS model uses 8 unique class IDs. IDs 7 and 9 are reserved for
> compatibility with the SaaS label scheme but are never produced by the
> OSS model.
### `POST /predict/tsr`
Recognizes table structure from a cropped table image.
**Request**
```
curl -X POST http://localhost:9390/predict/tsr \
-F "request=@table_crop.jpg;type=image/jpeg"
```
**Response**
```json
{
"bboxes": [
[x0, y0, x1, y1, score, class_id],
...
]
}
```
| class_id | Label |
|:--------:|-------|
| 0 | table |
| 1 | table column |
| 2 | table row |
| 3 | table column header |
| 4 | table projected row header |
| 5 | table spanning cell |
### `POST /predict/ocr`
Two modes controlled by the `operator` form field.
#### Detection (`operator=det`)
Returns quadrilateral bounding boxes for detected text regions.
```
curl -X POST "http://localhost:9390/predict/ocr" \
-F "operator=det" \
-F "request=@page.jpg;type=image/jpeg"
```
**Response** (5-level nested array):
```json
{
"output": [
[
[
[
[[x0,y0],[x1,y1],[x2,y2],[x3,y3]],
...
]
]
]
]
}
```
#### Recognition (`operator=rec`)
Recognizes text within a cropped region.
```
curl -X POST "http://localhost:9390/predict/ocr" \
-F "operator=rec" \
-F "request=@char_crop.jpg;type=image/jpeg"
```
**Response** (4-level nested array):
```json
{
"output": [
[
[
["recognized text", 1.0],
...
]
]
]
}
```
> Confidence is always `1.0` — the OSS recognition model does not return
> per-character confidence scores.
## Error Responses
| Scenario | HTTP Status |
|----------|:-----------:|
| Missing `operator` field (OCR) | 400 |
| Invalid `operator` value | 400 |
| Empty or corrupt image | 400 |
| Image exceeds 4096×4096 | 400 |
| Internal inference error | 500 |
## Models
All ONNX models are from the [InfiniFlow/deepdoc](https://huggingface.co/InfiniFlow/deepdoc)
HuggingFace repository (Apache 2.0 license):
| File | Size | Purpose |
|------|------|---------|
| `layout.onnx` | 75.7 MB | DLA (YOLOv10) |
| `det.onnx` | 4.7 MB | OCR text detection (PP-OCRv4) |
| `rec.onnx` | 10.8 MB | OCR text recognition (PP-OCRv4) |
| `tsr.onnx` | 12.2 MB | TSR (PaddleDetection) |
| `ocr.res` | 26 KB | OCR character dictionary |
## Architecture
```
deepdoc/server/
├── deepdoc_server.py # LitServe entry point
├── endpoints/ # LitAPI endpoints (HTTP layer)
│ ├── dla_endpoint.py
│ ├── tsr_endpoint.py
│ └── ocr_endpoint.py
└── adapters/ # Model wrappers (inference + format conversion)
├── dla_adapter.py
├── tsr_adapter.py
└── ocr_adapter.py
```
Endpoints → Adapters → `deepdoc/vision/` (reused OSS model classes) → ONNX Runtime.