mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
205 lines
4.4 KiB
Markdown
205 lines
4.4 KiB
Markdown
|
|
# OSS DeepDoc HTTP API Service
|
|||
|
|
|
|||
|
|
Serves DLA (Document Layout Analysis), OCR (Optical Character Recognition), and
|
|||
|
|
TSR (Table Structure Recognition) models via a unified HTTP API using
|
|||
|
|
[LitServe](https://github.com/Lightning-AI/litserve) and OSS ONNX Runtime models.
|
|||
|
|
|
|||
|
|
## Quick Start
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Build
|
|||
|
|
docker build -f Dockerfile_deepdoc_oss -t deepdoc_oss:latest .
|
|||
|
|
|
|||
|
|
# Run (CPU only; no GPU required)
|
|||
|
|
docker run -p 9390:9390 deepdoc_oss:latest
|
|||
|
|
|
|||
|
|
# Or via docker compose
|
|||
|
|
docker compose -f docker/docker-compose.yml up -d
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The service listens on port **9390** by default. Pass `--port` to change it:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python deepdoc/server/deepdoc_server.py --port 9000 --model-dir /path/to/models
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Endpoints
|
|||
|
|
|
|||
|
|
All prediction endpoints accept JPEG images via `multipart/form-data`. The form
|
|||
|
|
field for file uploads is named `request`.
|
|||
|
|
|
|||
|
|
| Method | Path | Description |
|
|||
|
|
|--------|------|-------------|
|
|||
|
|
| `GET` | `/health` | Liveness probe. Returns `ok`. |
|
|||
|
|
| `GET` | `/model` | Model metadata. Returns `{"model":"oss","version":"1.0"}`. |
|
|||
|
|
| `POST` | `/predict/dla` | Document Layout Analysis. |
|
|||
|
|
| `POST` | `/predict/tsr` | Table Structure Recognition. |
|
|||
|
|
| `POST` | `/predict/ocr` | OCR — use form field `operator=det` for detection or `operator=rec` for recognition. |
|
|||
|
|
|
|||
|
|
### `POST /predict/dla`
|
|||
|
|
|
|||
|
|
Analyzes a full page image and returns labelled layout regions.
|
|||
|
|
|
|||
|
|
**Request**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
curl -X POST http://localhost:9390/predict/dla \
|
|||
|
|
-F "request=@page.jpg;type=image/jpeg"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Response**
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"bboxes": [
|
|||
|
|
[x0, y0, x1, y1, score, class_id],
|
|||
|
|
...
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
| class_id | Label |
|
|||
|
|
|:--------:|-------|
|
|||
|
|
| 0 | title |
|
|||
|
|
| 1 | text |
|
|||
|
|
| 2 | reference |
|
|||
|
|
| 3 | figure |
|
|||
|
|
| 4 | figure caption |
|
|||
|
|
| 5 | table |
|
|||
|
|
| 6 | table caption |
|
|||
|
|
| 8 | equation |
|
|||
|
|
|
|||
|
|
> The OSS model uses 8 unique class IDs. IDs 7 and 9 are reserved for
|
|||
|
|
> compatibility with the SaaS label scheme but are never produced by the
|
|||
|
|
> OSS model.
|
|||
|
|
|
|||
|
|
### `POST /predict/tsr`
|
|||
|
|
|
|||
|
|
Recognizes table structure from a cropped table image.
|
|||
|
|
|
|||
|
|
**Request**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
curl -X POST http://localhost:9390/predict/tsr \
|
|||
|
|
-F "request=@table_crop.jpg;type=image/jpeg"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Response**
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"bboxes": [
|
|||
|
|
[x0, y0, x1, y1, score, class_id],
|
|||
|
|
...
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
| class_id | Label |
|
|||
|
|
|:--------:|-------|
|
|||
|
|
| 0 | table |
|
|||
|
|
| 1 | table column |
|
|||
|
|
| 2 | table row |
|
|||
|
|
| 3 | table column header |
|
|||
|
|
| 4 | table projected row header |
|
|||
|
|
| 5 | table spanning cell |
|
|||
|
|
|
|||
|
|
### `POST /predict/ocr`
|
|||
|
|
|
|||
|
|
Two modes controlled by the `operator` form field.
|
|||
|
|
|
|||
|
|
#### Detection (`operator=det`)
|
|||
|
|
|
|||
|
|
Returns quadrilateral bounding boxes for detected text regions.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
curl -X POST "http://localhost:9390/predict/ocr" \
|
|||
|
|
-F "operator=det" \
|
|||
|
|
-F "request=@page.jpg;type=image/jpeg"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Response** (5-level nested array):
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"output": [
|
|||
|
|
[
|
|||
|
|
[
|
|||
|
|
[
|
|||
|
|
[[x0,y0],[x1,y1],[x2,y2],[x3,y3]],
|
|||
|
|
...
|
|||
|
|
]
|
|||
|
|
]
|
|||
|
|
]
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Recognition (`operator=rec`)
|
|||
|
|
|
|||
|
|
Recognizes text within a cropped region.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
curl -X POST "http://localhost:9390/predict/ocr" \
|
|||
|
|
-F "operator=rec" \
|
|||
|
|
-F "request=@char_crop.jpg;type=image/jpeg"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Response** (4-level nested array):
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"output": [
|
|||
|
|
[
|
|||
|
|
[
|
|||
|
|
["recognized text", 1.0],
|
|||
|
|
...
|
|||
|
|
]
|
|||
|
|
]
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
> Confidence is always `1.0` — the OSS recognition model does not return
|
|||
|
|
> per-character confidence scores.
|
|||
|
|
|
|||
|
|
## Error Responses
|
|||
|
|
|
|||
|
|
| Scenario | HTTP Status |
|
|||
|
|
|----------|:-----------:|
|
|||
|
|
| Missing `operator` field (OCR) | 400 |
|
|||
|
|
| Invalid `operator` value | 400 |
|
|||
|
|
| Empty or corrupt image | 400 |
|
|||
|
|
| Image exceeds 4096×4096 | 400 |
|
|||
|
|
| Internal inference error | 500 |
|
|||
|
|
|
|||
|
|
## Models
|
|||
|
|
|
|||
|
|
All ONNX models are from the [InfiniFlow/deepdoc](https://huggingface.co/InfiniFlow/deepdoc)
|
|||
|
|
HuggingFace repository (Apache 2.0 license):
|
|||
|
|
|
|||
|
|
| File | Size | Purpose |
|
|||
|
|
|------|------|---------|
|
|||
|
|
| `layout.onnx` | 75.7 MB | DLA (YOLOv10) |
|
|||
|
|
| `det.onnx` | 4.7 MB | OCR text detection (PP-OCRv4) |
|
|||
|
|
| `rec.onnx` | 10.8 MB | OCR text recognition (PP-OCRv4) |
|
|||
|
|
| `tsr.onnx` | 12.2 MB | TSR (PaddleDetection) |
|
|||
|
|
| `ocr.res` | 26 KB | OCR character dictionary |
|
|||
|
|
|
|||
|
|
## Architecture
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
deepdoc/server/
|
|||
|
|
├── deepdoc_server.py # LitServe entry point
|
|||
|
|
├── endpoints/ # LitAPI endpoints (HTTP layer)
|
|||
|
|
│ ├── dla_endpoint.py
|
|||
|
|
│ ├── tsr_endpoint.py
|
|||
|
|
│ └── ocr_endpoint.py
|
|||
|
|
└── adapters/ # Model wrappers (inference + format conversion)
|
|||
|
|
├── dla_adapter.py
|
|||
|
|
├── tsr_adapter.py
|
|||
|
|
└── ocr_adapter.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Endpoints → Adapters → `deepdoc/vision/` (reused OSS model classes) → ONNX Runtime.
|