mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
### What problem does this PR solve? Http API based on onnx model. pdf_parser.py to golang ### Type of change - [x] Refactoring
205 lines
4.4 KiB
Markdown
205 lines
4.4 KiB
Markdown
# OSS DeepDoc HTTP API Service
|
||
|
||
Serves DLA (Document Layout Analysis), OCR (Optical Character Recognition), and
|
||
TSR (Table Structure Recognition) models via a unified HTTP API using
|
||
[LitServe](https://github.com/Lightning-AI/litserve) and OSS ONNX Runtime models.
|
||
|
||
## Quick Start
|
||
|
||
```bash
|
||
# Build
|
||
docker build -f Dockerfile_deepdoc_oss -t deepdoc_oss:latest .
|
||
|
||
# Run (CPU only; no GPU required)
|
||
docker run -p 9390:9390 deepdoc_oss:latest
|
||
|
||
# Or via docker compose
|
||
docker compose -f docker/docker-compose.yml up -d
|
||
```
|
||
|
||
The service listens on port **9390** by default. Pass `--port` to change it:
|
||
|
||
```bash
|
||
python deepdoc/server/deepdoc_server.py --port 9000 --model-dir /path/to/models
|
||
```
|
||
|
||
## Endpoints
|
||
|
||
All prediction endpoints accept JPEG images via `multipart/form-data`. The form
|
||
field for file uploads is named `request`.
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| `GET` | `/health` | Liveness probe. Returns `ok`. |
|
||
| `GET` | `/model` | Model metadata. Returns `{"model":"oss","version":"1.0"}`. |
|
||
| `POST` | `/predict/dla` | Document Layout Analysis. |
|
||
| `POST` | `/predict/tsr` | Table Structure Recognition. |
|
||
| `POST` | `/predict/ocr` | OCR — use form field `operator=det` for detection or `operator=rec` for recognition. |
|
||
|
||
### `POST /predict/dla`
|
||
|
||
Analyzes a full page image and returns labelled layout regions.
|
||
|
||
**Request**
|
||
|
||
```
|
||
curl -X POST http://localhost:9390/predict/dla \
|
||
-F "request=@page.jpg;type=image/jpeg"
|
||
```
|
||
|
||
**Response**
|
||
|
||
```json
|
||
{
|
||
"bboxes": [
|
||
[x0, y0, x1, y1, score, class_id],
|
||
...
|
||
]
|
||
}
|
||
```
|
||
|
||
| class_id | Label |
|
||
|:--------:|-------|
|
||
| 0 | title |
|
||
| 1 | text |
|
||
| 2 | reference |
|
||
| 3 | figure |
|
||
| 4 | figure caption |
|
||
| 5 | table |
|
||
| 6 | table caption |
|
||
| 8 | equation |
|
||
|
||
> The OSS model uses 8 unique class IDs. IDs 7 and 9 are reserved for
|
||
> compatibility with the SaaS label scheme but are never produced by the
|
||
> OSS model.
|
||
|
||
### `POST /predict/tsr`
|
||
|
||
Recognizes table structure from a cropped table image.
|
||
|
||
**Request**
|
||
|
||
```
|
||
curl -X POST http://localhost:9390/predict/tsr \
|
||
-F "request=@table_crop.jpg;type=image/jpeg"
|
||
```
|
||
|
||
**Response**
|
||
|
||
```json
|
||
{
|
||
"bboxes": [
|
||
[x0, y0, x1, y1, score, class_id],
|
||
...
|
||
]
|
||
}
|
||
```
|
||
|
||
| class_id | Label |
|
||
|:--------:|-------|
|
||
| 0 | table |
|
||
| 1 | table column |
|
||
| 2 | table row |
|
||
| 3 | table column header |
|
||
| 4 | table projected row header |
|
||
| 5 | table spanning cell |
|
||
|
||
### `POST /predict/ocr`
|
||
|
||
Two modes controlled by the `operator` form field.
|
||
|
||
#### Detection (`operator=det`)
|
||
|
||
Returns quadrilateral bounding boxes for detected text regions.
|
||
|
||
```
|
||
curl -X POST "http://localhost:9390/predict/ocr" \
|
||
-F "operator=det" \
|
||
-F "request=@page.jpg;type=image/jpeg"
|
||
```
|
||
|
||
**Response** (5-level nested array):
|
||
|
||
```json
|
||
{
|
||
"output": [
|
||
[
|
||
[
|
||
[
|
||
[[x0,y0],[x1,y1],[x2,y2],[x3,y3]],
|
||
...
|
||
]
|
||
]
|
||
]
|
||
]
|
||
}
|
||
```
|
||
|
||
#### Recognition (`operator=rec`)
|
||
|
||
Recognizes text within a cropped region.
|
||
|
||
```
|
||
curl -X POST "http://localhost:9390/predict/ocr" \
|
||
-F "operator=rec" \
|
||
-F "request=@char_crop.jpg;type=image/jpeg"
|
||
```
|
||
|
||
**Response** (4-level nested array):
|
||
|
||
```json
|
||
{
|
||
"output": [
|
||
[
|
||
[
|
||
["recognized text", 1.0],
|
||
...
|
||
]
|
||
]
|
||
]
|
||
}
|
||
```
|
||
|
||
> Confidence is always `1.0` — the OSS recognition model does not return
|
||
> per-character confidence scores.
|
||
|
||
## Error Responses
|
||
|
||
| Scenario | HTTP Status |
|
||
|----------|:-----------:|
|
||
| Missing `operator` field (OCR) | 400 |
|
||
| Invalid `operator` value | 400 |
|
||
| Empty or corrupt image | 400 |
|
||
| Image exceeds 4096×4096 | 400 |
|
||
| Internal inference error | 500 |
|
||
|
||
## Models
|
||
|
||
All ONNX models are from the [InfiniFlow/deepdoc](https://huggingface.co/InfiniFlow/deepdoc)
|
||
HuggingFace repository (Apache 2.0 license):
|
||
|
||
| File | Size | Purpose |
|
||
|------|------|---------|
|
||
| `layout.onnx` | 75.7 MB | DLA (YOLOv10) |
|
||
| `det.onnx` | 4.7 MB | OCR text detection (PP-OCRv4) |
|
||
| `rec.onnx` | 10.8 MB | OCR text recognition (PP-OCRv4) |
|
||
| `tsr.onnx` | 12.2 MB | TSR (PaddleDetection) |
|
||
| `ocr.res` | 26 KB | OCR character dictionary |
|
||
|
||
## Architecture
|
||
|
||
```
|
||
deepdoc/server/
|
||
├── deepdoc_server.py # LitServe entry point
|
||
├── endpoints/ # LitAPI endpoints (HTTP layer)
|
||
│ ├── dla_endpoint.py
|
||
│ ├── tsr_endpoint.py
|
||
│ └── ocr_endpoint.py
|
||
└── adapters/ # Model wrappers (inference + format conversion)
|
||
├── dla_adapter.py
|
||
├── tsr_adapter.py
|
||
└── ocr_adapter.py
|
||
```
|
||
|
||
Endpoints → Adapters → `deepdoc/vision/` (reused OSS model classes) → ONNX Runtime.
|