mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
main
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bf41d35729 |
Go: implement PaddleOCR provider and implement ASR for CoHere (#14954)
### What problem does this PR solve?
This PR implement implement OCR for Baidu and Mistral, implement
PaddleOCR provider and implement ASR for CoHere
**Verified examples from the CLI:**
```
RAGFlow(user)> ocr with 'mistral-ocr-2512@test@mistral' file './internal/text.jpg'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
RAGFlow(user)> ocr with 'paddleocr-vl-0.9b@test@baidu' file './internal/text.jpg'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# PaddleOCR
RAGFlow(user)> ocr with 'PaddleOCR-VL-1.5@test@paddleocr' file './internal/test.pdf'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke
Nando Metzger
Photogra
Anton Obukhov
Rodrigo Caye Daudt
netry and Remote Sensing,
Shengyu Huang
Konrad Schindler
ETH Zürich
<div style="text-align: c... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# Cohere
RAGFlow(user)> asr with 'cohere-transcribe-03-2026@test@cohere' audio './internal/test.wav' param '{"language": "en"}'
+-----------------------------------------------------------------------------------------------------------------------+
| text |
+-----------------------------------------------------------------------------------------------------------------------+
| The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired. |
+-----------------------------------------------------------------------------------------------------------------------+
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
|
||
|
|
c2863173b0 |
Go: implement TTS, ASR for Siliconflow and TTs for StepFun (#14944)
### What problem does this PR solve?
This PRimplement TTS, ASR for Siliconflow and TTs for StepFun
**The following functionalities are now supported:**
**SiliConFlow:**
- [x] Text To Speech
- [x] Audio To Text
- [x] Stream Audio To Text
**StrepFun:**
- [x] Audio To Text
- [x] Stream Audio To Text
**Verified examples from the CLI:**
```plaintext
# SiliconFlow
RAGFlow(user)> tts with 'FunAudioLLM/CosyVoice2-0.5B@test@Siliconflow' text 'hello? show yourself' play format 'wav' param '{"voice": "fnlp/MOSS-TTSD-v0.5:alex"}'
SUCCESS
RAGFlow(user)> asr with 'FunAudioLLM/SenseVoiceSmall@test@siliconflow' audio './internal/test.wav' param ''
+----------------------------------------------------------------------------------------------------------------------+
| text |
+----------------------------------------------------------------------------------------------------------------------+
| The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. |
+----------------------------------------------------------------------------------------------------------------------+
RAGFlow(user)> stream asr with 'FunAudioLLM/SenseVoiceSmall@test@siliconflow' audio './internal/test.wav' param ''
+----------------------------------------------------------------------------------------------------------------------+
| text |
+----------------------------------------------------------------------------------------------------------------------+
| The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. |
+----------------------------------------------------------------------------------------------------------------------+
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
|
||
|
|
558ea51a0f |
Go: implement provider: StepFun (#14815)
### What problem does this PR solve? Add a Go driver for StepFun (阶跃星辰), one of the unchecked providers on the umbrella tracking issue #14736. Until this PR, a tenant who configured `stepfun` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. Chat, list models, and check connection all returned `"not implemented"` instead of reaching the StepFun API. The Python side has had StepFun registered in `rag/llm/__init__.py` as a `SupportedLiteLLMProvider` with base URL `https://api.stepfun.com/v1`, plus `StepFunCV` for vision and `StepFunSeq2txt` for ASR, but no Go path. StepFun's chat API is OpenAI-compatible, so the implementation pattern is the same as the merged Moonshot driver (#14433) and OpenAI driver (#14605). ### What this PR includes - New file `internal/entity/models/stepfun.go` with a `StepFunModel` that implements the `ModelDriver` interface. - `factory.go`: route the `"stepfun"` provider name to `NewStepFunModel`. - New `conf/models/stepfun.json` with the public StepFun chat models (step-2-16k, step-1 family in 8k/32k/128k/256k context lengths, step-1-flash, and the step-1v / step-1o vision models) and `url_suffix` entries for `chat` and `models`. ### How the driver works - StepFun exposes the OpenAI-compatible API at `https://api.stepfun.com/v1`. - `ChatWithMessages` and `ChatStreamlyWithSender` post to `/chat/completions` in the same shape as the merged moonshot, openrouter, and openai drivers. - `ListModels` and `CheckConnection` call `/models` to list available ids and confirm the API key works. - `Embed` is left as `"not implemented"`. StepFun has not advertised a public embeddings endpoint in the API reference linked from the umbrella issue (`https://platform.stepfun.com/docs/en/api-reference/chat/chat-completion-create` is the chat endpoint), so any real implementation belongs in a separate follow-up only after the endpoint is verified. - `Rerank` and `Balance` return `"no such method"` because StepFun does not expose either. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 with no errors on go 1.25 (the `go.mod` minimum). - Method set of `StepFunModel` matches the `ModelDriver` interface: `NewInstance`, `Name`, `ChatWithMessages`, `ChatStreamlyWithSender`, `Embed`, `Rerank`, `ListModels`, `Balance`, `CheckConnection`. - Pattern parity with the merged moonshot (#14433), openai (#14605), openrouter (#14652), and xai (#14550) drivers. Closes #14814 Tracking: #14736 |