ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-02 16:55:42 +08:00

Author	SHA1	Message	Date
JPette1783	daa3811165	feat(models): add shared HTTP client, SSE parser, and stub helpers for Go model drivers (#15821 ) ### What problem does this PR solve? The Go model-driver layer () has ~38,700 lines across 109 files. Roughly 74% of that is boilerplate duplicated into every driver: identical HTTP client setup, the same 65-line SSE scanner loop, and 10-11 one-line "not supported" stub methods per driver. Any fix must be manually propagated to every file. Closes #15820. This PR establishes the three shared utility files that form the foundation for incremental driver migration: --- ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Haruko386 <tryeverypossible@163.com>	2026-06-11 19:20:12 +08:00
Haruko386	84edf539e7	Go: Refactor list-models func (#15900 ) ### What problem does this PR solve? As title Issue: #15853 ### Type of change - [x] Refactoring	2026-06-11 13:32:50 +08:00
Jin Hai	719ce15c95	Go CLI: update list supported models (#15845 ) ### What problem does this PR solve? Now list supported models will show more info. ``` RAGFlow(api/default)> list supported models from 'gitee' 'test'; +-----------+------------+-------------+----------------------------------------------------------+---------------------------------------------+ \| dimension \| max_tokens \| model_types \| name \| thinking \| +-----------+------------+-------------+----------------------------------------------------------+---------------------------------------------+ \| \| \| \| Wan2.7 \| \| \| \| \| \| HappyHorse-1.0 \| \| \| \| \| \| Qwen3.6-27B@Qwen \| \| \| \| \| \| Qwen3.6-35B-A3B@Qwen \| \| \| \| 1048576 \| [chat] \| DeepSeek-V4-Flash@deepseek-ai \| map[clear_thinking:true default_value:true] \| \| \| 1048576 \| [chat] \| DeepSeek-V4-Pro@deepseek-ai \| map[clear_thinking:true default_value:true] \| +-----------+------------+-------------+----------------------------------------------------------+---------------------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-09 19:01:00 +08:00
Haruko386	baeb0c0431	Refactor[Go Model Provider]: refactor baseURL and modelConfig (#15627 ) ### What problem does this PR solve? As Title ### Type of change - [x] Refactoring	2026-06-04 17:50:22 +08:00
Jin Hai	d736f358ba	Go: refactor model provider (#15568 ) ### What problem does this PR solve? 1. Add license announcement 2. Add sanity check on API config 3. Add base class: BaseModel 4. Add GetBaseURL ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-03 16:33:58 +08:00
Jin Hai	dbebc66ba8	Go: refactor provider code (#15564 ) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-06-03 14:09:07 +08:00
Dexterity	2819d0ea24	fix(go-models): use per call context timeouts so long streaming responses are not truncated (#15380 ) ### What problem does this PR solve? Closes #15379 Around 29 Go model providers in `internal/entity/models/` share an `http.Client` configured with `Timeout: 120 * time.Second`, and reuse that same client for `ChatStreamlyWithSender`. Go's `http.Client.Timeout` is a hard ceiling on the whole request that also covers reading the response body, so it behaves as a wall clock on streaming. Any streamed chat response that lasts longer than 120 seconds gets cut off in the middle with a timeout error. Long generations, reasoning model outputs, and slow or overloaded upstreams are the common victims. The providers that already behave correctly (`groq`, `mistral`, `voyage`, `anthropic`) set no client `Timeout` and instead wrap each request in a `context.WithTimeout`. This change converges the affected providers onto that same pattern. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-06-02 15:27:26 +08:00
Dexterity	04aa8d04e8	fix(go-models): raise SSE scanner buffer so large stream chunks are not dropped (#15382 ) ### Summary Closes #15381 Every provider in `internal/entity/models/` reads its streaming response with `bufio.NewScanner(resp.Body)` and iterates over `scanner.Scan()`. The default `bufio.Scanner` maximum token size is 64KB, so when an upstream sends a single SSE `data:` line larger than 64KB (long content deltas, large tool or function call argument blobs, bundled `reasoning_content`, or providers that emit a whole message in one event) `scanner.Scan()` returns `false` and `scanner.Err()` returns `bufio.ErrTooLong`. Streaming chat then ends with an error partway through the response. This change adds `scanner.Buffer(make([]byte, 641024), 10241024)` immediately after every SSE scanner that was still bare, raising the cap to 1MB. 1MB is the value already used for streaming chat in `openai.go`, `modelscope.go`, `groq.go`, `mistral.go`, `xai.go` and the other already patched providers (the 8MB cap in the repo is reserved for TTS and embedding paths), so this simply converges the remaining providers onto the established pattern. Nothing else changes: line parsing, `data:` prefix handling, `[DONE]` detection, JSON unmarshalling, error handling, and the existing `scanner.Err()` checks all stay the same. Providers covered (23 scanners across 22 files): 302ai, aliyun, baichuan, baidu, cohere, deepinfra, deepseek, gitee, huggingface, lmstudio, minimax (the chat scanner, whose TTS scanner was already bumped), moonshot, nvidia, ollama, openrouter, orcarouter, paddleocr, siliconflow, tokenhub, vllm, volcengine, xunfei, zhipu-ai. `jiekouai.go` is excluded because it is covered by the in flight #15337. A table driven regression test (`sse_scanner_buffer_test.go`) streams a single 128KB `data:` content delta followed by `data: [DONE]` through an `httptest` server and asserts that `ChatStreamlyWithSender` delivers the full content with no error across a representative subset of providers. Without the buffer fix the test fails with `bufio.Scanner: token too long`. This PR also removes three duplicate declarations of the package level `roundTripperFunc` test helper that several recently merged provider PRs each added independently, which had left the `internal/entity/models` test package unable to compile. The helper now lives in a single place and is shared. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-29 19:34:00 +08:00
Haruko386	bf41d35729	Go: implement PaddleOCR provider and implement ASR for CoHere (#14954 ) ### What problem does this PR solve? This PR implement implement OCR for Baidu and Mistral, implement PaddleOCR provider and implement ASR for CoHere Verified examples from the CLI: ``` RAGFlow(user)> ocr with 'mistral-ocr-2512@test@mistral' file './internal/text.jpg' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ RAGFlow(user)> ocr with 'paddleocr-vl-0.9b@test@baidu' file './internal/text.jpg' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # PaddleOCR RAGFlow(user)> ocr with 'PaddleOCR-VL-1.5@test@paddleocr' file './internal/test.pdf' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Bingxin Ke Nando Metzger Photogra Anton Obukhov Rodrigo Caye Daudt netry and Remote Sensing, Shengyu Huang Konrad Schindler ETH Zürich <div style="text-align: c... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # Cohere RAGFlow(user)> asr with 'cohere-transcribe-03-2026@test@cohere' audio './internal/test.wav' param '{"language": "en"}' +-----------------------------------------------------------------------------------------------------------------------+ \| text \| +-----------------------------------------------------------------------------------------------------------------------+ \| The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired. \| +-----------------------------------------------------------------------------------------------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-15 18:41:43 +08:00
Jin Hai	3a5df08c76	Go: add file parse command (#14892 ) ### What problem does this PR solve? ``` RAGFlow(user)> ocr with 'hunyuanocr@test@gitee' file './picture.png' +----------------------------------------------------------+ \| text \| +----------------------------------------------------------+ \| 生活不是等待风暴过去，而是学会在雨中翩翩起舞。 ——佚名 \| +----------------------------------------------------------+ RAGFlow(user)> list 'test@gitee' tasks; +---------+----------------------------------+ \| status \| task_id \| +---------+----------------------------------+ \| success \| C3FX4MQNKY5MGC6ZFMIXIAMJKHCEBQB5 \| +---------+----------------------------------+ RAGFlow(user)> show 'test@gitee' task 'C3FX4MQNKY5MGC6ZFMIXIAMJKHCEBQB5'; +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| content \| index \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| # PDF 1: Purpose of RAGFlow RAGFlow is an open source Retrieval-Augmented Generation (RAG) engine designed to turn raw documents into reliable context for large language models.Its purpose is to make it practical to build an Al assistant that can ans... \| 1 \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-15 12:29:52 +08:00
Jin Hai	b18640d228	Go: fix OCR command (#14891 ) ### What problem does this PR solve? RAGFlow(user)> ocr with 'hunyuanocr@test@gitee' file './picture.png' +----------------------------------------------------------+ \| text \| +----------------------------------------------------------+ \| 生活不是等待风暴过去，而是学会在雨中翩翩起舞。 ——佚名 \| +----------------------------------------------------------+ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-13 17:29:53 +08:00
Jin Hai	d08bf02d9b	Go: add ASR, TTS, OCR command (#14836 ) ### What problem does this PR solve? ``` RAGFlow(user)> asr with 'glm-asr-2512@test@zhipu-ai' audio './speech.wav'; CLI error: zhipu, no such method RAGFlow(user)> stream asr with 'glm-asr-2512@test@zhipu-ai' audio './speech.wav'; CLI error: zhipu, no such method RAGFlow(user)> tts with 'glm-tts@test@zhipu-ai' text 'how are you'; CLI error: zhipu, no such method RAGFlow(user)> stream tts with 'glm-tts@test@zhipu-ai' text 'how are you'; CLI error: zhipu, no such method RAGFlow(user)> ocr with 'glm-ocr@test@zhipu-ai' file './test.log'; CLI error: zhipu, no such method ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 17:17:44 +08:00
Renzo	39ee2fb120	Go: implement Rerank in NVIDIA driver (#14778 ) ## Summary - Replaces the `"no such method"` stub on `NvidiaModel.Rerank` (`internal/entity/models/nvidia.go`) with a real implementation against NVIDIA NIM's `/ranking` endpoint. - Mirrors the existing Python `NvidiaRerank` class at `rag/llm/rerank_model.py:149-190` for behavior parity: same `passages`/`query.text`/`logit` payload shape; `top_n` set to `len(documents)` so every input gets a score returned in original order (the issue body's spec omitted `top_n`, which would cause silent data loss). - Adds the `"rerank": "ranking"` URL suffix and two NIM rerank model entries (`nvidia/nv-rerankqa-mistral-4b-v3`, `nvidia/llama-3.2-nv-rerankqa-1b-v2`) to `conf/models/nvidia.json` so the picker exposes them. - Follows the same shape as the recently merged Aliyun (#14676), Gitee (#14656), and ZhipuAI (#14608) Rerank implementations: lowercase per-driver request/response types, conversion to the project-wide `RerankResponse{Data: []RerankResult}`, per-call `context.WithTimeout` of 30s. Closes #14720 ## Test plan - [x] `gofmt -l internal/entity/models/nvidia.go` — clean - [x] `go vet ./internal/entity/models/...` — no new errors introduced (the two pre-existing vet errors in `baidu.go:642` and `openrouter.go:566` are unrelated to this PR) - [x] `go build ./internal/entity/models/...` — succeeds - [x] `python3 -c "import json; json.load(open('conf/models/nvidia.json'))"` — JSON valid - [ ] Live smoke test against NVIDIA NIM with a real API key (requires reviewer with NIM credentials) ## Notes for reviewers - The issue body suggested omitting `top_n`. The Python reference includes it (`top_n: len(texts)`), and without it NVIDIA returns only the default top-K rankings rather than scores for every input. This PR follows the Python. - The URL host is `integrate.api.nvidia.com` (kept consistent with the existing chat/embeddings BaseURL in `nvidia.go`), not the legacy `ai.api.nvidia.com` host the Python uses. NIM's unified endpoint accepts the model names as-is, so no per-model URL transform is needed.	2026-05-11 17:21:16 +08:00
Jin Hai	c55e23e7e2	Go: refactor embedding interface (#14757 ) ### What problem does this PR solve? Provide embedding index according to the input text ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 14:45:30 +08:00
BitToby	4b96362092	Go: implement Encode (embeddings) in NVIDIA driver (#14700 ) ### What problem does this PR solve? The NVIDIA Go driver in `internal/entity/models/nvidia.go` shipped with a stub `Encode` method that returned `no such method`. `conf/models/nvidia.json` already lists `nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1` as an embedding model, but the conf had no `embedding` URL suffix, so the picker had nothing wired even if `Encode` worked. A tenant who wanted to use NVIDIA NIM for chat (already working) and embeddings from a single provider could not, even though the upstream endpoint is public at `https://integrate.api.nvidia.com/v1/embeddings` and uses an OpenAI-compatible request body extended with the NVIDIA-specific `input_type` and `truncate` fields. Several other Go drivers already implement `Encode` (siliconflow, zhipu-ai, aliyun), so the interface and the pattern are well-established. This PR fills the gap. ### What this PR includes * `conf/models/nvidia.json`: declare the `embedding` URL suffix alongside the existing `chat` and `models` entries. The embedding model entry was already present, so no model addition is needed. * `internal/entity/models/nvidia.go`: replace the `Encode` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape NVIDIA NIM returns. No factory change. No interface change. ### How the driver works * Validates `apiConfig` and the API key, validates the model name, resolves the region with a default fallback (matching the pattern the merged `ListModels` and `CheckConnection` paths in this driver already use), and builds the URL from `BaseURL[region] + URLSuffix.Embedding`. * Sends all input texts in one request as the `input` array, with the NVIDIA-specific `input_type: "query"`, `encoding_format: "float"`, and `truncate: "END"` fields, mirroring the Python `NvidiaEmbed` reference. * Parses `data[].embedding` and copies each slice into `[][]float64` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. * Handles both `float64` and `float32` element types. * Empty input returns `[][]float64{}` with no HTTP call. * Non-200 responses propagate the upstream status line and body. * A final pass checks every input slot got a vector and returns a clear error if any slot is still nil. * Per-call 30s context deadline so a slow call cannot block forever. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? * `go build ./internal/entity/models/...` returns exit 0. * `go vet ./internal/entity/models/...` is clean. * `gofmt -l internal/entity/models/nvidia.go` is clean. * The full method set on `NvidiaModel` still matches the `ModelDriver` interface. * Pattern parity with the just-merged Aliyun `Encode` (#14647). Closes #14699	2026-05-11 12:50:50 +08:00
Jin Hai	17d71e5d79	Go CLI: embed and rerank (#14735 ) ### What problem does this PR solve? ``` RAGFlow(user)> embed text 'what is rag' 'who are you' with 'embedding-3@test@zhipu-ai' dimension 16; +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 16 \| 0 \| \| 16 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'rerank@test@zhipu-ai' top 2; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 1 \| \| 2 \| 0.99999976 \| +-------+-----------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 17:41:54 +08:00
Panda Dev	c7ddc8c039	fix(go): implement ListModels and CheckConnection in NVIDIA driver (#14636 ) ### What problem does this PR solve? The NVIDIA Go driver added in #14623 has a real chat path, but \`ListModels\` and \`CheckConnection\` are stubs that always return \`no such method\`. So: - The model picker cannot auto-populate available NVIDIA NIM model ids. Users have to type the full id by hand (e.g. \`abacusai/dracarys-llama-3.1-70b-instruct\`). - The "Check connection" button always fails for NVIDIA, even when the base URL is reachable and the API key is accepted. NVIDIA NIM is OpenAI-compatible. \`/v1/models\` works with the same Bearer token used for chat. The \`conf/models/nvidia.json\` file already wires the \`models\` url_suffix, so no config change is needed. ### What this PR includes - \`internal/entity/models/nvidia.go\`: - \`ListModels\` now calls \`GET ${BaseURL}/${URLSuffix.Models}\`, parses \`response.data[*].id\`, and returns the list. Same shape as the moonshot, xai, and openai drivers. - \`CheckConnection\` now calls \`ListModels\` and returns its error. Same pattern xai, moonshot, deepseek, aliyun, and gitee already use. \`Balance\`, \`Encode\`, and \`Rerank\` are still stubs in this PR and can be added in follow-ups. No JSON change. No factory change. No interface change. ### How the implementation works - Region resolution falls back to \`default\` when the supplied region is unknown, so a stray region value does not break a valid request. - The Authorization header is only set when \`apiConfig\` and \`ApiKey\` are non-nil and non-empty. This avoids a nil-pointer dereference and lets self-hosted NIM deployments without a key still work. - Non-200 responses propagate the upstream status line and body so the user sees a real error message. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`NvidiaModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing xai, moonshot, deepseek, aliyun, gitee, and openai drivers. Closes #14635	2026-05-08 12:04:28 +08:00
Haruko386	a377512110	Go: implement provider: OpenRouter (#14652 ) ### What problem does this PR solve? 1. Implement `OpenRouter` Provider: Fully support OpenRouter AI models (e.g., `gemma`, `minimax`). Includes robust handling of Server-Sent Events (SSE) streams, error event interception, and proper parsing of both `reasoning_content` and standard `content`. 2. Fix BaseURL Resolution Bug: Fixed a critical edge case in region configuration parsing. Added a strict empty string check (`*apiConfig.Region != ""`) alongside the `nil` check. This ensures that if the UI passes an empty string, the system correctly falls back to the `"default"` region, preventing `unsupported protocol scheme ""` errors during HTTP requests. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-05-08 12:02:37 +08:00
Haruko386	078ea3bf4a	Go: implement provider: Nvidia (#14623 ) ### What problem does this PR solve? 1. Implement `Nvidia` Provider: Fully support NVIDIA NIM APIs with robust parameter handling (including the `thinking` parameter) and safe URL merging in `NewInstance`. 2. Fix Misleading CLI Errors: Corrected a bug in `common_command.go` where failed chat requests inaccurately reported `failed to list instance models`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-05-07 14:17:57 +08:00

19 Commits