Commit Graph

177 Commits

Author SHA1 Message Date
tmimmanuel
f78ef328bb Go: implement Bedrock embeddings (#15543)
### What problem does this PR solve?

Fixes #15542.

AWS Bedrock support for the Go model provider layer was added in #15166,
but embedding support was intentionally left out of scope and
`BedrockModel.Embed(...)` still returned the `no such method` sentinel.
This PR implements Bedrock text embeddings under the umbrella provider
tracker #14736.

### What this PR includes

- `internal/entity/models/bedrock.go`: implement
`BedrockModel.Embed(...)` through Bedrock Runtime `InvokeModel` with
existing SigV4 auth, region resolution, and runtime URL helpers.
- Titan embeddings: supports `amazon.titan-embed-text-v1` and
`amazon.titan-embed-text-v2:0`; v2 forwards `EmbeddingConfig.Dimension`
as `dimensions` when provided, while v1 keeps the payload minimal.
- Cohere embeddings: supports `cohere.embed-english-v3`,
`cohere.embed-multilingual-v3`, and `cohere.embed-v4:0`; batches input
texts and maps returned vectors to RAGFlow `EmbeddingData` in input
order.
- `conf/models/bedrock.json`: adds the `embedding` URL suffix (`invoke`)
and Bedrock embedding model entries.
- `internal/entity/models/bedrock_test.go`: adds unit tests for Titan,
Cohere, typed Cohere responses, validation, empty input, unsupported
models, and HTTP error propagation.

Reference docs:

- Bedrock InvokeModel API:
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html
- Titan Text Embeddings:
https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html
- Cohere Embed models on Bedrock:
https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- [x] `jq empty conf/models/bedrock.json`
- [x] `git diff --check`
- [x] `go test ./internal/entity/models/... -run Bedrock -count=1`
- [x] `go test ./internal/entity/models/... -run '^$' -count=1`
- [x] `go test ./internal/entity/models/... -run Bedrock -race -count=1`

Note: `go test ./internal/entity/models/... -count=1` currently fails in
unrelated existing Astraflow coverage
(`TestAstraflowEmbedReturnsNoSuchMethod` panics in
`internal/entity/models/astraflow.go`). The Bedrock-specific tests and
compile-only package check pass.
2026-06-05 13:26:32 +08:00
Haruko386
4b2af1347c feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641)
feat[Go]: implement Agent/Workflow PUT /api/v1/agents/<canvas_id>/tags (#15641)
2026-06-05 13:22:23 +08:00
Haruko386
baeb0c0431 Refactor[Go Model Provider]: refactor baseURL and modelConfig (#15627)
### What problem does this PR solve?

As Title

### Type of change

- [x] Refactoring
2026-06-04 17:50:22 +08:00
bitloi
2eed0d4679 refactor(go-models): add unsupported model driver defaults (#15431)
### What problem does this PR solve?

Adds a shared safe default implementation for unsupported Go
model-driver capability methods and migrates the confirmed panic-stub
providers to use it.

The Go `ModelDriver` interface requires providers to implement many
capability methods even when the provider does not support them. XunFei
had unsupported capability methods implemented as `panic("implement
me")`, Mistral still had a panic in `ParseFile`, and HuaweiCloud carried
an unreachable `panic("implement me")` after a normal chat return.

### Type of change

- [x] Refactoring


Co-authored-by: Haruko386 <tryeverypossible@163.com>
2026-06-03 19:16:28 +08:00
Jin Hai
d736f358ba Go: refactor model provider (#15568)
### What problem does this PR solve?

1. Add license announcement
2. Add sanity check on API config
3. Add base class: BaseModel
4. Add GetBaseURL

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-03 16:33:58 +08:00
Haruko386
473d06d1ad feat[Go]: implement add multi_models (#15563) 2026-06-03 15:26:46 +08:00
Jin Hai
dbebc66ba8 Go: refactor provider code (#15564)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-03 14:09:07 +08:00
Jin Hai
e1f19f6679 Go: fix gitee balance api (#15554)
```
RAGFlow(user)> create provider 'gitee' instance 'intl' key 'api-token' url 'https://ai.gitee.com/v1' region 'intl';
SUCCESS
```

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-03 13:23:20 +08:00
Dexterity
2819d0ea24 fix(go-models): use per call context timeouts so long streaming responses are not truncated (#15380)
### What problem does this PR solve?

Closes #15379 

Around 29 Go model providers in `internal/entity/models/` share an
`http.Client` configured with `Timeout: 120 * time.Second`, and reuse
that same client for `ChatStreamlyWithSender`. Go's
`http.Client.Timeout` is a hard ceiling on the whole request that also
covers reading the response body, so it behaves as a wall clock on
streaming. Any streamed chat response that lasts longer than 120 seconds
gets cut off in the middle with a timeout error. Long generations,
reasoning model outputs, and slow or overloaded upstreams are the common
victims.

The providers that already behave correctly (`groq`, `mistral`,
`voyage`, `anthropic`) set no client `Timeout` and instead wrap each
request in a `context.WithTimeout`. This change converges the affected
providers onto that same pattern.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-06-02 15:27:26 +08:00
glorydavid03023
5733e0624c fix(go-models): harden N1N default transport handling (#15351)
## Summary
- Harden `NewN1NModel` to avoid panics when `http.DefaultTransport` is a
custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `n1n_test.go` with coverage for name/factory plus
`TestN1NNewModelWithCustomDefaultTransport`.


Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-02 13:40:10 +08:00
Lynn
3bc5ed282e Fix: model-provider bugs (#15460)
### What problem does this PR solve?

Fix:
- Use @ to avoid split  by `_` in model_name.
- Verify api_key when add instance.
- Pop api_key in list intances response.
- Remove useless index.
- Sort providers, instances and models by name.
- Get `is_tools` from llm_factories.json

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-02 13:24:53 +08:00
dripsmvcp
d4f1c2c95c fix(go-models): remove duplicate roundTripperFunc from novita_test.go (#15492)
Remove duplicated function
2026-06-02 13:23:39 +08:00
ちー
e4ef9834da fix: rewrite enable thinking mode for minimax (#15496)
### What problem does this PR solve?

fix the bad thinking mode for minimax

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-06-02 13:22:11 +08:00
oktofeesh
f0e4f2d5d8 fix(go-models): apply custom Google base URLs (#15385)
## Summary
- Add custom `base_url` support to the Google Go model driver.
- Preserve Google URL suffix configuration when creating custom base URL
driver instances.
- Validate Google chat/stream request inputs before constructing the SDK
client.
- Cover Google model listing, connection checks, base URL resolution,
and request validation with focused tests.

## What changed
- `GoogleModel.NewInstance` now returns a Google driver configured with
the supplied base URL map.
- Google SDK client creation now resolves configured base URLs through
`genai.HTTPOptions.BaseURL`.
- Base URL lookup supports configured regions, empty-region keys, and
`default` fallback.
- Google chat, streaming chat, embeddings, and model listing now reject
blank API keys before creating SDK clients.
- Google chat and streaming chat now reject blank model names locally,
and streaming chat rejects a nil sender.
- Existing message handling, embeddings, pagination, and provider errors
are preserved.

## Why
Google custom model instances could not use configured base URLs because
`NewInstance` returned `nil` and the SDK client path ignored the driver
base URL map. The request validation keeps invalid Google calls from
reaching SDK client construction with blank credentials or incomplete
chat inputs.
2026-06-01 19:24:29 +08:00
glorydavid03023
3774916060 Go: implement Embed in GPUStack driver (#15182)
### What problem does this PR solve?

The Go GPUStack driver returned a stub error for `Embed()` even though
GPUStack exposes OpenAI-compatible embeddings on the **v1-openai** route
(not `v1/embeddings`).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-01 11:22:43 +08:00
oktofeesh
be28177955 fix(go-models): harden Hunyuan embedding validation (#15249)
## Summary
- Validate Hunyuan embedding model name and API key before building
requests.
- Reuse region-aware base URL validation for embedding requests.
- Replace the stale unsupported Embed test with happy-path and
validation coverage.

## What changed
- Added early Hunyuan Embed validation for missing model names and API
keys.
- Routed Embed through the same base URL region guard used by the other
Hunyuan methods.
- Updated Hunyuan tests to configure the embedding suffix and cover
Embed success plus invalid inputs.

## Why
Hunyuan Embed is implemented, but the existing test still expected it to
be unsupported and could panic before returning a normal validation
error. This keeps the implemented embedding path aligned with the
current driver behavior and prevents nil input panics.

Closes #15087
Refs #14736
2026-05-29 19:50:01 +08:00
Dexterity
04aa8d04e8 fix(go-models): raise SSE scanner buffer so large stream chunks are not dropped (#15382)
### Summary

Closes #15381 

Every provider in `internal/entity/models/` reads its streaming response
with `bufio.NewScanner(resp.Body)` and iterates over `scanner.Scan()`.
The default `bufio.Scanner` maximum token size is 64KB, so when an
upstream sends a single SSE `data:` line larger than 64KB (long content
deltas, large tool or function call argument blobs, bundled
`reasoning_content`, or providers that emit a whole message in one
event) `scanner.Scan()` returns `false` and `scanner.Err()` returns
`bufio.ErrTooLong`. Streaming chat then ends with an error partway
through the response.

This change adds `scanner.Buffer(make([]byte, 64*1024), 1024*1024)`
immediately after every SSE scanner that was still bare, raising the cap
to 1MB. 1MB is the value already used for streaming chat in `openai.go`,
`modelscope.go`, `groq.go`, `mistral.go`, `xai.go` and the other already
patched providers (the 8MB cap in the repo is reserved for TTS and
embedding paths), so this simply converges the remaining providers onto
the established pattern. Nothing else changes: line parsing, `data:`
prefix handling, `[DONE]` detection, JSON unmarshalling, error handling,
and the existing `scanner.Err()` checks all stay the same.

Providers covered (23 scanners across 22 files): 302ai, aliyun,
baichuan, baidu, cohere, deepinfra, deepseek, gitee, huggingface,
lmstudio, minimax (the chat scanner, whose TTS scanner was already
bumped), moonshot, nvidia, ollama, openrouter, orcarouter, paddleocr,
siliconflow, tokenhub, vllm, volcengine, xunfei, zhipu-ai. `jiekouai.go`
is excluded because it is covered by the in flight #15337.

A table driven regression test (`sse_scanner_buffer_test.go`) streams a
single 128KB `data:` content delta followed by `data: [DONE]` through an
`httptest` server and asserts that `ChatStreamlyWithSender` delivers the
full content with no error across a representative subset of providers.
Without the buffer fix the test fails with `bufio.Scanner: token too
long`.

This PR also removes three duplicate declarations of the package level
`roundTripperFunc` test helper that several recently merged provider PRs
each added independently, which had left the `internal/entity/models`
test package unable to compile. The helper now lives in a single place
and is shared.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-29 19:34:00 +08:00
呆萌闷油瓶
658ff06ca4 feat: add 4 new models for siliconflow (#15383)
### What problem does this PR solve?

Added 4 new models:
deepseek-ai/DeepSeek-V4-Pro
deepseek-ai/DeepSeek-V4-Flash
Pro/moonshotai/Kimi-K2.6
Pro/zai-org/GLM-5.1

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-29 19:28:29 +08:00
glorydavid03023
b79f79d9b9 fix(go-models): harden Novita default transport handling (#15350)
## Summary
- Harden `NewNovitaModel` to avoid panics when `http.DefaultTransport`
is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-29 14:28:46 +08:00
oktofeesh
58eb957c30 fix(go-models): harden JieKouAI driver requests (#15337)
## Summary
- Harden JieKouAI request validation before outbound provider calls
- Force non-streaming and streaming chat methods to use their expected
stream modes
- Make model listing use a bodyless GET and parse model responses
without panics

Closes #14736

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-29 10:09:27 +08:00
glorydavid03023
7fc909acc9 fix(go-models): harden ModelScope default transport handling (#15339)
## Summary
- Harden `NewModelScopeModel` to avoid panics when
`http.DefaultTransport` is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `TestModelScopeNewModelWithCustomDefaultTransport` regression
coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-28 19:41:11 +08:00
glorydavid03023
b7d88f0b09 fix(go-models): harden Voyage default transport handling (#15341)
## Summary
- Harden `NewVoyageModel` to avoid panics when `http.DefaultTransport`
is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `TestVoyageNewModelWithCustomDefaultTransport` regression
coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:46:58 +08:00
glorydavid03023
ff9aa4e2c7 fix(go-models): harden LongCat default transport handling (#15340)
## Summary
- Harden `NewLongCatModel` to avoid panics when `http.DefaultTransport`
is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `TestLongCatNewModelWithCustomDefaultTransport` regression
coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 16:45:59 +08:00
Haruko386
ed878930fb feat[Go]: implement delete/ rebuild/ listlog api for connector (#15300)
### What problem does this PR solve?

implement delete, rebuild api for connector

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-28 16:44:35 +08:00
oktofeesh
8468227a1a fix(go-models): harden 302.AI driver requests (#15289)
## Summary
- Harden the 302.AI model driver request validation and response parsing
paths.
- Add focused tests for chat request mode, model listing, malformed
provider responses, and input validation.

## What changed
- Validate API keys, model names, rerank queries, ASR file paths, OCR
inputs, parse URLs, task IDs, and model-list IDs before use.
- Keep chat and streaming methods from accepting conflicting `stream`
values in request payloads.
- Send `ListModels` as a bodyless GET and parse the response with typed
JSON structs instead of unchecked assertions.
- Remove raw SSE event logging from stream handling.

## Why
The driver could panic or send inconsistent requests when optional
config fields were nil, empty, malformed, or contradicted the method
path. This keeps provider-driver behavior explicit while preserving the
existing supported 302.AI flows.

Closes #14736
2026-05-28 13:33:01 +08:00
Haruko386
82318dee5d feat[Go]: implement create_connector API (#15285)
### What problem does this PR solve?

implement create_connector API

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-27 15:54:11 +08:00
oktofeesh
7fb9a26623 fix(go-models): validate TokenHub chat requests (#15283)
## Summary
- centralize TokenHub chat request validation for chat and streaming
calls
- reject blank TokenHub model names before sending provider requests
- send TokenHub model listing requests as bodyless GET requests

## What changed
- Added shared TokenHub chat request validation for API key, model name,
and messages.
- Updated `ListModels` to call `GET /models` without a request body.
- Added focused tests for blank model names and accidental GET request
bodies.
- Replaced an httptest handler callback `t.Fatalf` with `t.Errorf` plus
an HTTP error and return.

## Why
TokenHub chat requests should fail locally for invalid model names
instead of sending avoidable malformed requests upstream. Model listing
should also match normal GET semantics and avoid sending an empty JSON
body.

Closes #14736

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-27 14:39:41 +08:00
Haruko386
ae88578451 Go: implement TTS and ASR for X.AI (#15247)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-05-27 14:08:35 +08:00
sxxtony
17b5b33574 Go: implement Rerank in Replicate driver (#15278)
### What problem does this PR solve?

`ReplicateModel.Rerank` in `internal/entity/models/replicate.go` was a
`"replicate, no such method"` stub. The chat path landed in #14958 and
the embed path in #15073; rerank is the last major retrieval surface
still missing on this provider.

Until this PR, a tenant who selected a Replicate reranker model got the
sentinel error on every rerank call.

Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-27 14:07:00 +08:00
Hz_
47626bbe63 go: add Qiniu model provider (#15280)
### What problem does this PR solve?

This PR adds Qiniu provider integration for the Go model driver layer in
RAGFlow.

  Supported capabilities:

  - [X] Chat
  - [X] Think Chat
  - [X] Stream Chat
  - [X] Stream Think Chat
  - [X] Model listing
  - [X] Provider configuration and factory registration

  Verified examples from the CLI:

  ```
  login user '***' password '***';

  ADD PROVIDER 'qiniu';

  CREATE PROVIDER 'qiniu' INSTANCE 'test' KEY '***';

chat with 'deepseek/deepseek-v3.1-terminus-thinking@test@qiniu' message
'hello';

think chat with 'deepseek/deepseek-v3.1-terminus-thinking@test@qiniu'
message 'hello';

stream chat with 'deepseek/deepseek-v3.1-terminus-thinking@test@qiniu'
message 'hello, what are you';

stream think chat with
'deepseek/deepseek-v3.1-terminus-thinking@test@qiniu' message 'hello,
what are you';

stream think chat with 'qwen3-max-2026-01-23@test@qiniu' message 'hello,
what are you';

  LIST MODELS FROM 'qiniu' 'test';

```

  ### Type of change

  - [X] New Feature
  - [X] Provider integration
2026-05-27 13:19:39 +08:00
oktofeesh
a3c6e075f6 fix(go-models): add VolcEngine model listing suffix (#15234)
## Summary
- add the VolcEngine `models` URL suffix used by the existing Go
`ListModels` implementation
- return a clear error when the VolcEngine models suffix is missing
- add focused VolcEngine model-listing regression tests

## What changed
- Added `url_suffix.models` to `conf/models/volcengine.json`.
- Normalized the configured models suffix before building the request
URL.
- Covered config loading, successful model listing, upstream errors, and
missing suffix handling.

## Why
`VolcEngine.ListModels` already builds requests from `URLSuffix.Models`,
but the bundled VolcEngine config did not define that suffix. That left
the model-listing path unable to call the documented `/models` endpoint
from the existing provider config.

Fixes #14701

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-27 13:14:56 +08:00
oktofeesh
5ae41dc1eb fix(go-models): route hosted OCR providers through drivers (#15233)
## Summary
- route hosted MinerU.Net and PaddleOCR.Net provider names to their
existing Go drivers
- add regression coverage for loading the hosted OCR provider configs
through ProviderManager

## What changed
- Added canonical provider-name aliases for the hosted OCR provider
display names.
- Covered both bundled configs with a focused provider-manager test.

## Why
The hosted provider configs use display names with `.Net`, while model
factory dispatch lowercases the provider name. Without aliases, those
configs fall through to `DummyModel` instead of using the existing
MinerU and PaddleOCR drivers.

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 20:40:40 +08:00
oktofeesh
22a3b8cdf9 feat(go-models): list LongCat models (#15241)
## Summary
- Add LongCat model-list support through the documented
OpenAI-compatible models endpoint.

## What changed
- Add the LongCat `models` URL suffix for `/openai/v1/models`.
- Implement `ListModels` for the LongCat Go driver.
- Delegate `CheckConnection` to the lightweight model-list request.
- Add focused regression coverage for successful, malformed, oversized,
and missing-key responses.

## Why
LongCat documents a models endpoint under the OpenAI-compatible API
surface, but the Go driver still returned `no such method` for model
listing and connection checks.

## Validation
- `go test ./internal/entity/models -run TestLongCat -count=1`
- `go test -race ./internal/entity/models -run TestLongCat -count=1`
- `go test ./internal/entity -count=1`
- `git diff --check`

## Notes
- Related to the broader Go model provider tracking in #14736, but this
PR only handles LongCat model listing.
- `go test ./internal/entity/models -count=1` is currently blocked by an
unrelated Astraflow test panic outside this LongCat change.

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 19:58:53 +08:00
oktofeesh
557024e7d4 fix(go-models): add xAI model listing suffix (#15236)
## Summary
- add the xAI `models` URL suffix used by the existing Go `ListModels`
implementation
- return a clear error when the xAI models suffix is missing
- add focused xAI model-listing and connection-check regression tests

## What changed
- Added `url_suffix.models` to `conf/models/xai.json`.
- Normalized the configured models suffix before building the request
URL.
- Covered config loading, successful model listing, upstream errors,
API-key validation, missing suffix handling, and `CheckConnection`
delegation.

## Why
`XAIModel.ListModels` already builds requests from `URLSuffix.Models`,
and `CheckConnection` delegates to that method. The bundled xAI config
did not define that suffix, which left the model-listing path unable to
call the provider `/models` endpoint from the existing provider config.

## Validation
- `go test ./internal/entity/models -run TestXAI -count=1`
- `go test ./internal/entity -count=1`
- `git diff HEAD~1..HEAD --check`

## Notes
- `go test ./internal/entity/models -count=1` currently fails in
unchanged Astraflow coverage: `TestAstraflowEmbedReturnsNoSuchMethod`
panics before reaching any xAI assertions.

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 19:58:20 +08:00
Haruko386
3619ceca01 Go: implement provider: OrcaRouter (#15235)
### What problem does this PR solve?

implement provider `OrcaRouter`
**The following functionalities are now supported:**

**Cohere:**
- [x] Chat / Think Chat / Stream Chat / Stream Think Chat
- [x] Model listing
- [x] TTS
- [ ] Balance


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 18:20:33 +08:00
dripsmvcp
a48bcf814d Go: implement provider: ModelScope (#15041)
Closes #15040.

ModelScope was listed unchecked in the Go-rewrite tracker #14736 and
already had an llm_factories.json entry (tags: LLM) but no Go driver, so
the new Go API server could not route ModelScope instances. The Python
side has supported it through the OpenAI-compatible base at
rag/llm/chat_model.py:618 (ModelScopeChat), which requires a
user-supplied base URL and appends /v1.

This adds:
- internal/entity/models/modelscope.go: self-hosted OpenAI-compatible
driver with chat (sync + SSE stream with idle-timeout cancellation),
list_models, and check_connection. Auth header is optional, matching the
xinference pattern, so deployments without auth and auth-enabled
deployments both work. Base URL is normalized so users can configure
either the root endpoint or the /v1 endpoint.
- internal/entity/models/modelscope_test.go: 12 tests covering name, URL
normalization, factory routing, chat happy path / auth header /
reasoning_content extraction, stream happy path / stream=false rejection
/ idle cancellation, list_models + check_connection, missing-base-URL
clear error, and the no-such-method sentinels.
- conf/models/modelscope.json: shipped config (class: "local",
url_suffix v1/chat/completions and v1/models).
- internal/entity/models/factory.go: case "modelscope" →
ModelScopeModel.
- internal/service/llm.go: ModelScope added to the selfDeployed map
alongside Ollama, Xinference, LocalAI, LM-Studio, GPUStack — the Python
side requires user-supplied URL with no default, so the Go side
classifies it the same way.

Follow-on issues will add Embed and Rerank, in line with how Novita,
NVIDIA, TogetherAI, and other providers landed method-by-method.

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 18:18:46 +08:00
Hz_
84add43208 Add HuaweiCloud model provider (#15237)
### What problem does this PR solve?

  This PR adds HuaweiCloud provider integration in RAGFlow.

  Supported capabilities:

  - [x] Chat / Think Chat / Stream Chat / Stream Think Chat
  - [x] Embedding
  - [x] Rerank
  - [x] Model listing
  - [x] Provider connection checking

  Verified examples from the CLI:

  ```
  check instance 'test' from 'HuaweiCloud';

  chat with 'deepseek-v4-flash@test@HuaweiCloud' message 'hello';

  think chat with 'deepseek-v4-flash@test@HuaweiCloud' message 'hello';

  stream chat with 'deepseek-v4-flash@test@HuaweiCloud' message 'hello';

stream think chat with 'deepseek-v4-flash@test@HuaweiCloud' message
'hello';

embed text 'what is rag' 'who are you' with 'bge-m3@test@HuaweiCloud'
dimension 1024;

rerank query 'what is rag' document 'rag is retrieval augmented
generation' 'rag need llm' 'famous rag
project includes ragflow' with 'bge-reranker-v2-m3@test@HuaweiCloud' top
3;

  list supported models from 'HuaweiCloud' 'test';

  LIST MODELS FROM 'HuaweiCloud' 'test';
```
  ### Type of change

  - [x] New Feature
  - [x] Provider integration
2026-05-26 17:13:15 +08:00
ghost
a7d25391dc fix(tokenhub): wire Go driver and harden requests (#15224)
## Summary
- Wire the Go TokenHub provider through the model factory.
- Harden TokenHub request handling for chat, streaming, embeddings, and
model listing.
- Add focused TokenHub unit coverage for factory wiring and provider
behavior.

## Notes
- Refs #14736.
- Follows up #15159.

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 17:12:37 +08:00
Jake Armstrong
0fb85a66bc feat(go-models): add AWS Bedrock provider driver (#15166)
## Summary

Closes #15165.

Implements the AWS Bedrock model provider for the Go API server, tracked
under #14736. Adds Converse + Converse-Stream chat and foundation-model
listing, with SigV4 signing over a hand-rolled `net/http` path that
matches the established pattern in `internal/entity/models/` (no new
direct `go.mod` deps).

## Linked tracker

Tracked under #14736 (Implement model providers of RAGFlow API server in
Go). Closes #15165.
2026-05-26 17:10:06 +08:00
glorydavid03023
3dbd874a79 Go: implement Rerank in DeepInfra driver (#15185)
### What problem does this PR solve?

The Go DeepInfra driver returned a stub error for `Rerank()` even though
DeepInfra serves reranker models at `POST /v1/inference/{model}` with
`query`, `documents`, and a `scores[]` response.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-26 10:52:09 +08:00
sxxtony
67f7d87dff Go: implement provider: FuturMix (#15013)
### What problem does this PR solve?

Add a Go driver for **FuturMix** (https://futurmix.ai/docs), one of the
unchecked providers on the umbrella tracking issue #14736. FuturMix is
documented as an "OpenAI-compatible API" aggregator over Claude / GPT /
Gemini / DeepSeek (~22 models per their `/models` page).

Until this PR, a tenant who configured `futurmix` as a model provider in
the Go layer fell through to the default branch of
`internal/entity/models/factory.go` and got the dummy driver.

---------

Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 10:51:29 +08:00
Renzo
806414df43 Go: validate Baidu OCR inputs (#15168)
### What problem does this PR solve?

Closes #15167.

The Baidu Go provider advertises OCR support through
`paddleocr-vl-0.9b`, but `BaiduModel.OCRFile` dereferenced required
inputs before validating them. Calling OCR with a missing API config,
API key, or model name could panic instead of returning a normal error.

This PR adds explicit input validation for those required values.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-26 10:51:05 +08:00
Jake Armstrong
b961810e79 Go: implement OCR in ZhipuAI driver (#15143)
### What problem does this PR solve?

Closes #15142.

ZhipuAI lists `glm-ocr` as an OCR model, but the Go driver still
returned `no such method` from `OCRFile`. This wires the advertised
model to Z.AI's documented `layout_parsing` endpoint and returns the
`md_results` Markdown output through the existing `OCRFileResponse.Text`
field.

This PR also adds focused tests for URL input, raw file-content base64
input, and validation errors.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Test

- [x] `go test -vet=off ./internal/entity/models -run
'TestZhipuAIOCRFile'`
2026-05-26 10:50:06 +08:00
Haruko386
4783ce9951 fix(Go): rewrite chat, listmodels, embed for Ollama (#15213)
### What problem does this PR solve?

IDK how to implement **`Ollama`** on #14580 but it's totally wrong.
This is the rewrite version for **`Ollama`**

**Verified from CLI**
```
# Embed
RAGFlow(user)> embed text 'what is rag' 'who are you' with 'nomic-embed-text:latest@test12@ollama' dimension 1024;
+-----------+-------+
| dimension | index |
+-----------+-------+
| 768       | 0     |
| 768       | 1     |
+-----------+-------+

# Chat
RAGFlow(user)> think chat with 'qwen3:0.6b@test12@ollama' message 'who r u'
Thinking: Okay, the user asked, "Who r u?" I need to respond appropriately. First, I should acknowledge their question. Since I'm an AI, I don't have a physical form, but I can confirm that I'm a large language model. I should keep the response friendly and offer help. Let me make sure I'm not making up any information and that the response is natural. Also, I should check for any typos and ensure clarity. Alright, that should cover it.

Answer: I'm an AI language model, and I don't have a physical form. However, I can tell you that I'm designed to assist with questions and tasks. How can I help you today?
Time: 2.914285


RAGFlow(user)> stream think chat with 'qwen3:0.6b@test12@ollama' message 'who r u'
Thinking: , the user asked, "Who are you?" I need to respond appropriately. Since I'm an AI assistant, I should mention that I don't have a physical form or a mind. I should also clarify that I can help with various tasks like answering questions or providing information. It's important to keep the response friendly and informative while maintaining the correct tone.
Answer:  don't have a physical form or a mind, but I'm here to help with your questions or tasks! What can I do for you today?
Time: 1.740047

# LisyModels
RAGFlow(user)> list supported models from 'ollama' 'test12'
+-------------------------+
| model_name              |
+-------------------------+
| nomic-embed-text:latest |
| qwen3:0.6b              |
+-------------------------+
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-05-25 18:55:03 +08:00
Haruko386
69f301b84a Go: implement embed for Tencent Hunyuan (#15207)
### What problem does this PR solve?

Implement embed for Tencent Hunyuan

**Verified from CLI**
```
RAGFlow(user)> embed text 'what is rag' 'who are you' with 'hunyuan-embedding@test1@hunyuan' dimension 16;
+-----------+-------+
| dimension | index |
+-----------+-------+
| 1024      | 0     |
| 1024      | 1     |
+-----------+-------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-05-25 16:04:17 +08:00
ちー
bb6cfc14e6 feat[go]: implement provider: TokenHub (#15159)
### What problem does this PR solve?

implement provider TokenHub

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-25 16:02:50 +08:00
Jin Hai
f8c626bbc8 Go: add ingestion server (#15094)
### What problem does this PR solve?

1. Go ingestion server will connected with admin server with gRPC stream
2. Go ingestion server will be responsible for ingestion tasks
```

RAGFlow(admin)> list ingestors;
+-----------------+-----------+----------------------------------+---------------------------+----------+------------+--------------+--------+------------+---------------+
| address         | cpu_usage | id                               | last_heartbeat            | name     | process_id | rss_usage    | status | task_count | vms_usage     |
+-----------------+-----------+----------------------------------+---------------------------+----------+------------+--------------+--------+------------+---------------+
| 127.0.0.1:58564 | 0         | bdd1870eea2646e0aacb8a2cd3307aa2 | 2026-05-24T18:16:17+08:00 | ingestor | 680152     | 212.72265625 | active | 0          | 2589.12109375 |
+-----------------+-----------+----------------------------------+---------------------------+----------+------------+--------------+--------+------------+---------------+

RAGFlow(admin)> start ingestion 'abc';
+----------------------------------+
| task_id                          |
+----------------------------------+
| e714777639ca4760ab427b5f211e81ad |
+----------------------------------+

RAGFlow(admin)> stop ingestion 'f7bd39d0a724457eb5fdce6d81699776';
+----------------------------------+
| task_id                          |
+----------------------------------+
| f7bd39d0a724457eb5fdce6d81699776 |
+----------------------------------+

RAGFlow(admin)> list tasks;
+-----+----------------------------------+-------+------+----------------------------------+---------------------------+------------+------------+
| ETA | assign_to                        | error | from | id                               | last_update               | start_time | status     |
+-----+----------------------------------+-------+------+----------------------------------+---------------------------+------------+------------+
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | eae6431da72a40e796cff3a03008091b | 2026-05-24T19:46:03+08:00 |            | COMPLETED  |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | 6cccdd174bd049ecb05a774bbb47593f | 2026-05-24T19:46:03+08:00 |            | COMPLETED  |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | ef360d777e57485799adb96b30f2b4b8 | 2026-05-24T19:46:03+08:00 |            | CANCELED   |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | bcc5c5448cb64de48b6b6171c36fb790 | 2026-05-24T19:46:03+08:00 |            | CANCELED   |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | bfc25384c43a443294fe2da979a38ac2 | 2026-05-24T19:46:03+08:00 |            | DISPATCHED |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | 84960537b85d413b8990a9efd5952d67 | 2026-05-24T19:46:04+08:00 |            | DISPATCHED |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | 3d223c1b51e24b36861a3bfb2f1d58d4 | 2026-05-24T19:46:03+08:00 |            | CANCELED   |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | e433b0e356b846c89c301621a3c54494 | 2026-05-24T19:46:03+08:00 |            | COMPLETED  |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | 7c93a3880f074ebd8eca14e6b51bb7ef | 2026-05-24T19:46:03+08:00 |            | COMPLETED  |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | df2e4ef51aaf4390bff9a23f2692486e | 2026-05-24T19:46:04+08:00 |            | DISPATCHED |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | 7377c53010194ef7a83aa206698d66ff | 2026-05-24T19:46:05+08:00 |            | DISPATCHED |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | df64d1a1f9d348e3a2f174c4d7d69e73 | 2026-05-24T19:46:05+08:00 |            | DISPATCHED |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | b59834512e2847e1bdf13ace04b8a456 | 2026-05-24T19:46:06+08:00 |            | DISPATCHED |
| 0   | 17937da188b84f23a5c10bb87588944b |       | CLI  | 0064bb0ab69344028d1ecfda053826f4 | 2026-05-24T19:46:03+08:00 |            | QUEUED     |
+-----+----------------------------------+-------+------+----------------------------------+---------------------------+------------+------------+


```


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-25 14:00:08 +08:00
Haruko386
5d022d83e8 Go: implement provider: PaddleOCR_Local (#15158)
### What problem does this PR solve?

Go: implement provider: PaddleOCR_Local

**Verified from CLI**

```
RAGFlow(user)> ocr with 'PaddleOCR-VL@test@paddleocr_local' file './internal/test1.jpg'
+----------------------+
| text                 |
+----------------------+
| ## Parallel to these |
+----------------------+
```

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
- [X] New Feature (non-breaking change which adds functionality)
- [X] Refactoring
2026-05-25 12:12:57 +08:00
dripsmvcp
8d8ea71877 Go: implement provider: Tencent Hunyuan (#15092)
## Summary
- Adds a `Hunyuan` Go driver so the new API server can route Tencent
Hunyuan chat instances (registered in `conf/llm_factories.json:3830` as
`Tencent Hunyuan`). Follows the same SaaS-driver shape used for
Astraflow, Avian, Novita, TogetherAI, Replicate, DeepInfra, Upstage, and
LongCat.

Closes #15087

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-25 11:04:39 +08:00
bitloi
432e966414 fix(go): support OpenAI audio endpoints (#15104)
### What problem does this PR solve?

Closes #15102.

OpenAI's Go provider config advertises `whisper-1` as ASR and `tts-1` as
TTS, but the Go driver returned `openai, no such method` for both audio
paths and did not define `url_suffix.asr` / `url_suffix.tts`.

This PR:

- adds OpenAI audio URL suffixes for `audio/transcriptions` and
`audio/speech`
- implements non-streaming `TranscribeAudio` using multipart form
uploads
- implements non-streaming `AudioSpeech` using the OpenAI speech JSON
request shape
- keeps streaming TTS explicitly unsupported instead of sending binary
audio through the text SSE sender
- adds focused tests for config coverage, ASR/TTS request shape,
required TTS voice validation, and unsupported streaming TTS


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-25 10:25:53 +08:00