### What problem does this PR solve?
Implement OpenAI chat completions in GO
POST /api/v1/openai/<chat_id>/chat/completions
OpenAI chat cli: internal/development.md
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fixes two bugs in the OpenRouter streaming chat request builder
(`internal/entity/models/openrouter.go`, `ChatStreamlyWithSender`):
1. **qwen/glm models streamed to a broken URL.** The code routed any
`qwen`/`glm` model to
`URLSuffix.AsyncChat`, but `conf/models/openrouter.json` defines no
`async_chat` suffix
(empty), so the request was POSTed to `<base>/` instead of
`<base>/chat/completions` —
breaking streaming for every qwen/glm model. The non-stream path has no
such branch.
Fix: all models use the standard `Chat` suffix, consistent with the
non-stream path.
2. **Streaming reasoning was never enabled.** The request set reasoning
via a non-standard
`thinking` key, which OpenRouter ignores. OpenRouter's API — and this
provider's own
non-stream request (line ~110) and its streamed `delta.reasoning` parser
(line ~311) —
use the `reasoning` object. Fix: send `reasoning: {"enabled":
<thinking>}` (and
`{"effort": ...}` when set, taking precedence as in the non-stream
path).
Closes#16110
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Part of #15853 (provider model-list refactor).
Refactors **Ollama** `ListModels` onto the shared `ParseListModel`
pattern and fixes two correctness issues:
- **Endpoint:** switch the models suffix from `api/ps` (only
currently-running models) to `api/tags` (all installed models) — the
latter is what a model picker should show.
- **Parsing:** Ollama returns `{"models":[{"name","model"}]}`, a
non-OpenAI shape. Decode it into a typed struct, map the names into
`ModelList`, then enrich through `ParseListModel`. This removes the
previous unchecked type assertions (`result["models"].([]interface{})` /
`.(map[string]interface{})` / `.(string)`) that **panicked** when the
body was missing the `models` array or any field, and adds a fallback to
the `model` field when `name` is blank.
- Drops the no-op GET request body and a dead base-URL reassignment.
#### Drive-by fix
Shared gitee_test.go `DSModelList` -> `ModelList` compile fix (renamed
in #15900) so the models test package builds; auto-resolves against the
sibling #15853 PRs.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Part of #15853 (provider model-list refactor). Final two providers.
- **voyage:** Voyage AI exposes no live model-list endpoint — its public
API only has `/v1/embeddings` and `/v1/rerank` — so the previous
`ListModels` was a `no such method` stub. Replace it with a
static-catalog listing sourced from the loaded provider definition,
carrying each model's `max_tokens`, `model_types`, and embedding
`dimensions`. `list models from voyage` now returns the 13-model catalog
instead of erroring.
- **fishaudio:** route the existing `/model` voice listing through the
shared `ParseListModel` helper for consistency; keep the human-readable
`title` as the model name and fall back to `_id` when a title is blank.
#### Drive-by fix
Shared gitee_test.go `DSModelList` -> `ModelList` compile fix (renamed
in #15900); auto-resolves against the sibling #15853 PRs.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
Co-authored-by: Haruko386 <tryeverypossible@163.com>
## Summary
Migrate PaddleOCR integration from the deprecated synchronous HTTP API
to the new asynchronous Job API (`submit → poll → fetch`), aligning with
PaddleOCR 3.6.0+ architecture.
## Changes
### Python (`deepdoc/parser/paddleocr_parser.py`)
- Replace synchronous `requests.post()` with async Job API flow (submit
→ poll → fetch)
- Authentication: `token {token}` → `Bearer {token}`
- File transfer: base64 JSON body → multipart file upload
- Polling: exponential backoff (initial 3s, ×1.5, max 15s, timeout
controlled by `request_timeout`)
- Result: fetch full JSONL from result URL, preserving `prunedResult`
with bbox info for crop functionality
- Rename `api_url` → `base_url` (backward compatible: `api_url` still
accepted as fallback)
### Python (`rag/llm/ocr_model.py`)
- Prefer `paddleocr_base_url` / `PADDLEOCR_BASE_URL`, fallback to
`paddleocr_api_url` / `PADDLEOCR_API_URL`
### Go (`internal/entity/models/paddleocr.go`)
- Add `Client-Platform: ragflow` header to submit and poll requests
- Change polling from fixed 3s to exponential backoff (initial 3s, ×1.5,
max 15s)
### Python (`common/constants.py`)
- Add `PADDLEOCR_BASE_URL` to env keys and default config
## Backward Compatibility
- Old env var `PADDLEOCR_API_URL` still works (used as fallback)
- Frontend field `paddleocr_api_url` still works (backend reads it as
fallback)
- No user-facing configuration changes required for existing setups
## Why not use the `paddleocr` SDK package directly?
RAGFlow's `_transfer_to_sections()` relies on `prunedResult` (containing
`block_bbox`, `block_label`, `parsing_res_list`) from the raw API
response for PDF crop functionality. The SDK's public `parse_document()`
API only returns `DocParsingResult` with `markdown_text`, discarding the
bbox data. Therefore we implement the async Job API flow directly via
HTTP, following the same logic as the SDK internally.
### What problem does this PR solve?
The Go model-driver layer () has ~38,700 lines across 109 files. Roughly
74% of that is boilerplate duplicated into every driver: identical HTTP
client setup, the same 65-line SSE scanner loop, and 10-11 one-line "not
supported" stub methods per driver. Any fix must be manually propagated
to every file. Closes#15820.
This PR establishes the three shared utility files that form the
foundation for incremental driver migration:
---
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: Haruko386 <tryeverypossible@163.com>
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
### What problem does this PR solve?
- Replace embedding model `dimension` metadata with `max_dimension`.
- Add optional `dimensions` metadata for models with fixed selectable
output dimensions.
- Include `max_dimension` and `dimensions` in model list responses.
- Validate requested embedding dimensions before calling provider
embedding APIs.
- Forward SiliconFlow embedding dimensions with the correct `dimensions`
request field.
- Add unit coverage for embedding dimension validation rules.
### What problem does this PR solve?
`ChatStreamlyWithSender` in two Go model drivers could panic on nil
pointer dereferences when a caller passes a nil model config or omits
the reasoning `Effort`:
- **deepseek.go** - `switch *chatModelConfig.Effort` dereferenced
`Effort` without a nil check. It now defaults to `"high"` when nil.
- **volcengine.go** - the `modelConfig` pointer itself was dereferenced
(`Stream`, `MaxTokens`, `Temperature`, .) with no guard, and `Effort`
was dereferenced unchecked. `modelConfig` now defaults to an empty
`&ChatConfig{}` when nil so the optional-field accesses are safe, and
`Effort` defaults to `"medium"` when nil.
Addresses the CodeRabbit review on `volcengine.go`
`ChatStreamlyWithSender`. Per maintainer feedback ("one PR do one
thing"), the unrelated `handler/auth.go` and
`service/heartbeat_sender.go` changes were removed so this PR is scoped
to the model-provider fixes.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- Normalize model alias index keys to lowercase
- Detect lowercase alias collisions during provider manager
initialization
- Fix ListModels metadata mapping for mixed-case provider aliases
This PR expands conf/all_models.json with DeepSeek model entries and
provider aliases.
Changes:
- Added DeepSeek model entries across `V4`, `V3.2`, `V3.1`, `V3`, `R1`,
`Coder`, `Math`, `VL`, `OCR`, `Prover`, `MoE`, and `LLM` series.
- Normalized model name values to lowercase canonical IDs.
- Added alias values for official DeepSeek/Hugging Face names and
provider-specific names from OpenRouter, VolcEngine, SiliconFlow,
HuaweiCloud, and QiniuCloud.
- Preserved model metadata such as max_tokens, model_types, and thinking
where applicable.
- Added Gitee ListModels tests to verify DeepSeek aliases map back to
model metadata from all_models.json.
- Added an optional Gitee integration test gated by
GITEE_LIST_MODELS_INTEGRATION=1.
Test:
/usr/local/go/bin/go clean -cache
/usr/local/go/bin/go test ./internal/entity/models -run
'TestGiteeListModels(MapsAllDeepSeekAliasesToModelMetadata|KeepsOwnedBySuffixAfterAliasMetadataLookup|
Integration)'
## Summary
This PR re-enables the Go test steps in CI that were previously
commented out, and fixes all compilation errors that have accumulated in
`internal/entity/models/` since the `ListModels` return type was changed
from `[]string` to `[]ListModelResponse`.
## Changes
### CI (`.github/workflows/tests.yml`)
- Re-enable **Prepare test resources** step (clones resource repo with
WordNet data)
- Re-enable **Test Go packages** step (runs `go test ./internal/...`)
- Fix resource path race condition by using
`/tmp/resource-${GITHUB_RUN_ID}` instead of `/tmp/resource`
- Exclude `/cli` package from Go tests (contains `main` redeclarations)
### Test fixes (16 model provider test files)
All errors were caused by the upstream change from `[]string` to
`[]ListModelResponse` in the `ListModels` interface:
- Add `joinModelNames` test helper to extract `.Name` from
`[]ListModelResponse` slices
- `strings.Join(models, ",")` → `joinModelNames(models, ",")` (11 files)
- `ids[i] != "..."` → `ids[i].Name != "..."` (cometapi, mistral)
- `got[i] != want[i]` → `got[i].Name != want[i]` (bedrock)
- `[]string` return types → `[]ListModelResponse` (google)
### Pre-existing bugs in model_test.go
Bugs introduced by the upstream `entity/` → `entity/models/` directory
rename:
- Add missing `pm := GetProviderManager()` calls in 3 test functions
- Fix `InitProviderManager` signature (`_, err :=` → `err :=`)
- Fix `MaxTokens` `*int` dereference (6 comparisons)
- Fix `readProviderConfig` relative path (3 levels up instead of 2)
### model.go
- Add `findRepoRoot()` to make `conf/all_models.json` resolution work
from any CWD, fixing `TestSiliconFlowProviderConfigLoadsLatestProModels`
### Test validation
```bash
go build ./internal/... # ✅
go test ./internal/entity/models/... -count=1 # ✅ all pass
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
### What problem does this PR solve?
Fixes four panic / spurious-error paths in the Go model layer. Closes
#15818.
| # | File | Bug | Fix |
|---|------|-----|-----|
| 1 | | Thinking-mode streaming path: accessed unconditionally; Gemini
emits usage-only chunks with an empty slice, causing a runtime panic |
Guard each step: , , before indexing |
| 2 | | is a plain for ordinary requests; the cast to silently returns ,
then panics immediately | Switch on concrete type; handle both and |
| 3 | | Identical panic on the streaming path | Same switch-on-type fix
|
| 4 | | The field is optional (absent for non-thinking models) but the
code returned an error when it was missing, breaking every ordinary
Ollama completion | Change to a silent comma-ok assertion; is empty
string when the field is absent |
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Problem
`TestXiaomiNewModelWithCustomDefaultTransport` panics on Go 1.25:
```
panic: interface conversion: http.RoundTripper is models.roundTripperFunc, not *http.Transport
```
In Go 1.25, `http.DefaultTransport` is no longer `*http.Transport`, so
the unchecked type assertion in `NewXiaomiModel` panics when the test
replaces it with a `roundTripperFunc`.
## Fix
Use a safe type assertion with fallback to a new `http.Transport`,
matching the pattern already used in `modelscope.go`.
## Verification
```bash
go test -run TestXiaomiNewModelWithCustomDefaultTransport ./internal/entity/models/...
# PASS
```
Internal contributors only.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
## What problem does this PR solve?
Go test files are never compiled in CI — only production binaries via
`go build`. This allowed a missing `"sort"` import in
`metadata_filter_test.go` to be merged without detection.
## Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
## Changes
- Add `go test -count=1 ./internal/...` step after Go build in CI
workflow
- Fix missing `"sort"` import in `metadata_filter_test.go` (pre-existing
compile error)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
## Summary
Normalizes Qwen model-family names before reasoning extraction so
provider-prefixed Qwen models use the existing `<think>...</think>`
fallback.
## Summary
- keep MiniMax chat calls in non-streaming mode and streaming calls in
SSE mode
- make MiniMax model listing and connection checks use a bodyless GET
/v1/models
- add focused MiniMax request/response regression tests
### What problem does this PR solve?
**Verified from CLI**
```
RAGFlow(user)> chat with 'mimo-v2.5@test@xiaomi' message 'who r u'
Answer: Hello! I'm MiMo-v2.5, a large language model developed by Xiaomi's LLM Core Team. You can think of me as a friendly AI assistant ready to help you answer questions, have conversations, or work on creative tasks. My context window can handle up to 1 million tokens, so we can dive into pretty long discussions or documents if you'd like. What can I help you with today?
Time: 3.831830
RAGFlow(user)> stream chat with 'mimo-v2.5@test@xiaomi' message 'who r u'
Answer: there! I'm MiMo-v2.5, an AI assistant created by the Xiaomi LLM Core Team. I'm here to chat, help out, answer questions, or just have a friendly conversation. Think of me as a helpful buddy with a pretty big memory (1 million tokens worth!). What can I do for you today?😊
Time: 2.421630
RAGFlow(user)> think chat with 'mimo-v2.5@test@xiaomi' message 'who r u'
Thinking: The user is asking a simple question about who I am. According to my system prompt, I should:
- Identify myself as **MiMo-v2.5**
- State that I was developed by the **Xiaomi LLM Core Team**
- Answer in first person and be warm and conversational
Answer: Hey there! 👋
I'm **MiMo**, an AI assistant created by the **Xiaomi LLM Core Team**. Think of me as a friendly chat buddy who's here to help you with all sorts of questions and tasks!
I love having conversations, answering questions, brainstorming ideas, and helping people figure things out. Whether you want to chat, need help with something specific, or just want to explore ideas together — I'm here for it! 😊
What can I help you with today?
Time: 6.651589
RAGFlow(user)> tts with 'mimo-v2.5-tts@test@xiaomi' text 'hello? show yourself' play format 'wav' param '{"voice": "Chloe"}'
SUCCESS
RAGFlow(user)> asr with 'mimo-v2.5-asr@test@xiaomi' audio './internal/test.wav' param '{"language": "zh"}'
+------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------+
| 1 The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. |
+------------------------------------------------------------------------------------------------------------------------+
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
## Summary
- keep Moonshot chat calls in non-streaming mode and streaming calls in
SSE mode
- make Moonshot model listing and balance checks use bodyless GET
requests
- add focused Moonshot request/response regression tests
### What problem does this PR solve?
Closes#15611.
RAGFlow's fallback reasoning parser only recognized the exact model
family `qwen3`. For provider-prefixed Qwen model names such as
SiliconFlow's `qwen/qwen3-8b`, the derived model class can be
`qwen/qwen3`, so inline `<think>...</think>` content was not split from
the visible answer when `reasoning_content` was absent.
This PR normalizes model-family detection before fallback reasoning
extraction, keeps the parser nil-safe, and adds focused tests for Qwen3
variants plus Gitee and SiliconFlow chat responses.
It also makes SiliconFlow propagate `ChatConfig.Thinking` into the chat
request body, matching the existing Gitee behavior, so Qwen thinking
mode is actually enabled when requested.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### Validation
- `/root/go/bin/gofmt -l internal/entity/models/common.go
internal/entity/models/common_test.go
internal/entity/models/reasoning_family_provider_test.go
internal/entity/models/siliconflow.go`
- `git diff --check`
- `/root/go/bin/go test ./internal/entity/models -run
'Test(NormalizeModelFamily|GetThinkingAndAnswer|GiteeChatExtractsQwenThinkingFromInlineContent|SiliconflowChatExtractsProviderPrefixedQwenThinkingFromInlineContent)'
-vet=off -count=1`
Note: the full package command `/root/go/bin/go test
./internal/entity/models -vet=off -count=1` now runs locally, but it
currently fails on an unrelated existing
`TestAstraflowEmbedReturnsNoSuchMethod` panic in
`internal/entity/models/astraflow.go:482`.
### What problem does this PR solve?
Fixes#15542.
AWS Bedrock support for the Go model provider layer was added in #15166,
but embedding support was intentionally left out of scope and
`BedrockModel.Embed(...)` still returned the `no such method` sentinel.
This PR implements Bedrock text embeddings under the umbrella provider
tracker #14736.
### What this PR includes
- `internal/entity/models/bedrock.go`: implement
`BedrockModel.Embed(...)` through Bedrock Runtime `InvokeModel` with
existing SigV4 auth, region resolution, and runtime URL helpers.
- Titan embeddings: supports `amazon.titan-embed-text-v1` and
`amazon.titan-embed-text-v2:0`; v2 forwards `EmbeddingConfig.Dimension`
as `dimensions` when provided, while v1 keeps the payload minimal.
- Cohere embeddings: supports `cohere.embed-english-v3`,
`cohere.embed-multilingual-v3`, and `cohere.embed-v4:0`; batches input
texts and maps returned vectors to RAGFlow `EmbeddingData` in input
order.
- `conf/models/bedrock.json`: adds the `embedding` URL suffix (`invoke`)
and Bedrock embedding model entries.
- `internal/entity/models/bedrock_test.go`: adds unit tests for Titan,
Cohere, typed Cohere responses, validation, empty input, unsupported
models, and HTTP error propagation.
Reference docs:
- Bedrock InvokeModel API:
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html
- Titan Text Embeddings:
https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html
- Cohere Embed models on Bedrock:
https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### How was this tested?
- [x] `jq empty conf/models/bedrock.json`
- [x] `git diff --check`
- [x] `go test ./internal/entity/models/... -run Bedrock -count=1`
- [x] `go test ./internal/entity/models/... -run '^$' -count=1`
- [x] `go test ./internal/entity/models/... -run Bedrock -race -count=1`
Note: `go test ./internal/entity/models/... -count=1` currently fails in
unrelated existing Astraflow coverage
(`TestAstraflowEmbedReturnsNoSuchMethod` panics in
`internal/entity/models/astraflow.go`). The Bedrock-specific tests and
compile-only package check pass.
### What problem does this PR solve?
Adds a shared safe default implementation for unsupported Go
model-driver capability methods and migrates the confirmed panic-stub
providers to use it.
The Go `ModelDriver` interface requires providers to implement many
capability methods even when the provider does not support them. XunFei
had unsupported capability methods implemented as `panic("implement
me")`, Mistral still had a panic in `ParseFile`, and HuaweiCloud carried
an unreachable `panic("implement me")` after a normal chat return.
### Type of change
- [x] Refactoring
Co-authored-by: Haruko386 <tryeverypossible@163.com>
### What problem does this PR solve?
1. Add license announcement
2. Add sanity check on API config
3. Add base class: BaseModel
4. Add GetBaseURL
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Closes#15379
Around 29 Go model providers in `internal/entity/models/` share an
`http.Client` configured with `Timeout: 120 * time.Second`, and reuse
that same client for `ChatStreamlyWithSender`. Go's
`http.Client.Timeout` is a hard ceiling on the whole request that also
covers reading the response body, so it behaves as a wall clock on
streaming. Any streamed chat response that lasts longer than 120 seconds
gets cut off in the middle with a timeout error. Long generations,
reasoning model outputs, and slow or overloaded upstreams are the common
victims.
The providers that already behave correctly (`groq`, `mistral`,
`voyage`, `anthropic`) set no client `Timeout` and instead wrap each
request in a `context.WithTimeout`. This change converges the affected
providers onto that same pattern.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Harden `NewN1NModel` to avoid panics when `http.DefaultTransport` is a
custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `n1n_test.go` with coverage for name/factory plus
`TestN1NNewModelWithCustomDefaultTransport`.
Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary
- Add custom `base_url` support to the Google Go model driver.
- Preserve Google URL suffix configuration when creating custom base URL
driver instances.
- Validate Google chat/stream request inputs before constructing the SDK
client.
- Cover Google model listing, connection checks, base URL resolution,
and request validation with focused tests.
## What changed
- `GoogleModel.NewInstance` now returns a Google driver configured with
the supplied base URL map.
- Google SDK client creation now resolves configured base URLs through
`genai.HTTPOptions.BaseURL`.
- Base URL lookup supports configured regions, empty-region keys, and
`default` fallback.
- Google chat, streaming chat, embeddings, and model listing now reject
blank API keys before creating SDK clients.
- Google chat and streaming chat now reject blank model names locally,
and streaming chat rejects a nil sender.
- Existing message handling, embeddings, pagination, and provider errors
are preserved.
## Why
Google custom model instances could not use configured base URLs because
`NewInstance` returned `nil` and the SDK client path ignored the driver
base URL map. The request validation keeps invalid Google calls from
reaching SDK client construction with blank credentials or incomplete
chat inputs.
### What problem does this PR solve?
The Go GPUStack driver returned a stub error for `Embed()` even though
GPUStack exposes OpenAI-compatible embeddings on the **v1-openai** route
(not `v1/embeddings`).
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Validate Hunyuan embedding model name and API key before building
requests.
- Reuse region-aware base URL validation for embedding requests.
- Replace the stale unsupported Embed test with happy-path and
validation coverage.
## What changed
- Added early Hunyuan Embed validation for missing model names and API
keys.
- Routed Embed through the same base URL region guard used by the other
Hunyuan methods.
- Updated Hunyuan tests to configure the embedding suffix and cover
Embed success plus invalid inputs.
## Why
Hunyuan Embed is implemented, but the existing test still expected it to
be unsupported and could panic before returning a normal validation
error. This keeps the implemented embedding path aligned with the
current driver behavior and prevents nil input panics.
Closes#15087
Refs #14736
### Summary
Closes#15381
Every provider in `internal/entity/models/` reads its streaming response
with `bufio.NewScanner(resp.Body)` and iterates over `scanner.Scan()`.
The default `bufio.Scanner` maximum token size is 64KB, so when an
upstream sends a single SSE `data:` line larger than 64KB (long content
deltas, large tool or function call argument blobs, bundled
`reasoning_content`, or providers that emit a whole message in one
event) `scanner.Scan()` returns `false` and `scanner.Err()` returns
`bufio.ErrTooLong`. Streaming chat then ends with an error partway
through the response.
This change adds `scanner.Buffer(make([]byte, 64*1024), 1024*1024)`
immediately after every SSE scanner that was still bare, raising the cap
to 1MB. 1MB is the value already used for streaming chat in `openai.go`,
`modelscope.go`, `groq.go`, `mistral.go`, `xai.go` and the other already
patched providers (the 8MB cap in the repo is reserved for TTS and
embedding paths), so this simply converges the remaining providers onto
the established pattern. Nothing else changes: line parsing, `data:`
prefix handling, `[DONE]` detection, JSON unmarshalling, error handling,
and the existing `scanner.Err()` checks all stay the same.
Providers covered (23 scanners across 22 files): 302ai, aliyun,
baichuan, baidu, cohere, deepinfra, deepseek, gitee, huggingface,
lmstudio, minimax (the chat scanner, whose TTS scanner was already
bumped), moonshot, nvidia, ollama, openrouter, orcarouter, paddleocr,
siliconflow, tokenhub, vllm, volcengine, xunfei, zhipu-ai. `jiekouai.go`
is excluded because it is covered by the in flight #15337.
A table driven regression test (`sse_scanner_buffer_test.go`) streams a
single 128KB `data:` content delta followed by `data: [DONE]` through an
`httptest` server and asserts that `ChatStreamlyWithSender` delivers the
full content with no error across a representative subset of providers.
Without the buffer fix the test fails with `bufio.Scanner: token too
long`.
This PR also removes three duplicate declarations of the package level
`roundTripperFunc` test helper that several recently merged provider PRs
each added independently, which had left the `internal/entity/models`
test package unable to compile. The helper now lives in a single place
and is shared.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- Harden `NewNovitaModel` to avoid panics when `http.DefaultTransport`
is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Harden JieKouAI request validation before outbound provider calls
- Force non-streaming and streaming chat methods to use their expected
stream modes
- Make model listing use a bodyless GET and parse model responses
without panics
Closes#14736
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Harden `NewModelScopeModel` to avoid panics when
`http.DefaultTransport` is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `TestModelScopeNewModelWithCustomDefaultTransport` regression
coverage.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Harden `NewVoyageModel` to avoid panics when `http.DefaultTransport`
is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `TestVoyageNewModelWithCustomDefaultTransport` regression
coverage.
Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary
- Harden `NewLongCatModel` to avoid panics when `http.DefaultTransport`
is a custom non-`*http.Transport` RoundTripper.
- Fallback to a safe transport (`ProxyFromEnvironment`) while preserving
existing pooling/timeout settings.
- Add `TestLongCatNewModelWithCustomDefaultTransport` regression
coverage.
Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary
- Harden the 302.AI model driver request validation and response parsing
paths.
- Add focused tests for chat request mode, model listing, malformed
provider responses, and input validation.
## What changed
- Validate API keys, model names, rerank queries, ASR file paths, OCR
inputs, parse URLs, task IDs, and model-list IDs before use.
- Keep chat and streaming methods from accepting conflicting `stream`
values in request payloads.
- Send `ListModels` as a bodyless GET and parse the response with typed
JSON structs instead of unchecked assertions.
- Remove raw SSE event logging from stream handling.
## Why
The driver could panic or send inconsistent requests when optional
config fields were nil, empty, malformed, or contradicted the method
path. This keeps provider-driver behavior explicit while preserving the
existing supported 302.AI flows.
Closes#14736
## Summary
- centralize TokenHub chat request validation for chat and streaming
calls
- reject blank TokenHub model names before sending provider requests
- send TokenHub model listing requests as bodyless GET requests
## What changed
- Added shared TokenHub chat request validation for API key, model name,
and messages.
- Updated `ListModels` to call `GET /models` without a request body.
- Added focused tests for blank model names and accidental GET request
bodies.
- Replaced an httptest handler callback `t.Fatalf` with `t.Errorf` plus
an HTTP error and return.
## Why
TokenHub chat requests should fail locally for invalid model names
instead of sending avoidable malformed requests upstream. Model listing
should also match normal GET semantics and avoid sending an empty JSON
body.
Closes#14736
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring