## Summary
Add support for **"New API"** as a model provider, enabling connection
to [New API](https://github.com/QuantumNous/new-api) /
[one-api](https://github.com/songquanpeng/one-api) compatible gateways
that aggregate multiple LLM backends behind a unified OpenAI-compatible
`/v1` endpoint.
### Features
- **All model types**: Chat, Embedding, Rerank, Image2Text, TTS,
Speech2Text
- **List Models discovery**: `NewAPI(OpenAIAPICompatible)` class in
`model_meta.py` queries the gateway's `/v1/models` to auto-discover
available models via the native `GET /api/v1/providers/<name>/models`
endpoint
- **Model parameter editing**: Pencil icon on each discovered model row
to edit `model_type`, `max_tokens`, and `features` (e.g. tool call
support) before submitting
- **Custom model addition**: "Add Custom Model" button at the bottom of
the List Models dropdown for models not returned by the API
- **Gear icon settings**: Enabled the Settings gear button on provider
instances to manage models on existing instances (viewMode)
- **viewMode credential passthrough**: Fixed List Models in viewMode —
merges `initialValues` credentials when `api_key`/`base_url` fields are
hidden by `hideWhenInstanceExists`
### Changes
**Backend** (8 files):
- `rag/llm/chat_model.py` — `NewAPIChat(Base)` class
- `rag/llm/embedding_model.py` — `NewAPIEmbed(OpenAIEmbed)` class (no
auto `/v1` append)
- `rag/llm/rerank_model.py` — `NewAPIRerank(Base)` class (uses `/rerank`
endpoint)
- `rag/llm/cv_model.py` — `NewAPICv(GptV4)` class
- `rag/llm/tts_model.py` — `NewAPITTS(OpenAITTS)` class
- `rag/llm/sequence2txt_model.py` — `NewAPISeq2txt(GPTSeq2txt)` class
- `rag/llm/model_meta.py` — `NewAPI(OpenAIAPICompatible)` class for List
Models discovery
- `conf/llm_factories.json` — New API factory entry with all model type
tags
**Frontend** (8 files + 1 new SVG):
- `web/src/assets/svg/llm/new-api.svg` — New API logo icon
- `web/src/constants/llm.ts` — `LLMFactory.NewAPI` enum + `IconMap`
entry
- `web/src/components/svg-icon.tsx` — `NewAPI` added to `svgIcons`
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/field-config/local-llm-configs.ts`
— New API `buildLocalConfig`
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/constants.ts`
— `LIST_MODEL_PROVIDERS` includes NewAPI
- `web/src/pages/user-setting/setting-model/components/used-model.tsx` —
Enable Settings gear button
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-picker.ts`
— viewMode credential merge + model editing state/handlers
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/hooks/use-list-models-options.tsx`
— Pencil edit icon per model row
-
`web/src/pages/user-setting/setting-model/modal/provider-modal/index.tsx`
— `AddCustomModelDialog` import + edit dialog rendering
**Note on Go implementation**: A Go model driver (`NewAPIModel`
delegating to `OpenAIModel`) has been prepared but is deferred until the
Go runtime is enabled in a future release (current v0.26.0 images use
`API_PROXY_SCHEME=python` and do not compile Go binaries). Will submit
as a follow-up PR.
## Related
- Depends on: #15996 (provider instance API improvements — server-side
credential lookup, idempotent `add_model`, security fixes — required for
viewMode gear icon and batch model submission)
## Test plan
- [ ] Add New API provider with api_key and base_url pointing to an
OpenAI-compatible gateway
- [ ] Click "List Models" — should discover and display available models
from `/v1/models`
- [ ] Click pencil icon on a model — should open edit dialog to change
model_type, max_tokens, features
- [ ] Select multiple models and click OK — should add all selected
models
- [ ] Click gear icon on the added instance — should open viewMode with
List Models working
- [ ] In viewMode, select new models including pre-existing ones, click
OK — should succeed (requires #15996)
- [ ] Verify all model types work: create a Chat assistant, Embedding
KB, Rerank setting
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Tim Wang <wanghualoong@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### Summary
Closes#15423
`rag/llm/embedding_model.py` hosts about 40 embedding providers that
shared several defects affecting indexing reliability, cost accounting,
and error visibility. This PR fixes four concrete bugs.
**Masked, inconsistent errors (27 sites).** Nearly every provider ran
`log_exception(_e, res)` followed by `raise Exception(f"Error: {res}")`.
Because `log_exception` always raises, the second line was dead code,
and the surfaced exception varied with whether the SDK response exposed
a `.text` attribute. Every failure path now raises a single
`EmbeddingError` that includes the underlying response detail, so the
cause of a failed embedding is consistent and visible.
**Fabricated token counts.** `LocalAIEmbed` returned a hardcoded `1024`
and `OllamaEmbed` added `128` per text. These values feed `used_tokens`
and therefore billing and usage tracking. Both now report the real count
from the API (Ollama `prompt_eval_count`, LocalAI `usage`) and fall back
to a local token count only when the server omits it.
**Truncation overshoot.** The `8196` limit used by Mistral and Bedrock
exceeded the standard `8192` ceiling and could push boundary sized
inputs past the model limit. Limits are corrected to `8192` and made
intentional per provider, and providers that rely on server side
truncation now request it explicitly (Ollama `truncate=True`, Cohere
`truncate="END"`).
**Missing batching on Zhipu and Ollama.** Both issued one request per
text. They now batch like the other OpenAI compatible providers, turning
N round trips into `ceil(N / batch_size)`. Batched results are realigned
by response `index` so a chunk always keeps its own vector.
A shared `Base._batched_encode` helper owns the batch loop, optional
truncation, result accumulation, and the single error path. It is the
mechanism that lets these fixes live in one place instead of across 27
duplicated sites. The public `encode()` and `encode_queries()` contract
stays the same, so existing callers are unaffected.
Tests covering all four fixes are added under
`test/unit_test/rag/llm/test_embedding_model.py`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fixes#15587
## Problem
`AzureEmbed.__init__` in `rag/llm/embedding_model.py` and
`AzureGptV4.__init__` in `rag/llm/cv_model.py` both call
`json.loads(key)` unconditionally:
```python
api_key = json.loads(key).get("api_key", "")
api_version = json.loads(key).get("api_version", "2024-02-01")
```
When a user stores a plain API key string (not a JSON object) in the
model configuration — which is a valid and common way to configure Azure
OpenAI — `json.loads` raises `JSONDecodeError`. This makes the model
fail to initialize and causes document parsing/embedding to return a 500
error.
## Fix
Wrap `json.loads` in `try/except (json.JSONDecodeError, TypeError)` and
fall back to using the raw string as the `api_key` with the default
`api_version`. This is the same pattern already applied to the Azure
chat model in PR #15604.
## Files changed
- `rag/llm/embedding_model.py` — `AzureEmbed.__init__`
- `rag/llm/cv_model.py` — `AzureGptV4.__init__`
Fixes#15857
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
FIx replicate model provider failing with valid api key
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Wang Qi <wangq8@outlook.com>
### What problem does this PR solve?
Fix:
- VolcEngine adapt to new api_key format
- Save dict api_key as json
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- Fixes **Tongyi-Qianwen** (`QWenEmbed`) text embeddings when the
configured `base_url` points at DashScope **international**
(`dashscope-intl.aliyuncs.com`) or **China** (`dashscope.aliyuncs.com`)
hosts, including values copied from Model Studio that use the
**OpenAI-compatible** path (`.../compatible-mode/v1`).
- The `dashscope` Python SDK (`TextEmbedding.call`) expects the
**native** HTTP root (`https://<host>/api/v1`), not the
OpenAI-compatible base URL. Without mapping, international accounts
could hit the wrong host or path.
## Implementation
- Added `_dashscope_native_http_api_url()` to normalize known DashScope
hosts to `.../api/v1`, and wired `QWenEmbed` to set
`dashscope.base_http_api_url` before each embedding call (document and
query).
## Notes
- In-code comments document the Tongyi-Qianwen / DashScope intl vs CN
behavior for future maintainers.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
### What problem does this PR solve?
Multiple `requests.post()` calls across the LLM integration layer lack a
`timeout` parameter. Without a timeout, a single unresponsive upstream
service can block the calling thread **indefinitely**, eventually
exhausting the thread pool and degrading the entire system.
This is a well-known issue — Python's `requests` library defaults to
`timeout=None` (infinite wait), and [the library docs explicitly
recommend](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts)
always setting a timeout.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Change
Added `timeout` to all `requests.post()` calls missing it:
| File | Calls fixed | Timeout |
|------|-------------|---------|
| `rag/llm/rerank_model.py` | 9 | 30s |
| `rag/llm/embedding_model.py` | 8 | 30s |
| `rag/llm/cv_model.py` | 3 | 60s |
| `rag/llm/tts_model.py` | 2 | 60s |
| `rag/llm/sequence2txt_model.py` | 2 | 60s |
Embedding/rerank calls use 30s (lightweight API calls). Vision, TTS, and
audio transcription use 60s (heavier workloads with file uploads).
Note: other files in the codebase (e.g. `check_minio_alive`,
`check_ragflow_server_alive`) already use `timeout=10`, so this PR
brings the LLM layer in line with existing practice.
Signed-off-by: Ricardo-M-L <Sibyl_Hartmanbnb@webname.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Both tokenizer (`rag/flow/tokenizer/tokenizer.py`) and
`BuiltinEmbed.encode`
(`rag/llm/embedding_model.py`) currently accumulate embedding batches
via
`np.concatenate` inside the per-batch loop. `np.concatenate` allocates a
new
array and copies all existing data on every call, so accumulating N
batches
is O(N²) in both time and peak memory.
Replacing the incremental concatenate with a list-of-batches + a single
`np.vstack` at the end gives O(N) total work.
For tokenizer the title-vector broadcast `np.concatenate([vts[0]] * N)`
is
also replaced by `np.tile`, which does the same job with a single
contiguous
allocation instead of building a Python list of references.
This is purely a CPU/memory optimisation — output shape and dtype are
unchanged. Measured impact grows with document size:
- 1k chunks (batch 512, 2 iters): ~negligible
- 10k chunks (20 iters): ~10× speedup on this stage
- 100k chunks (195 iters): ~100× speedup, and peak RAM
drops from O(N) extra to near-zero
### Type of change
- [x] Performance Improvement
Co-authored-by: yoan sapienza <Yoan Sapienza yoan.sapienza@orange.fr Yoan Sapienza zappy@macbookpro.home>
## Add Astraflow Provider Support
This PR integrates [Astraflow](https://astraflow.ucloud.cn/) (by UCloud
/ 优刻得) as a new AI model provider in RAGFlow, with support for both
global and China endpoints.
### About Astraflow
Astraflow is an OpenAI-compatible AI model aggregation platform
supporting 200+ models from major providers including DeepSeek, Qwen,
GPT, Claude, Gemini, Llama, Mistral, and more.
| Variant | Factory Name | Endpoint | Env Var |
|---------|-------------|----------|---------|
| Global | `Astraflow` | `https://api-us-ca.umodelverse.ai/v1` |
`ASTRAFLOW_API_KEY` |
| China | `Astraflow-CN` | `https://api.modelverse.cn/v1` |
`ASTRAFLOW_CN_API_KEY` |
- **API key signup**: https://astraflow.ucloud.cn/
---
### Files Changed
| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Register `Astraflow` and `Astraflow-CN` in
`SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and
`LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `AstraflowChat` and `AstraflowCNChat`
(OpenAI-compatible `Base` subclass) |
| `rag/llm/embedding_model.py` | Add `AstraflowEmbed` and
`AstraflowCNEmbed` (subclasses of `OpenAIEmbed`) |
| `rag/llm/rerank_model.py` | Add `AstraflowRerank` and
`AstraflowCNRerank` (subclasses of `OpenAI_APIRerank`) |
| `rag/llm/cv_model.py` | Add `AstraflowCV` and `AstraflowCNCV`
(subclasses of `GptV4`) |
| `rag/llm/tts_model.py` | Add `AstraflowTTS` and `AstraflowCNTTS`
(subclasses of `OpenAITTS`) |
| `rag/llm/sequence2txt_model.py` | Add `AstraflowSeq2txt` and
`AstraflowCNSeq2txt` (subclasses of `GPTSeq2txt`) |
| `conf/llm_factories.json` | Register `Astraflow` and `Astraflow-CN`
factories with a curated list of popular models |
---
### Supported Model Types
- ✅ **Chat / LLM** — DeepSeek-V3/R1, Qwen3, GPT-4o/4.1, Claude 3.5/3.7,
Gemini 2.0/2.5 Flash, Llama 3.3/4, Mistral, and 200+ more
- ✅ **Text Embedding** — text-embedding-3-small/large
- ✅ **Image / Vision (IMAGE2TEXT)** — GPT-4o, GPT-4.1, Claude, Gemini,
Llama-4, etc.
- ✅ **Text Re-Rank**
- ✅ **TTS** — tts-1
- ✅ **Speech-to-Text (SPEECH2TEXT)** — whisper-1
### Implementation Notes
- Uses the `openai/` LiteLLM prefix — consistent with other
OpenAI-compatible aggregation platforms (SILICONFLOW, DeerAPI, CometAPI,
OpenRouter, n1n, Avian, etc.)
- `Astraflow` (global, rank 250) and `Astraflow-CN` (China, rank 249)
are separate factory entries, allowing users to choose the optimal
endpoint based on their region.
- All model classes cleanly subclass existing base classes (`Base`,
`OpenAIEmbed`, `OpenAI_APIRerank`, `GptV4`, `OpenAITTS`, `GPTSeq2txt`)
with no custom logic needed — the provider is fully OpenAI-compatible.
---------
Co-authored-by: user <user@xzaaaMacBook-Air.local>
### What problem does this PR solve?
Adds Perplexity contextualized embeddings API as a new model provider,
as requested in #13610.
- `PerplexityEmbed` provider in `rag/llm/embedding_model.py` supporting
both standard (`/v1/embeddings`) and contextualized
(`/v1/contextualizedembeddings`) endpoints
- All 4 Perplexity embedding models registered in
`conf/llm_factories.json`: `pplx-embed-v1-0.6b`, `pplx-embed-v1-4b`,
`pplx-embed-context-v1-0.6b`, `pplx-embed-context-v1-4b`
- Frontend entries (enum, icon mapping, API key URL) in
`web/src/constants/llm.ts`
- Updated `docs/guides/models/supported_models.mdx`
- 22 unit tests in `test/unit_test/rag/llm/test_perplexity_embed.py`
Perplexity's API returns `base64_int8` encoded embeddings (not
OpenAI-compatible), so this uses a custom `requests`-based
implementation. Contextualized vs standard model is auto-detected from
the model name.
Closes#13610
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
This PR aims to extend the list of possible providers. Adds new Provider
"RAGcon" within the Ollama Modal. It provides all model types except OCR
via Openai-compatible endpoints.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>
### What problem does this PR solve?
Refact: switch from oogle-generativeai to google-genai #13132
Refact: commnet out unused pywencai.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Add support `doubao-embedding-vision` model.
`doubao-embedding-large-text` is deprecated.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: bedrock iam authentication #12008
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Only support MinerU-API now, still need to complete frontend for
pipeline to allow the configuration of MinerU options.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fix: presentation parsing #11920
Fix: Embeddin encode exception handling
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: jina embedding issue #11614
Feat: Add jina embedding v4
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Jason <ggbbddjm@gmail.com>
### What problem does this PR solve?
1. Move EMBEDDING_CFG to common.globals
2. Fix error imports
3. Move signal handles to common/signal_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Introduced gpu profile in .env
Added Dockerfile_tei
fix datrie
Removed LIGHTEN flag
### Type of change
- [x] Documentation Update
- [x] Refactoring
### Related issues
#10078
### What problem does this PR solve?
Integrate DeerAPI provider.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
Co-authored-by: DeerAPI <tensor.null@gmail.com>
### What problem does this PR solve?
Rename the CometEmbed and CometSeq2txt classes to CometAPIEmbed and
CometAPISeq2txt, and correct supported_models.mdx.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Related PR:
Feat: add CometAPI to LLMFactory and update related mappings #10119
Change:
Fixes the issue where the embedding model in CometAPI was not being
called correctly
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: TensorNull <tensor.null@gmail.com>
### Related issues
#10078
### What problem does this PR solve?
Integrate CometAPI provider.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
fix text input exceed token num limit when using siliconflow's embedding
model BAAI/bge-large-zh-v1.5 and BAAI/bge-large-en-v1.5, truncate before
input.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Dataflow supports Spreadsheet and Word processor document
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
The default value for OpenAI '/v1/embeddings' parameter
'encoding_format' is 'base64'. Use 'float' explicitly to avoid base64
encoding & decoding, larger data size.
https://github.com/openai/openai-python/blob/main/src/openai/resources/embeddings.py
if not is_given(encoding_format):
params["encoding_format"] = "base64"
### Type of change
- [x] Performance Improvement
Updated constructors for base and derived classes in chat, embedding,
rerank, sequence2txt, and tts models to accept **kwargs. This change
improves extensibility and allows passing additional parameters without
breaking existing interfaces.
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: IT: Sop.Son <sop.son@feavn.local>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### What problem does this PR solve?
fix error 429 api rate limit when building knowledge graph for all chat
model and Mistral embedding model.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)