4 Commits

Author SHA1 Message Date
Haruko386
3e90d303e0 Go: implement provider: CoHere and FishAudio (#14790)
### What problem does this PR solve?

This PR completes the Cohere provider integration (upgrading to the new
Cohere V2 API) and enhances the Fish Audio provider in RAGFlow.

**The following functionalities are now supported:**

**Cohere:**
- [x] Chat / Think Chat / Stream Chat / Stream Think Chat
- [x] Embedding
- [x] Rerank
- [x] Model listing
- [x] Provider connection checking
- [ ] Balance

**Fish Audio:**
- [x] Model listing (`ListModels`)
- [x] Balance (`Balance`)

-----

**Verified examples from the CLI:**

```plaintext

# Cohere

RAGFlow(user)> think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho'
Thinking: Okay, the user wrote "jumperwho". Let me try to figure out what they might be asking. First, I'll check if it's a misspelling. "Jumper" ...... Hmm. Since the query is unclear, the best approach is to ask the user to provide more context or correct any possible typos.
Answer: It seems there might be a typo or missing context in your query "jumperwho." Could you clarify what you're referring to? For example:
- Are you asking about a **jumper** (a type of sweater, a person who jumps, or a component in electronics)?
- Is this related to a specific context, like a movie (e.g., the 2008 film *Jumper*) or a game?
- Did you mean to ask about a person ("who") associated with jumping (e.g., a parachutist)?

Let me know so I can provide a helpful response! 😊
Time: 6.710331

RAGFlow(user)> stream think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho'
Thinking: , the user mentioned "jumperwho". Let me try to figure out what they're referring to. First, I'll check if it's a misspelling. "Jumper" could be a typo for "jumper" or maybe a username. Alternatively, it might be a combination of words like "jumper who",....... the best approach is to inform the user that I don't recognize the term and ask if they can provide more context or clarify what they mean by "jumperwho". That way, I can assist them better once I have more information.
Answer:  seems "jumperwho" isn't a widely recognized term, proper noun, or acronym in common usage. Could you provide more context or clarify what you mean by "jumperwho"? This will help me understand your question or request better!
Time: 4.513596

RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'embed-v4.0@test3@cohere' dimension 16;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| embedding                                                                                                                                                                                                                                                        | index |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| [-0.016643638 -0.001957038 0.0055713872 0.009027058 0.05275187 -0.024542313 -0.044006906 0.024119169 0.0014192933 0.006558722 0.0019129605 -0.021016119 -0.026516981 -0.017489925 0.021298215 0.017772019 0.04569948 0.008886009 0.012059584 -0.0014721862 0.... | 0     |
| [0.018778935 -0.0063459855 -0.0006839742 0.0046623563 0.0067668925 -0.018001877 -0.03963003 0.035744734 -0.014246088 -0.0020721585 -0.006313608 0.025124922 -0.010749322 0.01217393 -0.010231283 -0.025254432 0.021498645 -0.028880708 0.019167464 -0.0058279... | 1     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'rerank-v4.0-pro@test@cohere' top 3;
+-------+-----------------+
| index | relevance_score |
+-------+-----------------+
| 0     | 0.91744334      |
| 1     | 0.7458429       |
| 2     | 0.68729424      |
+-------+-----------------+

RAGFlow(user)> list supported models from 'cohere' 'test'
+-------------------------------------+
| model_name                          |
+-------------------------------------+
| c4ai-aya-expanse-32b                |
| c4ai-aya-vision-32b                 |
| cohere-transcribe-03-2026           |
| command-a-03-2025                   |
| command-a-reasoning-08-2025         |
| command-a-translate-08-2025         |
| command-a-vision-07-2025            |
| command-r-08-2024                   |
| command-r-plus-08-2024              |
| command-r7b-12-2024                 |
| command-r7b-arabic-02-2025          |
| embed-english-light-v3.0            |
| embed-english-light-v3.0-image      |
| embed-english-v3.0                  |
| embed-english-v3.0-image            |
| embed-multilingual-light-v3.0       |
| embed-multilingual-light-v3.0-image |
| embed-multilingual-v3.0             |
| embed-multilingual-v3.0-image       |
| embed-v4.0                          |
+-------------------------------------+

RAGFlow(user)> check instance 'test' from 'cohere'
SUCCESS


# FishAudio

RAGFlow(user)> list supported models from 'fishaudio' 'test'
+----------------------------------------+
| model_name                             |
+----------------------------------------+
| Valentino Narración Biblica Fer        |
| Super Smash Bros. 4/Ultimate Announcer |
| Farid Dieck                            |
| عصام الشوالي                           |
| ALEX_CHIKNA                            |
| Energetic Male                         |
| voz de locutor k                       |
| يي                                     |
| ELITE                                  |
| Mortal Kombat                          |
+----------------------------------------+

RAGFlow(user)> show balance from 'fishaudio' 'test'
+----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+
| _id                              | created_at                  | credit | has_free_credit | has_phone_sha256 | updated_at                  | user_id                          |
+----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+
| 82ffec12cf984d88a30ec504d7909812 | 2026-05-09T07:52:16.119000Z | 0      |                 | false            | 2026-05-09T07:52:16.119000Z | 2578ab1126804d6eaa630552400d7ff3 |
+----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-05-11 20:18:38 +08:00
Renzo
39ee2fb120 Go: implement Rerank in NVIDIA driver (#14778)
## Summary

- Replaces the `"no such method"` stub on `NvidiaModel.Rerank`
(`internal/entity/models/nvidia.go`) with a real implementation against
NVIDIA NIM's `/ranking` endpoint.
- Mirrors the existing Python `NvidiaRerank` class at
`rag/llm/rerank_model.py:149-190` for behavior parity: same
`passages`/`query.text`/`logit` payload shape; `top_n` set to
`len(documents)` so every input gets a score returned in original order
(the issue body's spec omitted `top_n`, which would cause silent data
loss).
- Adds the `"rerank": "ranking"` URL suffix and two NIM rerank model
entries (`nvidia/nv-rerankqa-mistral-4b-v3`,
`nvidia/llama-3.2-nv-rerankqa-1b-v2`) to `conf/models/nvidia.json` so
the picker exposes them.
- Follows the same shape as the recently merged Aliyun (#14676), Gitee
(#14656), and ZhipuAI (#14608) Rerank implementations: lowercase
per-driver request/response types, conversion to the project-wide
`RerankResponse{Data: []RerankResult}`, per-call `context.WithTimeout`
of 30s.

Closes #14720

## Test plan

- [x] `gofmt -l internal/entity/models/nvidia.go` — clean
- [x] `go vet ./internal/entity/models/...` — no new errors introduced
(the two pre-existing vet errors in `baidu.go:642` and
`openrouter.go:566` are unrelated to this PR)
- [x] `go build ./internal/entity/models/...` — succeeds
- [x] `python3 -c "import json;
json.load(open('conf/models/nvidia.json'))"` — JSON valid
- [ ] Live smoke test against NVIDIA NIM with a real API key (requires
reviewer with NIM credentials)

## Notes for reviewers

- The issue body suggested omitting `top_n`. The Python reference
includes it (`top_n: len(texts)`), and without it NVIDIA returns only
the default top-K rankings rather than scores for every input. This PR
follows the Python.
- The URL host is `integrate.api.nvidia.com` (kept consistent with the
existing chat/embeddings BaseURL in `nvidia.go`), not the legacy
`ai.api.nvidia.com` host the Python uses. NIM's unified endpoint accepts
the model names as-is, so no per-model URL transform is needed.
2026-05-11 17:21:16 +08:00
BitToby
4b96362092 Go: implement Encode (embeddings) in NVIDIA driver (#14700)
### What problem does this PR solve?

The NVIDIA Go driver in `internal/entity/models/nvidia.go` shipped with
a stub `Encode`
method that returned `no such method`. `conf/models/nvidia.json` already
lists
`nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1` as an embedding model,
but the conf had
no `embedding` URL suffix, so the picker had nothing wired even if
`Encode` worked.

A tenant who wanted to use NVIDIA NIM for chat (already working) and
embeddings from a
single provider could not, even though the upstream endpoint is public
at
`https://integrate.api.nvidia.com/v1/embeddings` and uses an
OpenAI-compatible request
body extended with the NVIDIA-specific `input_type` and `truncate`
fields. Several other
Go drivers already implement `Encode` (siliconflow, zhipu-ai, aliyun),
so the interface
and the pattern are well-established.

This PR fills the gap.

### What this PR includes

* `conf/models/nvidia.json`: declare the `embedding` URL suffix
alongside the existing
`chat` and `models` entries. The embedding model entry was already
present, so no
  model addition is needed.
* `internal/entity/models/nvidia.go`: replace the `Encode` stub with a
real
implementation. Adds a small local response type that matches the
OpenAI-compatible
  shape NVIDIA NIM returns.

No factory change. No interface change.

### How the driver works

* Validates `apiConfig` and the API key, validates the model name,
resolves the region
with a default fallback (matching the pattern the merged `ListModels`
and
`CheckConnection` paths in this driver already use), and builds the URL
from
  `BaseURL[region] + URLSuffix.Embedding`.
* Sends all input texts in one request as the `input` array, with the
NVIDIA-specific `input_type: "query"`, `encoding_format: "float"`, and
`truncate: "END"`
  fields, mirroring the Python `NvidiaEmbed` reference.
* Parses `data[*].embedding` and copies each slice into `[][]float64`
indexed by
`data[*].index` so the output order matches the input order even if the
API returns
  items in a different order.
* Handles both `float64` and `float32` element types.
* Empty input returns `[][]float64{}` with no HTTP call.
* Non-200 responses propagate the upstream status line and body.
* A final pass checks every input slot got a vector and returns a clear
error if any
  slot is still nil.
* Per-call 30s context deadline so a slow call cannot block forever.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

* `go build ./internal/entity/models/...` returns exit 0.
* `go vet ./internal/entity/models/...` is clean.
* `gofmt -l internal/entity/models/nvidia.go` is clean.
* The full method set on `NvidiaModel` still matches the `ModelDriver`
interface.
* Pattern parity with the just-merged Aliyun `Encode` (#14647).

Closes #14699
2026-05-11 12:50:50 +08:00
Haruko386
078ea3bf4a Go: implement provider: Nvidia (#14623)
### What problem does this PR solve?

1. **Implement `Nvidia` Provider:** Fully support NVIDIA NIM APIs with
robust parameter handling (including the `thinking` parameter) and safe
URL merging in `NewInstance`.
2. **Fix Misleading CLI Errors:** Corrected a bug in `common_command.go`
where failed chat requests inaccurately reported `failed to list
instance models`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2026-05-07 14:17:57 +08:00