## Summary
Fixes#15245 — `POST /api/v1/chat/completions` with `stream=true`
intermittently returns 500:
```
data:{"code": 500, "message": "failed to encode response: json:
unsupported value: NaN (status code: 500)", "data": {...}}
```
…even though "the same question" works on retry.
## Root cause
The streaming path serialized the answer with bare `json.dumps(...)`
(`api/apps/restful_apis/chat_api.py:1221`). `json.dumps` defaults to
`allow_nan=True` and emits the literal token `NaN` for NaN /
Infinity float values. That is valid Python-flavored JSON but
**invalid per RFC 8259**, so downstream consumers reject it. The
reporter's gateway is Go-based and the error wording
(`failed to encode response: json: unsupported value: NaN`) is
straight from Go's `encoding/json`.
How NaN gets into the payload: retrieval scoring in
`rag/nlp/search.py` runs `np.mean(...)` over aggregations that can
be empty, and similarity denominators can be zero. Reference chunk
fields like `similarity`, `vector_similarity`, `term_similarity`
can therefore be NaN depending on which chunks a given query
retrieves — which is exactly why the failure is intermittent for
the same question.
The non-streaming branch (`get_json_result(data=answer)`,
`chat_api.py:1243`) has the same vulnerability — Quart's `jsonify`
also defaults to `allow_nan=True` and the same retrieval pipeline
feeds both branches.
`agent/tools/exesql.py:88-102` already has the same NaN/Inf guard
for SQL results. This PR brings the chat completions path up to
parity.
## Fix
Add a small `_sanitize_json_floats(obj)` helper near the top of
`api/apps/restful_apis/chat_api.py`. It walks `dict` / `list` /
`tuple` and replaces any `float` that is `NaN` or `±Infinity` with
`None`. Apply it at the two serialization boundaries:
- **Streaming branch** (`stream()`): sanitize the SSE payload before
`json.dumps`.
- **Non-streaming branch**: sanitize the `answer` dict before
`get_json_result(data=...)`.
The terminal `data:True` frame and the `code:500` error frame carry
no scores and are left untouched.
Added `import math` to the existing alphabetical import block.
No change to retrieval logic — replacing NaN with `null` at the
serialization boundary is conservative: clients still parse the
JSON, a missing-score chunk is a strictly better failure mode than
a 500 that kills the whole reply.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Python implementation of the Go-based model_provider API suite.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: bill <yibie_jingnian@163.com>
### What problem does this PR solve?
The agent API currently does not pass chat_template_kwargs to the
underlying LLM call path, so clients cannot control template-level model
behavior (such as thinking-mode toggles) when invoking
/agents/chat/completion. This PR adds passthrough support for
chat_template_kwargs across agent execution flows (session and
non-session, streaming and non-streaming) by propagating it through
canvas runtime state and into LLM invocation kwargs. This addresses the
feature gap raised in [Issue
#14182](https://github.com/infiniflow/ragflow/issues/14182).
Closes#14182
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
implement ASR and TTS for Xinference
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
`GET /agents/<agent_id>/sessions/<session_id>` crashed with
`AttributeError: 'NoneType' object has no attribute 'to_dict'` when the
session lookup failed: `_, conv =
API4ConversationService.get_by_id(...)` returned `(False, None)`, then
`conv.to_dict()` was called unconditionally.
This is reachable in multi-instance deployments: the session row may not
yet be visible on the node servicing the immediate follow-up GET after a
session is created on a different node.
Add the same `if not exists` guard already used by every other call site
of `API4ConversationService.get_by_id` (see agent_api.py:1147,
sdk/session.py:179, conversation_service.py:248, canvas_service.py:323).
Closes#14989
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
POST /api/v1/dify/retrieval resolved the caller via @apikey_required
(injecting tenant_id) but then fetched the requested knowledge_id with
no tenant filter and ran the full retrieval pipeline against
kb.tenant_id (the owner). Any valid Dify-compatible API key could
retrieve chunks from any tenant whose KB UUID was known. Adds the
missing ownership check.
## Root Cause
api/apps/sdk/dify_retrieval.py line 253:
KnowledgebaseService.get_by_id(kb_id) fetched the KB by id alone, then
the handler used kb.tenant_id (the OWNER) to build the embedding model
and call the retriever. The caller tenant_id was only used downstream at
line 278 for retrieval_by_children, well after cross-tenant data was
already retrieved.
grep confirmed there was no KnowledgebaseService.accessible call
anywhere in the handler.
## Fix
Two-line guard immediately after the existing get_by_id lookup,
mirroring the pattern PR #14749 lands for the sibling sdk/doc.py routes
(download, parse, stop_parsing, retrieval_test):
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
return build_error_result(message="Knowledgebase not found!",
code=RetCode.NOT_FOUND)
+ if not KnowledgebaseService.accessible(kb_id, tenant_id):
+ return build_error_result(message="No authorization.",
code=RetCode.AUTHENTICATION_ERROR)
if kb.tenant_embd_id:
...
KnowledgebaseService.accessible already handles solo-tenant ownership,
team membership via TenantService.get_joined_tenants_by_user_id, and the
permission=ME distinction. No behavior change for legitimate callers;
cross-tenant callers now receive RetCode.AUTHENTICATION_ERROR (109).
## Test Plan
- [x] Regression test added:
test/unit_test/api/apps/sdk/test_dify_retrieval.py
- test_cross_tenant_request_is_rejected -- attacker tenant calling owner
tenant KB gets 109; retriever is not invoked
- test_same_tenant_request_succeeds -- owner tenant gets the records
back
- test_missing_knowledge_base_returns_not_found -- missing KB returns
404 BEFORE the access check fires (legit callers see the clearer
message)
- [x] All 3 tests pass after the fix
- [x] Cross-tenant test FAILS on pre-fix main (KeyError on result[code]
because handler leaks records dict instead of returning auth error)
- [x] ruff check clean on both changed files
- [x] No drive-by reformatting in dify_retrieval.py -- only the 2 added
lines
### Post-fix output
test_cross_tenant_request_is_rejected PASSED [ 33%]
test_same_tenant_request_succeeds PASSED [ 66%]
test_missing_knowledge_base_returns_not_found PASSED [100%]
============================== 3 passed in 0.04s
===============================
Closes#15027