mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
### What problem does this PR solve? Fixes #15286. When calling `/api/v1/openai/<chat_id>/chat/completions` with `"stream": true`, the response contains the answer **twice** — the final message repeats everything that was already streamed. #### Root cause RAGFlow's `async_chat` streams the body as incremental `delta.content` chunks, then emits a terminating `final` event whose `answer` is the **complete** (decorated) message. The handler re-emitted that full answer as one more `delta.content` chunk: ```python if ans.get("final"): if ans.get("answer"): full_content = ans["answer"] response["choices"][0]["delta"]["content"] = full_content # <-- whole answer again yield ... ``` So a client accumulating `delta.content` ends up with the message duplicated. #### Fix Drop the re-emission. The complete answer from the `final` event is now surfaced **only** through the trailing chunk's `final_content` and `reference` fields, which matches OpenAI streaming semantics: deltas are incremental, and the final chunk carries only `finish_reason` / `usage` (plus RAGFlow's `reference` / `final_content` extensions). This matches the expected behavior described in the issue: "The stream should only yield content chunks once, and the final message should only contain reference, usage, and finish_reason." #### Testability refactor The streaming SSE assembly was a closure inside the request handler, so it could only be exercised against a live server + real LLM. I extracted it into a module-level `_stream_chat_completion_sse` async generator (behavior-preserving) so it can be unit-tested with a fake event stream. #### Tests Adds `test/unit_test/api/apps/restful_apis/test_openai_stream_no_duplicate.py` (same import-stub pattern as the existing `test_get_agent_session.py`): - body is streamed exactly once (the regression); - the complete answer is never re-emitted as a content chunk; - the terminating chunk has `finish_reason="stop"`, `content=None`, and correct `usage`; - `final_content` / `reference` are present on the trailing chunk; - reasoning (`think`) deltas stream separately and are not duplicated. > Note: this is unrelated to #15442, which only changes the `stream` default — it does not touch the duplication logic. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Added test cases --------- Co-authored-by: Wang Qi <wangq8@outlook.com>