mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
feat(tts): cache synthesized speech in Redis to avoid redundant calls (#14851)
## What problem does this PR solve? Closes #12017. TTS output is deterministic for a given `(model, text)` pair, so re-running the same text through the same TTS model produces the same bytes — yet `Canvas.tts` and `dialog_service.tts` re-synthesized on every request. That's slow and wastes provider quota whenever the same assistant response is replayed, shared across users, or repeated within a session. ### Change New helper `rag/utils/tts_cache.py` with `synthesize_with_cache(tts_mdl, cleaned_text)`: - **Key:** `tts:cache:{model_id}:{sha256(text)}` — separate namespace per model, identical cleaned text reuses a single entry across both call sites. - **Value:** the hex-encoded audio blob both call sites already returned. No format change for downstream consumers. - **TTL:** 7 days by default, configurable via `RAGFLOW_TTS_CACHE_TTL_SECONDS`. - **Failure modes:** a Redis hiccup falls back to direct synthesis; a failed synthesis still returns `None` (existing contract preserved). [`Canvas.tts`](https://github.com/infiniflow/ragflow/blob/main/agent/canvas.py#L683-L724) and [`dialog_service.tts`](https://github.com/infiniflow/ragflow/blob/main/api/db/services/dialog_service.py#L1367-L1380) now route through the helper; the per-file bytes-accumulation/hex-encode loop has been removed in favor of one shared implementation. ## Type of change - [x] New Feature (non-breaking change which adds functionality) ## Test plan - [ ] **Cache hit, chat path:** Configure a dialog with TTS enabled, ask the same question twice with `stream=false`. Verify the second response returns the same `audio_binary` and that the second invocation doesn't hit the TTS provider (e.g., observe provider-side logs / usage counters; check no `LLMBundle.tts can't update token usage` log line on the second run). - [ ] **Cache hit, agent path:** Same exercise via a Conversational Agent that includes a Message component playing back the answer. - [ ] **Cache isolation per model:** Switch tenant's `tts_id` between two models, run the same text against each — confirm the second model's first synthesis still happens (no cross-model hits). - [ ] **TTL override:** Set `RAGFLOW_TTS_CACHE_TTL_SECONDS=120`, confirm the entry expires after 2 minutes. - [ ] **Redis unavailable:** Stop Redis (or break the connection). Verify the TTS endpoint still works — synthesis falls back to direct calls, with a `TTS cache lookup failed` / `TTS cache store failed` warning logged. - [ ] **Failure path:** Configure a TTS model with an invalid API key, ensure the response still returns successfully with `audio_binary=None` (no regression vs. current behavior).
This commit is contained in:
@@ -17,7 +17,6 @@ import asyncio
|
||||
import base64
|
||||
import datetime
|
||||
import inspect
|
||||
import binascii
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
@@ -39,6 +38,7 @@ from common.misc_utils import get_uuid, hash_str2int
|
||||
from common.exceptions import TaskCanceledException
|
||||
from rag.prompts.generator import chunks_format
|
||||
from rag.utils.redis_conn import REDIS_CONN
|
||||
from rag.utils.tts_cache import synthesize_with_cache
|
||||
|
||||
class Graph:
|
||||
"""
|
||||
@@ -714,14 +714,7 @@ class Canvas(Graph):
|
||||
text = clean_tts_text(text)
|
||||
if not text:
|
||||
return None
|
||||
bin = b""
|
||||
try:
|
||||
for chunk in tts_mdl.tts(text):
|
||||
bin += chunk
|
||||
except Exception as e:
|
||||
logging.error(f"TTS failed: {e}, text={text!r}")
|
||||
return None
|
||||
return binascii.hexlify(bin).decode("utf-8")
|
||||
return synthesize_with_cache(tts_mdl, text)
|
||||
|
||||
def get_history(self, window_size):
|
||||
convs = []
|
||||
|
||||
Reference in New Issue
Block a user