feat(tts): cache synthesized speech in Redis to avoid redundant calls (#14851)

## What problem does this PR solve? Closes #12017. TTS output is deterministic for a given `(model, text)` pair, so re-running the same text through the same TTS model produces the same bytes — yet `Canvas.tts` and `dialog_service.tts` re-synthesized on every request. That's slow and wastes provider quota whenever the same assistant response is replayed, shared across users, or repeated within a session. ### Change New helper `rag/utils/tts_cache.py` with `synthesize_with_cache(tts_mdl, cleaned_text)`: - **Key:** `tts:cache:{model_id}:{sha256(text)}` — separate namespace per model, identical cleaned text reuses a single entry across both call sites. - **Value:** the hex-encoded audio blob both call sites already returned. No format change for downstream consumers. - **TTL:** 7 days by default, configurable via `RAGFLOW_TTS_CACHE_TTL_SECONDS`. - **Failure modes:** a Redis hiccup falls back to direct synthesis; a failed synthesis still returns `None` (existing contract preserved). [`Canvas.tts`](https://github.com/infiniflow/ragflow/blob/main/agent/canvas.py#L683-L724) and [`dialog_service.tts`](https://github.com/infiniflow/ragflow/blob/main/api/db/services/dialog_service.py#L1367-L1380) now route through the helper; the per-file bytes-accumulation/hex-encode loop has been removed in favor of one shared implementation. ## Type of change - [x] New Feature (non-breaking change which adds functionality) ## Test plan - [ ] **Cache hit, chat path:** Configure a dialog with TTS enabled, ask the same question twice with `stream=false`. Verify the second response returns the same `audio_binary` and that the second invocation doesn't hit the TTS provider (e.g., observe provider-side logs / usage counters; check no `LLMBundle.tts can't update token usage` log line on the second run). - [ ] **Cache hit, agent path:** Same exercise via a Conversational Agent that includes a Message component playing back the answer. - [ ] **Cache isolation per model:** Switch tenant's `tts_id` between two models, run the same text against each — confirm the second model's first synthesis still happens (no cross-model hits). - [ ] **TTL override:** Set `RAGFLOW_TTS_CACHE_TTL_SECONDS=120`, confirm the entry expires after 2 minutes. - [ ] **Redis unavailable:** Stop Redis (or break the connection). Verify the TTS endpoint still works — synthesis falls back to direct calls, with a `TTS cache lookup failed` / `TTS cache store failed` warning logged. - [ ] **Failure path:** Configure a TTS model with an invalid API key, ensure the response still returns successfully with `audio_binary=None` (no regression vs. current behavior).
2026-06-29 15:31:05 +08:00 · 2026-05-18 23:20:40 -07:00
parent f17a66d4f0
commit f169ab4b39
3 changed files with 124 additions and 18 deletions
--- a/agent/canvas.py
+++ b/agent/canvas.py
@@ -17,7 +17,6 @@ import asyncio
 import base64
 import datetime
 import inspect
-import binascii
 import json
 import logging
 import re
@@ -39,6 +38,7 @@ from common.misc_utils import get_uuid, hash_str2int
 from common.exceptions import TaskCanceledException
 from rag.prompts.generator import chunks_format
 from rag.utils.redis_conn import REDIS_CONN
+from rag.utils.tts_cache import synthesize_with_cache

 class Graph:
    """
@@ -714,14 +714,7 @@ class Canvas(Graph):
        text = clean_tts_text(text)
        if not text:
            return None
-        bin = b""
-        try:
-            for chunk in tts_mdl.tts(text):
-                bin += chunk
-        except Exception as e:
-            logging.error(f"TTS failed: {e}, text={text!r}")
-            return None
-        return binascii.hexlify(bin).decode("utf-8")
+        return synthesize_with_cache(tts_mdl, text)

    def get_history(self, window_size):
        convs = []