mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-07-02 08:45:42 +08:00
### Summary `truncateText` in the `reduction` and `summarization` middlewares truncates with `s[:maxLen]`, which slices by byte. When `maxLen` lands inside a multi-byte character (common with CJK or other non-ASCII content flowing through the agent), the string is cut mid-rune and the tail byte(s) become invalid UTF-8. That broken text then goes into the reduced context / summary prompt. `TruncateToolResult` in the same `reduction` package already avoids this by slicing on a rune boundary and even notes it in a comment. This PR makes the two `truncateText` helpers do the same, so they stay consistent with the existing helper. Both functions keep their existing output shape (summarization still appends `...`). Added a small unit test in each package covering ASCII truncation and a CJK string, asserting the result stays valid UTF-8.