diff --git a/rag/nlp/__init__.py b/rag/nlp/__init__.py index b6b346dbf0..3f5e7f2913 100644 --- a/rag/nlp/__init__.py +++ b/rag/nlp/__init__.py @@ -240,7 +240,7 @@ def is_english(texts): pattern = re.compile(r"[`a-zA-Z0-9\s.,':;/\"?<>!\(\)\-]+") if isinstance(texts, str): - texts = list(texts) + texts = [texts] elif isinstance(texts, list): texts = [t for t in texts if isinstance(t, str) and t.strip()] else: diff --git a/test/unit_test/rag/test_is_english.py b/test/unit_test/rag/test_is_english.py index 3b589065f6..6530c6ae67 100644 --- a/test/unit_test/rag/test_is_english.py +++ b/test/unit_test/rag/test_is_english.py @@ -55,6 +55,12 @@ def test_is_english_single_english_answer_in_list(): assert is_english(["This is a normal English answer."]) is True +@pytest.mark.p2 +def test_is_english_multi_word_phrase(): + # Regression: splitting a string into characters made short spaced phrases fail. + assert is_english("I am good") is True + + @pytest.mark.p2 def test_is_english_chinese_list_is_false(): assert is_english(["这是中文段落。", "另一个中文段落。"]) is False