From 99d1c9725ca1d85fd32847374e24e597bd586a3e Mon Sep 17 00:00:00 2001
From: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com>
Date: Wed, 25 Feb 2026 09:55:04 +0500
Subject: [PATCH] Bug mysql connector empty content resolved: Semantic ID Issue
 (#13206)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The RDBMS (MySQL/PostgreSQL) connector generates document filenames
using the first 100 characters of the content column
(semantic_identifier). When the content contains newline characters
(\n), the resulting filename includes those newlines — for example:
Category: غير صحيح كليًا\nTitle: تفنيد حقائق....txt
RAGFlow's filename_type() function uses re.match(r".*\.txt$", filename)
to detect file types, but .* does not match newline characters by
default in Python regex. This causes the regex to fail, returning
FileType.OTHER, which triggers:
pythonraise RuntimeError("This type of file has not been supported
yet!")
As a result, all documents synced via the MySQL/PostgreSQL connector are
silently discarded. The sync logs report success (e.g., "399 docs
synchronized"), but zero documents actually appear in the dataset. This
is the root cause of issue #13001.
Root cause trace:

rdbms_connector.py → _row_to_document() sets semantic_identifier from
raw content (may contain \n)
connector_service.py → duplicate_and_parse() uses semantic_identifier as
the filename
file_service.py → upload_document() calls filename_type(filename)
file_utils.py → filename_type() regex .*\.txt$ fails on newlines →
returns FileType.OTHER
upload_document() raises "This type of file has not been supported yet!"

Fix: Sanitize the semantic_identifier in _row_to_document() by replacing
newlines and carriage returns with spaces before truncating to 100
characters.
Relates to: #13001, #12817
Type of change

 Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
---
 common/data_source/rdbms_connector.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/common/data_source/rdbms_connector.py b/common/data_source/rdbms_connector.py
index 944bfdb551..2902041bd5 100644
--- a/common/data_source/rdbms_connector.py
+++ b/common/data_source/rdbms_connector.py
@@ -238,7 +238,8 @@ class RDBMSConnector(LoadConnector, PollConnector):
                     doc_updated_at = ts_value
         
         first_content_col = self.content_columns[0] if self.content_columns else "record"
-        semantic_id = str(row_dict.get(first_content_col, "database_record"))[:100]
+        semantic_id = str(row_dict.get(first_content_col, "database_record")).replace("\n", " ").replace("\r", " ").strip()[:100]
+
         
         return Document(
             id=doc_id,