From 99d1c9725ca1d85fd32847374e24e597bd586a3e Mon Sep 17 00:00:00 2001 From: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com> Date: Wed, 25 Feb 2026 09:55:04 +0500 Subject: [PATCH] Bug mysql connector empty content resolved: Semantic ID Issue (#13206) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The RDBMS (MySQL/PostgreSQL) connector generates document filenames using the first 100 characters of the content column (semantic_identifier). When the content contains newline characters (\n), the resulting filename includes those newlines — for example: Category: غير صحيح كليًا\nTitle: تفنيد حقائق....txt RAGFlow's filename_type() function uses re.match(r".*\.txt$", filename) to detect file types, but .* does not match newline characters by default in Python regex. This causes the regex to fail, returning FileType.OTHER, which triggers: pythonraise RuntimeError("This type of file has not been supported yet!") As a result, all documents synced via the MySQL/PostgreSQL connector are silently discarded. The sync logs report success (e.g., "399 docs synchronized"), but zero documents actually appear in the dataset. This is the root cause of issue #13001. Root cause trace: rdbms_connector.py → _row_to_document() sets semantic_identifier from raw content (may contain \n) connector_service.py → duplicate_and_parse() uses semantic_identifier as the filename file_service.py → upload_document() calls filename_type(filename) file_utils.py → filename_type() regex .*\.txt$ fails on newlines → returns FileType.OTHER upload_document() raises "This type of file has not been supported yet!" Fix: Sanitize the semantic_identifier in _row_to_document() by replacing newlines and carriage returns with spaces before truncating to 100 characters. Relates to: #13001, #12817 Type of change Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Ahmad Intisar --- common/data_source/rdbms_connector.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/common/data_source/rdbms_connector.py b/common/data_source/rdbms_connector.py index 944bfdb551..2902041bd5 100644 --- a/common/data_source/rdbms_connector.py +++ b/common/data_source/rdbms_connector.py @@ -238,7 +238,8 @@ class RDBMSConnector(LoadConnector, PollConnector): doc_updated_at = ts_value first_content_col = self.content_columns[0] if self.content_columns else "record" - semantic_id = str(row_dict.get(first_content_col, "database_record"))[:100] + semantic_id = str(row_dict.get(first_content_col, "database_record")).replace("\n", " ").replace("\r", " ").strip()[:100] + return Document( id=doc_id,