From b83e2ae5a28266dcc30afcbed1d1762c79b2b785 Mon Sep 17 00:00:00 2001 From: VincentLambert Date: Mon, 11 May 2026 05:55:44 +0200 Subject: [PATCH] fix: handle missing parent chunk in retrieval_by_children (#14556) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What problem does this PR solve? `retrieval_by_children()` in `rag/nlp/search.py` crashes with a `TypeError: 'NoneType' object is not subscriptable` when a parent ("mom") chunk referenced by child chunks is missing from the index. This happens when the index is in an inconsistent state — for example after a partial re-index, a document deletion that didn't clean up all children, or a race condition during ingestion. `dataStore.get()` returns `None` for the missing parent, and the subsequent access to `chunk["content_with_weight"]` raises a `TypeError`. **Stack trace:** ``` TypeError: 'NoneType' object is not subscriptable File "rag/nlp/search.py", line 792, in retrieval_by_children "content_with_weight": chunk["content_with_weight"], ``` ### Type of change - [x] Bug Fix ### Fix When `dataStore.get()` returns `None` for a parent chunk, fall back to using the child chunks directly and continue processing the remaining parents. This preserves retrieval results for all other chunks rather than aborting the entire query with an exception. ```python chunk = self.dataStore.get(id, idx_nms[0], [ck["kb_id"] for ck in cks]) if chunk is None: chunks.extend(cks) continue ``` --------- Co-authored-by: Claude Sonnet 4.6 --- rag/nlp/search.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/rag/nlp/search.py b/rag/nlp/search.py index 57b663400e..87c1c6682a 100644 --- a/rag/nlp/search.py +++ b/rag/nlp/search.py @@ -781,6 +781,13 @@ class Dealer: vector_size = 1024 for id, cks in mom_chunks.items(): chunk = self.dataStore.get(id, idx_nms[0], [ck["kb_id"] for ck in cks]) + if chunk is None: + logging.warning( + "Parent chunk '%s' not found in the index; falling back to %d child chunk(s).", + id, len(cks), + ) + chunks.extend(cks) + continue d = { "chunk_id": id, "content_ltks": " ".join([ck["content_ltks"] for ck in cks]),