From b83e2ae5a28266dcc30afcbed1d1762c79b2b785 Mon Sep 17 00:00:00 2001
From: VincentLambert <v.lambert@eurelis.com>
Date: Mon, 11 May 2026 05:55:44 +0200
Subject: [PATCH] fix: handle missing parent chunk in retrieval_by_children
 (#14556)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

### What problem does this PR solve?

`retrieval_by_children()` in `rag/nlp/search.py` crashes with a
`TypeError: 'NoneType' object is not subscriptable` when a parent
("mom") chunk referenced by child chunks is missing from the index.

This happens when the index is in an inconsistent state — for example
after a partial re-index, a document deletion that didn't clean up all
children, or a race condition during ingestion. `dataStore.get()`
returns `None` for the missing parent, and the subsequent access to
`chunk["content_with_weight"]` raises a `TypeError`.

**Stack trace:**
```
TypeError: 'NoneType' object is not subscriptable
  File "rag/nlp/search.py", line 792, in retrieval_by_children
    "content_with_weight": chunk["content_with_weight"],
```

### Type of change

- [x] Bug Fix

### Fix

When `dataStore.get()` returns `None` for a parent chunk, fall back to
using the child chunks directly and continue processing the remaining
parents. This preserves retrieval results for all other chunks rather
than aborting the entire query with an exception.

```python
chunk = self.dataStore.get(id, idx_nms[0], [ck["kb_id"] for ck in cks])
if chunk is None:
    chunks.extend(cks)
    continue
```

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 rag/nlp/search.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/rag/nlp/search.py b/rag/nlp/search.py
index 57b663400e..87c1c6682a 100644
--- a/rag/nlp/search.py
+++ b/rag/nlp/search.py
@@ -781,6 +781,13 @@ class Dealer:
         vector_size = 1024
         for id, cks in mom_chunks.items():
             chunk = self.dataStore.get(id, idx_nms[0], [ck["kb_id"] for ck in cks])
+            if chunk is None:
+                logging.warning(
+                    "Parent chunk '%s' not found in the index; falling back to %d child chunk(s).",
+                    id, len(cks),
+                )
+                chunks.extend(cks)
+                continue
             d = {
                 "chunk_id": id,
                 "content_ltks": " ".join([ck["content_ltks"] for ck in cks]),