fix(opensearch): implement get_scores for KNN second-pass scoring (#15390)

### What problem does this PR solve? On the OpenSearch backend (`DOC_ENGINE=opensearch`), every retrieval that performs the KNN second-pass scoring crashes with: AttributeError: 'OSConnection' object has no attribute 'get_scores' **Root cause.** #14970 ("Refactor: Drop the vector fetch for ES") added a `get_scores()` helper to `ESConnectionBase` (`common/doc_store/es_conn_base.py`) and introduced `Dealer._knn_scores()` in `rag/nlp/search.py`, which calls `self.dataStore.get_scores(res)`. `search.py` routes Infinity and OceanBase to their own similarity paths via `DOC_ENGINE_INFINITY` / `DOC_ENGINE_OCEANBASE`, but OpenSearch sets neither flag, so it falls into the Elasticsearch branch and calls `get_scores`. `OSConnection` (which subclasses `DocStoreConnection` directly, not `ESConnectionBase`) never received that method, so any vector-search hit triggers the crash. It reproduces with any normal embedding (e.g. 1024-dim mistral-embed) as soon as a KNN query returns hits. ### Fix Add `OSConnection.get_scores()`, mirroring `ESConnectionBase.get_scores()`. OpenSearch hit headers expose `_score` exactly like Elasticsearch (the existing `OSConnection.__getSource` already reads `d["_score"]`), so the implementation is identical. Scope note: Infinity and OceanBase deliberately do not use `get_scores` (#14970 routes them elsewhere), so this fix is intentionally limited to the OpenSearch backend, which is the only one reaching the ES KNN-score path. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Affected backends OpenSearch only. Elasticsearch already implements `get_scores`; Infinity / OceanBase are routed away from it. ### How to reproduce 1. `DOC_ENGINE=opensearch` (docker `.env`), restart the stack. 2. Create a knowledge base with any dense embedding model and parse a document. 3. Run a retrieval / chat over that KB -> 500 with the AttributeError above. ### Risk & backward compatibility None for the default Elasticsearch deployment -- the change only adds a method to `OSConnection`. No default values or ES/Infinity/OceanBase behavior change. ### Test plan - [ ] With `DOC_ENGINE=opensearch`, retrieval over a KB returns scored chunks (no AttributeError). - [ ] `DOC_ENGINE=elasticsearch` regression: retrieval unchanged. - [ ] Empty-result path: `_knn_scores` early-returns `{}` (guarded), get_scores handles an empty `hits` list gracefully.
2026-06-29 23:41:12 +08:00 · 2026-05-29 22:49:15 +09:00
parent a2500fed43
commit 3dfc16973c
1 changed files with 17 additions and 0 deletions
--- a/rag/utils/opensearch_conn.py
+++ b/rag/utils/opensearch_conn.py
@@ -666,6 +666,23 @@ class OSConnection(DocStoreConnection):
    def get_doc_ids(self, res):
        return [d["_id"] for d in res["hits"]["hits"]]

+    def get_scores(self, res) -> dict[str, float]:
+        """
+        Map hit `_id` to its raw `_score`. Used by rag/nlp/search.py:_knn_scores()
+        to recover the cosine similarity returned by a KNN-only second-pass search
+        without pulling the chunk vectors out of the index. OpenSearch hit headers
+        carry `_score` exactly like Elasticsearch, so this mirrors
+        ESConnectionBase.get_scores.
+        """
+        out = {}
+        for d in res.get("hits", {}).get("hits", []):
+            doc_id = d.get("_id")
+            if doc_id is None:
+                continue
+            score = d.get("_score")
+            out[doc_id] = float(score) if score is not None else 0.0
+        return out
+
    def __getSource(self, res):
        rr = []
        for d in res["hits"]["hits"]: