fix(opensearch): implement get_scores for KNN second-pass scoring (#15390)

### What problem does this PR solve?

On the OpenSearch backend (`DOC_ENGINE=opensearch`), every retrieval
that
performs the KNN second-pass scoring crashes with:

    AttributeError: 'OSConnection' object has no attribute 'get_scores'

**Root cause.** #14970 ("Refactor: Drop the vector fetch for ES") added
a
`get_scores()` helper to `ESConnectionBase`
(`common/doc_store/es_conn_base.py`)
and introduced `Dealer._knn_scores()` in `rag/nlp/search.py`, which
calls
`self.dataStore.get_scores(res)`. `search.py` routes Infinity and
OceanBase to
their own similarity paths via `DOC_ENGINE_INFINITY` /
`DOC_ENGINE_OCEANBASE`,
but OpenSearch sets neither flag, so it falls into the Elasticsearch
branch and
calls `get_scores`. `OSConnection` (which subclasses
`DocStoreConnection`
directly, not `ESConnectionBase`) never received that method, so any
vector-search hit triggers the crash. It reproduces with any normal
embedding
(e.g. 1024-dim mistral-embed) as soon as a KNN query returns hits.

### Fix

Add `OSConnection.get_scores()`, mirroring
`ESConnectionBase.get_scores()`.
OpenSearch hit headers expose `_score` exactly like Elasticsearch (the
existing
`OSConnection.__getSource` already reads `d["_score"]`), so the
implementation
is identical.

Scope note: Infinity and OceanBase deliberately do not use `get_scores`
(#14970 routes them elsewhere), so this fix is intentionally limited to
the
OpenSearch backend, which is the only one reaching the ES KNN-score
path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Affected backends
OpenSearch only. Elasticsearch already implements `get_scores`; Infinity
/
OceanBase are routed away from it.

### How to reproduce
1. `DOC_ENGINE=opensearch` (docker `.env`), restart the stack.
2. Create a knowledge base with any dense embedding model and parse a
document.
3. Run a retrieval / chat over that KB -> 500 with the AttributeError
above.

### Risk & backward compatibility
None for the default Elasticsearch deployment -- the change only adds a
method
to `OSConnection`. No default values or ES/Infinity/OceanBase behavior
change.

### Test plan
- [ ] With `DOC_ENGINE=opensearch`, retrieval over a KB returns scored
chunks
      (no AttributeError).
- [ ] `DOC_ENGINE=elasticsearch` regression: retrieval unchanged.
- [ ] Empty-result path: `_knn_scores` early-returns `{}` (guarded),
get_scores
      handles an empty `hits` list gracefully.
This commit is contained in:
Rintaro
2026-05-29 22:49:15 +09:00
committed by GitHub
parent a2500fed43
commit 3dfc16973c

View File

@@ -666,6 +666,23 @@ class OSConnection(DocStoreConnection):
def get_doc_ids(self, res):
return [d["_id"] for d in res["hits"]["hits"]]
def get_scores(self, res) -> dict[str, float]:
"""
Map hit `_id` to its raw `_score`. Used by rag/nlp/search.py:_knn_scores()
to recover the cosine similarity returned by a KNN-only second-pass search
without pulling the chunk vectors out of the index. OpenSearch hit headers
carry `_score` exactly like Elasticsearch, so this mirrors
ESConnectionBase.get_scores.
"""
out = {}
for d in res.get("hits", {}).get("hits", []):
doc_id = d.get("_id")
if doc_id is None:
continue
score = d.get("_score")
out[doc_id] = float(score) if score is not None else 0.0
return out
def __getSource(self, res):
rr = []
for d in res["hits"]["hits"]: