mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
fix(opensearch): implement get_scores for KNN second-pass scoring (#15390)
### What problem does this PR solve?
On the OpenSearch backend (`DOC_ENGINE=opensearch`), every retrieval
that
performs the KNN second-pass scoring crashes with:
AttributeError: 'OSConnection' object has no attribute 'get_scores'
**Root cause.** #14970 ("Refactor: Drop the vector fetch for ES") added
a
`get_scores()` helper to `ESConnectionBase`
(`common/doc_store/es_conn_base.py`)
and introduced `Dealer._knn_scores()` in `rag/nlp/search.py`, which
calls
`self.dataStore.get_scores(res)`. `search.py` routes Infinity and
OceanBase to
their own similarity paths via `DOC_ENGINE_INFINITY` /
`DOC_ENGINE_OCEANBASE`,
but OpenSearch sets neither flag, so it falls into the Elasticsearch
branch and
calls `get_scores`. `OSConnection` (which subclasses
`DocStoreConnection`
directly, not `ESConnectionBase`) never received that method, so any
vector-search hit triggers the crash. It reproduces with any normal
embedding
(e.g. 1024-dim mistral-embed) as soon as a KNN query returns hits.
### Fix
Add `OSConnection.get_scores()`, mirroring
`ESConnectionBase.get_scores()`.
OpenSearch hit headers expose `_score` exactly like Elasticsearch (the
existing
`OSConnection.__getSource` already reads `d["_score"]`), so the
implementation
is identical.
Scope note: Infinity and OceanBase deliberately do not use `get_scores`
(#14970 routes them elsewhere), so this fix is intentionally limited to
the
OpenSearch backend, which is the only one reaching the ES KNN-score
path.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Affected backends
OpenSearch only. Elasticsearch already implements `get_scores`; Infinity
/
OceanBase are routed away from it.
### How to reproduce
1. `DOC_ENGINE=opensearch` (docker `.env`), restart the stack.
2. Create a knowledge base with any dense embedding model and parse a
document.
3. Run a retrieval / chat over that KB -> 500 with the AttributeError
above.
### Risk & backward compatibility
None for the default Elasticsearch deployment -- the change only adds a
method
to `OSConnection`. No default values or ES/Infinity/OceanBase behavior
change.
### Test plan
- [ ] With `DOC_ENGINE=opensearch`, retrieval over a KB returns scored
chunks
(no AttributeError).
- [ ] `DOC_ENGINE=elasticsearch` regression: retrieval unchanged.
- [ ] Empty-result path: `_knn_scores` early-returns `{}` (guarded),
get_scores
handles an empty `hits` list gracefully.
This commit is contained in:
@@ -666,6 +666,23 @@ class OSConnection(DocStoreConnection):
|
||||
def get_doc_ids(self, res):
|
||||
return [d["_id"] for d in res["hits"]["hits"]]
|
||||
|
||||
def get_scores(self, res) -> dict[str, float]:
|
||||
"""
|
||||
Map hit `_id` to its raw `_score`. Used by rag/nlp/search.py:_knn_scores()
|
||||
to recover the cosine similarity returned by a KNN-only second-pass search
|
||||
without pulling the chunk vectors out of the index. OpenSearch hit headers
|
||||
carry `_score` exactly like Elasticsearch, so this mirrors
|
||||
ESConnectionBase.get_scores.
|
||||
"""
|
||||
out = {}
|
||||
for d in res.get("hits", {}).get("hits", []):
|
||||
doc_id = d.get("_id")
|
||||
if doc_id is None:
|
||||
continue
|
||||
score = d.get("_score")
|
||||
out[doc_id] = float(score) if score is not None else 0.0
|
||||
return out
|
||||
|
||||
def __getSource(self, res):
|
||||
rr = []
|
||||
for d in res["hits"]["hits"]:
|
||||
|
||||
Reference in New Issue
Block a user