mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
### What problem does this PR solve? Follow-up to #15393. After #15393 fixed the OpenSearch `search()` signature and the doc-meta mapping, document metadata still renders as **"0 fields"** for every document on the OpenSearch backend (`DOC_ENGINE=opensearch`). **Root cause.** `OSConnection.insert()` pops `id` out of the document before indexing: meta_id = d_copy.pop("id", "") # id used as _id, then DROPPED from _source so the stored `_source` never contains an `id` field. But the doc-meta read path filters and sorts on that field: - `DocMetadataService.get_metadata_for_documents()` builds `condition = {"kb_id": kb_id, "id": doc_ids}` -> `OSConnection.search()` emits `Q("terms", id=doc_ids)` (a term query on the `id` field), and - `_search_metadata()` sorts with `order_by.asc("id")`. With `id` absent from `_source`, the terms filter matches nothing, so `get_metadata_for_documents()` returns an empty map and the UI shows "0 fields" -- even though the metadata was written correctly (it is visible via a kb_id-only query). `ESConnection.insert()` already keeps `id` (`d_copy.get("id", "")`) with the comment *"also keep 'id' as a regular field for sorting"*. This is a plain OpenSearch-only divergence (`pop()` vs `get()`). ### Fix Mirror Elasticsearch: use `get("id")` instead of `pop("id")` so `id` survives in `_source`. The doc-meta mapping already declares `id` as `keyword`, so the field is searchable/sortable once populated. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Affected backends OpenSearch only. Elasticsearch already keeps `id`; Infinity / OceanBase unaffected. ### How to reproduce 1. `DOC_ENGINE=opensearch`, create a KB, upload/parse a document, set metadata. 2. Open the document list -> every document shows "0 fields" (the metadata exists in the `ragflow_doc_meta_*` index but its `_source` has no `id` field). ### Risk & backward compatibility `insert()` is shared with the main chunk index; keeping `id` in `_source` brings OpenSearch in line with Elasticsearch (which already does this), so it is parity, not new behavior. No default / ES / Infinity / OceanBase behavior change. Note: affects new inserts only. Existing `ragflow_doc_meta_*` indices created before this change have no `id` in `_source`; re-sync metadata, or backfill once with `_update_by_query` (`ctx._source.id = ctx._id`). ### Test plan - [ ] OpenSearch: after the fix the document list shows correct metadata field counts (not "0 fields"); metadata filter/sort by id works. - [ ] Elasticsearch regression: unchanged.