conf/infinity_mapping.json

{
	"id": {"type": "varchar", "default": ""},
	"doc_id": {"type": "varchar", "default": ""},
	"kb_id": {"type": "varchar", "default": "", "index_type": {"type": "secondary", "cardinality": "low"}},
	"mom_id": {"type": "varchar", "default": ""},
	"mom": {"type": "varchar", "default": ""},
	"create_time": {"type": "varchar", "default": ""},
	"create_timestamp_flt": {"type": "float", "default": 0.0},
	"img_id": {"type": "varchar", "default": ""},
	"docnm": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "docnm_kwd, title_tks, title_sm_tks"},
	"name_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"tag_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"important_kwd_empty_count": {"type": "integer", "default": 0},
	"important_keywords": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "important_kwd, important_tks"},
	"questions": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "question_kwd, question_tks"},
	"content": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "content_with_weight, content_ltks, content_sm_ltks"},
	"authors": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "authors_tks, authors_sm_tks"},
	"page_num_int": {"type": "varchar", "default": ""},
	"top_int": {"type": "varchar", "default": ""},
	"position_int": {"type": "varchar", "default": ""},
	"weight_int": {"type": "integer", "default": 0},
	"weight_flt": {"type": "float", "default": 0.0},
	"chunk_order_int": {"type": "integer", "default": 0},
	"rank_int": {"type": "integer", "default": 0},
	"rank_flt": {"type": "float", "default": 0},
	"available_int": {"type": "integer", "default": 1, "index_type": {"type": "secondary", "cardinality": "low"}},
	"knowledge_graph_kwd": {"type": "varchar", "default": ""},
	"entities_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"pagerank_fea": {"type": "integer", "default":  0},
	"tag_feas": {"type": "varchar", "default": "", "analyzer": "rankfeatures"},
	"from_entity_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"to_entity_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"entity_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"entity_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"source_id": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"n_hop_with_weight": {"type": "varchar", "default": ""},
	"mom_with_weight": {"type": "varchar", "default": ""},
	"removed_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"doc_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"toc_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"raptor_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
	"raptor_layer_int": {"type": "integer", "default": 0},
	"extra": {"type": "varchar", "default": ""}
}
Integration with Infinity (#2894) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring 2024-11-12 14:59:41 +08:00			`{`
			`"id": {"type": "varchar", "default": ""},`
			`"doc_id": {"type": "varchar", "default": ""},`
Add secondary index to infinity (#12825) Add secondary index: 1. kb_id 2. available_int --------- Signed-off-by: zpf121 <1219290549@qq.com> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com> 2026-02-02 13:22:29 +08:00			`"kb_id": {"type": "varchar", "default": "", "index_type": {"type": "secondary", "cardinality": "low"}},`
Fix: parent-child chunking method (#11810) ### What problem does this PR solve? change: parent-child chunking method ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2025-12-09 09:34:01 +08:00			`"mom_id": {"type": "varchar", "default": ""},`
Fix: add missing "mom" field to infinity_mapping.json for parent-child chunker (#13821) ### What problem does this PR solve? When using Infinity as DOC_ENGINE with parent-child chunker enabled, vector insertion fails because the "mom" field is missing from the index mapping. This fix adds the required field to resolve the issue. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2026-03-27 13:06:18 +08:00			`"mom": {"type": "varchar", "default": ""},`
Integration with Infinity (#2894) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring 2024-11-12 14:59:41 +08:00			`"create_time": {"type": "varchar", "default": ""},`
			`"create_timestamp_flt": {"type": "float", "default": 0.0},`
			`"img_id": {"type": "varchar", "default": ""},`
Use Infinity single-field-multi-index (#11444) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement 2025-11-26 11:06:37 +08:00			`"docnm": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "docnm_kwd, title_tks, title_sm_tks"},`
Optimize graphrag again (#6513) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement 2025-03-26 15:34:42 +08:00			`"name_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
use to_df replace to_pl when get infinity Result (#5604) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Performance Improvement --------- Co-authored-by: wangwei <dwxiayi@163.com> 2025-03-05 09:35:40 +08:00			`"tag_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
Fix: Infinity keyword round-trip, highlight fallback, and KB update guards (#12660) ### What problem does this PR solve? Fixes Infinity-specific API regressions: preserves ```important_kwd``` round‑trip for ```[""]```, restores required highlight key in retrieval responses, and enforces Infinity guards for unsupported ```parser_id=tag``` and pagerank in ```/v1/kb/update```. Also removes a slow/buggy pandas row-wise apply that was throwing ```ValueError``` and causing flakiness. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2026-01-16 20:03:52 +08:00			`"important_kwd_empty_count": {"type": "integer", "default": 0},`
Use Infinity single-field-multi-index (#11444) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement 2025-11-26 11:06:37 +08:00			`"important_keywords": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "important_kwd, important_tks"},`
			`"questions": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "question_kwd, question_tks"},`
			`"content": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "content_with_weight, content_ltks, content_sm_ltks"},`
			`"authors": {"type": "varchar", "default": "", "analyzer": ["rag-coarse", "rag-fine"], "comment": "authors_tks, authors_sm_tks"},`
Rename page_num_list, top_list, position_list (#3940) ### What problem does this PR solve? Rename page_num_list, top_list, position_list to page_num_int, top_int, position_int ### Type of change - [x] Refactoring 2024-12-10 16:32:58 +08:00			`"page_num_int": {"type": "varchar", "default": ""},`
			`"top_int": {"type": "varchar", "default": ""},`
			`"position_int": {"type": "varchar", "default": ""},`
Integration with Infinity (#2894) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring 2024-11-12 14:59:41 +08:00			`"weight_int": {"type": "integer", "default": 0},`
			`"weight_flt": {"type": "float", "default": 0.0},`
Feat: Refact pipeline (#13826) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> 2026-04-03 19:26:45 +08:00			`"chunk_order_int": {"type": "integer", "default": 0},`
Integration with Infinity (#2894) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring 2024-11-12 14:59:41 +08:00			`"rank_int": {"type": "integer", "default": 0},`
Light GraphRAG (#4585) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality) 2025-01-22 19:43:14 +08:00			`"rank_flt": {"type": "float", "default": 0},`
Add secondary index to infinity (#12825) Add secondary index: 1. kb_id 2. available_int --------- Signed-off-by: zpf121 <1219290549@qq.com> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com> 2026-02-02 13:22:29 +08:00			`"available_int": {"type": "integer", "default": 1, "index_type": {"type": "secondary", "cardinality": "low"}},`
Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 (#6651) ### What problem does this PR solve? Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2025-03-28 22:05:40 +08:00			`"knowledge_graph_kwd": {"type": "varchar", "default": ""},`
use to_df replace to_pl when get infinity Result (#5604) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Performance Improvement --------- Co-authored-by: wangwei <dwxiayi@163.com> 2025-03-05 09:35:40 +08:00			`"entities_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
Tagging (#4426) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality) 2025-01-09 17:07:21 +08:00			`"pagerank_fea": {"type": "integer", "default": 0},`
Added infinity rank_feature support (#9044) ### What problem does this PR solve? Added infinity rank_feature support ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2025-07-29 09:14:23 +08:00			`"tag_feas": {"type": "varchar", "default": "", "analyzer": "rankfeatures"},`
Optimize graphrag again (#6513) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement 2025-03-26 15:34:42 +08:00			`"from_entity_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
			`"to_entity_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
			`"entity_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
			`"entity_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
			`"source_id": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
Light GraphRAG (#4585) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality) 2025-01-22 19:43:14 +08:00			`"n_hop_with_weight": {"type": "varchar", "default": ""},`
Fix: Tika server upgrades. (#12073) ### What problem does this PR solve? #12037 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2025-12-23 09:35:52 +08:00			`"mom_with_weight": {"type": "varchar", "default": ""},`
Feat: add image preview to retrieval test. (#7610) ### What problem does this PR solve? #7608 ### Type of change - [x] New Feature (non-breaking change which adds functionality) 2025-05-13 14:30:36 +08:00			`"removed_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
Fix: add toc_kwd field and update page_num_int type (#10596) ### What problem does this PR solve? - Added new field 'toc_kwd' to infinity_mapping.json for table of contents keyword support - Changed page_num_int from integer to array type in task_executor.py to handle multiple page numbers ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2025-10-16 12:47:24 +08:00			`"doc_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
Fix(config): Add raptor_kwd field to infinity mapping (#11146) ### What problem does this PR solve? fix infinity "INSERT: Column raptor_kwd not found in table" error ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 2025-11-10 19:02:25 +08:00			`"toc_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
feat: persist RAPTOR layer metadata on summary chunks (#13286) ## Summary RAPTOR's recursive clustering builds a `layers` list tracking `(start_idx, end_idx)` boundaries per level, but currently discards this information — only the flat `chunks` list is returned. This makes it impossible to distinguish leaf-level summaries from top-level ones. This PR: - Returns `(chunks, layers)` tuple from `raptor.py`'s `__call__` - Annotates each RAPTOR summary chunk with `raptor_layer_int` (1 = first summary level, 2 = summary-of-summaries, etc.) - Adds `raptor_layer_int` to `infinity_mapping.json` (Elasticsearch handles it via existing `_int` dynamic template) ### Why this matters Downstream features need to know which RAPTOR layer a summary belongs to: - Retrieving the top-level document summary* for entity extraction, search snippets, or document comparison - Filtering by abstraction level — users may want only high-level summaries or only leaf-level cluster summaries - RAPTOR recall quality — #10951 reports summaries not being recalled for definition queries; layer metadata enables targeted retrieval ### Changes \| File \| Change \| LOC \| \|------\|--------\|-----\| \| `rag/raptor.py` \| Return `(chunks, layers)` tuple \| ~3 \| \| `rag/svr/task_executor.py` \| Build `chunk_layer` mapping, set `raptor_layer_int` \| ~12 \| \| `conf/infinity_mapping.json` \| Add `raptor_layer_int` integer field \| ~1 \| ### Backward compatibility - Additive only — no existing fields or behavior changed - Existing RAPTOR chunks continue to work (they'll have `raptor_layer_int = 0` by default) - New RAPTOR chunks get layer metadata automatically ## Test plan - [ ] Parse a document with RAPTOR enabled, verify `raptor_layer_int` is set on indexed chunks - [ ] Verify `raptor_layer_int` values increase with abstraction level (layer 1 < layer 2 < ...) - [ ] Verify existing RAPTOR deletion (`delete by raptor_kwd`) still works - [ ] Verify Infinity backend accepts the new field Fixes #7488 Related: #4104, #11191, #10951 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: yuch85 <yuch85.1@gmail.com> Co-authored-by: Wang Qi <wangq8@outlook.com> 2026-04-27 10:20:46 +08:00			`"raptor_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},`
fix(infinity): declare `extra` field + serialize dict on write to unbreak RAPTOR (#14998) ### What problem does this PR solve? Fixes #14997. RAPTOR builds on the Infinity backend have been broken since v0.25.2 introduced the `extra` field in code (`rag/svr/task_executor.py:1011`) without declaring it in `conf/infinity_mapping.json`. Every RAPTOR job fails with: ``` infinity.common.InfinityException: (3013, 'Fail to bind the expression: extra@src/planner/expression_binder_impl.cpp:99') ``` The auto-migration in `common/doc_store/infinity_conn_base.py:_migrate_db()` adds any columns it finds in the mapping JSON to existing tables — so the only thing standing between users and a working RAPTOR build is that one missing declaration. OceanBase, ES, and OpenSearch were unaffected because they store `extra` as a native JSON type; only Infinity (which has a strict `varchar`/`integer`/`float` schema) needed the addition. ### The fix Two-part change: 1. `conf/infinity_mapping.json`: declare `"extra": {"type": "varchar", "default": ""}`. On next startup, `_migrate_db()` adds the column to all existing chunk tables — no manual DDL needed for upgrading installations. 2. `rag/utils/infinity_conn.py` `insert()`: serialize the `extra` dict to a JSON string at write time, since Infinity's `varchar` can't store a Python dict directly. Modelled on the existing `chunk_data` handling a few lines above. The read path (`rag/utils/raptor_utils.py:_as_extra_dict`) already normalises both dict and JSON-string inputs, so no read-side change is needed. Other backends are untouched — `task_executor.py` still writes the dict, and the OceanBase/ES/OpenSearch insert paths handle dicts natively. ### Verification Tested on a v0.25.4 deployment with the Infinity backend by applying the same two changes via mounted-volume override: - Confirmed `_migrate_db()` adds the `extra` column to all pre-existing chunk tables on startup (column visible via Infinity's `show_columns()`). - Triggered RAPTOR builds on four datasets (~21k chunks total) via `POST /api/v1/datasets/<id>/index?type=raptor`. - All four progressed past the previously-failing `get_raptor_chunk_methods()` call into actual entity-extraction and clustering work without the (3013) error. - GraphRAG builds (which can trigger the same path indirectly via `task_executor.py:857`) also progressed cleanly. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) 2026-05-19 15:10:03 +05:30			`"raptor_layer_int": {"type": "integer", "default": 0},`
			`"extra": {"type": "varchar", "default": ""}`
Integration with Infinity (#2894) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring 2024-11-12 14:59:41 +08:00			`}`