ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-04 09:39:32 +08:00

Author	SHA1	Message	Date
cleanjunc	88e4d6bddb	Fix: restore GraphRAG entity ranking by indexing pagerank and n-hop paths (#15797 ) ### Summary Closes #15795 Knowledge-graph queries rank entities by `pagerank * sim` in `KGSearch`, but the entity chunks written at index time stopped carrying the values that ranking depends on. `graph_node_to_chunk` only stored `entity_type`, `description`, and `source_id`, dropping the node `pagerank` and the n-hop neighbour paths, while `search.py` still read them back as `rank_flt` and `n_hop_with_weight`. The producer of these fields, `update_nodes_pagerank_nhop_neighbour`, was removed in #6513, but the read side in `KGSearch` was never updated. The result is that on every knowledge-graph query: - `pagerank` resolves to `0`, so the `pagerank * sim` sort key is `0` for every entity and selection falls back to arbitrary order. - Every displayed entity score is `0.00`. - The n-hop relation-enrichment block is dead code because `n_hop_ents` is always empty, leaving `merge_tuples` and `is_continuous_subsequence` orphaned. This PR restores the missing index-time fields so the documented `P(E\|Q) = pagerank * sim` ranking and the n-hop enrichment work again. What changed: - `graph_node_to_chunk` now writes `rank_flt` from the node pagerank and `n_hop_with_weight` from the recomputed n-hop neighbour paths. - Reintroduced the n-hop path computation (`n_neighbor`) in `rag/graphrag/utils.py`, reusing the previously orphaned `merge_tuples` / `is_continuous_subsequence` helpers, with a direction-agnostic edge-weight lookup for undirected graphs. `set_graph` computes the paths per added or updated node and passes them through. - `KGSearch` now selects `n_hop_with_weight` in the entity keyword search so Infinity and OceanBase return it (Elasticsearch and OpenSearch already read it from `_source`), and the read is hardened against missing keys or empty strings before `json.loads`. - Added the `n_hop_with_weight` column to OceanBase, including the `EXTRA_COLUMNS` migration entry so existing tables get it. The other engines already map both fields via dynamic templates or the Infinity mapping. Scope note: pagerank and n-hop are re-indexed for the added or updated nodes in each pass, consistent with the existing incremental indexing design. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Testing Added unit tests in `test/unit_test/rag/graphrag/test_graphrag_utils.py`: - `n_neighbor`: path and weight shape, one-hop vs two-hop, isolated nodes, missing weights, and direction-agnostic lookup. - `graph_node_to_chunk`: `rank_flt` populated from pagerank and defaulting to `0`, `n_hop_with_weight` serialized and defaulting to an empty list. ``` uv run pytest test/unit_test/rag/graphrag/ # 106 passed uv run ruff check rag/graphrag/ rag/utils/ob_conn.py ```	2026-06-09 20:50:45 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
Zhichang Yu	b7744e053e	fix: support dense_vector from ES fields response (ES 9.x compatibility) (#13972 ) fix: support dense_vector from ES fields response (ES 9.x compatibility) - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Configuration Chore (non-breaking change which updates configuration) ## Summary by CodeRabbit * Bug Fixes * More accurate handling and unwrapping of dense-vector fields so returned values have correct shapes. * Field selection reliably limits returned data and falls back to alternate result locations when needed. * Use of consistent result IDs and tolerant handling when score values are missing. * Chores / Configuration * Increased build memory and adjusted build-time flags for the frontend build. * Simplified runtime model/GPU checks and removed an automated runtime GPU-install attempt. * Build Fixes * `web/vite.config.ts`: make `build.minify` and `build.sourcemap` respect `VITE_MINIFY` and `VITE_BUILD_SOURCEMAP` env vars from Dockerfile instead of hardcoding `terser` and `true`. * Environment * Allow stack version override and default the runtime image tag to "latest". <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Correct unwrapping of dense-vector fields and reliable field selection with fallback locations. * Consistent use of hit-level IDs and tolerant handling when score values are missing. * Chores / Configuration * Increased frontend build memory and added build-time minify/sourcemap flags; build minification and sourcemap now configurable. * Removed runtime GPU detection for model initialization; force CPU initialization. * Environment * Allow stack version override and default runtime image tag to "latest". <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 17:44:13 +08:00
yH	757d8d42dd	Fix: use configured OrderByExpr in _community_retrieval_ (#13683 ) The `odr` variable was configured with `desc("weight_flt")` but a new empty `OrderByExpr()` was passed to `dataStore.search()` instead, causing the descending sort to have no effect. ### What problem does this PR solve? In `_community_retrieval_`, the configured `OrderByExpr` with `desc("weight_flt")` was discarded — a new empty `OrderByExpr()` was passed to `dataStore.search()` instead, so community reports were never sorted by weight. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-19 17:55:40 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
Kevin Hu	32c0161ff1	Refa: Clean the folders. (#12890 ) ### Type of change - [x] Refactoring	2026-01-29 14:23:26 +08:00

6 Commits