fix: support dense_vector from ES fields response (ES 9.x compatibility) (#13972)

fix: support dense_vector from ES fields response (ES 9.x compatibility) - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Configuration Chore (non-breaking change which updates configuration) ## Summary by CodeRabbit * **Bug Fixes** * More accurate handling and unwrapping of dense-vector fields so returned values have correct shapes. * Field selection reliably limits returned data and falls back to alternate result locations when needed. * Use of consistent result IDs and tolerant handling when score values are missing. * **Chores / Configuration** * Increased build memory and adjusted build-time flags for the frontend build. * Simplified runtime model/GPU checks and removed an automated runtime GPU-install attempt. * **Build Fixes** * `web/vite.config.ts`: make `build.minify` and `build.sourcemap` respect `VITE_MINIFY` and `VITE_BUILD_SOURCEMAP` env vars from Dockerfile instead of hardcoding `terser` and `true`. * **Environment** * Allow stack version override and default the runtime image tag to "latest".  ## Summary by CodeRabbit * **Bug Fixes** * Correct unwrapping of dense-vector fields and reliable field selection with fallback locations. * Consistent use of hit-level IDs and tolerant handling when score values are missing. * **Chores / Configuration** * Increased frontend build memory and added build-time minify/sourcemap flags; build minification and sourcemap now configurable. * Removed runtime GPU detection for model initialization; force CPU initialization. * **Environment** * Allow stack version override and default runtime image tag to "latest".  --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-29 15:31:05 +08:00 · 2026-04-09 17:44:13 +08:00
parent 107fe6cf90
commit b7744e053e
50 changed files with 142 additions and 124 deletions
--- a/deepdoc/parser/pdf_parser.py
+++ b/deepdoc/parser/pdf_parser.py
@@ -38,7 +38,6 @@ from sklearn.cluster import KMeans
 from sklearn.metrics import silhouette_score

 from common.file_utils import get_project_base_directory
-from common.misc_utils import pip_install_torch
 from deepdoc.vision import OCR, AscendLayoutRecognizer, LayoutRecognizer, Recognizer, TableStructureRecognizer
 from rag.nlp import rag_tokenizer
 from rag.prompts.generator import vision_llm_describe_prompt
@@ -91,14 +90,9 @@ class RAGFlowPdfParser:
        self.tbl_det = TableStructureRecognizer()

        self.updown_cnt_mdl = xgb.Booster()
-        try:
-            pip_install_torch()
-            import torch.cuda
-
-            if torch.cuda.is_available():
-                self.updown_cnt_mdl.set_param({"device": "cuda"})
-        except Exception:
-            logging.info("No torch found.")
+        # xgboost model is very small; using CPU explicitly
+        self.updown_cnt_mdl.set_param({"device": "cpu"})
+        logging.info("updown_cnt_mdl initialized on CPU")
        try:
            model_dir = os.path.join(get_project_base_directory(), "rag/res/deepdoc")
            self.updown_cnt_mdl.load_model(os.path.join(model_dir, "updown_concat_xgb.model"))