mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
## Summary `get_model_config_from_provider_instance()` was not including `max_tokens` in its returned dict, causing all downstream consumers (dialog truncation, message fitting, knowledge base trimming, embedding, graphrag, RAPTOR) to fall back to the hardcoded default of **8192 tokens** regardless of the actual model context window size (e.g., GPT-4o 128K, Claude 200K). Closes #15944 ## Root Cause The function builds `model_config` with only: `llm_factory`, `api_key`, `llm_name`, `api_base`, `model_type`, `is_tools`. `max_tokens` is never included. Yet the data exists in four independent sources: 1. `TenantModel.extra` JSON field — written by `provider_api_service.py:659` 2. `conf/llm_factories.json` — every model entry has `max_tokens` 3. `rag/llm/model_meta.py` — 9 provider classes fetch real context windows from APIs 4. `TenantLLM.max_tokens` database column None of them are read by this function. ## Fix Two lines added, one per return path: - **Path B** (model_obj exists → provider-instance model): reads `max_tokens` from `model_obj.extra` JSON - **Path C** (fallback → factory config): reads `max_tokens` from `llm_info` (sourced from `llm_factories.json`) Both fall back to 8192 when the value is absent, preserving backward compatibility. ## Impact This single 5-line change fixes the context window budget for all **78+ call sites** across **20 files** that construct `LLMBundle` or read `max_tokens` from the config dict, including: | Consumer | File | Effect | |---|---|---| | Dialog chat truncation | `dialog_service.py:562` | `message_fit_in(msg, max_tokens * 0.95)` now uses real context window | | Knowledge base trimming | `dialog_service.py:752` | `kb_prompt(kbinfos, max_tokens)` now fits more retrieved content | | Agent message fitting | `agent/component/llm.py:322` | Agent prompts no longer truncated at 7946 tokens | | Embedding truncation | `task_executor.py:704` | Embedding input uses actual model limit | | GraphRAG extraction | `graphrag/*/extractor.py` | Entity extraction gets full context budget | | LLM4Tenant.max_length | `tenant_llm_service.py:513` | Chat model wrapper exposes real context window |