ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Files

少卿 9614605bf9 fix: propagate max_tokens from model config to downstream consumers (#15945 )

## Summary

`get_model_config_from_provider_instance()` was not including
`max_tokens` in its returned dict, causing all downstream consumers
(dialog truncation, message fitting, knowledge base trimming, embedding,
graphrag, RAPTOR) to fall back to the hardcoded default of **8192
tokens** regardless of the actual model context window size (e.g.,
GPT-4o 128K, Claude 200K).

Closes #15944

## Root Cause

The function builds `model_config` with only: `llm_factory`, `api_key`,
`llm_name`, `api_base`, `model_type`, `is_tools`. `max_tokens` is never
included.

Yet the data exists in four independent sources:
1. `TenantModel.extra` JSON field — written by
`provider_api_service.py:659`
2. `conf/llm_factories.json` — every model entry has `max_tokens`
3. `rag/llm/model_meta.py` — 9 provider classes fetch real context
windows from APIs
4. `TenantLLM.max_tokens` database column

None of them are read by this function.

## Fix

Two lines added, one per return path:

- **Path B** (model_obj exists → provider-instance model): reads
`max_tokens` from `model_obj.extra` JSON
- **Path C** (fallback → factory config): reads `max_tokens` from
`llm_info` (sourced from `llm_factories.json`)

Both fall back to 8192 when the value is absent, preserving backward
compatibility.

## Impact

This single 5-line change fixes the context window budget for all **78+
call sites** across **20 files** that construct `LLMBundle` or read
`max_tokens` from the config dict, including:

| Consumer | File | Effect |
|---|---|---|
| Dialog chat truncation | `dialog_service.py:562` |
`message_fit_in(msg, max_tokens * 0.95)` now uses real context window |
| Knowledge base trimming | `dialog_service.py:752` |
`kb_prompt(kbinfos, max_tokens)` now fits more retrieved content |
| Agent message fitting | `agent/component/llm.py:322` | Agent prompts
no longer truncated at 7946 tokens |
| Embedding truncation | `task_executor.py:704` | Embedding input uses
actual model limit |
| GraphRAG extraction | `graphrag/*/extractor.py` | Entity extraction
gets full context budget |
| LLM4Tenant.max_length | `tenant_llm_service.py:513` | Chat model
wrapper exposes real context window |

2026-06-11 17:24:58 +08:00

joint_services

fix: propagate max_tokens from model config to downstream consumers (#15945 )

2026-06-11 17:24:58 +08:00

services

fix(dialog): guard async_ask() against empty or invalid kb_ids (#15530 )

2026-06-11 15:52:59 +08:00

__init__.py

fix: propagate memory tenant id in task collect (#15837 )

2026-06-09 17:47:48 +08:00

db_models.py

fix(db): drop Peewee-auto-named unique index on tenant_model_instance (#15699 ) (#15879 )