ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-04 01:29:35 +08:00

Author	SHA1	Message	Date
buua436	093eec3105	fix: handle qwen rerank error response (#15881 ) ### What problem does this PR solve? Fix QWen rerank error handling so DashScope error responses without a text attribute do not raise a secondary KeyError and hide the real provider error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-10 13:05:24 +08:00
cleanjunc	38f9ea5fec	fix(rerank): normalize reranker scores onto a single scale before hybrid blend (#15429 ) ### What problem does this PR solve? Closes #15428 The hybrid score in `rag/nlp/search.py` (`rerank_by_model`) blends reranker similarity with token similarity on a fixed `[0, 1]` scale: ```python return tkweight * np.array(tksim) + vtweight * vtsim + rank_fea # tkweight=0.3, vtweight=0.7 ``` The reranker implementations did not agree on that scale. Only three of roughly 17 providers normalized their output, and `NvidiaRerank` returned raw, unbounded logits. Weighted at `0.7`, a negative logit could push a genuinely relevant chunk below pure keyword matches, and its magnitude swamped `tksim`, which lives in `[0, 1]`. The practical effect was that the same query produced differently scaled scores depending on the configured reranker, and logit based providers degraded retrieval quality instead of improving it. This PR enforces a single scoring contract in one place: - `Base.similarity` is now the only public entry point. It short-circuits empty input and guarantees a normalized result. Each provider implements its raw scoring in `_compute_rank`, which removes sixteen duplicated empty input guards and the three scattered normalization calls. - Normalization is range aware. Providers that already return calibrated `[0, 1]` relevance scores (Cohere, Jina, Voyage, and others) keep their absolute magnitudes, so `similarity_threshold` filtering and the reported `vector_similarity` stay meaningful. Only out-of-range output such as NVIDIA logits is min-max rescaled into `[0, 1]`. - The twelve leftover `[DEBUG ...]` prints in `rerank_by_model`, introduced in #14231, are removed. They ran on every retrieval, added per chunk overhead, and leaked queries, keywords, and document content to stdout and logs. A new regression suite in `test/unit_test/rag/llm/test_rerank_normalization.py` covers logit rescaling (positive, negative, and flat batches), preservation of already calibrated scores, ordering, empty input handling, and the per provider HTTP path. It also asserts that no provider overrides `similarity()`, so the contract cannot silently drift. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-08 11:53:22 +08:00
07heco	8dc5b1b42d	fix: optimize reranking module robustness and bug fixes (#14264 ) ## Description This PR fixes critical bugs and improves the robustness of the RAG reranking module while maintaining 100% backward compatibility with all existing functionality and providers. ## Key Changes 1. Network Stability: Added 30s timeout to all API requests to prevent service blocking 2. Boundary Protection: Added empty query/text validation for all rerank models 3. Response Fault Tolerance: Replaced hardcoded key access with `.get()` to avoid KeyError crashes 4. Bug Fixes: - Fixed `Ai302Rerank` (completely non-functional before) - Fixed `GPUStackRerank` incorrect exception catching - Fixed `_normalize_rank` empty array crash 5. Code Specification: Added type annotations, standardized unimplemented class prompts ## Compatibility - ✅ No changes to any class/method names - ✅ All rerank providers (Jina/Cohere/NVIDIA/HuggingFace etc.) work as before - ✅ No breaking changes, zero impact on existing workflows ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-14 11:56:09 +08:00
07heco	e46989832e	fix: complete robustness fixes for rerank module addressing all review comments (#14265 ) ## Summary This PR fully addresses all CodeRabbit review feedback and enhances the robustness of the reranking module with 100% backward compatibility. ## Key Fixes 1. Fixed JinaRerank hardcoded base_url to support subclass endpoint overrides 2. Corrected GPUStackRerank exception handling to use proper requests exceptions and preserve stack traces 3. Added 30s timeout to all API calls to prevent service hanging 4. Added empty input validation for all rerank providers 5. Replaced direct dict key access with .get() to eliminate KeyError crashes 6. Fixed _normalize_rank edge case for empty arrays 7. Implemented missing functionality for Ai302Rerank 8. Standardized type hints and fixed typo issues ## Compatibility - No breaking changes to any existing functionality - All rerank providers work as originally intended - Fully compatible with existing configurations and workflows ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 12:40:41 +08:00
Ricardo-M-L	13922209e6	fix(llm): add timeout to HTTP requests in LLM integration layer (#14313 ) ### What problem does this PR solve? Multiple `requests.post()` calls across the LLM integration layer lack a `timeout` parameter. Without a timeout, a single unresponsive upstream service can block the calling thread indefinitely, eventually exhausting the thread pool and degrading the entire system. This is a well-known issue — Python's `requests` library defaults to `timeout=None` (infinite wait), and [the library docs explicitly recommend](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts) always setting a timeout. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Change Added `timeout` to all `requests.post()` calls missing it: \| File \| Calls fixed \| Timeout \| \|------\|-------------\|---------\| \| `rag/llm/rerank_model.py` \| 9 \| 30s \| \| `rag/llm/embedding_model.py` \| 8 \| 30s \| \| `rag/llm/cv_model.py` \| 3 \| 60s \| \| `rag/llm/tts_model.py` \| 2 \| 60s \| \| `rag/llm/sequence2txt_model.py` \| 2 \| 60s \| Embedding/rerank calls use 30s (lightweight API calls). Vision, TTS, and audio transcription use 60s (heavier workloads with file uploads). Note: other files in the codebase (e.g. `check_minio_alive`, `check_ragflow_server_alive`) already use `timeout=10`, so this PR brings the LLM layer in line with existing practice. Signed-off-by: Ricardo-M-L <Sibyl_Hartmanbnb@webname.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 11:19:07 +08:00
Igor Ilinskii	889aba6a32	fix base_url handling in HuggingfaceRerank (#14555 ) ### What problem does this PR solve? HuggingfaceRerank.post() unconditionally prepends `http://` to base_url, which already contains a protocol. This creates invalid URLs like http://http://127.0.0.1:8080/rerank, breaking all requests. The fix normalizes URL handling to match the rest of the codebase, removing redunant `http://`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Related Issues - #7318 - #7796 --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-05-11 10:04:40 +08:00
FuturMix	2548c28d65	feat: add FuturMix as model provider (#14419 ) ## Summary Add [FuturMix](https://futurmix.ai) as a new model provider. FuturMix is an OpenAI-compatible unified AI gateway that provides access to 22+ models (GPT, Claude, Gemini, DeepSeek, and more) through a single API endpoint and key. - API Base: `https://futurmix.ai/v1` (OpenAI-compatible) - Supported capabilities: Chat, Embedding, Image2Text, TTS, Speech2Text, Rerank ### Changes \| File \| Change \| \|------\|--------\| \| `rag/llm/__init__.py` \| Add `FuturMix` to `SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` \| \| `rag/llm/chat_model.py` \| Add `FuturMixChat(Base)` — follows Astraflow/Avian pattern \| \| `rag/llm/embedding_model.py` \| Add `FuturMixEmbed(OpenAIEmbed)` — follows Astraflow pattern \| \| `rag/llm/cv_model.py` \| Add `FuturMixCV(GptV4)` — follows SILICONFLOW/OpenRouter pattern \| \| `rag/llm/tts_model.py` \| Add `FuturMixTTS(OpenAITTS)` — follows CometAPI/DeerAPI pattern \| \| `rag/llm/sequence2txt_model.py` \| Add `FuturMixSeq2txt(GPTSeq2txt)` — follows StepFun pattern \| \| `rag/llm/rerank_model.py` \| Add `FuturMixRerank(OpenAI_APIRerank)` \| \| `conf/llm_factories.json` \| Add factory config with 8 chat, 2 embedding, 1 image2text, 2 TTS, 1 speech2text models \| \| `docs/guides/models/supported_models.mdx` \| Add FuturMix to supported models table \| ### Models included - Chat: claude-sonnet-4-20250514, claude-3.5-haiku, gpt-4o, gpt-4o-mini, gemini-2.5-flash, gemini-2.0-flash, deepseek-chat, deepseek-reasoner - Embedding: text-embedding-3-small, text-embedding-3-large - Image2Text: gpt-4o - TTS: tts-1, tts-1-hd - Speech2Text: whisper-1 ## Test plan - [ ] Verify FuturMix appears in the model provider list in RAGFlow UI - [ ] Configure FuturMix with API key and test chat completion - [ ] Test embedding model with document indexing - [ ] Test image2text with a sample image 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-30 10:59:37 +08:00
Stephen Hu	345bec812d	refactor: improve QwenRerank logic (#14388 ) ### What problem does this PR solve? improve QwenRerank logic ### Type of change - [x] Refactoring	2026-04-28 20:17:34 +08:00
qinling0210	1473000135	Implement retrieval_test in GO (#14231 ) ### What problem does this PR solve? Implement retrieval_test in GO ### Type of change - [x] Refactoring	2026-04-24 15:30:14 +08:00
rhinoceros.xn	4e992de91f	Add tongyi gte-rerank-v2 (#14215 ) https://bailian.console.aliyun.com/cn-beijing?tab=api#/api/?type=model&url=2780056 ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Other (please describe): add gte-rerank-v2、qwen3-rerank	2026-04-20 11:39:17 +08:00
Jonah Hartmann	6023eb27ac	feat: add Ragcon provider (#13425 ) ### What problem does this PR solve? This PR aims to extend the list of possible providers. Adds new Provider "RAGcon" within the Ollama Modal. It provides all model types except OCR via Openai-compatible endpoints. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>	2026-03-06 09:37:27 +08:00
Magicbook1108	5fc3bd38b0	Feat: Support siliconflow.com (#13308 ) ### What problem does this PR solve? Feat: Support siliconflow.com ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 15:37:42 +08:00
Yongteng Lei	48591cb1e7	Refa: boost OpenAI-compatible reranker UX (#13087 ) ### What problem does this PR solve? boost OpenAI-compatible reranker UX. ### Type of change - [x] Refactoring	2026-02-10 16:13:21 +08:00
Stephen Hu	638c510468	refactor: introduce common normalize method in rerank base class (#12550 ) ### What problem does this PR solve? introduce common normalize method in rerank base class ### Type of change - [x] Refactoring	2026-01-12 11:07:11 +08:00
Philipp Heyken Soares	1c06ec39ca	fix cohere rerank base_url default (#11353 ) ### What problem does this PR solve? Cohere rerank base_url default handling - Background: When no rerank base URL is configured, the settings pipeline was passing an empty string through RERANK_CFG → TenantLLMService → CoHereRerank, so the Cohere client received base_url="" and produced “missing protocol” errors during rerank calls. - What changed: The CoHereRerank constructor now only forwards base_url to the Cohere client when it isn’t empty/whitespace, causing the client to fall back to its default API endpoint otherwise. - Why it matters: This prevents invalid URL construction in the rerank workflow and keeps tests/sanity checks that rely on the default Cohere endpoint from failing when no custom base URL is specified. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Philipp Heyken Soares <philipp.heyken-soares@am.ai>	2025-11-20 09:46:39 +08:00
cnJasonZ	3fcf2ee54c	feat: add new LLM provider Jiekou.AI (#11300 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Jason <ggbbddjm@gmail.com>	2025-11-17 19:47:46 +08:00
Jin Hai	378bdfccfc	Refactor log utils (#10973 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 20:25:02 +08:00
Jin Hai	360f5c1179	Move token related functions to common (#10942 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 08:50:05 +08:00
Stephen Hu	0ecccd27eb	Refactor:improve the logic for rerank models to cal the total token count (#10882 ) ### What problem does this PR solve? improve the logic for rerank models to cal the total token count ### Type of change - [x] Refactoring	2025-10-31 09:46:16 +08:00
Zhichang Yu	73144e278b	Don't release full image (#10654 ) ### What problem does this PR solve? Introduced gpu profile in .env Added Dockerfile_tei fix datrie Removed LIGHTEN flag ### Type of change - [x] Documentation Update - [x] Refactoring	2025-10-23 23:02:27 +08:00
Stephen Hu	94dbd4aac9	Refactor: use the same implement for total token count from res (#10197 ) ### What problem does this PR solve? use the same implement for total token count from res ### Type of change - [x] Refactoring	2025-09-22 17:17:06 +08:00
buua436	6c24ad7966	fix: correct rerank_model condition logic (#10174 ) ### What problem does this PR solve? fix the rerank_model condition logic by correcting the np.isclose check. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-19 16:02:10 +08:00
Stephen Hu	ca320a8c30	Refactor: for total_token_count method use if to check first. (#9707 ) ### What problem does this PR solve? for total_token_count method use if to check first, to improve the performance when we need to handle exception cases ### Type of change - [x] Refactoring	2025-08-26 10:47:20 +08:00
Stephen Hu	a0d630365c	Refactor:Improve VoyageRerank not texts handling (#9539 ) ### What problem does this PR solve? Improve VoyageRerank not texts handling ### Type of change - [x] Refactoring	2025-08-19 10:31:04 +08:00
Stephen Hu	fb77f9917b	Refactor: Use Input Length In DefaultRerank (#9516 ) ### What problem does this PR solve? 1. Use input length to prepare res 2. Adjust torch_empty_cache code location ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-08-18 10:00:27 +08:00
Stephen Hu	da5cef0686	Refactor:Improve the float compare for LocalAIRerank (#9428 ) ### What problem does this PR solve? Improve the float compare for LocalAIRerank ### Type of change - [x] Refactoring	2025-08-13 10:26:42 +08:00
so95	35539092d0	Add kwargs to model base class constructors (#9252 ) Updated constructors for base and derived classes in chat, embedding, rerank, sequence2txt, and tts models to accept kwargs. This change improves extensibility and allows passing additional parameters without breaking existing interfaces. - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: IT: Sop.Son <sop.son@feavn.local> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-07 09:45:37 +08:00
JI4JUN	aeaeb169e4	Feat/support 302ai provider (#8742 ) ### What problem does this PR solve? Support 302.AI provider. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-31 14:48:30 +08:00
Kevin Hu	d9fe279dde	Feat: Redesign and refactor agent module (#9113 ) ### What problem does this PR solve? #9082 #6365 <u> WARNING: it's not compatible with the older version of `Agent` module, which means that `Agent` from older versions can not work anymore.</u> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-30 19:41:09 +08:00
Stephen Hu	95b9208b13	Fix:Improve float operation when rerank (#8963 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8915 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-22 10:04:00 +08:00
Stephen Hu	46caf6ae72	Refactor improve codes for ranker (#8936 ) ### What problem does this PR solve? Use the normalize method directly ### Type of change - [x] Refactoring	2025-07-21 10:22:20 +08:00
Stephen Hu	38b34116dd	Refa: Remove useless conver and fix a bug for DefaultRerank (#8887 ) ### What problem does this PR solve? 1. bug when re-try, we need to reset i. 2. remove useless convert ### Type of change - [x] Refactoring	2025-07-17 12:09:50 +08:00
Yongteng Lei	f8a6987f1e	Refa: automatic LLMs registration (#8651 ) ### What problem does this PR solve? Support automatic LLMs registration. ### Type of change - [x] Refactoring	2025-07-03 19:05:31 +08:00
Kevin Hu	d46c24045f	Feat: add GiteeAI as a llm provider. (#8572 ) ### What problem does this PR solve? #1853 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-30 11:22:11 +08:00
Kevin Hu	aafeffa292	Feat: add gitee as LLM provider. (#8545 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-30 09:22:31 +08:00
Kevin Hu	65d5268439	Feat: implement novitaAI embedding and reranking. (#8250 ) ### What problem does this PR solve? Close #8227 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-13 15:42:17 +08:00
Kevin Hu	d36c8d18b1	Refa: make exception more clear. (#8224 ) ### What problem does this PR solve? #8156 ### Type of change - [x] Refactoring	2025-06-12 17:53:59 +08:00
Kevin Hu	156290f8d0	Fix: url path join issue. (#8013 ) ### What problem does this PR solve? Close #7980 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 14:18:40 +08:00
Kevin Hu	60c3a253ad	Fix: api-key issue for xinference. (#6490 ) ### What problem does this PR solve? #2792 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-25 15:01:13 +08:00
zhou	a6aed0da46	Fix: rerank with YoudaoRerank issue. (#6396 ) ### What problem does this PR solve? Fix rerank with YoudaoRerank issue，"'YoudaoRerank' object has no attribute '_dynamic_batch_size'" ![17425412353825](https://github.com/user-attachments/assets/9ed304c7-317a-440e-acff-fe895fc20f07) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 10:09:16 +08:00
Kevin Hu	d83911b632	Fix: huggingface rerank model issue. (#6385 ) ### What problem does this PR solve? #6348 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 12:43:32 +08:00
Kevin Hu	5b04b7d972	Fix: rerank with vllm issue. (#6306 ) ### What problem does this PR solve? #6301 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-20 11:52:42 +08:00
Edouard Hur	b29539b442	Fix: CoHereRerank not respecting base_url when provided (#5784 ) ### What problem does this PR solve? vLLM provider with a reranking model does not work : as vLLM uses under the hood the [CoHereRerank provider](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/__init__.py#L250) with a `base_url`, if this URL [is not passed to the Cohere client](https://github.com/infiniflow/ragflow/blob/v0.17.0/rag/llm/rerank_model.py#L379-L382) any attempt will endup on the Cohere SaaS (sending your private api key in the process) instead of your vLLM instance. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-10 11:22:06 +08:00
Kevin Hu	df9b7b2fe9	Fix: rerank issue. (#5696 ) ### What problem does this PR solve? #5673 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-06 15:05:19 +08:00
Kevin Hu	b8da2eeb69	Feat: support huggingface re-rank model. (#5684 ) ### What problem does this PR solve? #5658 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-03-06 10:44:04 +08:00
Kevin Hu	4e2afcd3b8	Fix FlagRerank max_length issue. (#5366 ) ### What problem does this PR solve? #5352 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-26 11:01:13 +08:00
liwenju0	569e40544d	Refactor rerank model with dynamic batch processing and memory manage… (#5273 ) …ment ### What problem does this PR solve? Issue：https://github.com/infiniflow/ragflow/issues/5262 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-02-24 11:32:08 +08:00
Kevin Hu	4776fa5e4e	Refactor for total_tokens. (#4652 ) ### What problem does this PR solve? #4567 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-26 13:54:26 +08:00
Kevin Hu	3805621564	Fix xinference rerank issue. (#4499 ) ### What problem does this PR solve? #4495 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-16 11:35:51 +08:00
Alex Chen	7944aacafa	Feat: add gpustack model provider (#4469 ) ### What problem does this PR solve? Add GPUStack as a new model provider. [GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU cluster manager for running LLMs. Currently, locally deployed models in GPUStack cannot integrate well with RAGFlow. GPUStack provides both OpenAI compatible APIs (Models / Chat Completions / Embeddings / Speech2Text / TTS) and other APIs like Rerank. We would like to use GPUStack as a model provider in ragflow. [GPUStack Docs](https://docs.gpustack.ai/latest/quickstart/) Related issue: https://github.com/infiniflow/ragflow/issues/4064. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Testing Instructions 1. Install GPUStack and deploy the `llama-3.2-1b-instruct` llm, `bge-m3` text embedding model, `bge-reranker-v2-m3` rerank model, `faster-whisper-medium` Speech-to-Text model, `cosyvoice-300m-sft` in GPUStack. 2. Add provider in ragflow settings. 3. Testing in ragflow.	2025-01-15 14:15:58 +08:00

1 2

98 Commits