ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
plind	7edabdf7c3	fix(retrieval): keep manual metadata filter reusable inside Iteration (#14849 ) ## What problem does this PR solve? Closes #12582. When a Retrieval component sits inside an Iteration with a manual metadata filter that references the iteration variable (e.g. `{IterationItem:abc@item}`), every iteration reuses the value resolved on the first pass. Root cause: [`_resolve_manual_filter` in `agent/tools/retrieval.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/retrieval.py#L144-L171) mutated `flt["value"]` in place. The `filters` list passed in is the live `self._param.meta_data_filter["manual"]` (see [`apply_meta_data_filter` in `common/metadata_utils.py:257-261`](https://github.com/infiniflow/ragflow/blob/main/common/metadata_utils.py#L257-L261)), so after the first iteration the param dict permanently held the resolved string instead of the original variable reference. ```text iter #1: flt["value"] = "{IterationItem:abc@item}" → resolved to "AI" after mutation: flt["value"] = "AI" ← written back into _param iter #2: flt["value"] = "AI" ← no {…} matches retrieval keeps filtering by "AI" forever ``` This PR returns a shallow copy with the resolved value instead, leaving the original filter (and its variable reference) intact for the next iteration. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) ## Test plan - [ ] Build an agent: `Agent (structured output → list of areas) → Iteration → Retrieval (manual filter: Area = {IterationItem/Item}) → Message`. Run with a multi-area query and confirm each iteration's Retrieval result matches its own item, not the first item. - [ ] Regression: Retrieval with a manual metadata filter outside an Iteration still resolves the variable correctly on each request. - [ ] Regression: Retrieval with no metadata filter and with `auto` / `semi_auto` filters behave unchanged.	2026-05-19 15:08:31 +08:00
yingjianzh	4c68a6b86c	fix(agent): pass top_k and fix similarity weight slider behavior (#14760 ) ### What problem does this PR solve? This PR fixes two issues in Agent Retrieval behavior and configuration UX: 1. `top_k` configured in Agent Retrieval was not passed down to the backend retriever call, so retrieval could ignore the configured vector recall limit. 2. Similarity weight slider semantics were confusing in Agent forms because the Agent field stores `keywords_similarity_weight` while UI interactions were interpreted as vector weight. This could cause displayed values and actual behavior to diverge. This PR ensures Agent retrieval uses configured `top_k`, and makes the slider behavior consistent and explicit for both vector and keyword weight modes. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-15 10:49:14 +08:00
sxxtony	59c35100c5	Perf: push metadata filters down to Elasticsearch (#14576 ) ### What problem does this PR solve? Fixes #14412. `common.metadata_utils.meta_filter` evaluates user-defined metadata conditions in Python after `DocMetadataService.get_flatted_meta_by_kbs` loads the entire `meta_fields` table into memory. Past a few thousand documents per knowledge base this becomes a memory bottleneck and a wasted ES round-trip — every filter request currently fetches up to 10000 metadata rows even when the resulting `doc_ids` list is tiny. This PR adds an ES push-down path that translates the same filter language into a `bool` query and returns just the matching document IDs. Changes - `common/metadata_es_filter.py` (new): pure-Python translator from the RAGflow filter list to ES DSL. Covers every operator the in-memory path supports (`=`, `≠`, `>`, `<`, `≥`, `≤`, `in`, `not in`, `contains`, `not contains`, `start with`, `end with`, `empty`, `not empty`) with `case_insensitive: true` on `prefix` and `wildcard` for parity with the existing lower-cased Python comparisons. User wildcard metacharacters are escaped before being injected into `wildcard` patterns. Negative operators (`≠`, `not in`, `not contains`, ranges) are wrapped with an `exists` guard so they do not accidentally match documents missing the key, matching the legacy `if k not in metas` behaviour. - `api/db/services/doc_metadata_service.py`: new `DocMetadataService.filter_doc_ids_by_meta_pushdown(kb_ids, filters, logic)` that returns the doc IDs ES matched, or `None` to signal the caller should fall back to the in-memory path. Returns `None` when the active doc store is Infinity (`meta_fields` is a JSON column, not a dotted-object mapping), when any filter cannot be expressed in DSL (`UnsupportedMetaFilter`), or when the ES request or metadata index lookup errors. - `common/metadata_utils.py`: `apply_meta_data_filter` accepts an optional `kb_ids` argument. When supplied, conditions go through push-down first via a new `_try_meta_pushdown` helper; on `None` the function falls back to the original `meta_filter` call. Default behaviour is unchanged for callers that don't pass `kb_ids`. - Updated all four callers (`agent/tools/retrieval.py`, `api/db/services/dialog_service.py` ×2, `api/apps/services/dataset_api_service.py`, `api/apps/sdk/session.py`) to forward `kb_ids` so the push-down path is exercised in production. - `test/unit_test/common/test_metadata_es_filter.py` (new): 35 unit tests covering every operator's DSL shape, value coercion (`ast.literal_eval`, lowercasing, ISO-date pass-through), wildcard escaping, OR-logic wrapping that protects negative clauses, and the doc-ID extractor. Behaviour preserved - The in-memory `meta_filter` is untouched and still services every fallback case (Infinity backend, unknown operators, ES outages). - The eligibility / credibility / issue-multiplier semantics described in the LLM-driven `auto` and `semi_auto` modes still hand the LLM the full in-memory `metas` dict to choose conditions from. Only the evaluation of those generated conditions is pushed down. - Existing tests in `test/unit_test/common/test_metadata_filter_operators.py` continue to pass (14/14). Test plan - `pytest test/unit_test/common/test_metadata_es_filter.py` — 35 passed. - `pytest test/unit_test/common/test_metadata_filter_operators.py` — 14 passed. - `ruff check` clean on every modified file. - Reviewer please validate the ES query shapes against a live cluster — particularly `case_insensitive` on `wildcard` and `prefix` (requires ES 7.10+) and the `exists` + `must_not` pairing for `≠`. Notes - The first cut caps each push-down request at 10000 results, matching the existing `get_flatted_meta_by_kbs` limit, and logs a warning when the cap is hit. A `search_after` follow-up would let us drop the cap entirely once the push-down path is validated. - Operator parity with the in-memory path is exact for the canonical unicode operators (`≥`, `≤`, `≠`) used internally; the ASCII aliases (`>=`, `<=`, `!=`) are normalised by `convert_conditions` before they reach the translator. ### Type of change - [x] Performance Improvement --------- Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>	2026-05-07 21:23:43 +08:00
akie	3911d90993	Fix: agent application can not show Cite (#14047 ) Close #14018 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Problem In Agent applications, even with the cite option enabled, only inline [ID: x] citation markers are visible (showing chunk content on hover). The Agent does not display the referenced file cards below the response, unlike Chat applications. ### Root Cause The Agent's Retrieval tool (agent/tools/retrieval.py) calls retriever.retrieval() with aggs=False, which means the retrieval results do not include doc_aggs (document aggregation) data. Without doc_aggs, the frontend ReferenceDocumentList component has no data to render the file cards. In contrast, the Chat application (api/db/services/dialog_service.py) calls the same retriever.retrieval() method with aggs=True. ### Fix Changed aggs=False to aggs=True in agent/tools/retrieval.py so that document aggregation data is returned along with the retrieved chunks.	2026-04-13 11:06:14 +08:00
balibabu	38acf34724	Fix: The agent selected a knowledge base, but the API returned the error: "No dataset is selected". (#13950 ) ### What problem does this PR solve? Fix: The agent selected a knowledge base, but the API returned the error: "No dataset is selected". ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-04-07 14:16:37 +08:00
Lynn	db57155b30	Fix: get user_id from variables (#13716 ) ### What problem does this PR solve? Get user_id from canvas variable when input a {} pattern value. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-20 23:39:34 +08:00
Lynn	02070bab2a	Feat: record user_id in memory (#13585 ) ### What problem does this PR solve? Get user_id from canvas and record it. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-13 15:38:35 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
qinling0210	9a5208976c	Put document metadata in ES/Infinity (#12826 ) ### What problem does this PR solve? Put document metadata in ES/Infinity. Index name of meta data: ragflow_doc_meta_{tenant_id} ### Type of change - [x] Refactoring	2026-01-28 13:29:34 +08:00
Kevin Hu	9a10558f80	Refa: async retrieval process. (#12629 ) ### Type of change - [x] Refactoring - [x] Performance Improvement	2026-01-15 12:28:49 +08:00
Kevin Hu	23a9544b73	Fix: toc async issue. (#12485 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-07 15:35:30 +08:00
Kevin Hu	461c81e14a	Fix: KG search issue. (#12364 ) ### What problem does this PR solve? Close #12347 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-31 14:40:27 +08:00
Lynn	7498bc63a3	Fix: judge retrieval from (#12223 ) ### What problem does this PR solve? Judge retrieval from in retrieval component, and fix bug in message component ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-26 13:01:46 +08:00
Lynn	6e9691a419	Feat: message manage (#12196 ) ### What problem does this PR solve? Manage message and use in agent. Issue #4213 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-25 21:18:13 +08:00
Yongteng Lei	0f0fb53256	Refa: refactor metadata filter (#11907 ) ### What problem does this PR solve? Refactor metadata filter. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-12 17:12:38 +08:00
Kevin Hu	ea4a5cd665	Fix: tokenizer issue. (#11902 ) #11786 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-11 17:38:17 +08:00
TeslaZY	c610bb605a	Added semi-automatic mode to the metadata filter (#11886 ) ### What problem does this PR solve? Retrieval metadata filtering adds semi-automatic mode, and users can manually check the metadata key that participates in LLM to generate filter conditions. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-11 10:45:21 +08:00
Kevin Hu	b5ad7b7062	Feat: support TOC transformer. (#11685 ) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 12:27:50 +08:00
Kevin Hu	820934fc77	Fix: no result if metadata returns none. (#11412 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-20 19:51:25 +08:00
Kevin Hu	06cef71ba6	Feat: add or logic operations for meta data filters. (#11404 ) ### What problem does this PR solve? #11376 #11387 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-20 14:31:12 +08:00
Yongteng Lei	9213568692	Feat: add mechanism to check cancellation in Agent (#10766 ) ### What problem does this PR solve? Add mechanism to check cancellation in Agent. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-11 17:36:48 +08:00
buua436	83ff8e8009	Fix:update agent variable name rule (#11124 ) ### What problem does this PR solve? change: 1. update agent variable name rule. 2. reset() in Canvas doesn't reset the env var. 3. correct log input binding in message component ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-11 11:18:30 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Jin Hai	1a9215bc6f	Move some vars to globals (#11017 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 14:14:38 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Jin Hai	1e45137284	Move 'timeout' to common folder (#10983 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 11:51:12 +08:00
buua436	ac465ba2a6	Feat:add variables to the metadata filtering function of the knowledg… (#10967 ) …e retrieval component. ### What problem does this PR solve? issue: #10861 change: add variables to the metadata filtering function of the knowledge retrieval component ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-03 19:19:09 +08:00
buua436	866098634b	Feat:setting metadata in the retrieval (#10682 ) ### What problem does this PR solve? issue: [#9272](https://github.com/infiniflow/ragflow/issues/9272) change: setting metadata in the retrieval ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-21 09:52:26 +08:00
Kevin Hu	0d8791936e	Feat: TOC retrieval (#10456 ) ### What problem does this PR solve? #10436 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-10 17:07:55 +08:00
Jin Hai	d931c33ced	Fix typos: retrievaler -> retriever (#10372 ) ### What problem does this PR solve? Fix typos ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-10 09:17:36 +08:00
Yongteng Lei	daea357940	Fix: invalid COMPONENT_EXEC_TIMEOUT (#10278 ) ### What problem does this PR solve? Fix invalid COMPONENT_EXEC_TIMEOUT. #10273 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-09-25 14:11:09 +08:00
Jin Hai	4eb7659499	Fix bug: broken import from rag.prompts.prompts (#10217 ) ### What problem does this PR solve? Fix broken imports ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: jinhai <haijin.chn@gmail.com>	2025-09-23 10:19:25 +08:00
Wilmer	c8b79dfed4	The retrieval component needs to support returning JSON data(#10170 ) (#10171 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-09-22 17:28:29 +08:00
湛露先生	6ff7cfe005	Fix bugs for agent/tools. (#9930 ) ### What problem does this PR solve? 1 Fix typos 2 Fix agent/tools/crawler.py return bug. 3 Fix agent/tools/deepl.py component_name bug. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring - [x] Performance Improvement Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-09-05 12:31:44 +08:00
天海蒼灆	ccb9f0b0d7	Feature (agent): Allow Retrieval kb_ids param use kb_id,and allow list kb_name or kb_id (#9531 ) ### What problem does this PR solve? Allow Retrieval kb_ids param use kb_id,and allow list kb_name or kb_id。 - Add judgment on whether the knowledge base name is a list and support batch queries -When the knowledge base name does not exist, try using the ID for querying -If both query methods fail, throw an exception ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-08-19 09:42:39 +08:00
Kevin Hu	b6e34e3aa7	Fix: PyPDF's Manipulated FlateDecode streams can exhaust RAM (#9469 ) ### What problem does this PR solve? #3951 #8463 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-14 13:45:19 +08:00
Kevin Hu	5749aa30b0	Fix: model type error. (#9308 ) ### What problem does this PR solve? #9240 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-07 16:14:47 +08:00
Kevin Hu	3f6177b5e5	Feat: Add thought info to every component. (#9134 ) ### What problem does this PR solve? #9082 #6365 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-31 15:13:45 +08:00
Kevin Hu	d9fe279dde	Feat: Redesign and refactor agent module (#9113 ) ### What problem does this PR solve? #9082 #6365 <u> WARNING: it's not compatible with the older version of `Agent` module, which means that `Agent` from older versions can not work anymore.</u> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-30 19:41:09 +08:00

40 Commits