ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-02 16:55:42 +08:00

Author	SHA1	Message	Date
Muhammad Furqan	828c5789f6	fix(agent/tools): GoogleScholar empty json output and ignored top_n (#16419 ) ### What problem does this PR solve? Closes #16418. `scholarly.search_pubs(...)` returns a lazy generator, but `agent/tools/googlescholar.py` treated it as a re-iterable, bounded list: ```python scholar_client = scholarly.search_pubs(kwargs["query"], ...) # lazy generator self._retrieve_chunks(scholar_client, ...) # (1) iterates -> exhausts it self.set_output("json", list(scholar_client)) # (2) already empty -> [] ``` 1. `json` output was always empty. `_retrieve_chunks` iterates `scholar_client`, exhausting the generator; `list(scholar_client)` then returns `[]`. 2. `top_n` was never applied. Unlike `ArXiv` (`max_results=self._param.top_n`), the unbounded generator was passed straight to `_retrieve_chunks`, which has no internal limit — so the tool kept paginating well past Top N (until an error, rate-limit/block, or `COMPONENT_EXEC_TIMEOUT`). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Changes - Materialize at most `top_n` results once with `itertools.islice`, and reuse that list for both `_retrieve_chunks` and the `json` output. - Add regression tests (`test/unit_test/agent/component/test_googlescholar.py`, stubbing `scholarly.search_pubs`) covering the `top_n` bound, the non-empty `json` output, and the empty-query short-circuit. Verified: against `main` the new tests fail with `assert 30 == 5` (top_n ignored) and `assert 0 == 5` (empty json); with this fix all pass. Backend-only. --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-07-01 10:47:39 +08:00
Taranum Wasu	e23f63bd93	fix(agent): prevent empty LLM user message after prompt fitting (#16413 ) ## Summary - Treat `max_tokens=0` as unset (`or 8192`) when building model context budgets, fixing agents that silently zeroed prompts when a vLLM model had `max_tokens: 0` in tenant config - Replace trailing same-role canvas history in `LLM._sys_prompt_and_msg` instead of skipping the current user prompt - Add `LLM.fit_messages()` validation after `message_fit_in` on agent paths so empty user content fails fast with a clear error instead of reaching vLLM Fixes #16411 ## Root cause Agent canvas workflow called `message_fit_in` with `int(max_length * 0.97)`. When `max_length` was `0`, both system and user content were trimmed to empty strings. The `[HISTORY STREAMLY]` log showing only `{"role":"user","content":""}` matches this. A secondary bug skipped appending the formatted user prompt when history ended with a `user` role message. ## Test plan - [x] Added `test/unit_test/agent/component/test_llm_prompt.py` for role-replace, validation, and zero-budget fitting - [x] Added `test_message_fit_in_zero_budget_preserves_non_empty_messages` in `test_generator_message_fit_in.py` - [ ] CI unit tests - [ ] Manual: agent canvas `begin → Retrieval → Agent → Message` with vLLM Qwen3; confirm user message reaches LLM Made with [Cursor](https://cursor.com) --------- Co-authored-by: Taranum Wasu <taranumwasu@Taranums-MacBook-Pro.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-07-01 09:30:54 +08:00
Rene Arredondo	dc8b6d767c	fix(agent): inject uploaded attachments into LLM context (#15215 ) (#15220 ) ## Summary Fixes #15215 — attachments uploaded to an agent were not reaching the LLM. When a user uploads a file in an agent chat, `canvas.run` parses it into the `sys.files` global (text content for documents, `data:image/...` URIs for images — see `agent/canvas.py:752-768`). But the LLM/Agent component's `_prepare_prompt_variables` only substitutes variables the user's prompt template explicitly references via `{var}` placeholders. The default prompt is `[{"role": "user", "content": "{sys.query}"}]` with no `{sys.files}`, so the parsed attachment content never reaches the model. In the reporter's logs, this is why the agent saw only the bare query `附件摘要 attachment summary` and went searching the dataset instead of reading the uploaded PDF. ## Fix `agent/component/llm.py` — added `_collect_sys_files()` and an auto-injection step in `_prepare_prompt_variables`: - If `sys.files` is non-empty and neither `sys_prompt` nor any entry in `prompts` already contains `{sys.files}` (no double-injection), split the entries into text vs. `data:image/...` URIs. - Image URIs are merged into `self.imgs`, which the existing logic uses to switch the chat model to `IMAGE2TEXT` and pass `images=...` to `async_chat`. - Text content is appended to the last `user` role message in `msg`, mirroring how `dialog_service.async_chat_solo` handles attachments for the non-agent chat path (`api/db/services/dialog_service.py:318-321`). Both `LLM._invoke_async` and `Agent._invoke_async` (tool-using) go through `_prepare_prompt_variables`, so plain LLM nodes and Agent nodes are fixed in both streaming and non-streaming paths. ## Test plan - [ ] Upload a PDF attachment to an agent with the default `{sys.query}` prompt and ask "summarize the attachment" — the model should answer from the file content rather than searching the knowledge base. - [ ] Upload an image attachment to an agent and ask about its contents — the model should switch to the vision-capable LLM and answer from the image. - [ ] Verify that an agent whose prompt does include `{sys.files}` still works and does not include the file content twice. - [ ] Verify that an agent run with no attachments behaves unchanged. - [ ] Run `uv run pytest` to make sure no existing tests regress. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: yzc <yuzhichang@gmail.com>	2026-06-30 15:48:59 +08:00
seekmistar01	608fc5df4d	fix(agent): Switch no longer matches an empty condition (all([]) is True) (#15644 ) ## Summary Fixes the agent `Switch` component matching an empty/all-skipped condition unconditionally because `all([]) is True`. ## Root cause `res` only accumulates for items with a non-empty `cpn_id` (blank ones `continue`). For a condition with empty `items` (or all-blank `cpn_id`), `res == []`, and `if all(res):` is `True`, so the Switch routes to that condition's `to` target before reaching the else/`end_cpn_ids` branch. ## Fix ```diff - if all(res): + if res and all(res): ``` An empty result set no longer counts as a match; genuinely-satisfied "and" conditions still route (the real `all(res)` path is preserved). ## Files changed - `agent/component/switch.py` - `test/unit_test/agent/component/test_switch_empty_condition.py` (new) ## Verification - `ruff check` / `ruff format --check` — clean - Added unit tests (mirroring the existing `_FakeCanvas` component-test pattern): an empty/all-skipped "and" condition now falls through to `end_cpn_ids`; a genuinely-satisfied "and" condition still routes to its target. - Local full pytest not run (heavy RAG deps); CI validates. ## Note Implemented with LLM assistance (model: claude-opus-4-8). Closes #15643 --------- Co-authored-by: seekmistar01 <seekmistar01@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Harsh Kashyap	6a4de82a80	fix(agent): restore be_output and test DeepL error return (#16363 ) ## Summary #16332 fixed the missing `return` in DeepL's except branch, but `ComponentBase.be_output` was removed during the agent refactor (#9113) while several components still call it. DeepL (and other tools) would raise `AttributeError` before any error message could be returned. - Restore `ComponentBase.be_output` as `pd.DataFrame([{"content": v}])` (same as pre-refactor behavior) - Add regression test that `_run` returns the `Error:` message when translation fails Related to #16329 ## Test plan - [x] `test_run_returns_error_on_translation_failure` - [x] Existing `test_deepl.py` check() tests still pass --------- Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:16 +08:00
Muhammad Furqan	fe14cc35cf	fix(agent/tools): DeepL component fails validation and drops errors (#16332 ) ### What problem does this PR solve? `DeepLParam.check()` validated `self.top_n`, but DeepL has no such parameter (it is not defined on the param class or its base), so `check()` always raised `AttributeError` and a DeepL component could never pass validation. Removed the bogus `top_n` check. Also fixed the `_run` except branch, which computed `be_output("Error...")` but never returned it, silently dropping the error message. Closes #16329 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Add test cases ### Testing Added `test/unit_test/agent/component/test_deepl.py` covering `DeepLParam.check()` with valid defaults and rejection of invalid source/target languages.	2026-06-25 14:40:56 +08:00
Harsh Kashyap	b9445c67e2	fix(agent): coerce None Switch inputs before string operators (#16320 ) ## Summary - Coerce `None` canvas values to `""` before string comparison operators in `Switch.process_operator`. - Prevents `AttributeError` when upstream components yield `None` and the Switch uses contains/start with/end with. ## Test plan - [x] `.v/bin/python -m ruff check agent/component/switch.py test/unit_test/agent/component/test_switch.py` - [x] `.v/bin/python -m pytest test/unit_test/agent/component/test_switch.py -q` (3 passed) Fixes #16315 --------- Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local>	2026-06-25 14:18:24 +08:00
天海蒼灆	17f27b9df2	fix(browser): show resolved variables in workflow run log input (#15325 ) ### What problem does this PR solve? Browser parsed sys.query from prompts but never called set_input_value, so node_finished inputs displayed null in the agent orchestration run log. Additionally, Browser’s tenant-model path could trigger unsupported structured-output modes (response_format/tool_choice) for some OpenAI-compatible providers (notably DeepSeek thinking models), causing step failures. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-08 18:12:56 +08:00
Yufeng He	6cba5a544a	fix(agent): skip empty switch conditions (#15691 ) ## What - make `Switch` ignore conditions that have no evaluable items - add a regression for blank `cpn_id` items falling through to the else branch - keep the existing non-empty `and` condition behavior covered Fixes #15643. ## Verified - `python -m py_compile agent\component\switch.py test\unit_test\agent\component\test_switch.py` - `python -m pytest test\unit_test\agent\component\test_switch.py -q` -> `2 passed` - `python -m ruff check agent\component\switch.py test\unit_test\agent\component\test_switch.py` - `git diff --check` I also checked `python -m ruff format --check` on the touched files. It would reformat pre-existing style in `agent/component/switch.py` beyond this bug fix, so I kept the patch scoped instead of reformatting the whole file.	2026-06-05 17:20:44 +08:00
天海蒼灆	3e5b11a523	Feat(browser control)：Add new agent component 'browser' to control browser by AI (#14888 ) ### What problem does this PR solve? This PR adds a new `Browser` operator to Agent workflows, enabling prompt-driven browser automation in RAGFlow.Technically based ‘Browser-Use’ It includes: - Backend browser component execution with tenant LLM integration - Upload source support (file IDs, URLs, variables, CSV/JSON array) - Downloaded file persistence to RAGFlow storage - Frontend node/operator integration, form config, icon, and i18n updates - Unit tests for upload/download and ID parsing logic - Dependency and Docker updates for browser-use runtime support ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-21 15:32:32 +08:00

10 Commits