### What problem does this PR solve?
Closes#16418.
`scholarly.search_pubs(...)` returns a **lazy generator**, but
`agent/tools/googlescholar.py` treated it as a re-iterable, bounded
list:
```python
scholar_client = scholarly.search_pubs(kwargs["query"], ...) # lazy generator
self._retrieve_chunks(scholar_client, ...) # (1) iterates -> exhausts it
self.set_output("json", list(scholar_client)) # (2) already empty -> []
```
1. **`json` output was always empty.** `_retrieve_chunks` iterates
`scholar_client`, exhausting the generator; `list(scholar_client)` then
returns `[]`.
2. **`top_n` was never applied.** Unlike `ArXiv`
(`max_results=self._param.top_n`), the unbounded generator was passed
straight to `_retrieve_chunks`, which has no internal limit — so the
tool kept paginating well past Top N (until an error, rate-limit/block,
or `COMPONENT_EXEC_TIMEOUT`).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Changes
- Materialize at most `top_n` results once with `itertools.islice`, and
reuse that list for both `_retrieve_chunks` and the `json` output.
- Add regression tests
(`test/unit_test/agent/component/test_googlescholar.py`, stubbing
`scholarly.search_pubs`) covering the `top_n` bound, the non-empty
`json` output, and the empty-query short-circuit.
Verified: against `main` the new tests fail with `assert 30 == 5` (top_n
ignored) and `assert 0 == 5` (empty json); with this fix all pass.
Backend-only.
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
## Summary
- Treat `max_tokens=0` as unset (`or 8192`) when building model context
budgets, fixing agents that silently zeroed prompts when a vLLM model
had `max_tokens: 0` in tenant config
- Replace trailing same-role canvas history in `LLM._sys_prompt_and_msg`
instead of skipping the current user prompt
- Add `LLM.fit_messages()` validation after `message_fit_in` on agent
paths so empty user content fails fast with a clear error instead of
reaching vLLM
Fixes#16411
## Root cause
Agent canvas workflow called `message_fit_in` with `int(max_length *
0.97)`. When `max_length` was `0`, both system and user content were
trimmed to empty strings. The `[HISTORY STREAMLY]` log showing only
`{"role":"user","content":""}` matches this. A secondary bug skipped
appending the formatted user prompt when history ended with a `user`
role message.
## Test plan
- [x] Added `test/unit_test/agent/component/test_llm_prompt.py` for
role-replace, validation, and zero-budget fitting
- [x] Added
`test_message_fit_in_zero_budget_preserves_non_empty_messages` in
`test_generator_message_fit_in.py`
- [ ] CI unit tests
- [ ] Manual: agent canvas `begin → Retrieval → Agent → Message` with
vLLM Qwen3; confirm user message reaches LLM
Made with [Cursor](https://cursor.com)
---------
Co-authored-by: Taranum Wasu <taranumwasu@Taranums-MacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary
Fixes#15215 — attachments uploaded to an agent were not reaching the
LLM.
When a user uploads a file in an agent chat, `canvas.run` parses it into
the `sys.files` global (text content for documents, `data:image/...`
URIs
for images — see `agent/canvas.py:752-768`). But the LLM/Agent
component's
`_prepare_prompt_variables` only substitutes variables the user's prompt
template explicitly references via `{var}` placeholders. The default
prompt is `[{"role": "user", "content": "{sys.query}"}]` with no
`{sys.files}`, so the parsed attachment content never reaches the model.
In the reporter's logs, this is why the agent saw only the bare query
`附件 摘要 attachment summary` and went searching the dataset instead of
reading the uploaded PDF.
## Fix
`agent/component/llm.py` — added `_collect_sys_files()` and an
auto-injection step in `_prepare_prompt_variables`:
- If `sys.files` is non-empty **and** neither `sys_prompt` nor any entry
in `prompts` already contains `{sys.files}` (no double-injection),
split the entries into text vs. `data:image/...` URIs.
- Image URIs are merged into `self.imgs`, which the existing logic uses
to switch the chat model to `IMAGE2TEXT` and pass `images=...` to
`async_chat`.
- Text content is appended to the last `user` role message in `msg`,
mirroring how `dialog_service.async_chat_solo` handles attachments for
the non-agent chat path (`api/db/services/dialog_service.py:318-321`).
Both `LLM._invoke_async` and `Agent._invoke_async` (tool-using) go
through `_prepare_prompt_variables`, so plain LLM nodes and Agent nodes
are fixed in both streaming and non-streaming paths.
## Test plan
- [ ] Upload a PDF attachment to an agent with the default `{sys.query}`
prompt and ask "summarize the attachment" — the model should answer
from the file content rather than searching the knowledge base.
- [ ] Upload an image attachment to an agent and ask about its contents
—
the model should switch to the vision-capable LLM and answer from
the image.
- [ ] Verify that an agent whose prompt **does** include `{sys.files}`
still works and does **not** include the file content twice.
- [ ] Verify that an agent run with no attachments behaves unchanged.
- [ ] Run `uv run pytest` to make sure no existing tests regress.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: yzc <yuzhichang@gmail.com>
## Summary
Fixes the agent `Switch` component matching an **empty/all-skipped
condition** unconditionally because `all([]) is True`.
## Root cause
`res` only accumulates for items with a non-empty `cpn_id` (blank ones
`continue`). For a condition with empty `items` (or all-blank `cpn_id`),
`res == []`, and `if all(res):` is `True`, so the Switch routes to that
condition's `to` target before reaching the else/`end_cpn_ids` branch.
## Fix
```diff
- if all(res):
+ if res and all(res):
```
An empty result set no longer counts as a match; genuinely-satisfied
"and" conditions still route (the real `all(res)` path is preserved).
## Files changed
- `agent/component/switch.py`
- `test/unit_test/agent/component/test_switch_empty_condition.py` (new)
## Verification
- `ruff check` / `ruff format --check` — clean
- Added unit tests (mirroring the existing `_FakeCanvas` component-test
pattern): an empty/all-skipped "and" condition now falls through to
`end_cpn_ids`; a genuinely-satisfied "and" condition still routes to its
target.
- Local full pytest not run (heavy RAG deps); CI validates.
## Note
Implemented with LLM assistance (model: claude-opus-4-8).
Closes#15643
---------
Co-authored-by: seekmistar01 <seekmistar01@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
## Summary
#16332 fixed the missing `return` in DeepL's except branch, but
`ComponentBase.be_output` was removed during the agent refactor (#9113)
while several components still call it. DeepL (and other tools) would
raise `AttributeError` before any error message could be returned.
- Restore `ComponentBase.be_output` as `pd.DataFrame([{"content": v}])`
(same as pre-refactor behavior)
- Add regression test that `_run` returns the `**Error**:` message when
translation fails
Related to #16329
## Test plan
- [x] `test_run_returns_error_on_translation_failure`
- [x] Existing `test_deepl.py` check() tests still pass
---------
Co-authored-by: Harsh Kashyap <harshkashyap@Harshs-MacBook-Pro.local>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
### What problem does this PR solve?
`DeepLParam.check()` validated `self.top_n`, but DeepL has no such
parameter (it is not defined on the param class or its base), so
`check()` always raised `AttributeError` and a DeepL component could
never pass validation. Removed the bogus `top_n` check.
Also fixed the `_run` except branch, which computed
`be_output("**Error**...")` but never returned it, silently dropping the
error message.
Closes#16329
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Add test cases
### Testing
Added `test/unit_test/agent/component/test_deepl.py` covering
`DeepLParam.check()` with valid defaults and rejection of invalid
source/target languages.
### What problem does this PR solve?
Browser parsed sys.query from prompts but never called set_input_value,
so node_finished inputs displayed null in the agent orchestration run
log.
Additionally, Browser’s tenant-model path could trigger unsupported
structured-output modes (response_format/tool_choice) for some
OpenAI-compatible providers (notably DeepSeek thinking models), causing
step failures.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
## What
- make `Switch` ignore conditions that have no evaluable items
- add a regression for blank `cpn_id` items falling through to the else
branch
- keep the existing non-empty `and` condition behavior covered
Fixes#15643.
## Verified
- `python -m py_compile agent\component\switch.py
test\unit_test\agent\component\test_switch.py`
- `python -m pytest test\unit_test\agent\component\test_switch.py -q` ->
`2 passed`
- `python -m ruff check agent\component\switch.py
test\unit_test\agent\component\test_switch.py`
- `git diff --check`
I also checked `python -m ruff format --check` on the touched files. It
would reformat pre-existing style in `agent/component/switch.py` beyond
this bug fix, so I kept the patch scoped instead of reformatting the
whole file.
### What problem does this PR solve?
This PR adds a new `Browser` operator to Agent workflows, enabling
prompt-driven browser automation in RAGFlow.Technically based
‘Browser-Use’
It includes:
- Backend browser component execution with tenant LLM integration
- Upload source support (file IDs, URLs, variables, CSV/JSON array)
- Downloaded file persistence to RAGFlow storage
- Frontend node/operator integration, form config, icon, and i18n
updates
- Unit tests for upload/download and ID parsing logic
- Dependency and Docker updates for browser-use runtime support
### Type of change
- [x] New Feature (non-breaking change which adds functionality)