Files
ragflow/test
Muhammad Furqan 828c5789f6 fix(agent/tools): GoogleScholar empty json output and ignored top_n (#16419)
### What problem does this PR solve?

Closes #16418.

`scholarly.search_pubs(...)` returns a **lazy generator**, but
`agent/tools/googlescholar.py` treated it as a re-iterable, bounded
list:

```python
scholar_client = scholarly.search_pubs(kwargs["query"], ...)   # lazy generator
self._retrieve_chunks(scholar_client, ...)                     # (1) iterates -> exhausts it
self.set_output("json", list(scholar_client))                  # (2) already empty -> []
```

1. **`json` output was always empty.** `_retrieve_chunks` iterates
`scholar_client`, exhausting the generator; `list(scholar_client)` then
returns `[]`.
2. **`top_n` was never applied.** Unlike `ArXiv`
(`max_results=self._param.top_n`), the unbounded generator was passed
straight to `_retrieve_chunks`, which has no internal limit — so the
tool kept paginating well past Top N (until an error, rate-limit/block,
or `COMPONENT_EXEC_TIMEOUT`).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

- Materialize at most `top_n` results once with `itertools.islice`, and
reuse that list for both `_retrieve_chunks` and the `json` output.
- Add regression tests
(`test/unit_test/agent/component/test_googlescholar.py`, stubbing
`scholarly.search_pubs`) covering the `top_n` bound, the non-empty
`json` output, and the empty-query short-circuit.

Verified: against `main` the new tests fail with `assert 30 == 5` (top_n
ignored) and `assert 0 == 5` (empty json); with this fix all pass.
Backend-only.

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-07-01 10:47:39 +08:00
..


(1). Deploy RAGFlow services and images

https://ragflow.io/docs/build_docker_image

(2). Configure the required environment for testing

Install Python dependencies (including test dependencies):

uv sync --python 3.13 --only-group test --no-default-groups --frozen

Activate the environment:

source .venv/bin/activate

Install SDK:

uv pip install sdk/python 

Modify the .env file: Add the following code:

COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.26.2 #Replace with the image you are using

Start the containerwait two minutes:

docker compose -f docker/docker-compose.yml up -d


(3). Test Elasticsearch

a) Run sdk tests against Elasticsearch:

export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380  # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 

b) Run http api tests against Elasticsearch:

pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 


(4). Test Infinity

Modify the .env file:

DOC_ENGINE=${DOC_ENGINE:-infinity}

Start the container:

docker compose -f docker/docker-compose.yml down -v 
docker compose -f docker/docker-compose.yml up -d

a) Run sdk tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 

b) Run http api tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api