ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-02 08:45:42 +08:00

Files

Muhammad Furqan 828c5789f6 fix(agent/tools): GoogleScholar empty json output and ignored top_n (#16419 )

### What problem does this PR solve?

Closes #16418.

`scholarly.search_pubs(...)` returns a **lazy generator**, but
`agent/tools/googlescholar.py` treated it as a re-iterable, bounded
list:

```python
scholar_client = scholarly.search_pubs(kwargs["query"], ...)   # lazy generator
self._retrieve_chunks(scholar_client, ...)                     # (1) iterates -> exhausts it
self.set_output("json", list(scholar_client))                  # (2) already empty -> []
```

1. **`json` output was always empty.** `_retrieve_chunks` iterates
`scholar_client`, exhausting the generator; `list(scholar_client)` then
returns `[]`.
2. **`top_n` was never applied.** Unlike `ArXiv`
(`max_results=self._param.top_n`), the unbounded generator was passed
straight to `_retrieve_chunks`, which has no internal limit — so the
tool kept paginating well past Top N (until an error, rate-limit/block,
or `COMPONENT_EXEC_TIMEOUT`).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

- Materialize at most `top_n` results once with `itertools.islice`, and
reuse that list for both `_retrieve_chunks` and the `json` output.
- Add regression tests
(`test/unit_test/agent/component/test_googlescholar.py`, stubbing
`scholarly.search_pubs`) covering the `top_n` bound, the non-empty
`json` output, and the empty-query short-circuit.

Verified: against `main` the new tests fail with `assert 30 == 5` (top_n
ignored) and `assert 0 == 5` (empty json); with this fix all pass.
Backend-only.

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>

2026-07-01 10:47:39 +08:00

benchmark

Fix: replace tenant_llm apis (#16131 )

2026-06-18 16:38:32 +08:00

fixtures/mineru

fix(mineru): skip page chrome blocks to prevent duplicate chunks (#15387 )

2026-06-01 20:15:04 +08:00

playwright

Feature: Allow page_size max value 100 (#15292 )

2026-05-28 11:13:01 +08:00

testcases

fix(api): gate sandbox artifact download on agent session ownership (#16169 )

2026-06-29 09:45:16 +08:00

unit_test

fix(agent/tools): GoogleScholar empty json output and ignored top_n (#16419 )

2026-07-01 10:47:39 +08:00

__init__.py

Feat: UI testing automation with playwright (#12749 )

2026-03-02 13:04:08 +08:00

README.md

Docs: Update version references to v0.26.2 in READMEs and docs (#16387 )

2026-06-29 09:45:16 +08:00

test_cajal_template_unit.py

Add CAJAL scientific paper agent template (#14641 )

2026-07-01 09:35:37 +08:00

README.md

(1). Deploy RAGFlow services and images

https://ragflow.io/docs/build_docker_image

(2). Configure the required environment for testing

Install Python dependencies (including test dependencies):

uv sync --python 3.13 --only-group test --no-default-groups --frozen

Activate the environment:

source .venv/bin/activate

Install SDK:

uv pip install sdk/python

Modify the .env file: Add the following code:

COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.26.2 #Replace with the image you are using

Start the container（wait two minutes）:

docker compose -f docker/docker-compose.yml up -d

(3). Test Elasticsearch

a) Run sdk tests against Elasticsearch:

export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380  # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api

b) Run http api tests against Elasticsearch:

pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api

(4). Test Infinity

Modify the .env file:

DOC_ENGINE=${DOC_ENGINE:-infinity}

Start the container:

docker compose -f docker/docker-compose.yml down -v 
docker compose -f docker/docker-compose.yml up -d

a) Run sdk tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api

b) Run http api tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api

README.md Unescape Escape

(1). Deploy RAGFlow services and images

(2). Configure the required environment for testing

(3). Test Elasticsearch

(4). Test Infinity

README.md