Files
ragflow/test
Muhammad Furqan 3cba34d67f fix(agent/tools): port Crawler to ToolBase so it can load and run (#16415)
### What problem does this PR solve?

Closes #16414.

The **Crawler** agent tool (`agent/tools/crawler.py`) was never ported
to the modern `ToolBase`/`_invoke` interface during the agent module
redesign, so it was broken in three independent ways:

1. **Crashed on construction.** `CrawlerParam` extends `ToolParamBase`,
whose `__init__` reads `self.meta["parameters"]`, but `CrawlerParam`
defined no `meta`. Constructing it raised `AttributeError:
'CrawlerParam' object has no attribute 'meta'`. Because
`agent/canvas.py` instantiates `component_class(component_name +
"Param")()` while loading a canvas, **any agent containing a Crawler
node failed to load.**
2. **`_invoke` missing.** It extends `ToolBase` (whose `invoke()`
dispatches to `self._invoke`) but only implemented the legacy `_run`, so
`_invoke` resolved to `ComponentBase._invoke` → `NotImplementedError`.
3. **`be_output` removed.** `_run` called `Crawler.be_output(...)`,
which no longer exists on the base classes.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

- Add a `ToolMeta` to `CrawlerParam` (defined before
`super().__init__()`, matching every other ported tool such as
`ArXivParam`/`TavilyExtractParam`) advertising a required `query`
parameter — the URL to crawl, default `{sys.query}`, consistent with the
`{sys.query}` convention shared by the other tools.
- Replace the legacy `_run`/`be_output` with `_invoke`/`set_output`,
writing the extracted page content to `formalized_content` (errors
surfaced via `_ERROR`), consistent with the other tools.
- Preserve the existing SSRF guard (`assert_url_is_safe` +
`pin_dns_global`).
- Add regression tests
(`test/unit_test/agent/component/test_crawler.py`) covering param
construction, validation, and the tool descriptor.

Same class of defect as #16329 (DeepL). Backend-only; no frontend
changes.

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-07-03 17:15:48 +08:00
..


(1). Deploy RAGFlow services and images

https://ragflow.io/docs/build_docker_image

(2). Configure the required environment for testing

Install Python dependencies (including test dependencies):

uv sync --python 3.13 --only-group test --no-default-groups --frozen

Activate the environment:

source .venv/bin/activate

Install SDK:

uv pip install sdk/python 

Modify the .env file: Add the following code:

COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.26.3 #Replace with the image you are using

Start the containerwait two minutes:

docker compose -f docker/docker-compose.yml up -d


(3). Test Elasticsearch

a) Run sdk tests against Elasticsearch:

export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380  # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 

b) Run http api tests against Elasticsearch:

pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 


(4). Test Infinity

Modify the .env file:

DOC_ENGINE=${DOC_ENGINE:-infinity}

Start the container:

docker compose -f docker/docker-compose.yml down -v 
docker compose -f docker/docker-compose.yml up -d

a) Run sdk tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 

b) Run http api tests against Infinity:

DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api