mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-07-04 01:29:35 +08:00
### What problem does this PR solve? Closes #16414. The **Crawler** agent tool (`agent/tools/crawler.py`) was never ported to the modern `ToolBase`/`_invoke` interface during the agent module redesign, so it was broken in three independent ways: 1. **Crashed on construction.** `CrawlerParam` extends `ToolParamBase`, whose `__init__` reads `self.meta["parameters"]`, but `CrawlerParam` defined no `meta`. Constructing it raised `AttributeError: 'CrawlerParam' object has no attribute 'meta'`. Because `agent/canvas.py` instantiates `component_class(component_name + "Param")()` while loading a canvas, **any agent containing a Crawler node failed to load.** 2. **`_invoke` missing.** It extends `ToolBase` (whose `invoke()` dispatches to `self._invoke`) but only implemented the legacy `_run`, so `_invoke` resolved to `ComponentBase._invoke` → `NotImplementedError`. 3. **`be_output` removed.** `_run` called `Crawler.be_output(...)`, which no longer exists on the base classes. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Changes - Add a `ToolMeta` to `CrawlerParam` (defined before `super().__init__()`, matching every other ported tool such as `ArXivParam`/`TavilyExtractParam`) advertising a required `query` parameter — the URL to crawl, default `{sys.query}`, consistent with the `{sys.query}` convention shared by the other tools. - Replace the legacy `_run`/`be_output` with `_invoke`/`set_output`, writing the extracted page content to `formalized_content` (errors surfaced via `_ERROR`), consistent with the other tools. - Preserve the existing SSRF guard (`assert_url_is_safe` + `pin_dns_global`). - Add regression tests (`test/unit_test/agent/component/test_crawler.py`) covering param construction, validation, and the tool descriptor. Same class of defect as #16329 (DeepL). Backend-only; no frontend changes. --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
(1). Deploy RAGFlow services and images
https://ragflow.io/docs/build_docker_image
(2). Configure the required environment for testing
Install Python dependencies (including test dependencies):
uv sync --python 3.13 --only-group test --no-default-groups --frozen
Activate the environment:
source .venv/bin/activate
Install SDK:
uv pip install sdk/python
Modify the .env file: Add the following code:
COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.26.3 #Replace with the image you are using
Start the container(wait two minutes):
docker compose -f docker/docker-compose.yml up -d
(3). Test Elasticsearch
a) Run sdk tests against Elasticsearch:
export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380 # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Elasticsearch:
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
(4). Test Infinity
Modify the .env file:
DOC_ENGINE=${DOC_ENGINE:-infinity}
Start the container:
docker compose -f docker/docker-compose.yml down -v
docker compose -f docker/docker-compose.yml up -d
a) Run sdk tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api