### What problem does this PR solve?
Closes#16414.
The **Crawler** agent tool (`agent/tools/crawler.py`) was never ported
to the modern `ToolBase`/`_invoke` interface during the agent module
redesign, so it was broken in three independent ways:
1. **Crashed on construction.** `CrawlerParam` extends `ToolParamBase`,
whose `__init__` reads `self.meta["parameters"]`, but `CrawlerParam`
defined no `meta`. Constructing it raised `AttributeError:
'CrawlerParam' object has no attribute 'meta'`. Because
`agent/canvas.py` instantiates `component_class(component_name +
"Param")()` while loading a canvas, **any agent containing a Crawler
node failed to load.**
2. **`_invoke` missing.** It extends `ToolBase` (whose `invoke()`
dispatches to `self._invoke`) but only implemented the legacy `_run`, so
`_invoke` resolved to `ComponentBase._invoke` → `NotImplementedError`.
3. **`be_output` removed.** `_run` called `Crawler.be_output(...)`,
which no longer exists on the base classes.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Changes
- Add a `ToolMeta` to `CrawlerParam` (defined before
`super().__init__()`, matching every other ported tool such as
`ArXivParam`/`TavilyExtractParam`) advertising a required `query`
parameter — the URL to crawl, default `{sys.query}`, consistent with the
`{sys.query}` convention shared by the other tools.
- Replace the legacy `_run`/`be_output` with `_invoke`/`set_output`,
writing the extracted page content to `formalized_content` (errors
surfaced via `_ERROR`), consistent with the other tools.
- Preserve the existing SSRF guard (`assert_url_is_safe` +
`pin_dns_global`).
- Add regression tests
(`test/unit_test/agent/component/test_crawler.py`) covering param
construction, validation, and the tool descriptor.
Same class of defect as #16329 (DeepL). Backend-only; no frontend
changes.
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
## Summary
Use `DocumentService.RemoveDocumentKeepFile` when deleting files that
are linked to documents.
## Change
- inject `DocumentService` into `FileService`
- replace direct document deletion in `deleteSingleFile`
- remove the obsolete file-local engine deletion helper
## Result
Deleting a file now cleans up linked documents through the same service
path used elsewhere, keeping KB counters and document engine cleanup
consistent.
### What problem does this PR solve?
Closes#16416.
The **AkShare** agent tool (`agent/tools/akshare.py`) was never ported
to the modern `ToolBase`/`_invoke` interface during the agent module
redesign and was still written against the removed legacy
`_run`/`be_output` API, so it was non-functional:
1. **Adding it to an Agent raised `AttributeError`.** `AkShare` extended
`ComponentBase` (not `ToolBase`) and `AkShareParam` defined no `meta`,
so it had no `get_meta()`. `agent/component/agent_with_tools.py` builds
each tool's function descriptor via `cpn.get_meta()`, so constructing an
Agent that includes the AkShare tool raised `AttributeError: 'AkShare'
object has no attribute 'get_meta'`.
2. **It could never run.** `invoke()` dispatches to `self._invoke`, but
`AkShare` only implemented the legacy `_run`, so `_invoke` fell through
to `ComponentBase._invoke` → `NotImplementedError`. `_run` also called
`be_output(...)`, which no longer exists on the base classes.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Changes
- Port `AkShareParam` to `ToolParamBase` with a `ToolMeta` (defined
before `super().__init__()`, matching `ArXivParam`/`TavilyExtractParam`)
exposing a required `query` parameter — the stock symbol to look up,
default `{sys.query}`. `query` matches the `{sys.query}` convention
shared by the other tools.
- Rewrite the component with `_invoke`/`set_output("formalized_content",
...)` (errors surfaced via `_ERROR`), keeping `top_n` and importing
`akshare` lazily.
- Add regression tests
(`test/unit_test/agent/component/test_akshare.py`) covering param
construction, validation, and the tool descriptor.
Same class of defect as #16329 (DeepL) and #16414 (Crawler).
Backend-only; no frontend changes.
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
### Summary
Plan to start api_server, admin_server and ingestor in one binary:
- ./ragflow_main --admin
- ./ragflow_main --api
- ./ragflow_main --ingestor
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Decrement document and knowledgebase chunk counts after chunks are
deleted
- Keep token counts unchanged because deleted chunk token totals are not
available
- Add tests for stats update, zero-delete behavior, error handling, and
transaction rollback
### Summary
1. env 'MINIO_PORT' is used for MINIO external access, which shouldn't
be used in Go config.
2. After RAGFlow 1.0 release, MINIO_PORT will be used for docker compose
internal usage. new ENV MINIO_EXTERNAL_PORT will be used for external
access.
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### Summary
Per discussion with @yuzhichang , disable agent test firstly.
https://github.com/infiniflow/ragflow/actions/runs/28562749273/job/84704079689?pr=16521
[0.094ms] [rows:0] SELECT * FROM `tenant_model_instance` WHERE
provider_id = "provider-1" AND instance_name = "default" ORDER BY
`tenant_model_instance`.`id` LIMIT 1
--- FAIL: TestInvoke_ProxyDNSPin (2.00s)
invoke_test.go:375: dial error = Invoke: do: Get "http://8.8.8.8/api":
context deadline exceeded; want pinned proxy IP 192.88.99.1:9999
(connection-refused is acceptable; an absent IP means the dialer fell
through to the default resolver and the pinning regression went
undetected)
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.358ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.283ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.523ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58 ExpectPing will have no effect as monitoring pings
is disabled. Use MonitorPingsOption to enable.
FAIL
FAIL ragflow/internal/agent/component 2.759s
ok ragflow/internal/agent/component/io 0.026s
### What problem does this PR solve?
Part of #15240 (rewriting the RAGFlow API server in Go).
Implements the two public bot endpoints from
`api/apps/restful_apis/bot_api.py`:
- **`GET /api/v1/chatbots/<dialog_id>/info`** (`chatbots_inputs`) —
returns `{title, avatar, prologue, has_tavily_key}` for a dialog the
authenticated tenant owns (tenant match + `status == VALID`), otherwise
`"Authentication error: no access to this chatbot!"`.
- **`GET /api/v1/searchbots/detail`** (`detail_share_embedded`) —
returns search-app detail for a `search_id` the tenant can access.
Permission is checked across the tenant's joined tenants; denial returns
`"Has no permission for this operation."` (operating error, `data:
false`) and a missing app returns `"Can't find this Search App!"`.
Both endpoints authenticate with an SDK **beta token** (`Authorization:
Bearer <beta>`) rather than a session — the token is resolved to a
tenant via `APIToken.query(beta=token)`, backed by a new
`APITokenDAO.GetByBeta`. Because they perform their own token-based
auth, the routes are registered on the unauthenticated route group
(mirroring the Python blueprint, which has no `@login_required`).
Both live in a new `internal/handler/bot.go` + `internal/service/bot.go`
since they share the same source module. Handler unit tests cover the
auth, success, and error-mapping paths.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ling Qin <qinling0210@163.com>
## Summary
Fix error messages in `build.sh` and add documentation in
`internal/development.md` for downloading native static libraries
(pdfium, pdf_oxide, office_oxide).
## Changes
- `build.sh`: change error hint from `uv run download_deps.py` to `uv
run ragflow_deps/download_deps.py` (correct path from project root)
- `internal/development.md`: add section 2.1 documenting how to download
native libs and install lld
## Summary
- use the project-standard 32-character ID generator when creating
shared chatbot sessions
- fix MySQL insert failures caused by writing 36-character UUID strings
into `api_4_conversation.id`