mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
## Summary
Aligns the **Go agent runtime/canvas/components/tools** behavior with
the **Python `agent/` implementation** so the same stored canvas DSL
produces the same execution result on either side. Every component,
tool, and runtime primitive in `internal/agent/` is now driven by the
same semantics as its Python counterpart — variable resolution, template
substitution, control flow, error reporting, retry/cancel, and stream
event shapes.
The **retrieval component is the one explicit exception** in this PR. It
is being reworked in a separate change and is excluded from this
alignment pass; the wrapper slot (`universe_a_wrappers.go →
newRetrievalComponent`) is preserved.
## Scope of alignment
### Components (all aligned with `agent/component/`)
`Begin` · `Message` · `LLM` (incl. ChatTemplateKwargs,
MessageHistoryWindowSize, VisualFiles, Cite, OutputStructure,
JSONOutput, TopP, MaxRetries, DelayAfterError, credentials) · `Agent`
(react + tool artifact capture + `Reset()` interface-assert) · `Switch`
(12/12 operators, Python-equivalent semantics) · `Categorize` · `Invoke`
· `Iteration` · `Loop` (macro-expansion through `workflowx.AddLoopNode`)
· `UserFillUp` (Python-equivalent interrupt/resume via eino
`compose.Interrupt`/`ResumeWithData`) · `FillUp` · `DataOperations` ·
`ListOperations` · `StringTransform` · `VariableAggregator` ·
`VariableAssigner` · `Browser` (full stagehand runtime parity) ·
`DocsGenerator` · `ExcelProcessor`.
### Tools (all aligned with `agent/tools/`)
`Retrieval` (wrapper slot only — logic out of scope) · `MCPToolAdapter`
(streamable-HTTP) · `CodeExec` (sandbox bridge with
`code_exec_contract.go` matching Python contract) · `AkShare` · `ArXiv`
· `Crawler` · `DeepL` · `DuckDuckGo` · `Email` · `ExeSQL` · `GitHub` ·
`Google` · `GoogleScholar` · `Jin10` · `PubMed` · `QWeather` · `SearXNG`
· `Tavily` · `Tushare` · `Wencai` · `Wikipedia` · `YahooFinance` —
uniform `eino tool.InvokableTool` interface, SSRF protection, shared
HTTP client.
### Canvas execution engine (`internal/agent/canvas/`)
Aligned with Python's `agent/canvas.py`:
- **Scheduler** (`scheduler.go`): state pre/post handlers, node lambdas,
per-component timeout resolver (4-level: per-class env → per-class table
→ uniform env → 600s fallback), `legacyNoOpNames`.
- **Loop subgraph** (`loop_subgraph.go`): Python-equivalent
`AddLoopNode` macro expansion + condition translation.
- **Multibranch** (`multibranch.go`): `Switch` / `Categorize` routing
via `compose.NewGraphMultiBranch` — same branch selection semantics as
Python.
- **Parallel subgraph** (`parallel_subgraph.go`): matches Python's
parallel fan-out contract.
- **Interrupt/Resume** (`interrupt_resume.go`): `UserFillUpNodeBody` /
`IsInterruptError` / `ExtractInterruptContexts` — replaces the
deprecated Python sentinel chain with eino's native interrupt API,
preserving the same external behavior.
- **Checkpoint** (`checkpoint_store.go`): `RedisCheckPointStore`
Get/Set/Delete, with business metadata (status / canvas_id /
parent_run_id) on a parallel Redis Hash.
- **RunTracker** (`run_tracker.go`): Start / MarkSucceeded / MarkFailed
/ MarkCancelled / AttachCheckpoint — same lifecycle as the Python run
record.
- **Cancel** (`cancel.go`): Redis pub/sub watch.
- **Stream** (`stream.go`): SSE channel with `messages` / `waiting` /
`errors` / `done` events, same shape as Python's `agent.canvas.RunEvent`
payload.
### DSL bridge (`internal/agent/dsl/`)
- `normalize.go`: v1↔v2 collapsed into a single wire format — Python and
Go consume the same stored JSON.
- `reset.go`: per-run state reset matches Python's `Canvas.reset()`
semantics.
- Testdata mirrors Python's `agent_msg.json` / `all.json` / etc.
### Runtime (`internal/agent/runtime/`)
- `CanvasState` / `NewCanvasState` / `GetVar` / `SetVar` / `ReadVars`:
same `{{cpn_id@param}}` resolution model.
- `ResolveTemplate` (regex fast path + gonja fallback) — Python
Jinja-style semantics.
- `selector.go`, `metrics.go`, `component.go`: shared runtime contracts.
## Out of scope (intentionally)
- **`Retrieval` component logic** — wrapped only; full parity lands in a
follow-up PR.
- **Frontend** — only minor dsl-bridge / canvas UX fixes ride along.
- **CLI / admin / model registry** — orthogonal to agent behavior.
## How alignment is verified
`internal/service/agent_run_e2e_test.go` exercises the **full production
chain** against real Python-shaped DSL fixtures:
```
loadCanvasForUser → versionDAO.GetLatest → decodeCanvasFromDSL →
canvas.Compile → cc.Workflow.Invoke → answer extraction
```
using in-memory SQLite + miniredis (no Docker). Covers:
- `TestRunAgent_RealCanvas_BeginMessage` — happy path, `{{sys.query}}`
resolution
- `TestRunAgent_RealCanvas_WaitForUserResume` — two-run resume cycle
(Python-equivalent)
- `TestRunAgent_RealCanvas_CompileFails` — unknown component name →
sanitized error (Python-equivalent)
- `TestRunAgent_RealCanvas_InvokeFails` — unresolvable template ref
(Python-equivalent)
- `TestRunAgent_RunTracker_AttachCheckpoint_CallSequence` —
Start→AttachCheckpoint→MarkSucceeded lifecycle
`internal/handler/agent_test.go` — SSE streaming parity (`Content-Type:
text/event-stream`, `data: {…}\n\n`, trailing `data: [DONE]\n\n`,
OpenAI-compatible non-stream `choices`).
`internal/agent/canvas/fixture_compile_test.go` + per-component tests
pin the Python-equivalent outputs.
```
go test -count=1 -v -run 'TestRunAgent_RealCanvas|TestRunAgent_RunTracker' ./internal/service/
```
## Design reference
`docs/develop/agent-go-port-design.md` (1329 lines, last cross-checked
2026-06-17) — module layout, per-component / per-tool inventory,
corner-case catalogue, and the actionable backlog (Section 14, including
the retrieval alignment follow-up).
---------
Co-authored-by: Claude <noreply@anthropic.com>
147 lines
7.4 KiB
Python
147 lines
7.4 KiB
Python
#!/usr/bin/env python3
|
|
|
|
# PEP 723 metadata
|
|
# /// script
|
|
# requires-python = ">=3.10"
|
|
# dependencies = [
|
|
# "nltk",
|
|
# "huggingface-hub"
|
|
# ]
|
|
# ///
|
|
|
|
# This script downloads every artifact that the `infiniflow/ragflow_deps`
|
|
# Docker image bakes in. Run it from anywhere — the `__main__` block
|
|
# chdir's into this file's own directory, so all outputs land under
|
|
# `ragflow_deps/` regardless of the caller's CWD.
|
|
#
|
|
# Build-context relationship: `ragflow_deps/Dockerfile` is built with
|
|
# `ragflow_deps/` as its build context, so the files written here MUST
|
|
# sit at the top of `ragflow_deps/`. The Dockerfile's COPY lines assume
|
|
# top-level paths (`huggingface.co`, `nltk_data`, `cl100k_base.tiktoken`,
|
|
# `*.deb`, `*.jar`, `*.tar.gz`, `stagehand-server-v3-linux-<arch>`).
|
|
#
|
|
# Typical workflow:
|
|
#
|
|
# uv run ragflow_deps/download_deps.py # download
|
|
# cd ragflow_deps
|
|
# docker build -f Dockerfile -t infiniflow/ragflow_deps .
|
|
#
|
|
# The main `Dockerfile` (built from the project root) pulls this image
|
|
# via `--mount=type=bind,from=infiniflow/ragflow_deps:latest,...` and
|
|
# is unaffected by where these files live locally.
|
|
|
|
import argparse
|
|
import os
|
|
import urllib.request
|
|
from typing import Union
|
|
|
|
import nltk
|
|
from huggingface_hub import snapshot_download
|
|
|
|
|
|
def get_urls(use_china_mirrors=False) -> list[Union[str, list[str]]]:
|
|
if use_china_mirrors:
|
|
return [
|
|
"http://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb",
|
|
"http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_arm64.deb",
|
|
"https://repo.huaweicloud.com/repository/maven/org/apache/tika/tika-server-standard/3.3.0/tika-server-standard-3.3.0.jar",
|
|
"https://repo.huaweicloud.com/repository/maven/org/apache/tika/tika-server-standard/3.3.0/tika-server-standard-3.3.0.jar.md5",
|
|
"https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken",
|
|
["https://registry.npmmirror.com/-/binary/chrome-for-testing/121.0.6167.85/linux64/chrome-linux64.zip", "chrome-linux64-121-0-6167-85"],
|
|
["https://registry.npmmirror.com/-/binary/chrome-for-testing/121.0.6167.85/linux64/chromedriver-linux64.zip", "chromedriver-linux64-121-0-6167-85"],
|
|
"https://github.com/astral-sh/uv/releases/download/0.9.16/uv-x86_64-unknown-linux-gnu.tar.gz",
|
|
"https://github.com/astral-sh/uv/releases/download/0.9.16/uv-aarch64-unknown-linux-gnu.tar.gz",
|
|
# stagehand-server-v3 Node.js SEA binaries (used by Browser
|
|
# component in local mode).
|
|
#
|
|
# The stagehand-go Go module (pinned in go.mod) and the
|
|
# stagehand-server binary (this release) are LOOSELY
|
|
# MATCHED — both stay on the v3.x line and remain
|
|
# protocol-compatible. The two version numbers do NOT
|
|
# track each other: the Go SDK is at v3.21.0 while the
|
|
# current latest server release is v3.7.2.
|
|
#
|
|
# On every go.mod bump, refresh this URL to the current
|
|
# latest server release. There is no version
|
|
# correspondence to maintain; "both on v3.x" is the
|
|
# compatibility contract.
|
|
"https://github.com/browserbase/stagehand/releases/download/stagehand-server-v3/v3.7.2/stagehand-server-v3-linux-x64",
|
|
"https://github.com/browserbase/stagehand/releases/download/stagehand-server-v3/v3.7.2/stagehand-server-v3-linux-arm64",
|
|
]
|
|
else:
|
|
return [
|
|
"http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb",
|
|
"http://ports.ubuntu.com/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_arm64.deb",
|
|
"https://repo1.maven.org/maven2/org/apache/tika/tika-server-standard/3.3.0/tika-server-standard-3.3.0.jar",
|
|
"https://repo1.maven.org/maven2/org/apache/tika/tika-server-standard/3.3.0/tika-server-standard-3.3.0.jar.md5",
|
|
"https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken",
|
|
["https://storage.googleapis.com/chrome-for-testing-public/121.0.6167.85/linux64/chrome-linux64.zip", "chrome-linux64-121-0-6167-85"],
|
|
["https://storage.googleapis.com/chrome-for-testing-public/121.0.6167.85/linux64/chromedriver-linux64.zip", "chromedriver-linux64-121-0-6167-85"],
|
|
"https://github.com/astral-sh/uv/releases/download/0.9.16/uv-x86_64-unknown-linux-gnu.tar.gz",
|
|
"https://github.com/astral-sh/uv/releases/download/0.9.16/uv-aarch64-unknown-linux-gnu.tar.gz",
|
|
# stagehand-server-v3 Node.js SEA binaries (used by Browser
|
|
# component in local mode).
|
|
#
|
|
# The stagehand-go Go module (pinned in go.mod) and the
|
|
# stagehand-server binary (this release) are LOOSELY
|
|
# MATCHED — both stay on the v3.x line and remain
|
|
# protocol-compatible. The two version numbers do NOT
|
|
# track each other: the Go SDK is at v3.21.0 while the
|
|
# current latest server release is v3.7.2.
|
|
#
|
|
# On every go.mod bump, refresh this URL to the current
|
|
# latest server release. There is no version
|
|
# correspondence to maintain; "both on v3.x" is the
|
|
# compatibility contract.
|
|
"https://github.com/browserbase/stagehand/releases/download/stagehand-server-v3/v3.7.2/stagehand-server-v3-linux-x64",
|
|
"https://github.com/browserbase/stagehand/releases/download/stagehand-server-v3/v3.7.2/stagehand-server-v3-linux-arm64",
|
|
]
|
|
|
|
|
|
repos = [
|
|
"InfiniFlow/text_concat_xgb_v1.0",
|
|
"InfiniFlow/deepdoc",
|
|
]
|
|
|
|
|
|
def download_model(repository_id):
|
|
local_directory = os.path.abspath(os.path.join("huggingface.co", repository_id))
|
|
os.makedirs(local_directory, exist_ok=True)
|
|
snapshot_download(repo_id=repository_id, local_dir=local_directory)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
# Anchor CWD to this file's directory so all relative outputs
|
|
# (huggingface.co/, nltk_data/, *.deb, *.jar, *.tar.gz, etc.) land
|
|
# at the top of ragflow_deps/ regardless of where the user invokes
|
|
# the script from. This is the build context for `ragflow_deps/Dockerfile`.
|
|
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
|
|
|
parser = argparse.ArgumentParser(description="Download dependencies with optional China mirror support")
|
|
parser.add_argument("--china-mirrors", action="store_true", help="Use China-accessible mirrors for downloads")
|
|
args = parser.parse_args()
|
|
|
|
urls = get_urls(args.china_mirrors)
|
|
|
|
# Some mirrors (e.g. archive.ubuntu.com) reject the default urllib
|
|
# User-Agent with HTTP 403, so install an opener with a browser-like UA.
|
|
opener = urllib.request.build_opener()
|
|
opener.addheaders = [("User-Agent", "Mozilla/5.0")]
|
|
urllib.request.install_opener(opener)
|
|
|
|
for url in urls:
|
|
download_url = url[0] if isinstance(url, list) else url
|
|
filename = url[1] if isinstance(url, list) else url.split("/")[-1]
|
|
print(f"Downloading {filename} from {download_url}...")
|
|
if not os.path.exists(filename):
|
|
urllib.request.urlretrieve(download_url, filename)
|
|
|
|
local_dir = os.path.abspath("nltk_data")
|
|
for data in ["wordnet", "punkt", "punkt_tab"]:
|
|
print(f"Downloading nltk {data}...")
|
|
nltk.download(data, download_dir=local_dir)
|
|
|
|
for repo_id in repos:
|
|
print(f"Downloading huggingface repo {repo_id}...")
|
|
download_model(repo_id)
|