Replaces the Python agent canvas runtime with a Go implementation that runs inside `cmd/server_main`. The canvas compiles into an eino Workflow that pauses on wait-for-user via native Interrupt/Resume (no sentinel flag) and resumes from a Redis-backed CheckPointStore. All 21 Python agent components and ~35 tools are ported with functional parity. Sandbox providers now read their JSON config from the admin-panel system_settings table with env fallback. 234 files / +35,413 / -6,111. All Go files are gofmt-clean (CI gate added); drops the v2 DSL E2E step and the gap-analysis plan (both redundant after the port ships). ## Type of change - [x] Refactoring - [x] New feature - [x] Bug fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
7.3 KiB
CodeExec sandbox — Phase 5d design decision
Status: decision recorded, implementation pending. This file captures the trade-offs so the choice can be revisited without re-deriving it.
Context (what the Python side actually does)
The Python agent's code_exec delegates to a provider-based
sandbox subsystem under agent/sandbox/. It is NOT a single
SDK and NOT a local subprocess — it is a thin abstraction over
several execution backends.
Provider interface (agent/sandbox/providers/base.py)
class SandboxProvider(ABC):
def initialize(self, config) -> bool
def create_instance(self, template: str) -> SandboxInstance
def execute_code(self, instance_id, code, language,
timeout=10, arguments=None) -> ExecutionResult
def destroy_instance(self, instance_id) -> bool
def health_check(self) -> bool
def get_supported_languages(self) -> List[str]
Providers shipped in the Python repo
| Provider | File | Backend |
|---|---|---|
SelfManagedProvider |
self_managed.py |
HTTP at localhost:9385 (the executor_manager, which runs a Docker pool with gVisor) |
AliyunCodeInterpreterProvider |
aliyun_codeinterpreter.py |
Alibaba Cloud sandbox (uses agentrun SDK / Function Compute) |
E2BProvider |
e2b.py |
e2b cloud sandbox (SaaS) |
ProviderManager (manager.py) selects one provider at startup
based on configuration; the CodeExec tool talks only to the
manager, never to a specific provider.
Subprocess flow on the Python side
A CodeExec call goes through agent/sandbox/client.execute_code(...)
which is the public entry point the CodeExec component uses
(agent/tools/code_exec.py:365):
from agent.sandbox.client import execute_code as sandbox_execute_code
result = sandbox_execute_code(
code=code, language=language,
timeout=timeout_seconds, arguments=arguments,
)
That function:
- Resolves the active provider via
ProviderManager(which readsSystemSettingsService.get_by_name("sandbox.provider_type")— i.e. the choice is driven by the system admin panel, not the caller). - Calls
provider.create_instance(template=language)→provider.execute_code(...)→provider.destroy_instance(...).
So the CodeExec component does support all three providers —
the provider choice is invisible to it. If the provider system
is not configured, the CodeExec component falls back to a direct
HTTP POST to http://{SANDBOX_HOST}:9385/run (the executor_manager
endpoint) for backward compatibility. Both paths land in the
same _process_execution_result handler.
Options for the Go port
A. Shell out to a Python subprocess
Go spawns python3 -c "..." that:
- Imports
agent.sandbox.providers.ProviderManager - Picks up the same configuration the Python agent uses
- Returns the ExecutionResult over stdout (JSON)
- Pros
- Reuses the full provider surface (self-managed + Aliyun + e2b). A single Python subprocess call covers all three.
- Plan §2.11.4 ("don't rewrite the sandbox") honored literally.
- Operators that already deploy Python RAGFlow have the provider configuration in place; Go inherits it for free.
- Cons
- Per-call latency = Python interpreter startup + provider import
- dispatch. ~hundreds of ms for the first call, similar for subsequent calls (no interpreter caching yet).
- Adds a Python dependency on the Go host.
- Pipe / stdout JSON serialisation is awkward for binary output (matplotlib plots, files written to the sandbox). Can be mitigated with file-based handoff for large payloads, but adds operational complexity.
B. Reimplement the ProviderManager in Go
Read the three provider implementations and write Go equivalents:
SelfManagedProvider→ Go HTTP client tolocalhost:9385(the executor_manager). Smallest of the three.AliyunCodeInterpreterProvider→ Go reimplementation of theagentrunSDK client. ~Vendor SDK surface to maintain.E2BProvider→ Go reimplementation of the e2b SDK client.
- Pros
- No Python dependency on the Go side.
- Lower per-call latency.
- Clean integration with the rest of the Go agent runtime.
- Cons
- Three SDK / API surfaces to maintain in parallel with the Python ones. Every vendor release requires Go updates.
- Plan §2.11.4's intent is to avoid duplicating sandbox logic; reimplementing three providers arguably violates the spirit even if it doesn't violate the letter.
- The
agentrunande2bSDKs include auth, retry, pagination, and connection management — real ongoing work.
Decision
Option A (shell out to a Python subprocess that uses
ProviderManager). Reasoning:
- The Python-side flow already supports all three providers via a single entry point. The Go port's job is the orchestrator + agent runtime, not duplicating three vendor SDKs.
- The latency cost is real but acceptable — CodeExec is called sparingly (a script per LLM turn at most).
- Plan §2.11.4 commits to NOT rewriting the sandbox. Option B pushes against that intent; option A doesn't.
The Python subprocess must go through ProviderManager, not
directly to any one provider, so configuration stays in one place.
"Shell out to system python3 directly" (without the agentrun
SDK or any sandbox) is NOT a valid implementation — it would
execute user-supplied code with the agent process's privileges,
violating the security model.
Implementation sketch
-
Add a
PythonProviderManagerClientimplementingSandboxClient(tool/code_exec_client.go). -
The subprocess command mirrors the Python CodeExec flow:
cmd := exec.CommandContext(ctx, "python3", "-c", ` import json, sys from agent.sandbox.client import execute_code result = execute_code( code=sys.argv[1], language=sys.argv[2], timeout=int(sys.argv[3]), arguments=json.loads(sys.argv[4]) if sys.argv[4] else None, ) json.dump({ "stdout": result.stdout, "stderr": result.stderr, "returned": "", # provider returns stdout; no REPL value "artifacts": (result.metadata or {}).get("artifacts", []), }, sys.stdout) `) cmd.Args = append(cmd.Args, code, language, strconv.Itoa(timeout), argsJSON) -
Parse the JSON response and map to
SandboxResponse. -
Add config knobs:
code_exec_python_bin(defaultpython3)code_exec_provider_type(read by Python — let the admin panel setsandbox.provider_typeas today)
Capability parity note: by going through
agent.sandbox.client.execute_code, the Go CodeExec tool inherits
all three providers (self_managed / aliyun_codeinterpreter /
e2b) for the cost of one Python subprocess call. The provider
choice happens inside Python based on SystemSettingsService,
invisible to the Go side. This matches what the Python
agent/tools/code_exec.py does today (lines 358-381 of that
file).
What this file is not
This is not an implementation task. It records the agreed-upon
direction so any future contributor (or this file's author, six
months from now) doesn't accidentally land a "shell out to
system python3" stub that bypasses the sandbox. If you find
yourself writing exec.CommandContext("python3", "-c", ...)
without going through ProviderManager, stop — you're
working against the plan and the security model.