Files
ragflow/agent/component/list_operations.py
Zhichang Yu 3f805a64f1 feat(agent): align Go agent behavior with Python (except retrieval component) (#16225)
## Summary

Aligns the **Go agent runtime/canvas/components/tools** behavior with
the **Python `agent/` implementation** so the same stored canvas DSL
produces the same execution result on either side. Every component,
tool, and runtime primitive in `internal/agent/` is now driven by the
same semantics as its Python counterpart — variable resolution, template
substitution, control flow, error reporting, retry/cancel, and stream
event shapes.

The **retrieval component is the one explicit exception** in this PR. It
is being reworked in a separate change and is excluded from this
alignment pass; the wrapper slot (`universe_a_wrappers.go →
newRetrievalComponent`) is preserved.

## Scope of alignment

### Components (all aligned with `agent/component/`)
`Begin` · `Message` · `LLM` (incl. ChatTemplateKwargs,
MessageHistoryWindowSize, VisualFiles, Cite, OutputStructure,
JSONOutput, TopP, MaxRetries, DelayAfterError, credentials) · `Agent`
(react + tool artifact capture + `Reset()` interface-assert) · `Switch`
(12/12 operators, Python-equivalent semantics) · `Categorize` · `Invoke`
· `Iteration` · `Loop` (macro-expansion through `workflowx.AddLoopNode`)
· `UserFillUp` (Python-equivalent interrupt/resume via eino
`compose.Interrupt`/`ResumeWithData`) · `FillUp` · `DataOperations` ·
`ListOperations` · `StringTransform` · `VariableAggregator` ·
`VariableAssigner` · `Browser` (full stagehand runtime parity) ·
`DocsGenerator` · `ExcelProcessor`.

### Tools (all aligned with `agent/tools/`)
`Retrieval` (wrapper slot only — logic out of scope) · `MCPToolAdapter`
(streamable-HTTP) · `CodeExec` (sandbox bridge with
`code_exec_contract.go` matching Python contract) · `AkShare` · `ArXiv`
· `Crawler` · `DeepL` · `DuckDuckGo` · `Email` · `ExeSQL` · `GitHub` ·
`Google` · `GoogleScholar` · `Jin10` · `PubMed` · `QWeather` · `SearXNG`
· `Tavily` · `Tushare` · `Wencai` · `Wikipedia` · `YahooFinance` —
uniform `eino tool.InvokableTool` interface, SSRF protection, shared
HTTP client.

### Canvas execution engine (`internal/agent/canvas/`)
Aligned with Python's `agent/canvas.py`:
- **Scheduler** (`scheduler.go`): state pre/post handlers, node lambdas,
per-component timeout resolver (4-level: per-class env → per-class table
→ uniform env → 600s fallback), `legacyNoOpNames`.
- **Loop subgraph** (`loop_subgraph.go`): Python-equivalent
`AddLoopNode` macro expansion + condition translation.
- **Multibranch** (`multibranch.go`): `Switch` / `Categorize` routing
via `compose.NewGraphMultiBranch` — same branch selection semantics as
Python.
- **Parallel subgraph** (`parallel_subgraph.go`): matches Python's
parallel fan-out contract.
- **Interrupt/Resume** (`interrupt_resume.go`): `UserFillUpNodeBody` /
`IsInterruptError` / `ExtractInterruptContexts` — replaces the
deprecated Python sentinel chain with eino's native interrupt API,
preserving the same external behavior.
- **Checkpoint** (`checkpoint_store.go`): `RedisCheckPointStore`
Get/Set/Delete, with business metadata (status / canvas_id /
parent_run_id) on a parallel Redis Hash.
- **RunTracker** (`run_tracker.go`): Start / MarkSucceeded / MarkFailed
/ MarkCancelled / AttachCheckpoint — same lifecycle as the Python run
record.
- **Cancel** (`cancel.go`): Redis pub/sub watch.
- **Stream** (`stream.go`): SSE channel with `messages` / `waiting` /
`errors` / `done` events, same shape as Python's `agent.canvas.RunEvent`
payload.

### DSL bridge (`internal/agent/dsl/`)
- `normalize.go`: v1↔v2 collapsed into a single wire format — Python and
Go consume the same stored JSON.
- `reset.go`: per-run state reset matches Python's `Canvas.reset()`
semantics.
- Testdata mirrors Python's `agent_msg.json` / `all.json` / etc.

### Runtime (`internal/agent/runtime/`)
- `CanvasState` / `NewCanvasState` / `GetVar` / `SetVar` / `ReadVars`:
same `{{cpn_id@param}}` resolution model.
- `ResolveTemplate` (regex fast path + gonja fallback) — Python
Jinja-style semantics.
- `selector.go`, `metrics.go`, `component.go`: shared runtime contracts.

## Out of scope (intentionally)

- **`Retrieval` component logic** — wrapped only; full parity lands in a
follow-up PR.
- **Frontend** — only minor dsl-bridge / canvas UX fixes ride along.
- **CLI / admin / model registry** — orthogonal to agent behavior.

## How alignment is verified

`internal/service/agent_run_e2e_test.go` exercises the **full production
chain** against real Python-shaped DSL fixtures:
```
loadCanvasForUser → versionDAO.GetLatest → decodeCanvasFromDSL →
canvas.Compile → cc.Workflow.Invoke → answer extraction
```
using in-memory SQLite + miniredis (no Docker). Covers:
- `TestRunAgent_RealCanvas_BeginMessage` — happy path, `{{sys.query}}`
resolution
- `TestRunAgent_RealCanvas_WaitForUserResume` — two-run resume cycle
(Python-equivalent)
- `TestRunAgent_RealCanvas_CompileFails` — unknown component name →
sanitized error (Python-equivalent)
- `TestRunAgent_RealCanvas_InvokeFails` — unresolvable template ref
(Python-equivalent)
- `TestRunAgent_RunTracker_AttachCheckpoint_CallSequence` —
Start→AttachCheckpoint→MarkSucceeded lifecycle

`internal/handler/agent_test.go` — SSE streaming parity (`Content-Type:
text/event-stream`, `data: {…}\n\n`, trailing `data: [DONE]\n\n`,
OpenAI-compatible non-stream `choices`).

`internal/agent/canvas/fixture_compile_test.go` + per-component tests
pin the Python-equivalent outputs.

```
go test -count=1 -v -run 'TestRunAgent_RealCanvas|TestRunAgent_RunTracker' ./internal/service/
```

## Design reference

`docs/develop/agent-go-port-design.md` (1329 lines, last cross-checked
2026-06-17) — module layout, per-component / per-tool inventory,
corner-case catalogue, and the actionable backlog (Section 14, including
the retrieval alignment follow-up).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-22 11:58:29 +08:00

228 lines
7.0 KiB
Python

from abc import ABC
import os
from agent.component.base import ComponentBase, ComponentParamBase
from api.utils.api_utils import timeout
class ListOperationsParam(ComponentParamBase):
"""
Define the List Operations component parameters.
"""
def __init__(self):
super().__init__()
self.query = ""
self.operations = "nth"
self.n = 0
self.strict = False
self.sort_method = "asc"
# Comma-separated list of map keys to sort by (primary,
# tiebreak, ...). Empty / unset falls back to the legacy
# full-hashable-key behaviour (sort by the lexicographically
# first field). Mirrors internal/agent/component/list_operations.go
# parseSortByFieldList + opSort's SortBy path.
self.sort_by = ""
self.filter = {
"operator": "=",
"value": ""
}
self.outputs = {
"result": {
"value": [],
"type": "Array of ?"
},
"first": {
"value": "",
"type": "?"
},
"last": {
"value": "",
"type": "?"
}
}
def check(self):
self.check_empty(self.query, "query")
self.check_valid_value(
self.operations,
"Support operations",
["nth", "head", "tail", "filter", "sort", "drop_duplicates"],
)
def get_input_form(self) -> dict[str, dict]:
return {}
class ListOperations(ComponentBase,ABC):
component_name = "ListOperations"
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs):
self.input_objects=[]
inputs = getattr(self._param, "query", None)
self.inputs = self._canvas.get_variable_value(inputs)
if not isinstance(self.inputs, list):
raise TypeError("The input of List Operations should be an array.")
self.set_input_value(inputs, self.inputs)
if self._param.operations == "nth":
self._nth()
elif self._param.operations == "head":
self._head()
elif self._param.operations == "tail":
self._tail()
elif self._param.operations == "filter":
self._filter()
elif self._param.operations == "sort":
self._sort()
elif self._param.operations == "drop_duplicates":
self._drop_duplicates()
def _coerce_n(self):
try:
return int(getattr(self._param, "n", 0))
except Exception:
return 0
def _is_strict(self):
strict = getattr(self._param, "strict", False)
if isinstance(strict, str):
return strict.strip().lower() in {"1", "true", "yes", "on"}
return bool(strict)
def _set_outputs(self, outputs):
self._param.outputs["result"]["value"] = outputs
self._param.outputs["first"]["value"] = outputs[0] if outputs else None
self._param.outputs["last"]["value"] = outputs[-1] if outputs else None
def _raise_strict_range_error(self, operation, n):
raise ValueError(
f"{operation} requires n to be within the valid range in strict mode, got {n}."
)
def _nth(self):
n = self._coerce_n()
strict = self._is_strict()
if n == 0:
if strict:
self._raise_strict_range_error("nth", n)
outputs = []
elif n > 0:
if n <= len(self.inputs):
outputs = [self.inputs[n - 1]]
elif strict:
self._raise_strict_range_error("nth", n)
else:
outputs = []
else:
if abs(n) <= len(self.inputs):
outputs = [self.inputs[n]]
elif strict:
self._raise_strict_range_error("nth", n)
else:
outputs = []
self._set_outputs(outputs)
def _head(self):
n = self._coerce_n()
strict = self._is_strict()
if strict:
if 1 <= n <= len(self.inputs):
outputs = self.inputs[:n]
else:
self._raise_strict_range_error("head", n)
else:
if n < 1:
outputs = []
else:
outputs = self.inputs[:n]
self._set_outputs(outputs)
def _tail(self):
n = self._coerce_n()
strict = self._is_strict()
if strict:
if 1 <= n <= len(self.inputs):
outputs = self.inputs[-n:]
else:
self._raise_strict_range_error("tail", n)
else:
if n < 1:
outputs = []
else:
outputs = self.inputs[-n:]
self._set_outputs(outputs)
def _filter(self):
self._set_outputs([i for i in self.inputs if self._eval(self._norm(i),self._param.filter["operator"],self._param.filter["value"])])
def _norm(self,v):
s = "" if v is None else str(v)
return s
def _eval(self, v, operator, value):
if operator == "=":
return v == value
elif operator == "":
return v != value
elif operator == "contains":
return value in v
elif operator == "start with":
return v.startswith(value)
elif operator == "end with":
return v.endswith(value)
else:
return False
def _sort(self):
items = self.inputs or []
method = getattr(self._param, "sort_method", "asc") or "asc"
reverse = method == "desc"
if not items:
self._set_outputs([])
return
first = items[0]
if isinstance(first, dict):
sort_by_raw = getattr(self._param, "sort_by", "") or ""
sort_by = [k.strip() for k in sort_by_raw.split(",") if k.strip()]
if sort_by:
outputs = sorted(
items,
key=lambda x: tuple(x.get(k) for k in sort_by),
reverse=reverse,
)
else:
outputs = sorted(
items,
key=lambda x: self._hashable(x),
reverse=reverse,
)
else:
outputs = sorted(items, reverse=reverse)
self._set_outputs(outputs)
def _drop_duplicates(self):
seen = set()
outs = []
for item in self.inputs:
k = self._hashable(item)
if k in seen:
continue
seen.add(k)
outs.append(item)
self._set_outputs(outs)
def _hashable(self,x):
if isinstance(x, dict):
return tuple(sorted((k, self._hashable(v)) for k, v in x.items()))
if isinstance(x, (list, tuple)):
return tuple(self._hashable(v) for v in x)
if isinstance(x, set):
return tuple(sorted(self._hashable(v) for v in x))
return x
def thoughts(self) -> str:
return "ListOperation in progress"