mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
main
22 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
195bfffb5e |
fix(security): address 93 CodeQL code-scanning alerts across 61 files (#16407)
## Summary Resolves all 93 open alerts at https://github.com/infiniflow/ragflow/security/code-scanning by rule: | Rule | Count | Treatment | |------|-------|-----------| | py/clear-text-logging-sensitive-data | 23 | Real fix — log scrubbing | | go/path-injection | 15 | Real fix where possible, suppression with rationale | | go/request-forgery | 8 | Suppression with rationale (operator-controlled URLs) | | go/clear-text-logging | 10 | Real fix — log scrubbing | | go/unsafe-quoting | 5 | Real fix — escape or refactor | | go/sql-injection | 3 | Real fix — orderby whitelist + CodeQL comment | | go/uncontrolled-allocation-size | 2 | Real fix — cap to 1024 | | go/incorrect-integer-conversion | 3 | Real fix — ParseInt + range check | | go/insecure-hostkeycallback | 1 | Real fix — known_hosts file | | go/disabled-certificate-check | 2 | Suppression with rationale | | go/command-injection | 1 | Suppression (sanitized via shq()) | | go/email-injection | 1 | Suppression with rationale | | go/cookie-httponly-not-set | 1 | Suppression (SPA bootstrap) | | js/stack-trace-exposure | 1 | Real fix — generic client message | | js/prototype-pollution-utility | 1 | Real fix — reject __proto__/constructor/prototype | | py/weak-sensitive-data-hashing | 1 | Real fix — MD5 → SHA-256 | | py/incomplete-url-substring-sanitization | 3 | Real fix — urlparse(hostname) | | py/paramiko-missing-host-key-validation | 1 | Real fix — load_system_host_keys + RejectPolicy | | cpp/integer-multiplication-cast-to-long | 2 | Real fix — cast to size_t | ## Real fixes (with measurable security improvement) **SSH host key verification (Go + Python)** Replace `InsecureIgnoreHostKey()` / `paramiko.AutoAddPolicy()` with proper host key verification against a known_hosts file (configurable via `SSH_KNOWN_HOSTS` env / `known_hosts` config field; fail-closed when unset). Loads `~/.ssh/known_hosts` first via `load_system_host_keys()` so existing setups keep working. **SQL injection in `user_canvas`** Add `userCanvasOrderableColumns` whitelist + `userCanvasOrderClause` helper. Both `GetList()` and `ListByTenantIDs()` now route the user-supplied `orderby` query param through the helper, defaulting to `create_time` on miss. **SQL injection in `pipeline_operation_log`** Existing whitelist documented via CodeQL comment. **Real SQL injection in `infinity/chunk.go:931`** Escape `'` → `''` on user-controlled `questionText` before splicing into `filter_fulltext(...)` SQL filter. **Real SQL injection in `elasticsearch/sql.go:75`** Defense-in-depth escape on tokenizer output before splicing into `MATCH(...)`. **Python code injection in `result_protocol.go`** Replace raw JSON literal embedding into Python/JS expressions with base64 + `json.loads` / `JSON.parse(Buffer.from(..., 'base64').toString('utf8'))`. Eliminates both the unsafe-quoting sink and the brittleness of mixing JSON true/false/null with Python syntax. **URL substring check bypass in `embedding_model.py`** Replace `if "dashscope-intl.aliyuncs.com" in u` with `urlparse(u).hostname == "dashscope-intl.aliyuncs.com"` so a base_url like `https://attacker.example/?u=dashscope-intl.aliyuncs.com` cannot bypass the routing. **Prototype pollution in `setNestedValue` (TS)** Reject `__proto__`/`constructor`/`prototype` keys before any assignment. **Integer overflow** - scrypt params via `ParseInt` + non-positive check (`internal/common/password.go`) - `topN` and `n` caps to 1024 (retrieval_service.go, dataset.go) - `nalloc*statesize` cast to `size_t` (cpp/re2/onepass.cc) **Cookie httponly** Set explicitly with rationale: this is the OAuth bootstrap cookie intentionally read by the SPA. **Stack trace exposure** Replace `error.message` in HTTP 500 response with generic `"internal error"`; full error still logged server-side via `console.error`. **Weak hashing** MD5 → SHA-256 for deterministic `conv_id` derivation (`conversation_service.py`). **Log scrubbing** Remove or redact user-controlled / sensitive content from clear-text logs across 8 ingestion parsers, `llm_service.py` ×11, `tenant_llm_service.py` ×7, `misc_utils.py` ×4, `redis_conn.py` ×10, `conftest.py` ×4, `init_data.py`, `dataset_api_service.py`, `generator.py`, `mysql_migration.py`, `cli.go`, `user_command.go`, `pdf_parser.go`. Most patterns converted to parameterized logging (`logging.info("...: %d", n)`) or static messages. ## CodeQL suppressions (each with rationale) For alerts where the data flow is genuinely safe but CodeQL can't see the context — operator-controlled URLs, sanitized inputs, etc. — I added `// codeql[go/<rule>] <rationale>` annotations rather than dismissing them, so future readers can audit the rationale inline: - `internal/agent/component/invoke.go:135` — Invoke is a generic canvas HTTP client - `internal/service/langfuse.go` ×2 — host is per-tenant operator config - `internal/service/file.go:1184` — already SSRF-guarded by `assertURLSafe` - `internal/utility/mcp_client.go` ×3 — already `AssertURLSafe` + IP-pinned - `internal/entity/models/bedrock.go` — sigv4-signed request, URL can't be tampered - `internal/service/deep_researcher.go:269` — `callback` is SSE display string, not SQL - `internal/engine/infinity/chunk.go:346` — UUIDs can't contain `'` (RFC 4122) - `internal/cli/common_command.go` ×2 — CLI trusts operator-configured URL - `internal/utility/smtp.go:194` — msg is server-built, not user form input - `internal/entity/models/*` ×14 (path-injection) — audio file paths are caller-supplied ## Test plan - ✅ All 13 modified Go packages build cleanly - ✅ 663 tests pass across `internal/agent/sandbox`, `internal/common`, `internal/agent/component`, `internal/engine/infinity`, `internal/dao` - ✅ All 11 modified Python files parse via `ast.parse` - ✅ TypeScript `tsc --noEmit` clean on the modified `use-provider-fields.tsx` - ✅ `node --check` clean on the modified JS file 🤖 Generated with [Claude Code](https://claude.com/claude-code) |
||
|
|
f58fae5fb7 |
feat(go-agent): Ported retrieval node, added Keenable web search tool (#16396)
Ported retrieval node, added Keenable web search tool - [x] New Feature (non-breaking change which adds functionality) |
||
|
|
3fa15c0e2f |
feat(agent): Go port — canvas engine, 22 components, DSL v2, 13 endpoints (#15952)
Ports the agent canvas subsystem from Python to Go.
## What's included
### Canvas Engine (Phase 0/1)
- State engine, scheduler, variable resolver, Redis checkpoint store,
cancel protocol
- **209 tests** across canvas / component / io packages
### 22 Components (P0–P4)
| Tier | Components |
|---|---|
| P0 T1+T2+T3 | LLM, Agent, ExitLoop, Switch, Categorize, Begin,
Message, Invoke |
| P1 T3 | VariableAggregator, VariableAssigner, StringTransform,
ListOperations, DataOperations |
| P2 T3 | Iteration, IterationItem, Loop, LoopItem |
| P3 T3 | UserFillUp, Fillup |
| P4 T5 | Browser, ExcelProcessor, DocsGenerator |
### DSL v2 Schema (Phase 2.5)
- Typed v2 in-memory model with v1-to-v2 auto-detect converter
- v1 legacy field stripping per plan §2.11.7
### HTTP Endpoints & Bug Fixes (Plans PR1–PR3)
- **DELETE SQL bug fix**: gorm v2 `Where("id = ?", id).Delete(...)`
pattern
- **CreateAgent validation**: title/DSL required, duplicate check, 103
envelope
- **13 new endpoints**: templates, prompts, tags, sessions CRUD,
chat/completions (SSE + non-stream stubs), rerun, test_db_connection,
logs, webhook/logs
- **756 Go unit tests** (745 → 756, +18)
- **17 → 0 Python integration test failures** (test_agents.py +
test_session_management/)
### Tools
21 eino tools: HTTPHelper, search tools, financial/data tools, mandatory
stubs
### Infrastructure
OTel observability, NATS message queue, DeepDoc gRPC client, SSRF
guards, IDOR mitigation
|
||
|
|
b28e134944 |
Feat: add local & ssh provider in admin panel (#15039)
### What problem does this PR solve? Feat: add local & ssh provider in admin panel ### Type of change - [x] New Feature (non-breaking change which adds functionality) |
||
|
|
f85e18afbc |
Refact: sandbox quickstart.md & add tutorial for code exec component (#14786)
### What problem does this PR solve? Refact: sandbox quickstart.md && add tutorial for code exec component ### Type of change - [x] Refactoring <img width="700" alt="img_v3_0211j_dcff835b-e3bb-4c77-9bc5-3b31a983229g" src="https://github.com/user-attachments/assets/7842fc0f-639a-458f-b164-bc81a99ce4a5" /> --------- Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> |
||
|
|
139b76d2b1 |
Chore(deps): Bump urllib3 from 2.6.3 to 2.7.0 in /agent/sandbox (#14824)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.3 to 2.7.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.7.0</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Security</h2> <p>Addressed high-severity security issues. Impact was limited to specific use cases detailed in the accompanying advisories; overall user exposure was estimated to be marginal.</p> <ul> <li> <p>Decompression-bomb safeguards of the streaming API were bypassed:</p> <ol> <li>When <code>HTTPResponse.drain_conn()</code> was called after the response had been read and decompressed partially. (Reported by <a href="https://github.com/Cycloctane"><code>@Cycloctane</code></a>)</li> <li>During the second <code>HTTPResponse.read(amt=N)</code> or <code>HTTPResponse.stream(amt=N)</code> call when the response was decompressed using the official <a href="https://pypi.org/project/brotli/">Brotli</a> library. (Reported by <a href="https://github.com/kimkou2024"><code>@kimkou2024</code></a>)</li> </ol> <p>See GHSA-mf9v-mfxr-j63j for details.</p> </li> <li> <p>HTTP pools created using <code>ProxyManager.connection_from_url</code> did not strip sensitive headers specified in <code>Retry.remove_headers_on_redirect</code> when redirecting to a different host. (GHSA-qccp-gfcp-xxvc reported by <a href="https://github.com/christos-spearbit"><code>@christos-spearbit</code></a>)</p> </li> </ul> <h2>Deprecations and Removals</h2> <ul> <li>Used <code>FutureWarning</code> instead of <code>DeprecationWarning</code> for better visibility of existing deprecation notices. Rescheduled the removal of deprecated features to version 3.0. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3763">urllib3/urllib3#3763</a>)</li> <li>Removed support for end-of-life Python 3.9. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3720">urllib3/urllib3#3720</a>)</li> <li>Removed support for end-of-life PyPy3.10. (<a href="https://redirect.github.com/urllib3/urllib3/issues/4979">urllib3/urllib3#4979</a>)</li> <li>Bumped the minimum supported pyOpenSSL version to 19.0.0. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3777">urllib3/urllib3#3777</a>)</li> </ul> <h2>Bugfixes</h2> <ul> <li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was ignoring decompressed data buffered from previous partial reads. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3636">urllib3/urllib3#3636</a>)</li> <li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only part of the response after a partial read when <code>cache_content=True</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/4967">urllib3/urllib3#4967</a>)</li> <li>Fixed <code>HTTPResponse.stream()</code> and <code>HTTPResponse.read_chunked()</code> to handle <code>amt=0</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3793">urllib3/urllib3#3793</a>)</li> <li>Updated <code>_TYPE_BODY</code> type alias to include missing <code>Iterable[str]</code>, matching the documented and runtime behavior of chunked request bodies. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3798">urllib3/urllib3#3798</a>)</li> <li>Fixed <code>LocationParseError</code> when paths resembling schemeless URIs were passed to <code>HTTPConnectionPool.urlopen()</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3352">urllib3/urllib3#3352</a>)</li> <li>Fixed <code>BaseHTTPResponse.readinto()</code> type annotation to accept <code>memoryview</code> in addition to <code>bytearray</code>, matching the <code>io.RawIOBase.readinto</code> contract and enabling use with <code>io.BufferedReader</code> without type errors. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3764">urllib3/urllib3#3764</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.7.0 (2026-05-07)</h1> <h2>Security</h2> <p>Addressed high-severity security issues. Impact was limited to specific use cases detailed in the accompanying advisories; overall user exposure was estimated to be marginal.</p> <ul> <li> <p>Decompression-bomb safeguards of the streaming API were bypassed:</p> <ol> <li>When <code>HTTPResponse.drain_conn()</code> was called after the response had been read and decompressed partially.</li> <li>During the second <code>HTTPResponse.read(amt=N)</code> or <code>HTTPResponse.stream(amt=N)</code> call when the response was decompressed using the official <code>Brotli <https://pypi.org/project/brotli/></code>__ library.</li> </ol> <p>See <code>GHSA-mf9v-mfxr-j63j <https://github.com/urllib3/urllib3/security/advisories/GHSA-mf9v-mfxr-j63j></code>__ for details.</p> </li> <li> <p>HTTP pools created using <code>ProxyManager.connection_from_url</code> did not strip sensitive headers specified in <code>Retry.remove_headers_on_redirect</code> when redirecting to a different host. (<code>GHSA-qccp-gfcp-xxvc <https://github.com/urllib3/urllib3/security/advisories/GHSA-qccp-gfcp-xxvc></code>__)</p> </li> </ul> <h2>Deprecations and Removals</h2> <ul> <li>Used <code>FutureWarning</code> instead of <code>DeprecationWarning</code> for better visibility of existing deprecation notices. Rescheduled the removal of deprecated features to version 3.0. (<code>[#3763](https://github.com/urllib3/urllib3/issues/3763) <https://github.com/urllib3/urllib3/issues/3763></code>__)</li> <li>Removed support for end-of-life Python 3.9. (<code>[#3720](https://github.com/urllib3/urllib3/issues/3720) <https://github.com/urllib3/urllib3/issues/3720></code>__)</li> <li>Removed support for end-of-life PyPy3.10. (<code>[#4979](https://github.com/urllib3/urllib3/issues/4979) <https://github.com/urllib3/urllib3/issues/4979></code>__)</li> <li>Bumped the minimum supported pyOpenSSL version to 19.0.0. (<code>[#3777](https://github.com/urllib3/urllib3/issues/3777) <https://github.com/urllib3/urllib3/issues/3777></code>__)</li> </ul> <h2>Bugfixes</h2> <ul> <li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was ignoring decompressed data buffered from previous partial reads. (<code>[#3636](https://github.com/urllib3/urllib3/issues/3636) <https://github.com/urllib3/urllib3/issues/3636></code>__)</li> <li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only part of the response after a partial read when <code>cache_content=True</code>.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
e6cb9faace |
fix: close two security analyzer bypass paths in sandbox executor (#14690)
## Summary
Two bypass vectors in the sandbox code security analyzer allowed
malicious code to pass the safety check undetected and reach the Docker
executor.
### 1. JavaScript: template-literal bypass of `require()` block
The `SecureJavaScriptAnalyzer` regex patterns used `['"]` to match
module names, covering only single and double quotes. An attacker could
use ES6 template literals to bypass all three `require` checks:
`javascript
const cp = require(`child_process`);
async function main() {
return cp.execSync('cat /etc/passwd').toString();
}
`
The same bypass applied to `fs` and `worker_threads`.
**Fix:** Updated all three `require` patterns from `['"]` to `['"\]` to
also match backtick template literals.
### 2. Python: `builtins` not blocked + attribute-call blind spot in
`visit_Call`
`visit_Call` only checked `ast.Name` nodes, so attribute-style calls
like `module.func()` were invisible to the analyzer. Additionally,
`builtins` was absent from `DANGEROUS_IMPORTS`. Combined, this allowed:
`python
import builtins
def main():
builtins.exec('import os; os.system("id")')
`
Neither the import nor the exec call triggered any flag.
**Fix:** Added `builtins` to `DANGEROUS_IMPORTS` and added an
`ast.Attribute` branch to `visit_Call` so that `module.dangerous_func()`
style calls are caught alongside bare `dangerous_func()` calls.
## Tests
Added four regression tests covering each new bypass vector:
- `test_javascript_child_process_template_literal_is_rejected`
- `test_javascript_fs_template_literal_is_rejected`
- `test_python_builtins_import_is_rejected`
- `test_python_attribute_eval_call_is_rejected`
---------
Co-authored-by: bounty-hunter <bounty@hunter.local>
|
||
|
|
51b73850e1 |
feat: make sandbox Dockerfile mirrors optional with ARG (#14553)
### What problem does this PR solve? Resolves #14447. *(Note: This supersedes stalled PR #14448 and implements the requested CodeRabbitAI fixes).* Currently, the Dockerfiles inside `agent/sandbox/sandbox_base_image` (both Python and Node.js) have hardcoded Chinese package mirrors. This forces the mirrors on all users globally, which causes build network timeouts for contributors outside of China. This PR introduces an enhancement to fix the issue by: 1. Implementing the `NEED_MIRROR` build argument in the sandbox Dockerfiles. 2. Replacing static `ENV` instructions with conditional shell logic inside `RUN` blocks to dynamically set the package registries. 3. Allowing the build to cleanly fall back to default global registries (`pypi.org` and `npmjs.org`) when `--build-arg NEED_MIRROR=0` is passed. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com> |
||
|
|
c29335cbff |
Feat: support local provider for code exec component & remove some outdated models (#14637)
### What problem does this PR solve? Feat: support local provider for code exec component & remove some outdated models ### Type of change - [x] New Feature (non-breaking change which adds functionality) |
||
|
|
f554f6ae85 |
chore(docs): tips for installing CN fonts (#14189)
### What problem does this PR solve? Add tips for installing Chinse fonts under code sandbox. Otherwise, `matplotlib `won't render Chinese correctly. <img width="2082" height="1186" alt="sales_analysis" src="https://github.com/user-attachments/assets/57e675ab-1e92-4662-9aeb-ad72a6121eb5" /> ### Type of change - [x] Documentation Update |
||
|
|
2b6c50734f |
Sync code from EE (#14080)
### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> |
||
|
|
1638083e18 |
Fix: sandbox cannot accept large args list (#14063)
### What problem does this PR solve? Sandbox cannot accept large args list. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) |
||
|
|
3064895bbb |
Fix: import error in sandbox provider (#13971)
### What problem does this PR solve? Fix import error in sandbox provider. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated internal configuration import mechanism for sandbox provider initialization. No end-user impact. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
112007243d |
Refa: refine code_exec component (#13925)
### What problem does this PR solve? Refine code_exec component. ### Type of change - [x] Refactoring |
||
|
|
dd839f30e8 |
Fix: code supports matplotlib (#13724)
### What problem does this PR solve? Code as "final" node:  Code as "mid" node:  ### Type of change - [x] New Feature (non-breaking change which adds functionality) |
||
|
|
c35b210c3a |
fix(security): upgrade requests to 2.32.5 in agent/sandbox to fix CVE-2024-47081 (#13424)
### What problem does this PR solve? This PR remediates CVE-2024-47081 (MEDIUM severity) in the agent/sandbox component by upgrading the requests library from version 2.32.3 to 2.32.5. The vulnerability allows .netrc credentials to leak via malicious URLs. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) |
||
|
|
8c9b080499 |
fix: update axios to 1.13.5+ to remediate CVE-2026-25639 DoS vulnerability (#13380)
### What problem does this PR solve? This PR remediates CVE-2026-25639, a HIGH severity Denial of Service vulnerability in axios caused by __proto__ pollution in the mergeConfig function. The vulnerability affects both the web frontend and the sandbox nodejs environment. Trivy security scan identified axios versions below 1.13.5 as vulnerable. This PR updates axios to secure versions (1.13.6 in web, 1.13.5 in sandbox) to eliminate the security risk. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) |
||
|
|
33ba955b02 |
Translate Chinese text to English in agent/sandbox (#13356)
Chinese text remained in generated code comments, log messages, field descriptions, and documentation files under `agent/sandbox/`. ### Changes - **`tests/MIGRATION_GUIDE.md`** — Full EN translation (migration guide from OpenSandbox → Code Interpreter) - **`tests/QUICKSTART.md`** — Full EN translation (quick test guide for Aliyun sandbox provider) - **`providers/aliyun_codeinterpreter.py`** — Removed `(主账号ID)` from docstring, error log, and config field description - **`sandbox_spec.md`** — Removed `(主账号ID)` from `account_id` field description - **`tests/test_aliyun_codeinterpreter_integration.py`** — Removed `(主账号ID)` from inline comment ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuzhichang <153784+yuzhichang@users.noreply.github.com> |
||
|
|
ba95167e13 |
Clean directories (#13061)
### Type of change - [x] Documentation Update |
||
|
|
47e55ab324 |
Chore(deps): Bump starlette from 0.46.2 to 0.49.1 in /agent/sandbox (#12878)
Bumps [starlette](https://github.com/Kludex/starlette) from 0.46.2 to 0.49.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Kludex/starlette/releases">starlette's releases</a>.</em></p> <blockquote> <h2>Version 0.49.1</h2> <p>This release fixes a security vulnerability in the parsing logic of the <code>Range</code> header in <code>FileResponse</code>.</p> <p>You can view the full security advisory: <a href="https://github.com/Kludex/starlette/security/advisories/GHSA-7f5h-v6xp-fcq8">GHSA-7f5h-v6xp-fcq8</a></p> <h2>Fixed</h2> <ul> <li>Optimize the HTTP ranges parsing logic <a href=" |
||
|
|
82b932dbc7 |
Chore(deps): Bump urllib3 from 2.4.0 to 2.6.3 in /agent/sandbox (#12877)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.4.0 to 2.6.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.6.3</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Fixed a security issue where decompression-bomb safeguards of the streaming API were bypassed when HTTP redirects were followed. (CVE-2026-21441 reported by <a href="https://github.com/D47A"><code>@D47A</code></a>, 8.9 High, GHSA-38jv-5279-wg99)</li> <li>Started treating <code>Retry-After</code> times greater than 6 hours as 6 hours by default. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3743">urllib3/urllib3#3743</a>)</li> <li>Fixed <code>urllib3.connection.VerifiedHTTPSConnection</code> on Emscripten. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3752">urllib3/urllib3#3752</a>)</li> </ul> <h2>2.6.2</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Fixed <code>HTTPResponse.read_chunked()</code> to properly handle leftover data in the decoder's buffer when reading compressed chunked responses. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3734">urllib3/urllib3#3734</a>)</li> </ul> <h2>2.6.1</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Restore previously removed <code>HTTPResponse.getheaders()</code> and <code>HTTPResponse.getheader()</code> methods. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3731">#3731</a>)</li> </ul> <h2>2.6.0</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Security</h2> <ul> <li>Fixed a security issue where streaming API could improperly handle highly compressed HTTP content ("decompression bombs") leading to excessive resource consumption even when a small amount of data was requested. Reading small chunks of compressed data is safer and much more efficient now. (CVE-2025-66471 reported by <a href="https://github.com/Cycloctane"><code>@Cycloctane</code></a>, 8.9 High, GHSA-2xpw-w6gg-jr37)</li> <li>Fixed a security issue where an attacker could compose an HTTP response with virtually unlimited links in the <code>Content-Encoding</code> header, potentially leading to a denial of service (DoS) attack by exhausting system resources during decoding. The number of allowed chained encodings is now limited to 5. (CVE-2025-66418 reported by <a href="https://github.com/illia-v"><code>@illia-v</code></a>, 8.9 High, GHSA-gm62-xv2j-4w53)</li> </ul> <blockquote> <p>[!IMPORTANT]</p> <ul> <li>If urllib3 is not installed with the optional <code>urllib3[brotli]</code> extra, but your environment contains a Brotli/brotlicffi/brotlipy package anyway, make sure to upgrade it to at least Brotli 1.2.0 or brotlicffi 1.2.0.0 to benefit from the security fixes and avoid warnings. Prefer using <code>urllib3[brotli]</code> to install a compatible Brotli package automatically.</li> </ul> </blockquote> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.6.3 (2026-01-07)</h1> <ul> <li>Fixed a high-severity security issue where decompression-bomb safeguards of the streaming API were bypassed when HTTP redirects were followed. (<code>GHSA-38jv-5279-wg99 <https://github.com/urllib3/urllib3/security/advisories/GHSA-38jv-5279-wg99></code>__)</li> <li>Started treating <code>Retry-After</code> times greater than 6 hours as 6 hours by default. (<code>[#3743](https://github.com/urllib3/urllib3/issues/3743) <https://github.com/urllib3/urllib3/issues/3743></code>__)</li> <li>Fixed <code>urllib3.connection.VerifiedHTTPSConnection</code> on Emscripten. (<code>[#3752](https://github.com/urllib3/urllib3/issues/3752) <https://github.com/urllib3/urllib3/issues/3752></code>__)</li> </ul> <h1>2.6.2 (2025-12-11)</h1> <ul> <li>Fixed <code>HTTPResponse.read_chunked()</code> to properly handle leftover data in the decoder's buffer when reading compressed chunked responses. (<code>[#3734](https://github.com/urllib3/urllib3/issues/3734) <https://github.com/urllib3/urllib3/issues/3734></code>__)</li> </ul> <h1>2.6.1 (2025-12-08)</h1> <ul> <li>Restore previously removed <code>HTTPResponse.getheaders()</code> and <code>HTTPResponse.getheader()</code> methods. (<code>[#3731](https://github.com/urllib3/urllib3/issues/3731) <https://github.com/urllib3/urllib3/issues/3731></code>__)</li> </ul> <h1>2.6.0 (2025-12-05)</h1> <h2>Security</h2> <ul> <li>Fixed a security issue where streaming API could improperly handle highly compressed HTTP content ("decompression bombs") leading to excessive resource consumption even when a small amount of data was requested. Reading small chunks of compressed data is safer and much more efficient now. (<code>GHSA-2xpw-w6gg-jr37 <https://github.com/urllib3/urllib3/security/advisories/GHSA-2xpw-w6gg-jr37></code>__)</li> <li>Fixed a security issue where an attacker could compose an HTTP response with virtually unlimited links in the <code>Content-Encoding</code> header, potentially leading to a denial of service (DoS) attack by exhausting system resources during decoding. The number of allowed chained encodings is now limited to 5. (<code>GHSA-gm62-xv2j-4w53 <https://github.com/urllib3/urllib3/security/advisories/GHSA-gm62-xv2j-4w53></code>__)</li> </ul> <p>.. caution::</p> <ul> <li>If urllib3 is not installed with the optional <code>urllib3[brotli]</code> extra, but your environment contains a Brotli/brotlicffi/brotlipy package anyway, make sure to upgrade it to at least Brotli 1.2.0 or brotlicffi 1.2.0.0 to benefit from the security fixes and avoid warnings. Prefer using</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
fd11aca8e5 |
feat: Implement pluggable multi-provider sandbox architecture (#12820)
## Summary Implement a flexible sandbox provider system supporting both self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for secure code execution in agent workflows. **Key Changes:** - ✅ Aliyun Code Interpreter provider using official `agentrun-sdk>=0.0.16` - ✅ Self-managed provider with gVisor (runsc) security - ✅ Arguments parameter support for dynamic code execution - ✅ Database-only configuration (removed fallback logic) - ✅ Configuration scripts for quick setup Issue #12479 ## Features ### 🔌 Provider Abstraction Layer **1. Self-Managed Provider** (`agent/sandbox/providers/self_managed.py`) - Wraps existing executor_manager HTTP API - gVisor (runsc) for secure container isolation - Configurable pool size, timeout, retry logic - Languages: Python, Node.js, JavaScript - ⚠️ **Requires**: gVisor installation, Docker, base images **2. Aliyun Code Interpreter** (`agent/sandbox/providers/aliyun_codeinterpreter.py`) - SaaS integration using official agentrun-sdk - Serverless microVM execution with auto-authentication - Hard timeout: 30 seconds max - Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`, `AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION` - Automatically wraps code to call `main()` function **3. E2B Provider** (`agent/sandbox/providers/e2b.py`) - Placeholder for future integration ### ⚙️ Configuration System - `conf/system_settings.json`: Default provider = `aliyun_codeinterpreter` - `agent/sandbox/client.py`: Enforces database-only configuration - Admin UI: `/admin/sandbox-settings` - Configuration validation via `validate_config()` method - Health checks for all providers ### 🎯 Key Capabilities **Arguments Parameter Support:** All providers support passing arguments to `main()` function: ```python # User code def main(name: str, count: int) -> dict: return {"message": f"Hello {name}!" * count} # Executed with: arguments={"name": "World", "count": 3} # Result: {"message": "Hello World!Hello World!Hello World!"} ``` **Self-Describing Providers:** Each provider implements `get_config_schema()` returning form configuration for Admin UI **Error Handling:** Structured `ExecutionResult` with stdout, stderr, exit_code, execution_time ## Configuration Scripts Two scripts for quick Aliyun sandbox setup: **Shell Script (requires jq):** ```bash source scripts/configure_aliyun_sandbox.sh ``` **Python Script (interactive):** ```bash python3 scripts/configure_aliyun_sandbox.py ``` ## Testing ```bash # Unit tests uv run pytest agent/sandbox/tests/test_providers.py -v # Aliyun provider tests uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v # Integration tests (requires credentials) uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v # Quick SDK validation python3 agent/sandbox/tests/verify_sdk.py ``` **Test Coverage:** - 30 unit tests for provider abstraction - Provider-specific tests for Aliyun - Integration tests with real API - Security tests for executor_manager ## Documentation - `docs/develop/sandbox_spec.md` - Complete architecture specification - `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy sandbox - `agent/sandbox/tests/QUICKSTART.md` - Quick start guide - `agent/sandbox/tests/README.md` - Testing documentation ## Breaking Changes ⚠️ **Migration Required:** 1. **Directory Move**: `sandbox/` → `agent/sandbox/` - Update imports: `from sandbox.` → `from agent.sandbox.` 2. **Mandatory Configuration**: - SystemSettings must have `sandbox.provider_type` configured - Removed fallback default values - Configuration must exist in database (from `conf/system_settings.json`) 3. **Aliyun Credentials**: - Requires `AGENTRUN_*` environment variables (not `ALIYUN_*`) - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID) 4. **Self-Managed Provider**: - gVisor (runsc) must be installed for security - Install: `go install gvisor.dev/gvisor/runsc@latest` ## Database Schema Changes ```python # SystemSettings.value: CharField → TextField api/db/db_models.py: Changed for unlimited config length # SystemSettingsService.get_by_name(): Fixed query precision api/db/services/system_settings_service.py: startswith → exact match ``` ## Files Changed ### Backend (Python) - `agent/sandbox/providers/base.py` - SandboxProvider ABC interface - `agent/sandbox/providers/manager.py` - ProviderManager - `agent/sandbox/providers/self_managed.py` - Self-managed provider - `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider - `agent/sandbox/providers/e2b.py` - E2B provider (placeholder) - `agent/sandbox/client.py` - Unified client (enforces DB-only config) - `agent/tools/code_exec.py` - Updated to use provider system - `admin/server/services.py` - SandboxMgr with registry & validation - `admin/server/routes.py` - 5 sandbox API endpoints - `conf/system_settings.json` - Default: aliyun_codeinterpreter - `api/db/db_models.py` - TextField for SystemSettings.value - `api/db/services/system_settings_service.py` - Exact match query ### Frontend (TypeScript/React) - `web/src/pages/admin/sandbox-settings.tsx` - Settings UI - `web/src/services/admin-service.ts` - Sandbox service functions - `web/src/services/admin.service.d.ts` - Type definitions - `web/src/utils/api.ts` - Sandbox API endpoints ### Documentation - `docs/develop/sandbox_spec.md` - Architecture spec - `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide - `agent/sandbox/tests/QUICKSTART.md` - Quick start - `agent/sandbox/tests/README.md` - Testing guide ### Configuration Scripts - `scripts/configure_aliyun_sandbox.sh` - Shell script (jq) - `scripts/configure_aliyun_sandbox.py` - Python script ### Tests - `agent/sandbox/tests/test_providers.py` - 30 unit tests - `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests - `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` - Integration tests - `agent/sandbox/tests/verify_sdk.py` - SDK validation ## Architecture ``` Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged|Aliyun|E2B] ↓ SystemSettings ``` ## Usage ### 1. Configure Provider **Via Admin UI:** 1. Navigate to `/admin/sandbox-settings` 2. Select provider (Aliyun Code Interpreter / Self-Managed) 3. Fill in configuration 4. Click "Test Connection" to verify 5. Click "Save" to apply **Via Configuration Scripts:** ```bash # Aliyun provider export AGENTRUN_ACCESS_KEY_ID="xxx" export AGENTRUN_ACCESS_KEY_SECRET="yyy" export AGENTRUN_ACCOUNT_ID="zzz" export AGENTRUN_REGION="cn-shanghai" source scripts/configure_aliyun_sandbox.sh ``` ### 2. Restart Service ```bash cd docker docker compose restart ragflow-server ``` ### 3. Execute Code in Agent ```python from agent.sandbox.client import execute_code result = execute_code( code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}', language="python", timeout=30, arguments={"name": "World"} ) print(result.stdout) # {"message": "Hello World!"} ``` ## Troubleshooting ### "Container pool is busy" (Self-Managed) - **Cause**: Pool exhausted (default: 1 container in `.env`) - **Fix**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+ ### "Sandbox provider type not configured" - **Cause**: Database missing configuration - **Fix**: Run config script or set via Admin UI ### "gVisor not found" - **Cause**: runsc not installed - **Fix**: `go install gvisor.dev/gvisor/runsc@latest && sudo cp ~/go/bin/runsc /usr/local/bin/` ### Aliyun authentication errors - **Cause**: Wrong environment variable names - **Fix**: Use `AGENTRUN_*` prefix (not `ALIYUN_*`) ## Checklist - [x] All tests passing (30 unit tests + integration tests) - [x] Documentation updated (spec, migration guide, quickstart) - [x] Type definitions added (TypeScript) - [x] Admin UI implemented - [x] Configuration validation - [x] Health checks implemented - [x] Error handling with structured results - [x] Breaking changes documented - [x] Configuration scripts created - [x] gVisor requirements documented Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> |