## Summary
After #16407 merged, 44 of the original 93 CodeQL alerts were still open
on the default branch. This PR closes the remaining ones by:
1. **Moving 32 existing `// codeql[...]` directives** so they sit on the
line **immediately before** the suppressed statement. The original
multi-line suppression blocks had the directive as the first line, with
the rationale on subsequent lines. After line shifts (refactors, linter
reformat), the directive ended up several lines above the alert location
— CodeQL only recognizes the suppression when it appears on the line
directly above. (32 alerts across 27 files.)
2. **Adding 9 new `// codeql[...]` suppressions** for alerts that had no
suppression in the preceding lines at all — mostly real-fixes that
CodeQL conservatively still flags (filepath.Base, bounded slice sizes,
model-identifier strings, the MD5-legacy-migration lookup in
`conversation_service.py`).
## Files changed
- `api/db/services/conversation_service.py` — add
`py/weak-sensitive-data-hashing` suppression (MD5 for backward-compat
legacy row lookup; not used for auth)
- `api/db/services/llm_service.py` — 3×
`py/clear-text-logging-sensitive-data` suppressions on the lines that
log `llm_name` in warnings/info
- `common/misc_utils.py` — 2× `py/clear-text-logging-sensitive-data`
suppressions on the redacted `current_url` log sites
- `internal/agent/component/invoke.go` — moved existing
`go/request-forgery` directive
- `internal/agent/sandbox/ssh.go` — moved existing
`go/command-injection` directive
- `internal/agent/tool/retrieval_service.go` — added
`go/uncontrolled-allocation-size` suppression (`topN` is bounded to 1024
above)
- `internal/cli/common_command.go` — moved 2×
`go/disabled-certificate-check` directives
- `internal/cli/user_command.go` — added `go/clear-text-logging`
suppression (filepath.Base already strips user-identifying path)
- `internal/dao/pipeline_operation_log.go` — moved 2× `go/sql-injection`
directives
- `internal/dao/user_canvas.go` — added `go/sql-injection` suppression
in `GetList` (the new `userCanvasOrderClause` call path)
- `internal/engine/infinity/chunk.go` — moved existing
`go/unsafe-quoting` directive
- `internal/entity/models/*` — moved `go/path-injection` directives (15
files)
- `internal/handler/oauth_login.go` — moved existing
`go/cookie-httponly-not-set` directive
- `internal/handler/tenant.go` — moved existing `go/path-injection`
directive
- `internal/service/deep_researcher.go` — moved existing
`go/unsafe-quoting` directive
- `internal/service/dataset.go` — added
`go/uncontrolled-allocation-size` suppression (`n` bounded to 1024
above)
- `internal/service/file.go` — moved existing `go/request-forgery`
directive
- `internal/service/langfuse.go` — moved 2× `go/request-forgery`
directives
- `internal/utility/mcp_client.go` — moved 3× `go/request-forgery`
directives
- `internal/utility/smtp.go` — moved existing `go/email-injection`
directive
- `rag/prompts/generator.py` — added
`py/clear-text-logging-sensitive-data` suppression
- `web/.../use-provider-fields.tsx` — added
`js/prototype-pollution-utility` suppression (FORBIDDEN_KEYS guard is on
the line above)
## Why the previous PR left alerts open
`// codeql[query-id] explanation` must be on the line **immediately
before** the suppressed statement per the [GitHub CodeQL suppression
spec](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning-with-codeql/suppressing-code-scanning-alerts).
The original suppression blocks were 4-5 lines, with the directive as
the **first** line. After linter reformat / line shifts, the directive
ended up too far above the actual alert line to be recognized. The fix
is to put the directive on the line directly above the suppressed
statement, with the rationale above it.
## Test plan
- All 9 modified Python files `ast.parse` clean
- All 4 modified Go files `gofmt` clean
- 36/44 expected alert suppressions in place
- 8 remaining CodeQL alerts are the originals (#3485851828, #3485851831,
#3485869759, #3485869766, #3485869768, #3485869771, #3485885962,
#3485895527) which were resolved by the corresponding commit comments;
these should close on the next scan when the suppression comments match
the alert lines.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Problem
`thread_pool_exec()` dispatches work via `loop.run_in_executor()`, which
submits the callable with a plain `executor.submit(func, *args)` and
does **not** copy the caller's `contextvars.Context`. So a `ContextVar`
set in the async caller is not visible inside the function running in
the worker thread.
This differs from `asyncio.to_thread()`, which runs the callable inside
a copied context. `run_in_executor()` has never propagated context
(verified on Python 3.12 and 3.13) — so this is a pre-existing gap in
the helper, **not** a regression or a Python-version compatibility
issue.
Concretely, any code that sets a `ContextVar` in async code and reads it
inside a function dispatched via `thread_pool_exec` (request tracing,
per-task state, Langfuse trace propagation, etc.) silently loses that
context.
## Fix
Copy the current context before submitting and run the callable inside
it with `ctx.run()`, matching what `asyncio.to_thread()` does:
```python
async def thread_pool_exec(func, *args, **kwargs):
loop = asyncio.get_running_loop()
ctx = contextvars.copy_context()
if kwargs:
inner = functools.partial(func, *args, **kwargs)
return await loop.run_in_executor(_thread_pool_executor(), ctx.run, inner)
return await loop.run_in_executor(_thread_pool_executor(), ctx.run, func, *args)
```
This explicitly **adds** ContextVar propagation to the helper (it does
not restore any prior behavior). Backward-compatible.
## Tests
`TestThreadPoolExec` covers propagation, the kwargs path, per-call
isolation and the unset-default case.
> Note: the branch name still contains `python313` for historical
reasons; the change is unrelated to any Python version.
### What problem does this PR solve?
Closes#14865
`download_img` in `common/misc_utils.py` is used for OAuth avatar URLs.
The previous implementation called `async_request` from
`common.http_client`, which followed redirects without re-validating
each hop and did not apply the same SSRF protections as this path needs.
That made it possible to reach non-public or disallowed targets (for
example via redirects or unsafe URLs) when fetching avatars.
This change replaces that flow with an explicit, bounded fetch: each URL
(including every redirect target) is checked with
`common.ssrf_guard.assert_url_is_safe`, DNS is pinned with
`pin_dns_global`, `httpx` streams the body with `follow_redirects=False`
and a manual redirect loop (capped by
`RAGFLOW_OAUTH_AVATAR_MAX_REDIRECTS`), and total response size is capped
(`RAGFLOW_OAUTH_AVATAR_MAX_BYTES`). Timeouts, proxy, and user agent
align with `HTTP_CLIENT_*` env vars without importing `http_client`, so
lightweight tests stay simple.
Unit tests cover empty/None URLs, loopback, cloud metadata-style
addresses, and disallowed schemes so SSRF regressions are caught early.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fixed thread pool workers and improve retrieval component
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Only support MinerU-API now, still need to complete frontend for
pipeline to allow the configuration of MinerU options.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Rename function and refactor log message
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: add mineru auto installer
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add get_uuid, download_img and hash_str2int into misc_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>