mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
## Problem The CodeQL Go analysis was failing on the entire codebase with: fatal error: office_oxide.h: No such file or directory because six ingestion parser files (`doc`, `docx`, `ppt`, `pptx`, `xls`, `xlsx`) import `github.com/yfedoseev/office_oxide/go`, a CGO binding to a Rust library. The CodeQL runner image doesn't ship the `office_oxide.h` native header, so the Go AST build aborts before CodeQL can analyze anything. This means **no Go-language alerts have been re-evaluated** since the suppression comments were added in #16407 and #16408. The most recent CodeQL run fixed 51 alerts (all Python/JS), but every Go alert stayed open, including ones in files that have nothing to do with office_oxide. ## Fix Add a `.github/codeql/codeql-config.yml` that uses `paths-ignore` to skip the six parser files. The rest of the Go tree is pure Go (no CGO) and analyzes cleanly. The parser files are also excluded from local `go test` / `go build` when the office_oxide C library isn't installed, so this brings CodeQL in line with the existing toolchain. ## Expected outcome After this PR merges, the next CodeQL run on main will: 1. Complete successfully (Go analysis no longer aborts) 2. Re-evaluate the alerts in the remaining files 3. Match the existing `// codeql[go/...] suppression comments` added in #16407 and #16408 4. Close those alerts This should drop the open-alert count from 44 to near zero (the 6 Python clear-text-logging and 1 JS prototype-pollution alerts that were added in #16408 will also be re-evaluated). ## Why not just install office_oxide in the CodeQL runner? - The `office_oxide` Go binding is a 3rd-party module (`github.com/yfedoseev/office_oxide/go`) with CGO that pulls in a Rust crate - The CodeQL runner uses a stock Go toolchain that doesn't include the C library - Installing it would require modifying the GitHub-managed CodeQL workflow, which is owned by GitHub and not easily customizable - The parsers are also unimplemented stubs (each `Parse` function logs the filename and returns `nil` after my earlier clear-text-logging fix), so they have no security-relevant code to scan anyway 🤖 Generated with [Claude Code](https://claude.com/claude-code)
26 lines
1.3 KiB
YAML
26 lines
1.3 KiB
YAML
# CodeQL configuration. The default CodeQL Analysis workflow (managed by
|
|
# GitHub) reads this file when scanning the repository. We use it to
|
|
# exclude files that the Go analysis cannot compile — the rest of the
|
|
# repo compiles fine, but the CGO-based office_oxide bindings require
|
|
# a native header (`office_oxide.h`) that isn't present in the CodeQL
|
|
# runner image. Without this exclusion the entire Go analysis aborts
|
|
# with `fatal error: office_oxide.h: No such file or directory`, which
|
|
# means no Go alerts can be re-evaluated and alerts on these files
|
|
# stay open indefinitely even after their root cause is fixed.
|
|
#
|
|
# The excluded files are MS Office document parsers. They are also
|
|
# excluded from `go test` and `go build` in local development when
|
|
# the office_oxide C library is not installed, so this exclusion
|
|
# brings CodeQL in line with the rest of the toolchain.
|
|
paths-ignore:
|
|
- internal/ingestion/parser/doc_parser.go
|
|
- internal/ingestion/parser/docx_parser.go
|
|
- internal/ingestion/parser/ppt_parser.go
|
|
- internal/ingestion/parser/pptx_parser.go
|
|
- internal/ingestion/parser/xls_parser.go
|
|
- internal/ingestion/parser/xlsx_parser.go
|
|
# Generated / vendored — also break analysis without adding signal.
|
|
- "**/testdata/**"
|
|
- "**/node_modules/**"
|
|
- "**/*.pb.go"
|