From f90be41eab4ccb9ad2c52031e6c5d3d89d998909 Mon Sep 17 00:00:00 2001 From: Zhichang Yu Date: Sat, 27 Jun 2026 21:05:14 +0800 Subject: [PATCH] build(codeql): exclude office_oxide CGO files so Go analysis completes (#16410) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Problem The CodeQL Go analysis was failing on the entire codebase with: fatal error: office_oxide.h: No such file or directory because six ingestion parser files (`doc`, `docx`, `ppt`, `pptx`, `xls`, `xlsx`) import `github.com/yfedoseev/office_oxide/go`, a CGO binding to a Rust library. The CodeQL runner image doesn't ship the `office_oxide.h` native header, so the Go AST build aborts before CodeQL can analyze anything. This means **no Go-language alerts have been re-evaluated** since the suppression comments were added in #16407 and #16408. The most recent CodeQL run fixed 51 alerts (all Python/JS), but every Go alert stayed open, including ones in files that have nothing to do with office_oxide. ## Fix Add a `.github/codeql/codeql-config.yml` that uses `paths-ignore` to skip the six parser files. The rest of the Go tree is pure Go (no CGO) and analyzes cleanly. The parser files are also excluded from local `go test` / `go build` when the office_oxide C library isn't installed, so this brings CodeQL in line with the existing toolchain. ## Expected outcome After this PR merges, the next CodeQL run on main will: 1. Complete successfully (Go analysis no longer aborts) 2. Re-evaluate the alerts in the remaining files 3. Match the existing `// codeql[go/...] suppression comments` added in #16407 and #16408 4. Close those alerts This should drop the open-alert count from 44 to near zero (the 6 Python clear-text-logging and 1 JS prototype-pollution alerts that were added in #16408 will also be re-evaluated). ## Why not just install office_oxide in the CodeQL runner? - The `office_oxide` Go binding is a 3rd-party module (`github.com/yfedoseev/office_oxide/go`) with CGO that pulls in a Rust crate - The CodeQL runner uses a stock Go toolchain that doesn't include the C library - Installing it would require modifying the GitHub-managed CodeQL workflow, which is owned by GitHub and not easily customizable - The parsers are also unimplemented stubs (each `Parse` function logs the filename and returns `nil` after my earlier clear-text-logging fix), so they have no security-relevant code to scan anyway 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- .github/codeql/codeql-config.yml | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) create mode 100644 .github/codeql/codeql-config.yml diff --git a/.github/codeql/codeql-config.yml b/.github/codeql/codeql-config.yml new file mode 100644 index 0000000000..3900739ee6 --- /dev/null +++ b/.github/codeql/codeql-config.yml @@ -0,0 +1,25 @@ +# CodeQL configuration. The default CodeQL Analysis workflow (managed by +# GitHub) reads this file when scanning the repository. We use it to +# exclude files that the Go analysis cannot compile — the rest of the +# repo compiles fine, but the CGO-based office_oxide bindings require +# a native header (`office_oxide.h`) that isn't present in the CodeQL +# runner image. Without this exclusion the entire Go analysis aborts +# with `fatal error: office_oxide.h: No such file or directory`, which +# means no Go alerts can be re-evaluated and alerts on these files +# stay open indefinitely even after their root cause is fixed. +# +# The excluded files are MS Office document parsers. They are also +# excluded from `go test` and `go build` in local development when +# the office_oxide C library is not installed, so this exclusion +# brings CodeQL in line with the rest of the toolchain. +paths-ignore: + - internal/ingestion/parser/doc_parser.go + - internal/ingestion/parser/docx_parser.go + - internal/ingestion/parser/ppt_parser.go + - internal/ingestion/parser/pptx_parser.go + - internal/ingestion/parser/xls_parser.go + - internal/ingestion/parser/xlsx_parser.go + # Generated / vendored — also break analysis without adding signal. + - "**/testdata/**" + - "**/node_modules/**" + - "**/*.pb.go"