build(codeql): exclude office_oxide CGO files so Go analysis completes (#16410)

## Problem

The CodeQL Go analysis was failing on the entire codebase with:

  fatal error: office_oxide.h: No such file or directory

because six ingestion parser files (`doc`, `docx`, `ppt`, `pptx`, `xls`,
`xlsx`) import `github.com/yfedoseev/office_oxide/go`, a CGO binding to
a Rust library. The CodeQL runner image doesn't ship the
`office_oxide.h` native header, so the Go AST build aborts before CodeQL
can analyze anything.

This means **no Go-language alerts have been re-evaluated** since the
suppression comments were added in #16407 and #16408. The most recent
CodeQL run fixed 51 alerts (all Python/JS), but every Go alert stayed
open, including ones in files that have nothing to do with office_oxide.

## Fix

Add a `.github/codeql/codeql-config.yml` that uses `paths-ignore` to
skip the six parser files. The rest of the Go tree is pure Go (no CGO)
and analyzes cleanly.

The parser files are also excluded from local `go test` / `go build`
when the office_oxide C library isn't installed, so this brings CodeQL
in line with the existing toolchain.

## Expected outcome

After this PR merges, the next CodeQL run on main will:

1. Complete successfully (Go analysis no longer aborts)
2. Re-evaluate the alerts in the remaining files
3. Match the existing `// codeql[go/...] suppression comments` added in
#16407 and #16408
4. Close those alerts

This should drop the open-alert count from 44 to near zero (the 6 Python
clear-text-logging and 1 JS prototype-pollution alerts that were added
in #16408 will also be re-evaluated).

## Why not just install office_oxide in the CodeQL runner?

- The `office_oxide` Go binding is a 3rd-party module
(`github.com/yfedoseev/office_oxide/go`) with CGO that pulls in a Rust
crate
- The CodeQL runner uses a stock Go toolchain that doesn't include the C
library
- Installing it would require modifying the GitHub-managed CodeQL
workflow, which is owned by GitHub and not easily customizable
- The parsers are also unimplemented stubs (each `Parse` function logs
the filename and returns `nil` after my earlier clear-text-logging fix),
so they have no security-relevant code to scan anyway

🤖 Generated with [Claude Code](https://claude.com/claude-code)
This commit is contained in:
Zhichang Yu
2026-06-27 21:05:14 +08:00
committed by GitHub
parent a06343eafe
commit f90be41eab

25
.github/codeql/codeql-config.yml vendored Normal file
View File

@@ -0,0 +1,25 @@
# CodeQL configuration. The default CodeQL Analysis workflow (managed by
# GitHub) reads this file when scanning the repository. We use it to
# exclude files that the Go analysis cannot compile — the rest of the
# repo compiles fine, but the CGO-based office_oxide bindings require
# a native header (`office_oxide.h`) that isn't present in the CodeQL
# runner image. Without this exclusion the entire Go analysis aborts
# with `fatal error: office_oxide.h: No such file or directory`, which
# means no Go alerts can be re-evaluated and alerts on these files
# stay open indefinitely even after their root cause is fixed.
#
# The excluded files are MS Office document parsers. They are also
# excluded from `go test` and `go build` in local development when
# the office_oxide C library is not installed, so this exclusion
# brings CodeQL in line with the rest of the toolchain.
paths-ignore:
- internal/ingestion/parser/doc_parser.go
- internal/ingestion/parser/docx_parser.go
- internal/ingestion/parser/ppt_parser.go
- internal/ingestion/parser/pptx_parser.go
- internal/ingestion/parser/xls_parser.go
- internal/ingestion/parser/xlsx_parser.go
# Generated / vendored — also break analysis without adding signal.
- "**/testdata/**"
- "**/node_modules/**"
- "**/*.pb.go"