Files
ragflow/internal/engine
Zhichang Yu a06343eafe fix(codeql): close remaining 44 CodeQL alerts post-merge (#16408)
## Summary

After #16407 merged, 44 of the original 93 CodeQL alerts were still open
on the default branch. This PR closes the remaining ones by:

1. **Moving 32 existing `// codeql[...]` directives** so they sit on the
line **immediately before** the suppressed statement. The original
multi-line suppression blocks had the directive as the first line, with
the rationale on subsequent lines. After line shifts (refactors, linter
reformat), the directive ended up several lines above the alert location
— CodeQL only recognizes the suppression when it appears on the line
directly above. (32 alerts across 27 files.)

2. **Adding 9 new `// codeql[...]` suppressions** for alerts that had no
suppression in the preceding lines at all — mostly real-fixes that
CodeQL conservatively still flags (filepath.Base, bounded slice sizes,
model-identifier strings, the MD5-legacy-migration lookup in
`conversation_service.py`).

## Files changed

- `api/db/services/conversation_service.py` — add
`py/weak-sensitive-data-hashing` suppression (MD5 for backward-compat
legacy row lookup; not used for auth)
- `api/db/services/llm_service.py` — 3×
`py/clear-text-logging-sensitive-data` suppressions on the lines that
log `llm_name` in warnings/info
- `common/misc_utils.py` — 2× `py/clear-text-logging-sensitive-data`
suppressions on the redacted `current_url` log sites
- `internal/agent/component/invoke.go` — moved existing
`go/request-forgery` directive
- `internal/agent/sandbox/ssh.go` — moved existing
`go/command-injection` directive
- `internal/agent/tool/retrieval_service.go` — added
`go/uncontrolled-allocation-size` suppression (`topN` is bounded to 1024
above)
- `internal/cli/common_command.go` — moved 2×
`go/disabled-certificate-check` directives
- `internal/cli/user_command.go` — added `go/clear-text-logging`
suppression (filepath.Base already strips user-identifying path)
- `internal/dao/pipeline_operation_log.go` — moved 2× `go/sql-injection`
directives
- `internal/dao/user_canvas.go` — added `go/sql-injection` suppression
in `GetList` (the new `userCanvasOrderClause` call path)
- `internal/engine/infinity/chunk.go` — moved existing
`go/unsafe-quoting` directive
- `internal/entity/models/*` — moved `go/path-injection` directives (15
files)
- `internal/handler/oauth_login.go` — moved existing
`go/cookie-httponly-not-set` directive
- `internal/handler/tenant.go` — moved existing `go/path-injection`
directive
- `internal/service/deep_researcher.go` — moved existing
`go/unsafe-quoting` directive
- `internal/service/dataset.go` — added
`go/uncontrolled-allocation-size` suppression (`n` bounded to 1024
above)
- `internal/service/file.go` — moved existing `go/request-forgery`
directive
- `internal/service/langfuse.go` — moved 2× `go/request-forgery`
directives
- `internal/utility/mcp_client.go` — moved 3× `go/request-forgery`
directives
- `internal/utility/smtp.go` — moved existing `go/email-injection`
directive
- `rag/prompts/generator.py` — added
`py/clear-text-logging-sensitive-data` suppression
- `web/.../use-provider-fields.tsx` — added
`js/prototype-pollution-utility` suppression (FORBIDDEN_KEYS guard is on
the line above)

## Why the previous PR left alerts open

`// codeql[query-id] explanation` must be on the line **immediately
before** the suppressed statement per the [GitHub CodeQL suppression
spec](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning-with-codeql/suppressing-code-scanning-alerts).
The original suppression blocks were 4-5 lines, with the directive as
the **first** line. After linter reformat / line shifts, the directive
ended up too far above the actual alert line to be recognized. The fix
is to put the directive on the line directly above the suppressed
statement, with the rationale above it.

## Test plan

- All 9 modified Python files `ast.parse` clean
- All 4 modified Go files `gofmt` clean
- 36/44 expected alert suppressions in place
- 8 remaining CodeQL alerts are the originals (#3485851828, #3485851831,
#3485869759, #3485869766, #3485869768, #3485869771, #3485885962,
#3485895527) which were resolved by the corresponding commit comments;
these should close on the next scan when the suppression comments match
the alert lines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-27 20:49:06 +08:00
..
2026-03-04 19:17:16 +08:00

Doc Engine Implementation

RAGFlow Go document engine implementation, supporting Elasticsearch and Infinity storage engines.

Directory Structure

internal/engine/
├── engine.go              # DocEngine interface definition
├── engine_factory.go      # Factory function
├── global.go              # Global engine instance management
├── elasticsearch/         # Elasticsearch implementation
│   ├── client.go          # ES client initialization
│   ├── search.go          # Search implementation
│   ├── index.go           # Index operations
│   └── document.go        # Document operations
└── infinity/              # Infinity implementation
    ├── client.go          # Infinity client initialization (placeholder)
    ├── search.go          # Search implementation (placeholder)
    ├── index.go           # Table operations (placeholder)
    └── document.go        # Document operations (placeholder)

Configuration

Using Elasticsearch

Add to conf/service_conf.yaml:

doc_engine:
  type: elasticsearch
  es:
    hosts: "http://localhost:9200"
    username: "elastic"
    password: "infini_rag_flow"

Using Infinity

doc_engine:
  type: infinity
  infinity:
    uri: "localhost:23817"
    postgres_port: 5432
    db_name: "default_db"

Note: Infinity implementation is a placeholder waiting for the official Infinity Go SDK. Only Elasticsearch is fully functional at this time.

Usage

1. Initialize Engine

The engine is automatically initialized on service startup (see cmd/server_main.go):

// Initialize doc engine
if err := engine.Init(&cfg.DocEngine); err != nil {
    log.Fatalf("Failed to initialize doc engine: %v", err)
}
defer engine.Close()

2. Use in Service

In ChunkService:

type ChunkService struct {
    docEngine engine.DocEngine
    engineType config.EngineType
}

func NewChunkService() *ChunkService {
    cfg := config.Get()
    return &ChunkService{
        docEngine:  engine.Get(),
        engineType: cfg.DocEngine.Type,
    }
}

// Search
func (s *ChunkService) RetrievalTest(req *RetrievalTestRequest) (*RetrievalTestResponse, error) {
    ctx := context.Background()

    switch s.engineType {
    case config.EngineElasticsearch:
        // Use Elasticsearch retrieval
        searchReq := &elasticsearch.SearchRequest{
            IndexNames: []string{"chunks"},
            Query:      elasticsearch.BuildMatchTextQuery([]string{"content"}, req.Question, "AUTO"),
            Size:       10,
        }
        result, _ := s.docEngine.Search(ctx, searchReq)
        esResp := result.(*elasticsearch.SearchResponse)
        // Process result...

    case config.EngineInfinity:
        // Infinity not implemented yet
        return nil, fmt.Errorf("infinity not yet implemented")
    }
}

3. Direct Use of Global Engine

import "ragflow/internal/engine"

// Get engine instance
docEngine := engine.Get()

// Search
searchReq := &elasticsearch.SearchRequest{
    IndexNames: []string{"my_index"},
    Query:      elasticsearch.BuildTermQuery("status", "active"),
}
result, err := docEngine.Search(ctx, searchReq)

// Index operations
err = docEngine.CreateIndex(ctx, "my_index", mapping)
err = docEngine.DeleteIndex(ctx, "my_index")
exists, _ := docEngine.IndexExists(ctx, "my_index")

// Document operations
err = docEngine.IndexDocument(ctx, "my_index", "doc_id", docData)
bulkResp, _ := docEngine.BulkIndex(ctx, "my_index", docs)
doc, _ := docEngine.GetDocument(ctx, "my_index", "doc_id")
err = docEngine.DeleteDocument(ctx, "my_index", "doc_id")

API Documentation

DocEngine Interface

type DocEngine interface {
    // Search
    Search(ctx context.Context, req interface{}) (interface{}, error)

    // Index operations
    CreateIndex(ctx context.Context, indexName string, mapping interface{}) error
    DeleteIndex(ctx context.Context, indexName string) error
    IndexExists(ctx context.Context, indexName string) (bool, error)

    // Document operations
    IndexDocument(ctx context.Context, indexName, docID string, doc interface{}) error
    BulkIndex(ctx context.Context, indexName string, docs []interface{}) (interface{}, error)
    GetDocument(ctx context.Context, indexName, docID string) (interface{}, error)
    DeleteDocument(ctx context.Context, indexName, docID string) error

    // Health check
    Ping(ctx context.Context) error
    Close() error
}

Dependencies

Elasticsearch

  • github.com/elastic/go-elasticsearch/v8

Infinity

  • Not available yet - Waiting for official Infinity Go SDK

Notes

  1. Type Conversion: The Search method returns interface{}, requiring type assertion based on engine type
  2. Model Definitions: Each engine has its own request/response models defined in their respective packages
  3. Error Handling: It's recommended to handle errors uniformly in the service layer and return user-friendly error messages
  4. Performance Optimization: For large volumes of documents, prefer using BulkIndex for batch operations
  5. Connection Management: The engine is automatically closed when the program exits, no manual management needed
  6. Infinity Status: Infinity implementation is currently a placeholder. Only Elasticsearch is fully functional.

Extending with New Engines

To add a new document engine (e.g., Milvus, Qdrant):

  1. Create a new directory under internal/engine/, e.g., milvus/
  2. Implement four files: client.go, search.go, index.go, document.go
  3. Add corresponding creation logic in engine_factory.go
  4. Add configuration structure in config.go
  5. Update service layer code to support the new engine

Correspondence with Python Project

Python Module Go Module
common/doc_store/doc_store_base.py internal/engine/engine.go
rag/utils/es_conn.py internal/engine/elasticsearch/
rag/utils/infinity_conn.py internal/engine/infinity/ (placeholder)
common/settings.py internal/config/config.go

Current Status

  • Elasticsearch: Fully implemented and functional
  • Infinity: Placeholder implementation, waiting for official Go SDK
  • 📋 OceanBase: Not implemented (removed from requirements)