Files
ragflow/internal/engine
Jin Hai e96bc37d06 Go: use NATS as the message queue (#15327)
### What problem does this PR solve?

```
RAGFlow(admin)> mq publish 'msg2';
SUCCESS
RAGFlow(admin)> mq publish 'msg3';
SUCCESS
RAGFlow(admin)> mq list;
+---------+---------------+
| message | subject       |
+---------+---------------+
| msg1    | tasks.RAGFLOW |
| msg2    | tasks.RAGFLOW |
| msg3    | tasks.RAGFLOW |
+---------+---------------+
RAGFlow(admin)> mq pull 2;
+---------+---------------+
| message | subject       |
+---------+---------------+
| msg1    | tasks.RAGFLOW |
| msg2    | tasks.RAGFLOW |
+---------+---------------+
RAGFlow(admin)> mq pull noack;
+---------+---------------+
| message | subject       |
+---------+---------------+
| abc     | tasks.RAGFLOW |
+---------+---------------+
RAGFlow(admin)> mq show
+-------------------+----------------+--------+---------------+---------------+-------------------+---------------+
| ack_pending_count | consumer_count | memory | message_count | pending_count | redelivered_count | waiting_count |
+-------------------+----------------+--------+---------------+---------------+-------------------+---------------+
| 2                 | 1              | 0      | 2             | 0             | 1                 | 0             |
+-------------------+----------------+--------+---------------+---------------+-------------------+---------------+

RAGFlow(admin)> list ingestors;
+--------------+-------------------------------------------+--------+
| host         | name                                      | status |
+--------------+-------------------------------------------+--------+
| 192.168.1.38 | ingestor-8f0e4bd5650a4ac58b0151969fbf6935 | alive  |
+--------------+-------------------------------------------+--------+

RAGFlow(admin)> list ingestion tasks;
+----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+
| document_id                      | id                               | status    | step | user        | user_id                          |
+----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+
| ffe64fae423411f1a2d938a74640adcc | 90d3d0f6528941c1ac8eb0360effccc4 | COMPLETED | 5    | aaa@aaa.com | 2ba4881420fa11f19e9c38a74640adcc |
+----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+

RAGFlow(admin)> remove ingestion tasks '90d3d0f6528941c1ac8eb0360effccc4';
+---------+----------------------------------+
| delete  | task_id                          |
+---------+----------------------------------+
| success | 90d3d0f6528941c1ac8eb0360effccc4 |
+---------+----------------------------------+

RAGFlow(admin)> stop ingestion tasks 'e89e20d9a25848a1b79bd9345ddbfe1d';
+----------+----------------------------------+
| status   | task_id                          |
+----------+----------------------------------+
| STOPPING | e89e20d9a25848a1b79bd9345ddbfe1d |
+----------+----------------------------------+

# Publish a message
RAGFlow(admin)> mq publish 'cdd';
SUCCESS

# List current tasks in the message queue
RAGFlow(admin)> mq list
+----------------------------------+---------------+
| message                          | subject       |
+----------------------------------+---------------+
| 7ce392a3c1624cd2be4b5276e8825059 | tasks.RAGFLOW |
+----------------------------------+---------------+

# Consume a task from the message queue
RAGFlow(admin)> mq pull
+------+-----+----------------+
| ack  | id  | type           |
+------+-----+----------------+
| true | cdd | ingestion_test |
+------+-----+----------------+

# User mode
# List ingestion tasks, followed by dataset id
RAGFlow(user)> list ingestion tasks from '0abe79f9423311f1ad8d38a74640adcc';
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| create_date               | create_time   | dataset_id                       | document_id                      | id                               | schema | status    | update_date               | update_time   | user_id                          |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| 2026-05-30T20:21:06+08:00 | 1780143666289 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | 8d758cd14a8b4ba8ab505003fb52017d |        | COMPLETED | 2026-05-30T20:21:26+08:00 | 1780143686431 | 2ba4881420fa11f19e9c38a74640adcc |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+

RAGFlow(user)> list ingestion tasks;
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| create_date               | create_time   | dataset_id                       | document_id                      | id                               | schema | status    | update_date               | update_time   | user_id                          |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| 2026-06-02T19:02:31+08:00 | 1780398151417 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | e89e20d9a25848a1b79bd9345ddbfe1d |        | COMPLETED | 2026-06-02T19:02:52+08:00 | 1780398172208 | 2ba4881420fa11f19e9c38a74640adcc |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+

# Create an ingestion task
# First argument is document id, second argument is dataset id
RAGFlow(user)> start ingestion 'ffe64fae423411f1a2d938a74640adcc' from '0abe79f9423311f1ad8d38a74640adcc';
+----------------------------------+-------------------------------------------+
| document_id                      | result                                    |
+----------------------------------+-------------------------------------------+
| ffe64fae423411f1a2d938a74640adcc | task_id: 8d758cd14a8b4ba8ab505003fb52017d |
+----------------------------------+-------------------------------------------+

# Pause an ingestion task, first argument is ingestion id
RAGFlow(user)> stop ingestion '8d758cd14a8b4ba8ab505003fb52017d';
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| create_date               | create_time   | dataset_id                       | document_id                      | id                               | schema | status    | update_date               | update_time   | user_id                          |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+
| 2026-05-30T20:21:06+08:00 | 1780143666289 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | 8d758cd14a8b4ba8ab505003fb52017d |        | COMPLETED | 2026-05-30T20:21:26+08:00 | 1780143686431 | 2ba4881420fa11f19e9c38a74640adcc |
+---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+

# Delete an ingestion task
RAGFlow(api/default)> remove ingestion tasks 'f366450a27d54677aec1c7090add30f0';
+---------+----------------------------------+
| remove  | task_id                          |
+---------+----------------------------------+
| success | f366450a27d54677aec1c7090add30f0 |
+---------+----------------------------------+

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-12 14:56:44 +08:00
..
2026-03-04 19:17:16 +08:00

Doc Engine Implementation

RAGFlow Go document engine implementation, supporting Elasticsearch and Infinity storage engines.

Directory Structure

internal/engine/
├── engine.go              # DocEngine interface definition
├── engine_factory.go      # Factory function
├── global.go              # Global engine instance management
├── elasticsearch/         # Elasticsearch implementation
│   ├── client.go          # ES client initialization
│   ├── search.go          # Search implementation
│   ├── index.go           # Index operations
│   └── document.go        # Document operations
└── infinity/              # Infinity implementation
    ├── client.go          # Infinity client initialization (placeholder)
    ├── search.go          # Search implementation (placeholder)
    ├── index.go           # Table operations (placeholder)
    └── document.go        # Document operations (placeholder)

Configuration

Using Elasticsearch

Add to conf/service_conf.yaml:

doc_engine:
  type: elasticsearch
  es:
    hosts: "http://localhost:9200"
    username: "elastic"
    password: "infini_rag_flow"

Using Infinity

doc_engine:
  type: infinity
  infinity:
    uri: "localhost:23817"
    postgres_port: 5432
    db_name: "default_db"

Note: Infinity implementation is a placeholder waiting for the official Infinity Go SDK. Only Elasticsearch is fully functional at this time.

Usage

1. Initialize Engine

The engine is automatically initialized on service startup (see cmd/server_main.go):

// Initialize doc engine
if err := engine.Init(&cfg.DocEngine); err != nil {
    log.Fatalf("Failed to initialize doc engine: %v", err)
}
defer engine.Close()

2. Use in Service

In ChunkService:

type ChunkService struct {
    docEngine engine.DocEngine
    engineType config.EngineType
}

func NewChunkService() *ChunkService {
    cfg := config.Get()
    return &ChunkService{
        docEngine:  engine.Get(),
        engineType: cfg.DocEngine.Type,
    }
}

// Search
func (s *ChunkService) RetrievalTest(req *RetrievalTestRequest) (*RetrievalTestResponse, error) {
    ctx := context.Background()

    switch s.engineType {
    case config.EngineElasticsearch:
        // Use Elasticsearch retrieval
        searchReq := &elasticsearch.SearchRequest{
            IndexNames: []string{"chunks"},
            Query:      elasticsearch.BuildMatchTextQuery([]string{"content"}, req.Question, "AUTO"),
            Size:       10,
        }
        result, _ := s.docEngine.Search(ctx, searchReq)
        esResp := result.(*elasticsearch.SearchResponse)
        // Process result...

    case config.EngineInfinity:
        // Infinity not implemented yet
        return nil, fmt.Errorf("infinity not yet implemented")
    }
}

3. Direct Use of Global Engine

import "ragflow/internal/engine"

// Get engine instance
docEngine := engine.Get()

// Search
searchReq := &elasticsearch.SearchRequest{
    IndexNames: []string{"my_index"},
    Query:      elasticsearch.BuildTermQuery("status", "active"),
}
result, err := docEngine.Search(ctx, searchReq)

// Index operations
err = docEngine.CreateIndex(ctx, "my_index", mapping)
err = docEngine.DeleteIndex(ctx, "my_index")
exists, _ := docEngine.IndexExists(ctx, "my_index")

// Document operations
err = docEngine.IndexDocument(ctx, "my_index", "doc_id", docData)
bulkResp, _ := docEngine.BulkIndex(ctx, "my_index", docs)
doc, _ := docEngine.GetDocument(ctx, "my_index", "doc_id")
err = docEngine.DeleteDocument(ctx, "my_index", "doc_id")

API Documentation

DocEngine Interface

type DocEngine interface {
    // Search
    Search(ctx context.Context, req interface{}) (interface{}, error)

    // Index operations
    CreateIndex(ctx context.Context, indexName string, mapping interface{}) error
    DeleteIndex(ctx context.Context, indexName string) error
    IndexExists(ctx context.Context, indexName string) (bool, error)

    // Document operations
    IndexDocument(ctx context.Context, indexName, docID string, doc interface{}) error
    BulkIndex(ctx context.Context, indexName string, docs []interface{}) (interface{}, error)
    GetDocument(ctx context.Context, indexName, docID string) (interface{}, error)
    DeleteDocument(ctx context.Context, indexName, docID string) error

    // Health check
    Ping(ctx context.Context) error
    Close() error
}

Dependencies

Elasticsearch

  • github.com/elastic/go-elasticsearch/v8

Infinity

  • Not available yet - Waiting for official Infinity Go SDK

Notes

  1. Type Conversion: The Search method returns interface{}, requiring type assertion based on engine type
  2. Model Definitions: Each engine has its own request/response models defined in their respective packages
  3. Error Handling: It's recommended to handle errors uniformly in the service layer and return user-friendly error messages
  4. Performance Optimization: For large volumes of documents, prefer using BulkIndex for batch operations
  5. Connection Management: The engine is automatically closed when the program exits, no manual management needed
  6. Infinity Status: Infinity implementation is currently a placeholder. Only Elasticsearch is fully functional.

Extending with New Engines

To add a new document engine (e.g., Milvus, Qdrant):

  1. Create a new directory under internal/engine/, e.g., milvus/
  2. Implement four files: client.go, search.go, index.go, document.go
  3. Add corresponding creation logic in engine_factory.go
  4. Add configuration structure in config.go
  5. Update service layer code to support the new engine

Correspondence with Python Project

Python Module Go Module
common/doc_store/doc_store_base.py internal/engine/engine.go
rag/utils/es_conn.py internal/engine/elasticsearch/
rag/utils/infinity_conn.py internal/engine/infinity/ (placeholder)
common/settings.py internal/config/config.go

Current Status

  • Elasticsearch: Fully implemented and functional
  • Infinity: Placeholder implementation, waiting for official Go SDK
  • 📋 OceanBase: Not implemented (removed from requirements)