mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
### What problem does this PR solve? ``` RAGFlow(admin)> mq publish 'msg2'; SUCCESS RAGFlow(admin)> mq publish 'msg3'; SUCCESS RAGFlow(admin)> mq list; +---------+---------------+ | message | subject | +---------+---------------+ | msg1 | tasks.RAGFLOW | | msg2 | tasks.RAGFLOW | | msg3 | tasks.RAGFLOW | +---------+---------------+ RAGFlow(admin)> mq pull 2; +---------+---------------+ | message | subject | +---------+---------------+ | msg1 | tasks.RAGFLOW | | msg2 | tasks.RAGFLOW | +---------+---------------+ RAGFlow(admin)> mq pull noack; +---------+---------------+ | message | subject | +---------+---------------+ | abc | tasks.RAGFLOW | +---------+---------------+ RAGFlow(admin)> mq show +-------------------+----------------+--------+---------------+---------------+-------------------+---------------+ | ack_pending_count | consumer_count | memory | message_count | pending_count | redelivered_count | waiting_count | +-------------------+----------------+--------+---------------+---------------+-------------------+---------------+ | 2 | 1 | 0 | 2 | 0 | 1 | 0 | +-------------------+----------------+--------+---------------+---------------+-------------------+---------------+ RAGFlow(admin)> list ingestors; +--------------+-------------------------------------------+--------+ | host | name | status | +--------------+-------------------------------------------+--------+ | 192.168.1.38 | ingestor-8f0e4bd5650a4ac58b0151969fbf6935 | alive | +--------------+-------------------------------------------+--------+ RAGFlow(admin)> list ingestion tasks; +----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+ | document_id | id | status | step | user | user_id | +----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+ | ffe64fae423411f1a2d938a74640adcc | 90d3d0f6528941c1ac8eb0360effccc4 | COMPLETED | 5 | aaa@aaa.com | 2ba4881420fa11f19e9c38a74640adcc | +----------------------------------+----------------------------------+-----------+------+-------------+----------------------------------+ RAGFlow(admin)> remove ingestion tasks '90d3d0f6528941c1ac8eb0360effccc4'; +---------+----------------------------------+ | delete | task_id | +---------+----------------------------------+ | success | 90d3d0f6528941c1ac8eb0360effccc4 | +---------+----------------------------------+ RAGFlow(admin)> stop ingestion tasks 'e89e20d9a25848a1b79bd9345ddbfe1d'; +----------+----------------------------------+ | status | task_id | +----------+----------------------------------+ | STOPPING | e89e20d9a25848a1b79bd9345ddbfe1d | +----------+----------------------------------+ # Publish a message RAGFlow(admin)> mq publish 'cdd'; SUCCESS # List current tasks in the message queue RAGFlow(admin)> mq list +----------------------------------+---------------+ | message | subject | +----------------------------------+---------------+ | 7ce392a3c1624cd2be4b5276e8825059 | tasks.RAGFLOW | +----------------------------------+---------------+ # Consume a task from the message queue RAGFlow(admin)> mq pull +------+-----+----------------+ | ack | id | type | +------+-----+----------------+ | true | cdd | ingestion_test | +------+-----+----------------+ # User mode # List ingestion tasks, followed by dataset id RAGFlow(user)> list ingestion tasks from '0abe79f9423311f1ad8d38a74640adcc'; +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ | create_date | create_time | dataset_id | document_id | id | schema | status | update_date | update_time | user_id | +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ | 2026-05-30T20:21:06+08:00 | 1780143666289 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | 8d758cd14a8b4ba8ab505003fb52017d | | COMPLETED | 2026-05-30T20:21:26+08:00 | 1780143686431 | 2ba4881420fa11f19e9c38a74640adcc | +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ RAGFlow(user)> list ingestion tasks; +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ | create_date | create_time | dataset_id | document_id | id | schema | status | update_date | update_time | user_id | +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ | 2026-06-02T19:02:31+08:00 | 1780398151417 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | e89e20d9a25848a1b79bd9345ddbfe1d | | COMPLETED | 2026-06-02T19:02:52+08:00 | 1780398172208 | 2ba4881420fa11f19e9c38a74640adcc | +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ # Create an ingestion task # First argument is document id, second argument is dataset id RAGFlow(user)> start ingestion 'ffe64fae423411f1a2d938a74640adcc' from '0abe79f9423311f1ad8d38a74640adcc'; +----------------------------------+-------------------------------------------+ | document_id | result | +----------------------------------+-------------------------------------------+ | ffe64fae423411f1a2d938a74640adcc | task_id: 8d758cd14a8b4ba8ab505003fb52017d | +----------------------------------+-------------------------------------------+ # Pause an ingestion task, first argument is ingestion id RAGFlow(user)> stop ingestion '8d758cd14a8b4ba8ab505003fb52017d'; +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ | create_date | create_time | dataset_id | document_id | id | schema | status | update_date | update_time | user_id | +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ | 2026-05-30T20:21:06+08:00 | 1780143666289 | 0abe79f9423311f1ad8d38a74640adcc | ffe64fae423411f1a2d938a74640adcc | 8d758cd14a8b4ba8ab505003fb52017d | | COMPLETED | 2026-05-30T20:21:26+08:00 | 1780143686431 | 2ba4881420fa11f19e9c38a74640adcc | +---------------------------+---------------+----------------------------------+----------------------------------+----------------------------------+--------+-----------+---------------------------+---------------+----------------------------------+ # Delete an ingestion task RAGFlow(api/default)> remove ingestion tasks 'f366450a27d54677aec1c7090add30f0'; +---------+----------------------------------+ | remove | task_id | +---------+----------------------------------+ | success | f366450a27d54677aec1c7090add30f0 | +---------+----------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Doc Engine Implementation
RAGFlow Go document engine implementation, supporting Elasticsearch and Infinity storage engines.
Directory Structure
internal/engine/
├── engine.go # DocEngine interface definition
├── engine_factory.go # Factory function
├── global.go # Global engine instance management
├── elasticsearch/ # Elasticsearch implementation
│ ├── client.go # ES client initialization
│ ├── search.go # Search implementation
│ ├── index.go # Index operations
│ └── document.go # Document operations
└── infinity/ # Infinity implementation
├── client.go # Infinity client initialization (placeholder)
├── search.go # Search implementation (placeholder)
├── index.go # Table operations (placeholder)
└── document.go # Document operations (placeholder)
Configuration
Using Elasticsearch
Add to conf/service_conf.yaml:
doc_engine:
type: elasticsearch
es:
hosts: "http://localhost:9200"
username: "elastic"
password: "infini_rag_flow"
Using Infinity
doc_engine:
type: infinity
infinity:
uri: "localhost:23817"
postgres_port: 5432
db_name: "default_db"
Note: Infinity implementation is a placeholder waiting for the official Infinity Go SDK. Only Elasticsearch is fully functional at this time.
Usage
1. Initialize Engine
The engine is automatically initialized on service startup (see cmd/server_main.go):
// Initialize doc engine
if err := engine.Init(&cfg.DocEngine); err != nil {
log.Fatalf("Failed to initialize doc engine: %v", err)
}
defer engine.Close()
2. Use in Service
In ChunkService:
type ChunkService struct {
docEngine engine.DocEngine
engineType config.EngineType
}
func NewChunkService() *ChunkService {
cfg := config.Get()
return &ChunkService{
docEngine: engine.Get(),
engineType: cfg.DocEngine.Type,
}
}
// Search
func (s *ChunkService) RetrievalTest(req *RetrievalTestRequest) (*RetrievalTestResponse, error) {
ctx := context.Background()
switch s.engineType {
case config.EngineElasticsearch:
// Use Elasticsearch retrieval
searchReq := &elasticsearch.SearchRequest{
IndexNames: []string{"chunks"},
Query: elasticsearch.BuildMatchTextQuery([]string{"content"}, req.Question, "AUTO"),
Size: 10,
}
result, _ := s.docEngine.Search(ctx, searchReq)
esResp := result.(*elasticsearch.SearchResponse)
// Process result...
case config.EngineInfinity:
// Infinity not implemented yet
return nil, fmt.Errorf("infinity not yet implemented")
}
}
3. Direct Use of Global Engine
import "ragflow/internal/engine"
// Get engine instance
docEngine := engine.Get()
// Search
searchReq := &elasticsearch.SearchRequest{
IndexNames: []string{"my_index"},
Query: elasticsearch.BuildTermQuery("status", "active"),
}
result, err := docEngine.Search(ctx, searchReq)
// Index operations
err = docEngine.CreateIndex(ctx, "my_index", mapping)
err = docEngine.DeleteIndex(ctx, "my_index")
exists, _ := docEngine.IndexExists(ctx, "my_index")
// Document operations
err = docEngine.IndexDocument(ctx, "my_index", "doc_id", docData)
bulkResp, _ := docEngine.BulkIndex(ctx, "my_index", docs)
doc, _ := docEngine.GetDocument(ctx, "my_index", "doc_id")
err = docEngine.DeleteDocument(ctx, "my_index", "doc_id")
API Documentation
DocEngine Interface
type DocEngine interface {
// Search
Search(ctx context.Context, req interface{}) (interface{}, error)
// Index operations
CreateIndex(ctx context.Context, indexName string, mapping interface{}) error
DeleteIndex(ctx context.Context, indexName string) error
IndexExists(ctx context.Context, indexName string) (bool, error)
// Document operations
IndexDocument(ctx context.Context, indexName, docID string, doc interface{}) error
BulkIndex(ctx context.Context, indexName string, docs []interface{}) (interface{}, error)
GetDocument(ctx context.Context, indexName, docID string) (interface{}, error)
DeleteDocument(ctx context.Context, indexName, docID string) error
// Health check
Ping(ctx context.Context) error
Close() error
}
Dependencies
Elasticsearch
github.com/elastic/go-elasticsearch/v8
Infinity
- Not available yet - Waiting for official Infinity Go SDK
Notes
- Type Conversion: The
Searchmethod returnsinterface{}, requiring type assertion based on engine type - Model Definitions: Each engine has its own request/response models defined in their respective packages
- Error Handling: It's recommended to handle errors uniformly in the service layer and return user-friendly error messages
- Performance Optimization: For large volumes of documents, prefer using
BulkIndexfor batch operations - Connection Management: The engine is automatically closed when the program exits, no manual management needed
- Infinity Status: Infinity implementation is currently a placeholder. Only Elasticsearch is fully functional.
Extending with New Engines
To add a new document engine (e.g., Milvus, Qdrant):
- Create a new directory under
internal/engine/, e.g.,milvus/ - Implement four files:
client.go,search.go,index.go,document.go - Add corresponding creation logic in
engine_factory.go - Add configuration structure in
config.go - Update service layer code to support the new engine
Correspondence with Python Project
| Python Module | Go Module |
|---|---|
common/doc_store/doc_store_base.py |
internal/engine/engine.go |
rag/utils/es_conn.py |
internal/engine/elasticsearch/ |
rag/utils/infinity_conn.py |
internal/engine/infinity/ (placeholder) |
common/settings.py |
internal/config/config.go |
Current Status
- ✅ Elasticsearch: Fully implemented and functional
- ⏳ Infinity: Placeholder implementation, waiting for official Go SDK
- 📋 OceanBase: Not implemented (removed from requirements)