mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 23:41:12 +08:00
Feat: mysql sync (#14200)
### What problem does this PR solve? Add a script to sync db schema with peewee_migrate. ### Type of change - [x] Other (please describe): tool script
This commit is contained in:
@@ -1,4 +1,13 @@
|
||||
# MySQL Data Migration Script
|
||||
# Database Scripts
|
||||
|
||||
This directory contains database-related utility scripts for RAGFlow.
|
||||
|
||||
- **mysql_migration.py**: Data migration between tables with stage-based execution
|
||||
- **db_schema_sync.py**: Database schema synchronization using peewee-migrate
|
||||
|
||||
---
|
||||
|
||||
# mysql_migration.py
|
||||
|
||||
A flexible MySQL data migration tool for migrating data between tables with stage-based execution.
|
||||
|
||||
@@ -208,3 +217,130 @@ Stage Details:
|
||||
| `[DRY RUN] Target table does not exist` | Target table missing, use `--execute` or `--create-table-only`to create |
|
||||
| `Dependency table does not exist` | Required table from previous stage missing |
|
||||
| `Inserted batch X: Y records` | Successfully inserted batch of records |
|
||||
|
||||
---
|
||||
|
||||
# db_schema_sync.py
|
||||
|
||||
A database schema synchronization tool that uses peewee-migrate to detect and manage schema changes.
|
||||
|
||||
## Overview
|
||||
|
||||
This script:
|
||||
1. Reads model definitions from `api/db/db_models.py`
|
||||
2. Compares with existing database tables specified via command line
|
||||
3. Generates migration files in `tools/migrate/{version}/`
|
||||
|
||||
### Detected Change Types
|
||||
|
||||
| Change Type | Description | Auto-included? |
|
||||
|-------------|-------------|----------------|
|
||||
| New table | Model class with no corresponding DB table | Yes |
|
||||
| New field | Model field not present in DB table | Yes |
|
||||
| Field type change | Model field type differs from DB column type | Yes |
|
||||
| Removed field | DB column not present in model definition | No (requires `--drop`) |
|
||||
|
||||
> **Warning**: Removed fields are **not** included in migrations by default. You must explicitly use `--drop` to generate `DROP COLUMN` statements, as this operation permanently deletes data.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Install peewee-migrate:
|
||||
```bash
|
||||
pip install peewee-migrate
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Command Line Arguments
|
||||
|
||||
```
|
||||
python db_schema_sync.py [OPTIONS]
|
||||
```
|
||||
|
||||
| Option | Short | Description |
|
||||
|--------|-------|-------------|
|
||||
| `--host` | - | MySQL host (required) |
|
||||
| `--port` | - | MySQL port (default: 3306) |
|
||||
| `--user` | - | MySQL user (required) |
|
||||
| `--password` | - | MySQL password (required) |
|
||||
| `--database` | - | MySQL database name (required) |
|
||||
| `--version` | `-v` | Version number in format `vxx.xx.xx` (required) |
|
||||
| `--list` | `-l` | List all migrations |
|
||||
| `--create` | - | Create a new migration (auto-detect changes) |
|
||||
| `--migrate` | `-m` | Run pending migrations |
|
||||
| `--diff` | `-d` | Show schema differences |
|
||||
| `--name` | `-n` | Migration name (default: auto) |
|
||||
| `--drop` | - | Include `DROP COLUMN` for fields removed from models (destructive - permanently deletes data!) |
|
||||
|
||||
### Version Format
|
||||
|
||||
Version must be in format `vxx.xx.xx` where `xx` are digits:
|
||||
- Valid: `v0.24.0`, `v1.0.0`, `v10.20.30`
|
||||
- Invalid: `0.24.0`, `v0.24`, `v0.24.0.1`
|
||||
|
||||
### Migration File Location
|
||||
|
||||
Migration files are stored in:
|
||||
```
|
||||
tools/migrate/{version_dir}/
|
||||
```
|
||||
|
||||
Where `{version_dir}` is the version with `.` replaced by `_`.
|
||||
|
||||
Example: Version `v0.24.0` → Directory `tools/migrate/v0_24_0/`
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# List all migrations
|
||||
python db_schema_sync.py --list \
|
||||
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
||||
--version v0.24.0
|
||||
|
||||
# Create a new auto-detected migration (new tables, new fields, type changes only)
|
||||
python db_schema_sync.py --create \
|
||||
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
||||
--version v0.24.0
|
||||
|
||||
# Create a migration including dropped fields (destructive!)
|
||||
python db_schema_sync.py --create --drop \
|
||||
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
||||
--version v0.24.0
|
||||
|
||||
# Create a named migration
|
||||
python db_schema_sync.py --create --name add_user_table \
|
||||
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
||||
--version v0.24.0
|
||||
|
||||
# Run all pending migrations
|
||||
python db_schema_sync.py --migrate \
|
||||
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
||||
--version v0.24.0
|
||||
|
||||
# Show schema differences (including removed fields)
|
||||
python db_schema_sync.py --diff \
|
||||
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
||||
--version v0.24.0
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Load Models**: Imports all model classes from `api/db/db_models.py`
|
||||
2. **Connect Database**: Creates MySQL connection from command line arguments
|
||||
3. **Detect Changes**: Compares model definitions with actual database schema:
|
||||
- New tables → `create_model`
|
||||
- New fields → `ALTER TABLE ADD COLUMN`
|
||||
- Field type changes → `ALTER TABLE MODIFY COLUMN`
|
||||
- Removed fields → `ALTER TABLE DROP COLUMN` (only with `--drop`)
|
||||
4. **Generate Migration**: Creates Python migration file with `migrate()` and `rollback()` functions
|
||||
|
||||
### Rollback Behavior
|
||||
|
||||
| Forward Operation | Rollback Operation |
|
||||
|-------------------|--------------------|
|
||||
| `CREATE TABLE` | `remove_model` |
|
||||
| `ADD COLUMN` | `DROP COLUMN` |
|
||||
| `MODIFY COLUMN` | `MODIFY COLUMN` (restore original type) |
|
||||
| `DROP COLUMN` | `ADD COLUMN` (restore column definition; **data is lost**) |
|
||||
|
||||
> **Note**: Rolling back a `DROP COLUMN` will re-add the column structure, but the data that was in it cannot be recovered.
|
||||
|
||||
Reference in New Issue
Block a user