Feat: mysql sync (#14200)

### What problem does this PR solve?

Add a script to sync db schema with peewee_migrate.

### Type of change

- [x] Other (please describe): tool script
This commit is contained in:
Lynn
2026-04-20 11:40:01 +08:00
committed by GitHub
parent 4e992de91f
commit 0f806dc3ca
2 changed files with 1088 additions and 1 deletions

View File

@@ -1,4 +1,13 @@
# MySQL Data Migration Script
# Database Scripts
This directory contains database-related utility scripts for RAGFlow.
- **mysql_migration.py**: Data migration between tables with stage-based execution
- **db_schema_sync.py**: Database schema synchronization using peewee-migrate
---
# mysql_migration.py
A flexible MySQL data migration tool for migrating data between tables with stage-based execution.
@@ -208,3 +217,130 @@ Stage Details:
| `[DRY RUN] Target table does not exist` | Target table missing, use `--execute` or `--create-table-only`to create |
| `Dependency table does not exist` | Required table from previous stage missing |
| `Inserted batch X: Y records` | Successfully inserted batch of records |
---
# db_schema_sync.py
A database schema synchronization tool that uses peewee-migrate to detect and manage schema changes.
## Overview
This script:
1. Reads model definitions from `api/db/db_models.py`
2. Compares with existing database tables specified via command line
3. Generates migration files in `tools/migrate/{version}/`
### Detected Change Types
| Change Type | Description | Auto-included? |
|-------------|-------------|----------------|
| New table | Model class with no corresponding DB table | Yes |
| New field | Model field not present in DB table | Yes |
| Field type change | Model field type differs from DB column type | Yes |
| Removed field | DB column not present in model definition | No (requires `--drop`) |
> **Warning**: Removed fields are **not** included in migrations by default. You must explicitly use `--drop` to generate `DROP COLUMN` statements, as this operation permanently deletes data.
## Prerequisites
Install peewee-migrate:
```bash
pip install peewee-migrate
```
## Usage
### Command Line Arguments
```
python db_schema_sync.py [OPTIONS]
```
| Option | Short | Description |
|--------|-------|-------------|
| `--host` | - | MySQL host (required) |
| `--port` | - | MySQL port (default: 3306) |
| `--user` | - | MySQL user (required) |
| `--password` | - | MySQL password (required) |
| `--database` | - | MySQL database name (required) |
| `--version` | `-v` | Version number in format `vxx.xx.xx` (required) |
| `--list` | `-l` | List all migrations |
| `--create` | - | Create a new migration (auto-detect changes) |
| `--migrate` | `-m` | Run pending migrations |
| `--diff` | `-d` | Show schema differences |
| `--name` | `-n` | Migration name (default: auto) |
| `--drop` | - | Include `DROP COLUMN` for fields removed from models (destructive - permanently deletes data!) |
### Version Format
Version must be in format `vxx.xx.xx` where `xx` are digits:
- Valid: `v0.24.0`, `v1.0.0`, `v10.20.30`
- Invalid: `0.24.0`, `v0.24`, `v0.24.0.1`
### Migration File Location
Migration files are stored in:
```
tools/migrate/{version_dir}/
```
Where `{version_dir}` is the version with `.` replaced by `_`.
Example: Version `v0.24.0` → Directory `tools/migrate/v0_24_0/`
### Examples
```bash
# List all migrations
python db_schema_sync.py --list \
--host localhost --port 3306 --user root --password xxx --database rag_flow \
--version v0.24.0
# Create a new auto-detected migration (new tables, new fields, type changes only)
python db_schema_sync.py --create \
--host localhost --port 3306 --user root --password xxx --database rag_flow \
--version v0.24.0
# Create a migration including dropped fields (destructive!)
python db_schema_sync.py --create --drop \
--host localhost --port 3306 --user root --password xxx --database rag_flow \
--version v0.24.0
# Create a named migration
python db_schema_sync.py --create --name add_user_table \
--host localhost --port 3306 --user root --password xxx --database rag_flow \
--version v0.24.0
# Run all pending migrations
python db_schema_sync.py --migrate \
--host localhost --port 3306 --user root --password xxx --database rag_flow \
--version v0.24.0
# Show schema differences (including removed fields)
python db_schema_sync.py --diff \
--host localhost --port 3306 --user root --password xxx --database rag_flow \
--version v0.24.0
```
## How It Works
1. **Load Models**: Imports all model classes from `api/db/db_models.py`
2. **Connect Database**: Creates MySQL connection from command line arguments
3. **Detect Changes**: Compares model definitions with actual database schema:
- New tables → `create_model`
- New fields → `ALTER TABLE ADD COLUMN`
- Field type changes → `ALTER TABLE MODIFY COLUMN`
- Removed fields → `ALTER TABLE DROP COLUMN` (only with `--drop`)
4. **Generate Migration**: Creates Python migration file with `migrate()` and `rollback()` functions
### Rollback Behavior
| Forward Operation | Rollback Operation |
|-------------------|--------------------|
| `CREATE TABLE` | `remove_model` |
| `ADD COLUMN` | `DROP COLUMN` |
| `MODIFY COLUMN` | `MODIFY COLUMN` (restore original type) |
| `DROP COLUMN` | `ADD COLUMN` (restore column definition; **data is lost**) |
> **Note**: Rolling back a `DROP COLUMN` will re-add the column structure, but the data that was in it cannot be recovered.