Commit Graph

4 Commits

Author SHA1 Message Date
VictorECDSA
ff5971448b [Fix] naive: force-merge short markdown headers to prevent separate chunks (#15488)
## Problem

When uploading `.md` files with `parser=naive` and `delimiter="\n"`,
markdown headers (e.g., `## Quick Travel`) become separate chunks with
very short content (16-18 characters). This causes retrieval issues:
when the header is matched, the corresponding body text is not included
in the chunk.

## Related Issues

Closes #15487

## Checklist

- [x] Code changes are minimal and focused
- [x] Unit tests added (12/12 passed)
- [x] No breaking changes
2026-06-03 10:49:28 +08:00
Carve_
7b230aadf4 chore(tests): move oceanbase peewee test under test/ and fix enum check (#12969)
### What problem does this PR solve?

This mistake was made by PR #12926 
This PR makes the OceanBase peewee unit test discoverable by the default
unit test runner/CI (by moving it under test/), so it’s included in the
unified unit test suite.
It also fixes `test_database_lock_enum_values` to correctly handle Enum
alias members (DatabaseLock uses the same value for MYSQL and
OCEANBASE).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Screenshots
The original `test_oceanbase_peewee.py` was placed under tests/, which
isn’t included in the default unit test runner’s testpaths, so it wasn’t
picked up by the unit test suite. So we need to move it to correct path.
<img width="670" height="540" alt="image"
src="https://github.com/user-attachments/assets/69d39346-450f-46dc-8965-29c3d7b32bc9"
/>

When using old version in `test_oceanbase_peewee.py`:
```
    def test_database_lock_enum_values(self):
        """Test DatabaseLock enum has all expected values."""
        expected = {'MYSQL', 'OCEANBASE', 'POSTGRES'}
        actual = {e.name for e in DatabaseLock}
        assert expected.issubset(actual), f"Missing: {expected - actual}"
```
The old check iterated Enum members, so alias values were skipped and
only `MYSQL/POSTGRES` were seen, making OCEANBASE appear missing.

<img width="1998" height="931" alt="65e2837f23b7b298980a410c7d5c2f09"
src="https://github.com/user-attachments/assets/d8e98c5a-2cfa-4182-ae35-a3ef03554a27"
/>

and new version uses `DatabaseLock.__members__` and passes:
<img width="2024" height="1170" alt="1aa8c6facb28d24149270fe1bc4a9dd9"
src="https://github.com/user-attachments/assets/d8688936-ccac-4a39-a389-23dc6f0fe276"
/>
2026-02-03 17:28:53 +08:00
Liu An
1b587013d8 Fix: remove unused imports and f-string formatting (#12935)
### What problem does this PR solve?

- Remove unused imports (Mock, patch, MagicMock, json, os,
RAGFLOW_COLUMNS, VECTOR_FIELD_PATTERN) from multiple files
- Replace f-string formatting with regular strings for console output
messages in cli.py
- Clean up unnecessary imports that were no longer being used in the
codebase

### Type of change

- [x] Refactoring
2026-02-02 12:11:39 +08:00
NTLx
c4c3f744c0 feat: add Peewee ORM support for OceanBase as primary database (#12769) (#12926)
## Summary

This PR adds Peewee ORM support for OceanBase as the primary database in
RAGFlow, as requested in issue #12769.

## Changes

### Core Implementation

1. **RetryingPooledOceanBaseDatabase Class**
   - Inherits from `PooledMySQLDatabase` (OceanBase is MySQL-compatible)
   - Implements retry mechanism for connection issues
   - Handles MySQL-specific error codes (2013, 2006 for connection loss)
   - Provides connection pool management

2. **PooledDatabase Enum**
   - Added `OCEANBASE = RetryingPooledOceanBaseDatabase`

3. **DatabaseLock Enum**
   - Added `OCEANBASE = MysqlDatabaseLock`
   - OceanBase uses MySQL-style locking

4. **TextFieldType Enum**
   - Added `OCEANBASE = "LONGTEXT"`
   - OceanBase uses same text field type as MySQL

5. **DatabaseMigrator Enum**
   - Added `OCEANBASE = MySQLMigrator`
   - OceanBase uses MySQL migration tools

### Usage

```bash
# Set environment variable to use OceanBase
export DB_TYPE=oceanbase

# Configure connection (in docker/.env or environment)
OCEANBASE_HOST=localhost
OCEANBASE_PORT=2881
OCEANBASE_USER=root
OCEANBASE_PASSWORD=password
OCEANBASE_DATABASE=ragflow
```

### Technical Details

- **Location**: `api/db/db_models.py`
- **Dependencies**: No new dependencies (uses existing Peewee MySQL
support)
- **Code Size**: ~90 lines
- **Difficulty**: Simple

### Testing

- Added comprehensive unit tests in
`tests/unit/test_oceanbase_peewee.py`
- Tests cover:
  - OceanBase database class existence and inheritance
  - Enum values for PooledDatabase, DatabaseLock, TextFieldType
  - Initialization with custom retry settings
  - Environment variable configuration

### Acceptance Criteria

 Can switch to OceanBase database via `DB_TYPE=oceanbase` environment
variable
 All database operations work normally in OceanBase environment  
 OceanBase uses MySQL compatibility mode (no additional dependencies)  

### Background

This is part of the RAGFlow + OceanBase Hackathon to allow users to
choose OceanBase as RAGFlow's primary database, leveraging OceanBase's
high availability and scalability.

---

## Related Issues
- **Primary**: https://github.com/infiniflow/ragflow/issues/12769
- **Context**: https://github.com/oceanbase/seekdb/issues/123 (OceanBase
Developer Challenge)

---

Closes infiniflow/ragflow#12769
2026-01-31 15:45:20 +08:00