Files
ragflow/web/src/interfaces/request/document.ts
CaptainTimon 2717ee283f feat(raptor): add Psi tree builder with original-space ranking and safe migration (#14679)
### What problem does this PR solve?

Closes #14674.

This PR improves RAPTOR configuration and tree construction while
preserving the existing RAPTOR behavior as the default.

RAPTOR currently builds summary layers with the original UMAP + GMM
clustering path. This PR keeps that default path, and adds:

- A hidden backend tree-builder option:
  - `tree_builder="raptor"`: default, existing RAPTOR behavior.
- `tree_builder="psi"`: rank-aware Psi-style tree builder using original
embedding-space cosine ranking.
- A user-facing clustering method option for the default RAPTOR builder:
  - `clustering_method="gmm"`: existing default.
- `clustering_method="ahc"`: agglomerative hierarchical clustering path.
- A RAPTOR UI setting for `Clustering method` and `Max cluster`.

### What changed

#### Backend

- Added `tree_builder` support for RAPTOR/Psi.
- Added `clustering_method` support for GMM/AHC.
- Kept existing RAPTOR + GMM as the default.
- Added Psi tree building from original-space cosine similarity.
- Added bucketed Psi building controls for large inputs:
  - `raptor.ext.psi_exact_max_leaves`
  - `raptor.ext.psi_bucket_size`
- Added method-aware RAPTOR summary metadata using existing
`extra.raptor_method`.
- Avoided adding a dedicated DB schema field for experimental method
tracking.
- Added cleanup/migration logic to avoid mixing stale RAPTOR summary
trees.
- Added defensive checks for Psi tree construction and summary failures.

#### Frontend/UI

- Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`.
- Added/kept `Max cluster` in RAPTOR settings.
- Enlarged max cluster UI limit to `1024`, matching backend validation.
- Kept AHC editable even when a RAPTOR task has already finished.
- Fixed the UI save payload so `clustering_method` and `tree_builder`
are serialized through `parser_config.raptor.ext`, avoiding backend
validation errors for extra top-level RAPTOR fields.

Example saved RAPTOR config:

```json
{
  "raptor": {
    "max_cluster": 317,
    "ext": {
      "clustering_method": "ahc",
      "tree_builder": "raptor"
    }
  }
}

Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>
2026-05-12 09:42:31 +08:00

50 lines
1.1 KiB
TypeScript

export interface IChangeParserConfigRequestBody {
pages?: number[][];
chunk_token_num?: number;
layout_recognize?: string;
task_page_size?: number;
delimiter?: string;
auto_keywords?: number;
auto_questions?: number;
html4excel?: boolean;
toc_extraction?: boolean;
image_table_context_window?: number;
image_context_size?: number;
table_context_size?: number;
raptor?: {
use_raptor?: boolean;
prompt?: string;
max_token?: number;
threshold?: number;
max_cluster?: number;
random_seed?: number;
scope?: string;
clustering_method?: 'gmm' | 'ahc';
tree_builder?: 'raptor' | 'psi';
};
// Metadata fields
metadata?: Array<{
key?: string;
description?: string;
enum?: string[];
}>;
built_in_metadata?: Array<{
key?: string;
description?: string;
enum?: string[];
}>;
enable_metadata?: boolean;
}
export interface IChangeParserRequestBody {
parser_id: string;
pipeline_id?: string;
doc_id?: string;
parser_config: IChangeParserConfigRequestBody;
}
export interface IDocumentMetaRequestBody {
documentId: string;
meta: string; // json format string
}