mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-07-04 18:45:38 +08:00
### What problem does this PR solve? ## Problem During the REST API refactoring (#13690), the `/api/v2/kb/check_embedding` endpoint was removed and never migrated to the new RESTful structure. The frontend was pointed to the `/api/v1/datasets/{id}/embedding` endpoint (which is `run_embedding` — a completely different function). Additionally, a hard guard was introduced that rejects any `embd_id` change when `chunk_num > 0`, making it impossible to switch embedding models on datasets with existing chunks. ## Root Cause 1. **Missing endpoint**: The old `check_embedding` logic (sample random chunks, re-embed with the new model, compare cosine similarity) was not carried over to the new REST API service layer. 2. **Wrong frontend URL**: `checkEmbedding` in `api.ts` pointed to `/datasets/{id}/embedding` (`run_embedding`) instead of a dedicated check endpoint. 3. **Overly restrictive guard**: `dataset_api_service.py` line 310 blocked all `embd_id` updates when `chunk_num > 0`. This check did not exist in the pre-refactor code — it was incorrectly introduced during the refactor. ## Changes ### Backend - **`api/apps/services/dataset_api_service.py`** - Remove the `chunk_num > 0` hard guard on `embd_id` updates - Add `check_embedding()` service function: samples random chunks, re-embeds them with the candidate model, computes cosine similarity, returns compatibility result (avg ≥ 0.9 = compatible) - Add `import re` for the `_clean()` helper - **`api/apps/restful_apis/dataset_api.py`** - Add `POST /datasets/<dataset_id>/embedding/check` endpoint following the new REST API conventions - Clean up unused top-level imports (`random`, `re`, `numpy`) ### Frontend - **`web/src/utils/api.ts`** - Fix `checkEmbedding` URL from `/datasets/${datasetId}/embedding` → `/datasets/${datasetId}/embedding/check` ### Tests - **`test/testcases/test_http_api/test_dataset_management/test_update_dataset.py`** - Update `test_embedding_model_with_existing_chunks` to assert success (`code == 0`) instead of expecting the old `102` error - **`test/testcases/test_web_api/test_dataset_management/test_dataset_sdk_routes_unit.py`** - Update `test_update_route_branch_matrix_unit` to assert `RetCode.SUCCESS` when updating `embd_id` on a chunked dataset, replacing the old `chunk_num` error assertion ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com>
(1). Deploy RAGFlow services and images
https://ragflow.io/docs/build_docker_image
(2). Configure the required environment for testing
Install Python dependencies (including test dependencies):
uv sync --python 3.12 --only-group test --no-default-groups --frozen
Activate the environment:
source .venv/bin/activate
Install SDK:
uv pip install sdk/python
Modify the .env file: Add the following code:
COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
TEI_MODEL=BAAI/bge-small-en-v1.5
RAGFLOW_IMAGE=infiniflow/ragflow:v0.25.1 #Replace with the image you are using
Start the container(wait two minutes):
docker compose -f docker/docker-compose.yml up -d
(3). Test Elasticsearch
a) Run sdk tests against Elasticsearch:
export HTTP_API_TEST_LEVEL=p2
export HOST_ADDRESS=http://127.0.0.1:9380 # Ensure that this port is the API port mapped to your localhost
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Elasticsearch:
pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
(4). Test Infinity
Modify the .env file:
DOC_ENGINE=${DOC_ENGINE:-infinity}
Start the container:
docker compose -f docker/docker-compose.yml down -v
docker compose -f docker/docker-compose.yml up -d
a) Run sdk tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
b) Run http api tests against Infinity:
DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api