Commit Graph

1495 Commits

Author SHA1 Message Date
buua436
aa4526266f Refa: migrate MCP APIs to RESTful api (#14317)
### What problem does this PR solve?

migrate MCP APIs to RESTful api

### Type of change

- [x] Refactoring
2026-04-23 12:51:27 +08:00
Jack
dbf8c6ed90 Refactor: Doc metadata update (#14289)
### What problem does this PR solve?

Before migration
Web API: POST /v1/document/metadata/update

After migration, Restful API
PATCH /api/v2/datasets/<dataset_id>/documents/metadatas 

### Type of change

- [x] Refactoring
2026-04-23 12:04:34 +08:00
Wang Qi
aae45b959b Refactor: API file2document (#14306)
Refactor: API file2document
2026-04-23 11:40:45 +08:00
Wang Qi
e79b896637 Refactor: REST API langfuse api-key (#14315)
REST API langfuse api-key
2026-04-23 11:36:16 +08:00
Wang Qi
01753b8f31 Refactor: API connectors (#14228)
### What problem does this PR solve?

Refactor /api/v1/connectors to be more RESTful.

### Type of change
- [x] Refactoring
2026-04-22 20:42:41 +08:00
Jack
c08cd8e090 Refactor: Migrate document metadata config update API (#14286)
### What problem does this PR solve?

Before migration
Web API: POST /v1/document/update_metadata_setting

After consolidation, Restful API
PUT
/api/v1/datasets/<dataset_id>/documents/<document_id>/metadata/config

### Type of change

- [x] Refactoring
2026-04-22 20:01:31 +08:00
Magicbook1108
d1c62fc19d Refact: Tenant api (#14288)
### What problem does this PR solve?

Refact: Tenant api

### Type of change

- [x] Refactoring
2026-04-22 20:00:32 +08:00
Wang Qi
61d756e1b5 Fix #14213 create folder does not accept FOLDER (#14276)
### What problem does this PR solve?

As description.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-22 11:55:10 +08:00
buua436
ff29484d42 fix: normalize think tags in final chat answer (#14271)
### What problem does this PR solve?

normalize think tags in final chat answer

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-22 11:15:08 +08:00
Jack
3d8a82c0aa Refactor: Consolidation WEB API & HTTP API for document delete api (#14254)
### What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/rm
Http API - DELETE /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- DELETE
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-22 10:49:52 +08:00
buua436
6baf74afc1 Refa: align chat and search restful APIs (#14229)
### What problem does this PR solve?

Refactor /api/v1/chats to be more RESTful.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-22 10:49:11 +08:00
Jack
2d05475693 Refactor: Consolidation WEB API & HTTP API for document infos (#14239)
### What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/infos
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents?ids=id1&ids=id2

### Type of change

- [ ] Refactoring
2026-04-21 19:35:11 +08:00
Jack
009e538a4e Refactor: Consolidation WEB API & HTTP API for document get_filter (#14248)
### What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/filter
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents?type=filter
### Type of change

- [x] Refactoring
2026-04-21 18:55:30 +08:00
Liu An
6e33d8722f Revert "Fix: forwarding highlight param" (#14249)
Reverts infiniflow/ragflow#14112
2026-04-21 15:23:18 +08:00
NeedmeFordev
78c3583964 Fix memory resolution regression for multimodal Gemini models (#14209)
### What problem does this PR solve?

Fixes #14206.

This issue is a regression. PR #9520 previously changed Gemini models
from `image2text` to `chat` to fix chat-side resolution, but PR #13073
later restored those Gemini entries to `image2text` during model-list
updates, which reintroduced the bug.

The underlying problem is that Gemini models are multimodal and
advertise both `CHAT` and `IMAGE2TEXT`, while tenant model resolution
still depends on a single stored `model_type`. That makes chat-only
flows such as memory extraction fragile when a compatible model is
stored as `image2text`.

This PR fixes the issue at the model resolution layer instead of
changing `llm_factories.json` again:
- keep the stored tenant model type unchanged
- try exact `model_type` lookup first
- if no exact match is found, fall back only when the model metadata
shows the requested capability is supported
- coerce the runtime config to the requested type for chat callers
- fail fast in memory creation instead of silently persisting
`tenant_llm_id=0`

This preserves existing multimodal and `image2text` behavior while
restoring chat compatibility for memory-related flows.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Testing

- Re-checked the current memory creation and memory message extraction
paths against the updated resolution logic
- Verified locally that a Gemini-style tenant model stored as
`image2text` but tagged with `CHAT` can still be resolved for `chat`
- Verified `get_model_config_by_type_and_name(..., CHAT, ...)` returns a
chat-compatible runtime config
- Verified `get_model_config_by_id(..., CHAT)` also returns a
chat-compatible runtime config
- Verified strict resolution still fails when the model metadata does
not advertise chat capability
2026-04-20 16:37:36 +08:00
Jack
939933649a Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176)
### What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/list
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-20 14:54:40 +08:00
Lynn
c3387cd5b8 Fix: parent child config (#14199)
### What problem does this PR solve?

Correctly set and display parent-child config in parser_config, and
allow to pass `tenant_id` in PATCH `/api/v1/chats`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 23:02:42 +08:00
Daniil Sivak
22c6648348 Fix: forwarding highlight param (#14112)
Closes #9078

### What problem does this PR solve?

The `retrieval_test` endpoint in `chunk_app.py` never forwarded the
`highlight` request parameter to `retriever.retrieval()`, so the search
engine never produced highlight snippets. Additionally, the frontend
always rendered `content_with_weight` instead of preferring the
`highlight` field, and the CSS rule color `var(--accent-primary)` didn't
work because the variable stores an RGB triplet `(45,212,191)` requiring
the `rgb()` wrapper.

### Before

- Search page: displayed raw content_with_weight as a wall of plain
white text with no term highlighting, including markdown headings
rendered as literal text
- Retrieval testing page: showed `content_with_weight` in a plain `<p>`
tag, no `<em>` tags rendered, no highlight coloring
- Children chunks: when child chunks were consolidated into a parent via
`retrieval_by_children`, any highlight data from children was discarded
- TOC chunks: chunks fetched via `retrieval_by_toc` had no `highlight`
field, appearing as plain text while other chunks had highlights

**Retrieval testing**:
<img width="1449" height="1178"
alt="before-retrieval-no-highlight-cropped"
src="https://github.com/user-attachments/assets/5c6f5a5e-6c11-461a-bdb4-049d7dfb7a33"
/>

**Search**:
<img width="1378" height="711" alt="before-search-no-highlight-cropped"
src="https://github.com/user-attachments/assets/be7b5152-72ef-40da-a8fd-921e997ae7d3"
/>

### After

- Search page: displays the highlight field with search terms rendered
in teal/cyan color (`rgb(var(--accent-primary))`)
- Retrieval testing page: sends highlight: true in the request, uses
`HighLightMarkdown` component to render `<em>` tags with proper coloring
- Children chunks: highlights from child chunks are joined and preserved
on the parent
- TOC chunks: when other chunks have highlights, TOC-fetched chunks use
`content_with_weight` as a highlight fallback

**Retrieval testing**:
<img width="1410" height="1015" alt="05-retrieval-testing-results"
src="https://github.com/user-attachments/assets/f0cff8cf-0962-4320-b559-cd5037f622d2"
/>

**Search**:
<img width="1294" height="455" alt="03-search-highlight-results"
src="https://github.com/user-attachments/assets/a90e0e3e-3837-46be-8ddd-2412ff7cbc19"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 20:59:20 +08:00
Wang Qi
28d8b1c883 [Fix] trivial fix log creation (#14181)
### What problem does this PR solve?

Trivial fix log creation, follow on PR:
https://github.com/infiniflow/ragflow/pull/14136

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 13:13:41 +08:00
Magicbook1108
797aa6076a Fix: keyword extraction (#14177)
### What problem does this PR solve?

Fix: keyword extraction

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 11:32:48 +08:00
Magicbook1108
ea8de1bb47 Fix: different llm in chat (#14162)
### What problem does this PR solve?

Fix: different llm in chat

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:01 +08:00
Lynn
655dd2f8c6 Fix: simplify _load_user (#14154)
### What problem does this PR solve?

Simplify _load_user, remove unused fallback.

### Type of change

- [x] Refactoring
2026-04-16 18:47:43 +08:00
euvre
9a785b26bd fix: change file size column from IntegerField to BigIntegerField to support files > 2GB (#14148)
### What problem does this PR solve?

Fixes #6034

Changes the `size` field in both `Document` and `File` models from
`IntegerField` (32-bit, max ~2GB) to `BigIntegerField` (64-bit, max
~9.2EB), and adds corresponding database migrations.

## Problem

When uploading a file larger than 2GB, the `size` value overflows a
32-bit signed integer (max 2,147,483,647). This causes:

- The stored `size` wraps around to an incorrect value (e.g., a 3GB file
shows as 2,097,152 KB in File Management).
- Subsequent file operations (e.g., download) fail because the corrupted
size leads to invalid storage lookups.

## Changes

- `Document.size`: `IntegerField` → `BigIntegerField`
- `File.size`: `IntegerField` → `BigIntegerField`
- Added `alter_db_column_type` migrations in `migrate_db()` for both
`document.size` and `file.size` columns to ensure existing deployments
are upgraded automatically.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-16 15:43:29 +08:00
Qi Wang
969ce3a79f [Bug fix #14133] fix graph rag, raptor, mindmap log cannot show correctly in UI (#14136)
### What problem does this PR solve?
Fix #14133, knowledge graph, raptor, mindmap log cannot show correctly
in UI
<img width="1930" height="982" alt="Image"
src="https://github.com/user-attachments/assets/d2f8e6c1-d82d-4b00-a377-949aada545ca"
/>
After Fix:
<img width="2108" height="805" alt="image"
src="https://github.com/user-attachments/assets/b37426c1-83d3-4a32-a83c-9d340d69e0e6"
/>
<img width="2173" height="1067" alt="image"
src="https://github.com/user-attachments/assets/30105222-3310-43a0-9f83-1e320d05e413"
/>

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 13:08:36 +08:00
Daniil Sivak
c93ec0a1f3 Fix: reject empty/space-only content in update_chunk API (#14082)
Closes #6541

### What problem does this PR solve?

Add content validation to `update_chunk` (SDK and non-SDK) to reject
empty or whitespace-only content before it reaches the embedding model.

**Before:** Calling `update_chunk` with space-only content (like `" "`,
`""`, `"\n"`) bypassed validation and was sent directly to the embedding
model, which returned an error. This was the same bug previously fixed
for `add_chunk` in #6390, but `update_chunk` was missed.

**After:** Empty/whitespace-only content is caught by validation and
returns an error: `` `content` is required ``

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-15 18:43:53 +08:00
euvre
3364d86e6b Auto-inject knowledge parameter in async_chat when prompt_config is missing it (#14121)
### What problem does this PR solve?

Resolve #14115 .

## Problem

On the shared chat link page (`/chats/share?shared_id=...`), querying
the knowledge base returns "no relevant information was found", while
the same query works correctly on the editor chat page.

## Root Cause

Knowledge base retrieval in `async_chat()` is gated by the check `if
"knowledge" in param_keys` (line 598), where `param_keys` is derived
from `prompt_config["parameters"]`. If `parameters` is empty or missing
the `{"key": "knowledge", "optional": false}` entry, retrieval is
entirely skipped.

This can happen because `_apply_prompt_defaults()` — which ensures
`parameters` contains the `knowledge` entry — is only called in the
`create` (POST) and `update_chat` (PUT) handlers, but **not** in
`patch_chat` (PATCH). If a chat's `prompt_config` was updated via PATCH
without including `parameters`, the `knowledge` entry would be absent.
Additionally, `prompt_config["parameters"]` would raise a `KeyError` if
the key was missing entirely.

## Fix

Added a defensive safety net in `async_chat()`
(`api/db/services/dialog_service.py`) that auto-injects the `knowledge`
parameter when:
- `dialog.kb_ids` is set (knowledge bases are configured)
- `"knowledge"` is not already in `param_keys`
- `{knowledge}` placeholder exists in the system prompt

Also changed `prompt_config["parameters"]` to
`prompt_config.get("parameters", [])` to prevent `KeyError` when the key
is absent.

## Files Changed

- `api/db/services/dialog_service.py` — added auto-injection of
`knowledge` parameter and safe `.get()` access for `parameters`


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-15 17:31:31 +08:00
Ea001
38cefd88e2 Fix tag_feas code injection in retrieval ranking (#13923)
## Summary
- remove eval-based parsing from retrieval rank feature scoring
- validate `tag_feas` at write time in chunk APIs and SDK routes
- add regression tests for safe parsing and malicious payload rejection

## Details
`tag_feas` is intended to be structured rank-feature data, but the
retrieval ranking path was evaluating stored values as Python
expressions. This change treats `tag_feas` strictly as data.

### What changed
- replace `eval()` in `rag/nlp/search.py` with safe parsing via
`json.loads()` and optional `ast.literal_eval()` compatibility for
legacy Python-dict strings
- strictly filter parsed values down to `dict[str, finite number]`
- reject invalid `tag_feas` payloads at write time in web chunk routes
and SDK document chunk routes
- add focused regression tests to prove executable strings are ignored
and invalid payloads are rejected

## Validation
- `python -m pytest test/unit_test/common/test_tag_feature_utils.py
test/unit_test/rag/test_rank_feature_scores.py -q`

---------

Co-authored-by: unknown <zhenglinkai@CCN.Local>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-04-15 16:31:11 +08:00
Eden
1f33ca1099 fix(dialog): restore decorated answer in async_ask final SSE event (#13917)
## What's the problem

Both `async_chat()` and `async_ask()` call `decorate_answer()` to build
the final SSE payload — it inserts citation markers (`##N$$`) into the
answer text and prunes `doc_aggs` to only the cited documents.
Immediately after, both functions overwrite `final["answer"]` with `""`:

```python
# async_chat(), line ~774  (issue #13828)
final = decorate_answer(thought + full_answer)
final["final"] = True
final["audio_binary"] = None
final["answer"] = ""   # discards decorated text
yield final

# async_ask(), line ~1444  (same bug, different path)
final = decorate_answer(full_answer)
final["final"] = True
final["answer"] = ""   # discards decorated text
yield final
```

The client receives filtered references (built for a citation-decorated
answer it never sees) while displaying the raw, undecorated streaming
text. Citations can never match.

## Root cause

`final["answer"] = ""` was left over from an earlier design where
clients were meant to reconstruct the full answer purely from delta
events. Once `decorate_answer()` started placing citation markers, this
blank-out broke the contract: the final event is where the decorated
answer should land.

## Fix

Remove the two blank-override lines — one in `async_chat()`, one in
`async_ask()`:

```diff
-    final["answer"] = ""
```

`decorate_answer()` already sets `final["answer"]` to the correct
decorated string; there is nothing to override.

## Relation to #13828

Issue #13828 and PR #13835 identify the bug in `async_chat()`. This PR
absorbs that fix and also corrects the identical pattern in
`async_ask()` (used by the `/retrieval` route in `chat_api.py`), which
PR #13835 does not touch.

## Regression test

Added
`test/unit_test/api/db/services/test_dialog_service_final_answer.py`
with three tests:

| Test | Purpose |
|------|---------|
| `test_buggy_pattern_drops_answer` | Documents the old behaviour:
blank-override empties the final answer |
| `test_fixed_pattern_preserves_decorated_answer` | Core invariant:
final event carries the decorated text from `decorate_answer()` |
| `test_final_event_reference_matches_decorated_result` | Citation
markers in the answer must match the pruned `doc_aggs` in the same event
|

Local run result:

```
test_dialog_service_final_answer.py::test_buggy_pattern_drops_answer         PASSED
test_dialog_service_final_answer.py::test_fixed_pattern_preserves_decorated_answer PASSED
test_dialog_service_final_answer.py::test_final_event_reference_matches_decorated_result PASSED

3 passed in 0.04s
```

`ruff check` passes with no issues on all changed files.

---------

Co-authored-by: edenfunf <edenfunf@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-15 14:10:36 +08:00
Jack
bc5f78996b Consolidateion of document upload API (#14106)
### What problem does this PR solve?

Consolidation WEB API & HTTP API for document upload

Before consolidation
Web API: POST /v1/document/upload
Http API - POST /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- POST
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-15 11:27:43 +08:00
akie
a98b64326c Add warning log when metadata query hits 10000 result limit (#14109)
## What problem does this PR solve?

Add a warning log when `get_flatted_meta_by_kbs` returns 10,000 results,
which indicates the query limit has been reached and metadata may be
silently truncated.


## Type of change
- [x] Improvement (non-breaking change which improves observability)
2026-04-14 20:04:32 +08:00
NeedmeFordev
1a1b5aa53e Fix: respect the internet toggle before running Tavily web search (#14051) (#14052)
### What problem does this PR solve?

Fixes #14051.

The chat UI already sends an `internet` flag with each request, but the
backend previously triggered Tavily web retrieval whenever
`prompt_config.tavily_api_key` was configured. As a result, web search
could still run even when the internet toggle was off.

This PR makes web search an explicit opt-in at request time:
- `tavily_api_key` only indicates that web search is available
- Tavily retrieval runs only when `internet` is explicitly enabled
- the same behavior now applies to both the normal retrieval path and
the deep-research / reasoning path

This also fixes the no-KB fallback case so chats without KBs fall back
to normal solo chat when `internet` is off.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 19:55:20 +08:00
Jin Hai
8e9cef3687 Remove unused API (#14046)
### What problem does this PR solve?

1. Remove unused token related API
2. Fix typo

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-14 19:32:16 +08:00
Jack
576431de99 Refactor: Change update doc from PUT to patch (#14067)
### What problem does this PR solve?

Before change, update_document in api/apps/restful_apis/document_api.py
is using "PUT".
After change, it will use "PATCH" which is more suitable.

### Type of change

- [x] Refactoring
2026-04-14 17:12:23 +08:00
Qi Wang
57aec2e65d Fix bug: run Knowledge graph or RAPTOR, it will update an existing task (#14102)
### What problem does this PR solve?

It fixed the bug: https://github.com/infiniflow/ragflow/issues/14101
When run Knowledge graph or RAPTOR, the last document running status
will be wrongly set, see below:
It should never touch existing document result.

![Image](https://github.com/user-attachments/assets/14fe1f9e-0541-4093-8111-ed0bd25b87ba)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 16:37:41 +08:00
Magicbook1108
1376c004a9 Fix: update docs generator (#14070)
### What problem does this PR solve?

Refactor: update docs generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

1. Support multiple document generator components and correctly display
messages in the message component. The document generator will not
overwrite other messages.

<img width="700" alt="Screenshot from 2026-04-13 13-56-17"
src="https://github.com/user-attachments/assets/3f3e06e8-33ce-4df1-8b05-510c86af70a4"
/>

2. Support Chinese content and ensure correct Markdown rendering in PDF
and DOCX
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/69bf1f7b-261d-48e5-a9f3-8e94462b90ed"
/>

3. Simplify configuration page and support more output format
 
<img height="700" alt="image"
src="https://github.com/user-attachments/assets/8647374c-c055-4daa-ad71-cd9052eb138e"
/>

4. Hide download from other components except for message 
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/a723dfcb-b60d-4eb5-b2f6-d41ca5955eb4"
/>

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/a8762ac4-807b-4f0b-9287-65f82f7c9c98"
/>

5. Sanitize filename
 
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/df49509f-37c0-40f9-b03d-bd6ce7fdefa8"
/>


6. And more changes on usability
2026-04-14 15:24:43 +08:00
Jin Hai
2b6c50734f Sync code from EE (#14080)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-14 15:03:46 +08:00
bitloi
853021ff2a feat: support multiple canvas_types for agent templates and remove duplicate files (#14030)
### What problem does this PR solve?

Closes #13907

The template catalog had duplicate files (e.g. `*_r.json`) only to place
the same template into multiple sidebar groups.
This increases maintenance cost and makes template updates error-prone.

This PR adds first-class support for multiple template categories in a
single file via `canvas_types`, then removes duplicate template files.

What changed:
- Added `canvas_types` to `CanvasTemplate` model and DB migration.
- Added normalization logic when loading templates:
  - accepts legacy `canvas_type`
  - accepts new `canvas_types`
  - merges/deduplicates values
- preserves backward compatibility by keeping `canvas_type` as first
normalized value.
- Updated template import flow to load only `.json` files and in stable
sorted order.
- Updated frontend template filtering to match on `canvas_types` first,
with fallback to legacy `canvas_type`.
- Consolidated duplicated template pairs into single files and removed:
  - `deep_search_r.json`
  - `reflective_academic_paper_generator_r.json`
  - `seo_article_writer_r.json`
- Added regression/edge-case tests for category normalization and route
serialization expectations.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-04-13 20:26:30 +08:00
Jack
51ce6aab01 Consolidate set_meta into update_document (#14045)
### What problem does this PR solve?

Consolidate "set_meta" API into "update_document" .

Before consolidation
Web API: POST /api/v1/document/set_meta
Http API - PUT /v1/datasets/<dataset_id>/document/<document_id>

After consolidation, Restful API -- PUT
/v1/datasets/<dataset_id>/document/<document_id>

### Type of change

- [x] Refactoring
2026-04-13 12:47:17 +08:00
Jack
4046a4cfb6 Consolidateion metadata summary API (#14031)
### What problem does this PR solve?

Consolidation WEB API & HTTP API for document metadata summary

Before consolidation
Web API: POST /api/v1/document/metadata/summary
Http API - GET /v1/datasets/<dataset_id>/metadata/summary

After consolidation, Restful API -- GET
/v1/datasets/<dataset_id>/metadata/summary

### Type of change

- [x] Refactoring
2026-04-10 18:41:30 +08:00
Zhichang Yu
a9ca4ea1a1 Disable flask and quart debug (#14042)
### What problem does this PR solve?

Visit
`http://127.0.0.1:9381/?__debugger__=yes&cmd=resource&f=debugger.js`
will expose the flask code:
```
docReady(() => {
  if (!EVALEX_TRUSTED) {
    initPinBox();
  }
  // if we are in console mode, show the console.
  if (CONSOLE_MODE && EVALEX) {
    createInteractiveConsole();
  }

  const frames = document.querySelectorAll("div.traceback div.frame");
  if (EVALEX) {
    addConsoleIconToFrames(frames);
  }
  addEventListenersToElements(document.querySelectorAll("div.detail"), "click", () =>
    document.querySelector("div.traceback").scrollIntoView(false)
  );
  addToggleFrameTraceback(frames);
  addToggleTraceTypesOnClick(document.querySelectorAll("h2.traceback"));
  addInfoPrompt(document.querySelectorAll("span.nojavascript"));
  wrapPlainTraceback();
});

function addToggleFrameTraceback(frames) {
  frames.forEach((frame) => {
    frame.addEventListener("click", () => {
      frame.getElementsByTagName("pre")[0].parentElement.classList.toggle("expanded");
    });
  })
}

```

### Type of change

- [x] Other (please describe): Fix security risk
2026-04-10 18:01:49 +08:00
Jin Hai
cfc2928de2 Go: remove unused API route (#14028)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-10 18:00:41 +08:00
eason
aa92abe73c fix: close file handles properly in json.load() calls (#13997)
## Summary

Fixes #13996

Replace `json.load(open(...))` with `with open(...) as f: json.load(f)`
in two files to ensure file descriptors are properly closed.

**Affected files:**
- `common/doc_store/infinity_conn_base.py` — schema loading for Infinity
doc store
- `api/db/init_data.py` — agent template loading at startup

## Why this matters

In a long-running server process like RAGFlow, leaked file descriptors
from `json.load(open(...))` can accumulate over time. While CPython's
refcounting usually cleans these up, it's not guaranteed (especially
under memory pressure or with alternative Python runtimes), and can lead
to `OSError: [Errno 24] Too many open files`.

## Test plan

- [ ] Verify Infinity doc store schema loading still works correctly
- [ ] Verify agent templates load correctly on startup

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved file handling in internal data processing to ensure proper
resource cleanup.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: easonysliu <easonysliu@tencent.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:16:49 +08:00
Jin Hai
e2b879b258 Fix tiny issues (#14006)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Improved authentication error logging to better distinguish between
JWT and API token failures.
* Enhanced code documentation with clarifying comments for better
maintainability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-09 19:01:36 +08:00
Magicbook1108
8d52ef2893 Feat: enable sync deleted files for connector (#14000)
### What problem does this PR solve?

Feat: enable sync deleted files for connector
1. first comes with github

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added "sync deleted files" feature for data sources, enabling
automatic removal of files deleted from the source system.
* Added multilingual support for the new sync deleted files setting
across multiple languages.

* **UI Improvements**
  * Improved checkbox form field rendering and layout.
  * Enhanced full-width display for authentication token input fields.
2026-04-09 16:40:14 +08:00
Jack
577c96bf2a Refactor: Merge document update API (#13962)
### What problem does this PR solve?

Refactor: merge document.rename into document.update_document

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a unified document update API (PUT) supporting name, metadata,
parser/chunk settings, and status changes.

* **Breaking Changes**
* Legacy single-parameter rename endpoint removed; renames now require
dataset + document identifiers.
  * `/list` now reads dataset id from a different query parameter.

* **Validation / Bug Fixes**
* Stricter meta_fields and parser-config validation; unauthenticated
requests return 401.

* **Frontend**
  * UI now sends dataset id when saving document names.

* **Tests**
* Numerous unit and HTTP tests adjusted or removed to match new API and
validations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
Co-authored-by: Qi Wang <wangq8@outlook.com>
Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
2026-04-09 11:17:38 +08:00
Jin Hai
fa75aee3b9 Refactor system API (#13958)
### What problem does this PR solve?

- ping
- token
- log level

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* System endpoints consolidated under /api/v1/system: ping, health
check, and token management moved to the centralized API surface.
* Token management unified at /api/v1/system/tokens with
list/create/delete behavior.

* **Documentation**
  * API reference updated to reflect the new /api/v1/system paths.

* **Tests**
* Client fixtures and test utilities updated to use
/api/v1/system/tokens; one unit test for health/oceanbase status
removed.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 15:26:18 +08:00
Jin Hai
ad789f5c43 Fix list files (#13960)
### What problem does this PR solve?

As title.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Standardized the query parameter used when listing documents so
listings behave consistently across the web and client interfaces.
* Clarified the error message shown when a required dataset ID is
missing to give clearer guidance to users.

* **Tests**
* Updated test coverage to reflect the standardized dataset identifier
usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 13:38:30 +08:00
dataCenter430
62a1333cf2 Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940)
…
### What problem does this PR solve?

Closes #13857

Parent-child chunking was introduced in v0.23.0 but is only configurable
through the web UI. Users managing datasets programmatically cannot
enable it via the HTTP API or Python SDK because `ParserConfig` uses
`extra="forbid"`, rejecting the `children_delimiter` field at
validation.

### What does this PR change?

Adds a `parent_child` nested config to `ParserConfig`, following the
same pattern as `raptor` and `graphrag`:

```json
"parser_config": {
  "parent_child": {
    "use_parent_child": true,
    "children_delimiter": "\n"
  }
}
```

- api/utils/validation_utils.py — new ParentChildConfig model, added to
ParserConfig
- api/utils/api_utils.py — naive defaults + flatten to
children_delimiter for the execution layer
- api/apps/services/dataset_api_service.py — flatten on the update path
- test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG
-
test/testcases/test_http_api/test_dataset_management/test_create_dataset.py
— 4 valid + 2 invalid test cases

No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py).
Existing UI flow via ext is unaffected.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added parent-child chunking configuration for dataset creation and
updates with new `use_parent_child` toggle and customizable
`children_delimiter` setting to specify how parent chunks are split into
child chunks.

* **Documentation**
* Updated HTTP and Python API references with parent-child chunking
configuration details and examples.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 11:36:57 +08:00
MkDev11
cfee2bc9db feat: Auto-adjust chunk recall weights based on user feedback (#12689)
### What problem does this PR solve?

Implements automatic adjustment of knowledge base chunk recall weights
based on user feedback (upvotes/downvotes). When users upvote or
downvote a response, the system locates the corresponding knowledge
snippets and adjusts their recall weight to improve future retrieval
quality.

**Closes #12670**

**How it works:**
1. User upvotes/downvotes a response via `POST /thumbup`
2. System extracts chunk IDs from the conversation reference
3. For each referenced chunk:
   - Reads current `pagerank_fea` value from document store
   - Increments (+1) for upvote or decrements (-1) for downvote
   - Clamps weight to [0, 100] range
   - Updates chunk in ES/Infinity/OceanBase
4. Future retrievals score these chunks higher/lower based on
accumulated feedback

**Files changed:**
- `api/db/services/chunk_feedback_service.py` - New service for updating
chunk pagerank weights
- `api/apps/conversation_app.py` - Integrated feedback service into
thumbup endpoint
- `test/testcases/test_web_api/test_chunk_feedback/` - Unit tests

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Chat message feedback now updates per-chunk relevance weights
(feature-flag gated), with configurable weighting and atomic updates
across storage backends.

* **Bug Fixes**
* Stricter validation for message feedback inputs and more robust
handling of feedback transitions.

* **Tests**
* Expanded test coverage for chunk-feedback behavior, weighting
strategies, storage backends, and thumb-flip scenarios.

* **Chores**
  * CI workflow extended to run the new chunk-feedback web API tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
2026-04-08 09:52:18 +08:00
Jin Hai
931021875a Refactor system/version API to RESTful style (#13956)
### What problem does this PR solve?

Refactor version API to RESTful style. Python and go server API also
updated.
### Type of change

- [x] Refactoring



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **Refactor**
* Migrated core API endpoints to the `/api/v1/` namespace for improved
consistency and organization.
* Standardized system version, search, and chat list endpoints under the
new API versioning structure.

* **New Features**
* Added MinIO region configuration support, allowing specification of
storage engine regional settings via environment variables or
configuration files.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 19:07:47 +08:00