Commit Graph

5803 Commits

Author SHA1 Message Date
Liu An
d5c306de30 Fix: remove unit test checkpoint resume (#14216)
### What problem does this PR solve?

remove unit test checkpoint resume

### Type of change

- [x] Performance Improvement
2026-04-20 11:27:40 +08:00
euvre
84b6069ec7 fix: escape single quotes in Infinity SQL filter conditions (#14186)
### What problem does this PR solve?

## Summary

Fixes #5939

Entity names containing single quotes (e.g., `投影直线L'`) caused SQL syntax
errors when building filter conditions for Infinity queries, due to
unescaped string interpolation in `equivalent_condition_to_str`.

## Changes

In `common/doc_store/infinity_conn_base.py`, added `.replace("'", "''")`
escaping for string values in two branches of
`equivalent_condition_to_str` where it was missing:

1. **`field_keyword` branch with non-list value** (line 190): The list
branch already escaped single quotes on line 183, but the single-string
branch did not.
2. **Plain string value branch** (line 209): Direct f-string
interpolation `{k}='{v}'` was vulnerable to unescaped quotes.

Both fixes use the same SQL-standard escape pattern (`'` → `''`) already
applied elsewhere in this method.

## How to Test

1. Upload a document containing entity names with single quotes.
2. Enable Knowledge Graph (GraphRAG) in the parsing configuration.
3. Initiate document parsing — it should complete without SQL syntax
errors.

## Note

The original issue also reported a typo (`dge_graph_kwd` instead of
`knowledge_graph_kwd`), which has already been fixed in the current
codebase.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-20 10:04:07 +08:00
balibabu
6712b504e6 Fix: Clicking on the empty dialog box on the agent exploration page will result in an error. (#14198)
### What problem does this PR solve?

Fix: Clicking on the empty dialog box on the agent exploration page will
result in an error.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 23:52:13 +08:00
Lynn
c3387cd5b8 Fix: parent child config (#14199)
### What problem does this PR solve?

Correctly set and display parent-child config in parser_config, and
allow to pass `tenant_id` in PATCH `/api/v1/chats`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 23:02:42 +08:00
balibabu
09622c6353 Fix: Spaces cannot be entered in the code editor of the code operator. (#14183)
### What problem does this PR solve?

Fix: Spaces cannot be entered in the code editor of the code operator.

[Monaco Editor with XYFlow fails to accept most space bar keypresses,
who is at fault?
#5204](https://github.com/microsoft/monaco-editor/discussions/5204)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:41:39 +08:00
balibabu
fa644c5a15 Fix: The embedded page for search is inaccessible. (#14194)
### What problem does this PR solve?

Fix: The embedded page for search is inaccessible.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:37:34 +08:00
chanx
60506ef7a5 fix: Add internationalization configurations related to text segmentation identifiers. (#14201)
### What problem does this PR solve?

fix: Add internationalization configurations related to text
segmentation identifiers.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:37:14 +08:00
balibabu
3a4d17cb0d Fix: The placeholder in PromptEditor is obscured. (#14179)
### What problem does this PR solve?

Fix: The placeholder in PromptEditor is obscured.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:02:41 +08:00
Daniil Sivak
22c6648348 Fix: forwarding highlight param (#14112)
Closes #9078

### What problem does this PR solve?

The `retrieval_test` endpoint in `chunk_app.py` never forwarded the
`highlight` request parameter to `retriever.retrieval()`, so the search
engine never produced highlight snippets. Additionally, the frontend
always rendered `content_with_weight` instead of preferring the
`highlight` field, and the CSS rule color `var(--accent-primary)` didn't
work because the variable stores an RGB triplet `(45,212,191)` requiring
the `rgb()` wrapper.

### Before

- Search page: displayed raw content_with_weight as a wall of plain
white text with no term highlighting, including markdown headings
rendered as literal text
- Retrieval testing page: showed `content_with_weight` in a plain `<p>`
tag, no `<em>` tags rendered, no highlight coloring
- Children chunks: when child chunks were consolidated into a parent via
`retrieval_by_children`, any highlight data from children was discarded
- TOC chunks: chunks fetched via `retrieval_by_toc` had no `highlight`
field, appearing as plain text while other chunks had highlights

**Retrieval testing**:
<img width="1449" height="1178"
alt="before-retrieval-no-highlight-cropped"
src="https://github.com/user-attachments/assets/5c6f5a5e-6c11-461a-bdb4-049d7dfb7a33"
/>

**Search**:
<img width="1378" height="711" alt="before-search-no-highlight-cropped"
src="https://github.com/user-attachments/assets/be7b5152-72ef-40da-a8fd-921e997ae7d3"
/>

### After

- Search page: displays the highlight field with search terms rendered
in teal/cyan color (`rgb(var(--accent-primary))`)
- Retrieval testing page: sends highlight: true in the request, uses
`HighLightMarkdown` component to render `<em>` tags with proper coloring
- Children chunks: highlights from child chunks are joined and preserved
on the parent
- TOC chunks: when other chunks have highlights, TOC-fetched chunks use
`content_with_weight` as a highlight fallback

**Retrieval testing**:
<img width="1410" height="1015" alt="05-retrieval-testing-results"
src="https://github.com/user-attachments/assets/f0cff8cf-0962-4320-b559-cd5037f622d2"
/>

**Search**:
<img width="1294" height="455" alt="03-search-highlight-results"
src="https://github.com/user-attachments/assets/a90e0e3e-3837-46be-8ddd-2412ff7cbc19"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 20:59:20 +08:00
Yongteng Lei
fac46ef67f Refa: change Minimax base url to mainland by default to align with UI (#14195)
### What problem does this PR solve?

Change Minimax base url to mainland by default to align with UI.

### Type of change

- [x] Refactoring
2026-04-17 19:08:57 +08:00
dependabot[bot]
b34a726acd Build(deps): Bump pypdf from 6.9.2 to 6.10.2 (#14184)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.9.2 to 6.10.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/releases">pypdf's
releases</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)
by <a href="https://github.com/Ygnas"><code>@​Ygnas</code></a></li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)
by <a href="https://github.com/j-t-1"><code>@​j-t-1</code></a></li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)
by <a href="https://github.com/rassie"><code>@​rassie</code></a></li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)
by <a
href="https://github.com/astahlman"><code>@​astahlman</code></a></li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)
by <a
href="https://github.com/ReinerBRO"><code>@​ReinerBRO</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's
changelog</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)</li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)</li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)</li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)</li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)</li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)</li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)</li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c476b4f293"><code>c476b4f</code></a>
REL: 6.10.2</li>
<li><a
href="c50a0104cf"><code>c50a010</code></a>
SEC: Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li><a
href="ac734dab4e"><code>ac734da</code></a>
SEC: Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
<li><a
href="b49e7eb454"><code>b49e7eb</code></a>
REL: 6.10.1</li>
<li><a
href="62338e9d36"><code>62338e9</code></a>
SEC: Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
<li><a
href="5dcc0aebaa"><code>5dcc0ae</code></a>
DEV: Update pytest-benchmark to 5.2.3</li>
<li><a
href="b42e4aa98a"><code>b42e4aa</code></a>
DEV: Update pinned pillow and pytest where possible (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3732">#3732</a>)</li>
<li><a
href="717446b121"><code>717446b</code></a>
ROB: Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
<li><a
href="9e461d361b"><code>9e461d3</code></a>
DEV: Bump softprops/action-gh-release from 2 to 3 (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3730">#3730</a>)</li>
<li><a
href="500d09d92f"><code>500d09d</code></a>
TST: Update <code>test_embedded_file__basic</code> to use
<code>tmp_path</code> fixture (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3726">#3726</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.9.2&new-version=6.10.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-17 18:43:19 +08:00
Jin Hai
94106646e7 Go: set and list default models (#14191)
### What problem does this PR solve?

```
RAGFlow(user)> set default vlm "zhipu-ai" "ccc" "glm-4.6v-flash";
SUCCESS
RAGFlow(user)> list default models;
+--------+----------------+----------------+----------------+------------+
| enable | model_instance | model_name     | model_provider | model_type |
+--------+----------------+----------------+----------------+------------+
| true   | ccc            | glm-4.6v-flash | zhipu-ai       | llm        |
| true   | ccc            | glm-4.6v-flash | zhipu-ai       | image2text |
+--------+----------------+----------------+----------------+------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-17 18:05:33 +08:00
Wang Qi
28d8b1c883 [Fix] trivial fix log creation (#14181)
### What problem does this PR solve?

Trivial fix log creation, follow on PR:
https://github.com/infiniflow/ragflow/pull/14136

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 13:13:41 +08:00
Magicbook1108
797aa6076a Fix: keyword extraction (#14177)
### What problem does this PR solve?

Fix: keyword extraction

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 11:32:48 +08:00
LeonTung
c3bf8d9d60 feat(templates): add a data analysis agent template (#14130)
### What problem does this PR solve?

Add a new agent template that demonstrates how to leverage the
`CodeExec` component to do the data analysis.

### Type of change

- [x] Other (please describe): Agent template
2026-04-17 11:32:04 +08:00
writinwaters
0df5d830d4 Refact: Updated agent template descriptions. (#14175)
### What problem does this PR solve?

Updated ingestion pipeline template descriptions for better technical
accuracy and readability.

### Type of change

- [x] Refactoring
2026-04-17 10:46:06 +08:00
Lynn
f194a09cd6 Fix: dataset update parent child (#14167)
### What problem does this PR solve?

Correctly set parent child config in parser_config.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 10:41:50 +08:00
Jin Hai
e03212fd7a Fix go cli models command and api (#14166)
### What problem does this PR solve?

```
RAGFlow(user)> list providers;
+--------------------------------------+----------+-------------------------------------------+--------------+
| base_url                             | name     | tags                                      | total_models |
+--------------------------------------+----------+-------------------------------------------+--------------+
| https://open.bigmodel.cn/api/paas/v4 | ZHIPU-AI | LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION | 21           |
| https://api.x.ai/v1                  | xAI      | LLM                                       | 6            |
+--------------------------------------+----------+-------------------------------------------+--------------+
RAGFlow(user)> show provider 'zhipu-ai';
+--------------------------------------+----------+-------------------------------------------+--------------+
| base_url                             | name     | tags                                      | total_models |
+--------------------------------------+----------+-------------------------------------------+--------------+
| https://open.bigmodel.cn/api/paas/v4 | ZHIPU-AI | LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION | 21           |
+--------------------------------------+----------+-------------------------------------------+--------------+
RAGFlow(user)> delete provider 'zhipu-ai';
SUCCESS
RAGFlow(user)> add provider 'zhipu-ai';
SUCCESS
RAGFlow(user)> create provider 'zhipu-ai' instance 'ccc' 'ccxxccxx';
SUCCESS
RAGFlow(user)> list instances from 'zhipu-ai';
+---------------------------------------------------+----------------------------------+--------------+----------------------------------+--------+
| apiKey                                            | id                               | instanceName | providerID                       | status |
+---------------------------------------------------+----------------------------------+--------------+----------------------------------+--------+
| ccxxccxx | 640dd7ee398711f1bdd838a74640adcc | ccc          | d1d59de5398411f1bdd838a74640adcc | active |
+---------------------------------------------------+----------------------------------+--------------+----------------------------------+--------+
RAGFlow(user)> list models from 'zhipu-ai';
+----------+------------+---------------+---------------+
| features | max_tokens | model_types   | name          |
+----------+------------+---------------+---------------+
| map[]    | 128000     | [chat]        | glm-4.7       |
| map[]    | 128000     | [chat]        | glm-4.5       |
| map[]    | 128000     | [chat]        | glm-4.5-x     |
| map[]    | 128000     | [chat]        | glm-4.5-air   |
| map[]    | 128000     | [chat]        | glm-4.5-airx  |
| map[]    | 128000     | [chat]        | glm-4.5-flash |
| map[]    | 64000      | [image2text]  | glm-4.5v      |
| map[]    | 128000     | [chat]        | glm-4-plus    |
| map[]    | 128000     | [chat]        | glm-4-0520    |
| map[]    | 128000     | [chat]        | glm-4         |
| map[]    | 8000       | [chat]        | glm-4-airx    |
| map[]    | 128000     | [chat]        | glm-4-air     |
| map[]    | 128000     | [chat]        | glm-4-flash   |
| map[]    | 128000     | [chat]        | glm-4-flashx  |
| map[]    | 1000000    | [chat]        | glm-4-long    |
| map[]    | 128000     | [chat]        | glm-3-turbo   |
| map[]    | 2000       | [image2text]  | glm-4v        |
| map[]    | 8192       | [chat]        | glm-4-9b      |
| map[]    | 512        | [embedding]   | embedding-2   |
| map[]    | 512        | [embedding]   | embedding-3   |
| map[]    | 4096       | [speech2text] | glm-asr       |
+----------+------------+---------------+---------------+
RAGFlow(user)> disable model 'glm-4.5-flash' from 'zhipu-ai' 'ccc';
SUCCESS
RAGFlow(user)> drop instance 'ccc' from 'zhipu-ai';
SUCCESS
RAGFlow(user)> list instances from 'zhipu-ai';
No data to print
```

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-17 09:55:25 +08:00
Wang Qi
96a23d2fd0 [Bug fix] fix bug found in regression when view chunks for document that not parsed in infinity, it would fail in UI (#14168)
### What problem does this PR solve?
See title, the fail image:
<img width="2667" height="915" alt="20260416-205718"
src="https://github.com/user-attachments/assets/0c564237-5ed0-49af-bf4c-d3b5519abc6e"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 09:51:23 +08:00
Magicbook1108
f906a203bb Fix doc generator (#14160)
### What problem does this PR solve?

Fix doc generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:38 +08:00
balibabu
4a9bfd18bc Fix: The PromptEditor's placeholder is only half displayed. (#14161)
### What problem does this PR solve?

Fix: The PromptEditor's placeholder is only half displayed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:16 +08:00
Magicbook1108
ea8de1bb47 Fix: different llm in chat (#14162)
### What problem does this PR solve?

Fix: different llm in chat

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:01 +08:00
writinwaters
8a874c7a09 Doc: Added Ingetrating Notion connector (#14163)
### What problem does this PR solve?

Added How to integrate Notion to RAGFlow.

### Type of change

- [x] Documentation Update
2026-04-16 20:06:02 +08:00
Lynn
655dd2f8c6 Fix: simplify _load_user (#14154)
### What problem does this PR solve?

Simplify _load_user, remove unused fallback.

### Type of change

- [x] Refactoring
2026-04-16 18:47:43 +08:00
balibabu
4cf4d444d2 Fix: Login page type error. (#14156)
### What problem does this PR solve?

Fix: Login page type error.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 18:46:52 +08:00
Magicbook1108
901023a80a Fix: literal eval http request input (#14145)
### What problem does this PR solve?

Fix: literal eval http request input

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

<img width="700" alt="img_v3_0210q_f4b49ff7-e670-4054-ab0e-9443a09215fg"
src="https://github.com/user-attachments/assets/089300be-06f9-4bb6-97af-61bf5f4a5e8c"
/>


<img width="700" alt="img_v3_0210q_398cd52a-2ad9-42be-8d5b-4e6e68a7d22g"
src="https://github.com/user-attachments/assets/239b43cd-a2a5-49d8-9200-991bb26336c8"
/>
2026-04-16 16:52:34 +08:00
euvre
9a785b26bd fix: change file size column from IntegerField to BigIntegerField to support files > 2GB (#14148)
### What problem does this PR solve?

Fixes #6034

Changes the `size` field in both `Document` and `File` models from
`IntegerField` (32-bit, max ~2GB) to `BigIntegerField` (64-bit, max
~9.2EB), and adds corresponding database migrations.

## Problem

When uploading a file larger than 2GB, the `size` value overflows a
32-bit signed integer (max 2,147,483,647). This causes:

- The stored `size` wraps around to an incorrect value (e.g., a 3GB file
shows as 2,097,152 KB in File Management).
- Subsequent file operations (e.g., download) fail because the corrupted
size leads to invalid storage lookups.

## Changes

- `Document.size`: `IntegerField` → `BigIntegerField`
- `File.size`: `IntegerField` → `BigIntegerField`
- Added `alter_db_column_type` migrations in `migrate_db()` for both
`document.size` and `file.size` columns to ensure existing deployments
are upgraded automatically.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-16 15:43:29 +08:00
euvre
0cd49e14dd fix: make Infinity connection pool size configurable and add retry logic for GraphRAG write bursts (#14143)
### What problem does this PR solve?

Resolve #14137 .

### Problem

Graph resolution succeeds (nodes/edges merged, pagerank updated), but
the subsequent burst of Infinity write operations in `set_graph`
exhausts the connection pool with `TOO_MANY_CONNECTIONS` errors. Root
causes:

1. **Hardcoded pool size** — `infinity_conn_pool.py` hardcoded
`ConnectionPool(max_size=4)` on initial creation and `max_size=32` on
refresh. Operators cannot tune this without patching code.
2. **No retry on transient failures** — a single `TOO_MANY_CONNECTIONS`
on edge deletes or chunk inserts kills the entire resolution+community
pipeline with no retry.

### Changes

#### `common/doc_store/infinity_conn_pool.py`

- Read `ConnectionPool` `max_size` from the `INFINITY_POOL_MAX_SIZE`
environment variable (default: `4`), applied consistently to both
initial creation and refresh paths.
- Log the actual pool size on startup for easier debugging.

#### `rag/graphrag/utils.py` — `set_graph()`

- **Edge deletes**: add exponential-backoff retry (3 attempts, 1s/2s/4s
delays) so transient `TOO_MANY_CONNECTIONS` errors are retried instead
of failing the entire job. Concurrency continues to be gated by the
existing `chat_limiter`.
- **Batch inserts**: add exponential-backoff retry (3 attempts, 1s/2s/4s
delays) for the same reason.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-16 15:40:54 +08:00
Qi Wang
969ce3a79f [Bug fix #14133] fix graph rag, raptor, mindmap log cannot show correctly in UI (#14136)
### What problem does this PR solve?
Fix #14133, knowledge graph, raptor, mindmap log cannot show correctly
in UI
<img width="1930" height="982" alt="Image"
src="https://github.com/user-attachments/assets/d2f8e6c1-d82d-4b00-a377-949aada545ca"
/>
After Fix:
<img width="2108" height="805" alt="image"
src="https://github.com/user-attachments/assets/b37426c1-83d3-4a32-a83c-9d340d69e0e6"
/>
<img width="2173" height="1067" alt="image"
src="https://github.com/user-attachments/assets/30105222-3310-43a0-9f83-1e320d05e413"
/>

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 13:08:36 +08:00
Yongteng Lei
356ba5650a Fix: sandbox don't attach attachment metadata (#14135)
### What problem does this PR solve?

Sandbox don't attach attachment metadata

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 12:08:54 +08:00
balibabu
53154b2cc3 Feat: Add a title prefix to the testid on the login page. (#14129)
### What problem does this PR solve?

Feat: Add a title prefix to the testid on the login page.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-16 12:08:44 +08:00
Magicbook1108
944a90d645 Feat: add button to turn off vlm parsing (#14125)
### What problem does this PR solve?

Feat: add button to turn off vlm parsing

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chanx <1243304602@qq.com>
2026-04-15 19:06:00 +08:00
chanx
dce0b1c030 Fix: Pipeline page style optimizations (#14128)
### What problem does this PR solve?

Fix: Pipeline page style optimizations

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-15 19:05:54 +08:00
Daniil Sivak
c93ec0a1f3 Fix: reject empty/space-only content in update_chunk API (#14082)
Closes #6541

### What problem does this PR solve?

Add content validation to `update_chunk` (SDK and non-SDK) to reject
empty or whitespace-only content before it reaches the embedding model.

**Before:** Calling `update_chunk` with space-only content (like `" "`,
`""`, `"\n"`) bypassed validation and was sent directly to the embedding
model, which returned an error. This was the same bug previously fixed
for `add_chunk` in #6390, but `update_chunk` was missed.

**After:** Empty/whitespace-only content is caught by validation and
returns an error: `` `content` is required ``

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-15 18:43:53 +08:00
Magicbook1108
d51789e2be Feat: update templates && add resume template (#14124)
### What problem does this PR solve?

Feat: update templates  && add resume template

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-15 18:42:29 +08:00
balibabu
c56a7f99d1 Fix: The pop-up menu of the PromptEditor will be blocked. #14126 (#14127)
### What problem does this PR solve?

Fix: The pop-up menu of the PromptEditor will be blocked.  #14126

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-15 18:42:02 +08:00
writinwaters
2520065c5a Doc: Added Integrate Confluence (#14131)
### What problem does this PR solve?

Added a guide on integrating Confluence as connector.

### Type of change

- [x] Documentation Update
2026-04-15 18:38:36 +08:00
Minal Mahala
f930389311 Refact: improve task resume mechanism for graphrag (#14096)
### What problem does this PR solve?

Addresses review feedback on #14074 (Checkpoint mechanism for
long-running workflow jobs, issue #12494).

**Changes based on @yuzhichang's review:**

1. **Renamed `checkpoint_service.py` → `task_checkpoint.py`** as
suggested.
2. **Replaced Redis with direct docEngine queries** as suggested — the
subgraph already gets persisted to the doc store by
`generate_subgraph()`, so we just query for it instead of maintaining a
separate checkpoint in Redis. This is simpler, has no extra dependency,
and uses a single source of truth.

**Changes based on CodeRabbit review:**

3. **Fixed `source_id` query format mismatch** — subgraphs are stored
with `source_id: [doc_id]` (list), but the original query used
`source_id: doc_id` (string). Now follows the same pattern as
`does_graph_contains()` in `rag/graphrag/utils.py`: filter by
`knowledge_graph_kwd` only, then match `source_id` in Python. This
avoids ambiguity across Elasticsearch / Infinity / OceanBase backends.

### Changes

| File | Change |
|---|---|
| `api/db/services/task_checkpoint.py` (new) |
`load_subgraph_from_store()` and `has_raptor_chunks()` — docEngine-based
checkpoint queries |
| `rag/graphrag/general/index.py` | `build_one()` calls
`load_subgraph_from_store()` before running LLM extraction |
| `rag/svr/task_executor.py` | RAPTOR per-doc loop calls
`has_raptor_chunks()` before processing |
| `test/unit_test/rag/graphrag/test_checkpoint_resume.py` (new) | 10
unit tests covering subgraph loading, source_id filtering, edge cases |

### How it works

- **GraphRAG:** Before running expensive LLM entity/relation extraction
for a doc, checks the doc store for an existing subgraph (saved by a
previous interrupted run). If found, loads it directly and skips LLM
calls.
- **RAPTOR:** Before processing a doc, checks if RAPTOR chunks
(`raptor_kwd="raptor"`) already exist for it. If yes, skips.

### Testing

- 10 new unit tests — all passing
- Full existing suite: 617 passed

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-04-15 17:37:28 +08:00
euvre
3364d86e6b Auto-inject knowledge parameter in async_chat when prompt_config is missing it (#14121)
### What problem does this PR solve?

Resolve #14115 .

## Problem

On the shared chat link page (`/chats/share?shared_id=...`), querying
the knowledge base returns "no relevant information was found", while
the same query works correctly on the editor chat page.

## Root Cause

Knowledge base retrieval in `async_chat()` is gated by the check `if
"knowledge" in param_keys` (line 598), where `param_keys` is derived
from `prompt_config["parameters"]`. If `parameters` is empty or missing
the `{"key": "knowledge", "optional": false}` entry, retrieval is
entirely skipped.

This can happen because `_apply_prompt_defaults()` — which ensures
`parameters` contains the `knowledge` entry — is only called in the
`create` (POST) and `update_chat` (PUT) handlers, but **not** in
`patch_chat` (PATCH). If a chat's `prompt_config` was updated via PATCH
without including `parameters`, the `knowledge` entry would be absent.
Additionally, `prompt_config["parameters"]` would raise a `KeyError` if
the key was missing entirely.

## Fix

Added a defensive safety net in `async_chat()`
(`api/db/services/dialog_service.py`) that auto-injects the `knowledge`
parameter when:
- `dialog.kb_ids` is set (knowledge bases are configured)
- `"knowledge"` is not already in `param_keys`
- `{knowledge}` placeholder exists in the system prompt

Also changed `prompt_config["parameters"]` to
`prompt_config.get("parameters", [])` to prevent `KeyError` when the key
is absent.

## Files Changed

- `api/db/services/dialog_service.py` — added auto-injection of
`knowledge` parameter and safe `.get()` access for `parameters`


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-15 17:31:31 +08:00
Ea001
38cefd88e2 Fix tag_feas code injection in retrieval ranking (#13923)
## Summary
- remove eval-based parsing from retrieval rank feature scoring
- validate `tag_feas` at write time in chunk APIs and SDK routes
- add regression tests for safe parsing and malicious payload rejection

## Details
`tag_feas` is intended to be structured rank-feature data, but the
retrieval ranking path was evaluating stored values as Python
expressions. This change treats `tag_feas` strictly as data.

### What changed
- replace `eval()` in `rag/nlp/search.py` with safe parsing via
`json.loads()` and optional `ast.literal_eval()` compatibility for
legacy Python-dict strings
- strictly filter parsed values down to `dict[str, finite number]`
- reject invalid `tag_feas` payloads at write time in web chunk routes
and SDK document chunk routes
- add focused regression tests to prove executable strings are ignored
and invalid payloads are rejected

## Validation
- `python -m pytest test/unit_test/common/test_tag_feature_utils.py
test/unit_test/rag/test_rank_feature_scores.py -q`

---------

Co-authored-by: unknown <zhenglinkai@CCN.Local>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-04-15 16:31:11 +08:00
Eden
1f33ca1099 fix(dialog): restore decorated answer in async_ask final SSE event (#13917)
## What's the problem

Both `async_chat()` and `async_ask()` call `decorate_answer()` to build
the final SSE payload — it inserts citation markers (`##N$$`) into the
answer text and prunes `doc_aggs` to only the cited documents.
Immediately after, both functions overwrite `final["answer"]` with `""`:

```python
# async_chat(), line ~774  (issue #13828)
final = decorate_answer(thought + full_answer)
final["final"] = True
final["audio_binary"] = None
final["answer"] = ""   # discards decorated text
yield final

# async_ask(), line ~1444  (same bug, different path)
final = decorate_answer(full_answer)
final["final"] = True
final["answer"] = ""   # discards decorated text
yield final
```

The client receives filtered references (built for a citation-decorated
answer it never sees) while displaying the raw, undecorated streaming
text. Citations can never match.

## Root cause

`final["answer"] = ""` was left over from an earlier design where
clients were meant to reconstruct the full answer purely from delta
events. Once `decorate_answer()` started placing citation markers, this
blank-out broke the contract: the final event is where the decorated
answer should land.

## Fix

Remove the two blank-override lines — one in `async_chat()`, one in
`async_ask()`:

```diff
-    final["answer"] = ""
```

`decorate_answer()` already sets `final["answer"]` to the correct
decorated string; there is nothing to override.

## Relation to #13828

Issue #13828 and PR #13835 identify the bug in `async_chat()`. This PR
absorbs that fix and also corrects the identical pattern in
`async_ask()` (used by the `/retrieval` route in `chat_api.py`), which
PR #13835 does not touch.

## Regression test

Added
`test/unit_test/api/db/services/test_dialog_service_final_answer.py`
with three tests:

| Test | Purpose |
|------|---------|
| `test_buggy_pattern_drops_answer` | Documents the old behaviour:
blank-override empties the final answer |
| `test_fixed_pattern_preserves_decorated_answer` | Core invariant:
final event carries the decorated text from `decorate_answer()` |
| `test_final_event_reference_matches_decorated_result` | Citation
markers in the answer must match the pruned `doc_aggs` in the same event
|

Local run result:

```
test_dialog_service_final_answer.py::test_buggy_pattern_drops_answer         PASSED
test_dialog_service_final_answer.py::test_fixed_pattern_preserves_decorated_answer PASSED
test_dialog_service_final_answer.py::test_final_event_reference_matches_decorated_result PASSED

3 passed in 0.04s
```

`ruff check` passes with no issues on all changed files.

---------

Co-authored-by: edenfunf <edenfunf@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-15 14:10:36 +08:00
balibabu
f08d13287a Feat: Edit the code of the code operator from a broad perspective. (#14116)
### What problem does this PR solve?

Feat: Edit the code of the code operator from a broad perspective.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-15 11:51:17 +08:00
chanx
2d291cd841 fix(flow): Fix text descriptions for multi-column layout options. (#14107)
### What problem does this PR solve?

fix(flow): Fix text descriptions for multi-column layout options.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-15 11:50:58 +08:00
Jin Hai
a0a4029f01 Fix document (#14118)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-15 11:35:16 +08:00
Jack
bc5f78996b Consolidateion of document upload API (#14106)
### What problem does this PR solve?

Consolidation WEB API & HTTP API for document upload

Before consolidation
Web API: POST /v1/document/upload
Http API - POST /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- POST
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-15 11:27:43 +08:00
xinmotlanthua
e1dede1366 fix(web): replace hardcoded English strings with i18n in floating chat widget (#14095)
## Summary
- Replace 3 hardcoded English strings in `floating-chat-widget.tsx` with
`react-i18next` `t()` calls so the widget respects the `locale` query
parameter
- Add `useTranslation` hook to the component
- Add translation keys (`chat.chatSupport`, `chat.replyInstantly`,
`chat.typeYourMessage`) to all 14 locale files

## Strings replaced
| Original | i18n key |
|---|---|
| `'Chat Support'` | `t('chat.chatSupport')` |
| `'We typically reply instantly'` | `t('chat.replyInstantly')` |
| `'Type your message...'` | `t('chat.typeYourMessage')` |

Closes #14072

Co-authored-by: khanhkhanhlele <namkhanh2172@gmail.com>
2026-04-14 20:12:56 +08:00
akie
a98b64326c Add warning log when metadata query hits 10000 result limit (#14109)
## What problem does this PR solve?

Add a warning log when `get_flatted_meta_by_kbs` returns 10,000 results,
which indicates the query limit has been reached and metadata may be
silently truncated.


## Type of change
- [x] Improvement (non-breaking change which improves observability)
2026-04-14 20:04:32 +08:00
NeedmeFordev
1a1b5aa53e Fix: respect the internet toggle before running Tavily web search (#14051) (#14052)
### What problem does this PR solve?

Fixes #14051.

The chat UI already sends an `internet` flag with each request, but the
backend previously triggered Tavily web retrieval whenever
`prompt_config.tavily_api_key` was configured. As a result, web search
could still run even when the internet toggle was off.

This PR makes web search an explicit opt-in at request time:
- `tavily_api_key` only indicates that web search is available
- Tavily retrieval runs only when `internet` is explicitly enabled
- the same behavior now applies to both the normal retrieval path and
the deep-research / reasoning path

This also fixes the no-KB fallback case so chats without KBs fall back
to normal solo chat when `internet` is off.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 19:55:20 +08:00
Jin Hai
8e9cef3687 Remove unused API (#14046)
### What problem does this PR solve?

1. Remove unused token related API
2. Fix typo

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-14 19:32:16 +08:00
chanx
912fedc9b9 Fix: metadata bug (#14105)
### What problem does this PR solve?

Fix: metadata bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 18:45:09 +08:00