Compare commits

..

558 Commits

Author SHA1 Message Date
Liu An
a33d0737cd Docs: Update version references to v0.25.0 in READMEs and docs (#14257)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.24.0 to v0.25.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-04-21 17:26:50 +08:00
Lynn
afdf0814d7 Fix: get metadata conf (#14250)
### What problem does this PR solve?

Get metadata configuration from union of custom metadata and
built_in_metadata.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 17:22:42 +08:00
writinwaters
0db2d544a9 Docs: 0.25.0 agent apps can be published. (#14252)
### What problem does this PR solve?

Agent apps can be published.

### Type of change

- [x] Documentation Update
2026-04-21 16:56:11 +08:00
balibabu
4841ce4239 Fix: Component definition is missing display name. (#14255)
### What problem does this PR solve?

Fix: Component definition is missing display name.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 16:53:08 +08:00
Jin Hai
e48d75987c Go: add stream / think chat (#14242)
### What problem does this PR solve?

1. Supports stream and non-stream chat
2. Supports think and non-think chat
3. List supported models from DeepSeek service. (This command can be
used to verify the API validity)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-21 16:52:32 +08:00
balibabu
a2bea30749 Fix: Editing an empty response in the retrieval operator will cause the focus to shift to the metadata input box. (#14253)
### What problem does this PR solve?

Fix: Editing an empty response in the retrieval operator will cause the
focus to shift to the metadata input box.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 16:19:55 +08:00
chanx
05c39b90a8 Fix: pipeline parser log not display (#14251)
### What problem does this PR solve?

Fix: pipeline parser log not display

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 15:24:13 +08:00
Liu An
6e33d8722f Revert "Fix: forwarding highlight param" (#14249)
Reverts infiniflow/ragflow#14112
2026-04-21 15:23:18 +08:00
balibabu
78b800e685 Fix: Fix: The minimum value for the "Suggested text block size" input box is set to 1. (#14246)
### What problem does this PR solve?

Fix: The minimum value for the "Suggested text block size" input box is
set to 1.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 14:06:36 +08:00
Magicbook1108
b3891ba6a4 Fix audio/video in pipeline (#14241)
### What problem does this PR solve?

Fix audio/video in pipeline

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 12:17:57 +08:00
Wang Qi
8aab158942 OpenSource Resume is supported only with Elasticsearch. (#14233)
### What problem does this PR solve?

OpenSource Resume is supported only with Elasticsearch.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-21 10:05:47 +08:00
Jin Hai
f269ee9739 Go: add thinking features to zhipu-ai (#14234)
### What problem does this PR solve?

```
RAGFlow(user)> list models from 'zhipu-ai';
+------------+------------+---------------+----------------+
| features   | max_tokens | model_types   | name           |
+------------+------------+---------------+----------------+
| [thinking] | 128000     | [chat]        | glm-4.7        |
| [thinking] | 128000     | [chat]        | glm-4.5        |
| [thinking] | 128000     | [chat vision] | glm-4.6v-Flash |
| [thinking] | 128000     | [chat]        | glm-4.5-x      |
| [thinking] | 128000     | [chat]        | glm-4.5-air    |
| [thinking] | 128000     | [chat]        | glm-4.5-airx   |
| [thinking] | 128000     | [chat]        | glm-4.5-flash  |
| [thinking] | 64000      | [vision]      | glm-4.5v       |
|            | 128000     | [chat]        | glm-4-plus     |
|            | 128000     | [chat]        | glm-4-0520     |
|            | 128000     | [chat]        | glm-4          |
|            | 8000       | [chat]        | glm-4-airx     |
|            | 128000     | [chat]        | glm-4-air      |
|            | 128000     | [chat]        | glm-4-flash    |
|            | 128000     | [chat]        | glm-4-flashx   |
|            | 1000000    | [chat]        | glm-4-long     |
|            | 128000     | [chat]        | glm-3-turbo    |
|            | 2000       | [vision]      | glm-4v         |
|            | 8192       | [chat]        | glm-4-9b       |
|            | 512        | [embedding]   | embedding-2    |
|            | 512        | [embedding]   | embedding-3    |
|            | 4096       | [asr]         | glm-asr        |
|            | 0          | [tts]         | glm-tts        |
|            | 0          | [ocr]         | glm-ocr        |
|            | 0          | [rerank]      | glm-rerank     |
+------------+------------+---------------+----------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-20 21:53:27 +08:00
balibabu
c43367eca3 Fix: The number of chunks in the file list is not displayed. (#14232)
### What problem does this PR solve?

Fix: The number of chunks in the file list is not displayed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 19:24:20 +08:00
balibabu
5265def967 Fix: The mind map on the search page does not display completely upon initial loading. (#14226)
### What problem does this PR solve?

Fix: The mind map on the search page does not display completely upon
initial loading.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 19:24:13 +08:00
Julian
ba7d3f6c31 Add debugpy dependency to pyproject.toml (#14225)
In order to attach the debugger to a running docker container it has to
be inside the docker image

### What problem does this PR solve?

[#14224](https://github.com/infiniflow/ragflow/issues/14224)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 18:05:17 +08:00
NeedmeFordev
78c3583964 Fix memory resolution regression for multimodal Gemini models (#14209)
### What problem does this PR solve?

Fixes #14206.

This issue is a regression. PR #9520 previously changed Gemini models
from `image2text` to `chat` to fix chat-side resolution, but PR #13073
later restored those Gemini entries to `image2text` during model-list
updates, which reintroduced the bug.

The underlying problem is that Gemini models are multimodal and
advertise both `CHAT` and `IMAGE2TEXT`, while tenant model resolution
still depends on a single stored `model_type`. That makes chat-only
flows such as memory extraction fragile when a compatible model is
stored as `image2text`.

This PR fixes the issue at the model resolution layer instead of
changing `llm_factories.json` again:
- keep the stored tenant model type unchanged
- try exact `model_type` lookup first
- if no exact match is found, fall back only when the model metadata
shows the requested capability is supported
- coerce the runtime config to the requested type for chat callers
- fail fast in memory creation instead of silently persisting
`tenant_llm_id=0`

This preserves existing multimodal and `image2text` behavior while
restoring chat compatibility for memory-related flows.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Testing

- Re-checked the current memory creation and memory message extraction
paths against the updated resolution logic
- Verified locally that a Gemini-style tenant model stored as
`image2text` but tagged with `CHAT` can still be resolved for `chat`
- Verified `get_model_config_by_type_and_name(..., CHAT, ...)` returns a
chat-compatible runtime config
- Verified `get_model_config_by_id(..., CHAT)` also returns a
chat-compatible runtime config
- Verified strict resolution still fails when the model metadata does
not advertise chat capability
2026-04-20 16:37:36 +08:00
Magicbook1108
9c7c105007 Fix: Doc generator (#14223)
### What problem does this PR solve?

Doc generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 16:37:33 +08:00
Jin Hai
af2ed416a7 Add extra field to model instance (#14203)
### What problem does this PR solve?

Now each model support region with different URL

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-20 15:31:12 +08:00
Jack
939933649a Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176)
### What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/list
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-20 14:54:40 +08:00
Magicbook1108
d053317c4d Fix: variable in doc generator (#14180)
### What problem does this PR solve?

Fix: variable in doc generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 14:19:42 +08:00
Magicbook1108
19eedeec61 Fix: accept empty value as 0 chunk (#14220)
### What problem does this PR solve?

Fix: accept empty value as 0 chunk
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 12:53:47 +08:00
LeonTung
f554f6ae85 chore(docs): tips for installing CN fonts (#14189)
### What problem does this PR solve?
Add tips for installing Chinse fonts under code sandbox. Otherwise,
`matplotlib `won't render Chinese correctly.

<img width="2082" height="1186" alt="sales_analysis"
src="https://github.com/user-attachments/assets/57e675ab-1e92-4662-9aeb-ad72a6121eb5"
/>



### Type of change

- [x] Documentation Update
2026-04-20 12:11:23 +08:00
Lynn
0f806dc3ca Feat: mysql sync (#14200)
### What problem does this PR solve?

Add a script to sync db schema with peewee_migrate.

### Type of change

- [x] Other (please describe): tool script
2026-04-20 11:40:01 +08:00
rhinoceros.xn
4e992de91f Add tongyi gte-rerank-v2 (#14215)
https://bailian.console.aliyun.com/cn-beijing?tab=api#/api/?type=model&url=2780056

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change
- [x] Other (please describe): add gte-rerank-v2、qwen3-rerank
2026-04-20 11:39:17 +08:00
Liu An
d5c306de30 Fix: remove unit test checkpoint resume (#14216)
### What problem does this PR solve?

remove unit test checkpoint resume

### Type of change

- [x] Performance Improvement
2026-04-20 11:27:40 +08:00
euvre
84b6069ec7 fix: escape single quotes in Infinity SQL filter conditions (#14186)
### What problem does this PR solve?

## Summary

Fixes #5939

Entity names containing single quotes (e.g., `投影直线L'`) caused SQL syntax
errors when building filter conditions for Infinity queries, due to
unescaped string interpolation in `equivalent_condition_to_str`.

## Changes

In `common/doc_store/infinity_conn_base.py`, added `.replace("'", "''")`
escaping for string values in two branches of
`equivalent_condition_to_str` where it was missing:

1. **`field_keyword` branch with non-list value** (line 190): The list
branch already escaped single quotes on line 183, but the single-string
branch did not.
2. **Plain string value branch** (line 209): Direct f-string
interpolation `{k}='{v}'` was vulnerable to unescaped quotes.

Both fixes use the same SQL-standard escape pattern (`'` → `''`) already
applied elsewhere in this method.

## How to Test

1. Upload a document containing entity names with single quotes.
2. Enable Knowledge Graph (GraphRAG) in the parsing configuration.
3. Initiate document parsing — it should complete without SQL syntax
errors.

## Note

The original issue also reported a typo (`dge_graph_kwd` instead of
`knowledge_graph_kwd`), which has already been fixed in the current
codebase.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-20 10:04:07 +08:00
balibabu
6712b504e6 Fix: Clicking on the empty dialog box on the agent exploration page will result in an error. (#14198)
### What problem does this PR solve?

Fix: Clicking on the empty dialog box on the agent exploration page will
result in an error.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 23:52:13 +08:00
Lynn
c3387cd5b8 Fix: parent child config (#14199)
### What problem does this PR solve?

Correctly set and display parent-child config in parser_config, and
allow to pass `tenant_id` in PATCH `/api/v1/chats`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 23:02:42 +08:00
balibabu
09622c6353 Fix: Spaces cannot be entered in the code editor of the code operator. (#14183)
### What problem does this PR solve?

Fix: Spaces cannot be entered in the code editor of the code operator.

[Monaco Editor with XYFlow fails to accept most space bar keypresses,
who is at fault?
#5204](https://github.com/microsoft/monaco-editor/discussions/5204)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:41:39 +08:00
balibabu
fa644c5a15 Fix: The embedded page for search is inaccessible. (#14194)
### What problem does this PR solve?

Fix: The embedded page for search is inaccessible.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:37:34 +08:00
chanx
60506ef7a5 fix: Add internationalization configurations related to text segmentation identifiers. (#14201)
### What problem does this PR solve?

fix: Add internationalization configurations related to text
segmentation identifiers.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:37:14 +08:00
balibabu
3a4d17cb0d Fix: The placeholder in PromptEditor is obscured. (#14179)
### What problem does this PR solve?

Fix: The placeholder in PromptEditor is obscured.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 21:02:41 +08:00
Daniil Sivak
22c6648348 Fix: forwarding highlight param (#14112)
Closes #9078

### What problem does this PR solve?

The `retrieval_test` endpoint in `chunk_app.py` never forwarded the
`highlight` request parameter to `retriever.retrieval()`, so the search
engine never produced highlight snippets. Additionally, the frontend
always rendered `content_with_weight` instead of preferring the
`highlight` field, and the CSS rule color `var(--accent-primary)` didn't
work because the variable stores an RGB triplet `(45,212,191)` requiring
the `rgb()` wrapper.

### Before

- Search page: displayed raw content_with_weight as a wall of plain
white text with no term highlighting, including markdown headings
rendered as literal text
- Retrieval testing page: showed `content_with_weight` in a plain `<p>`
tag, no `<em>` tags rendered, no highlight coloring
- Children chunks: when child chunks were consolidated into a parent via
`retrieval_by_children`, any highlight data from children was discarded
- TOC chunks: chunks fetched via `retrieval_by_toc` had no `highlight`
field, appearing as plain text while other chunks had highlights

**Retrieval testing**:
<img width="1449" height="1178"
alt="before-retrieval-no-highlight-cropped"
src="https://github.com/user-attachments/assets/5c6f5a5e-6c11-461a-bdb4-049d7dfb7a33"
/>

**Search**:
<img width="1378" height="711" alt="before-search-no-highlight-cropped"
src="https://github.com/user-attachments/assets/be7b5152-72ef-40da-a8fd-921e997ae7d3"
/>

### After

- Search page: displays the highlight field with search terms rendered
in teal/cyan color (`rgb(var(--accent-primary))`)
- Retrieval testing page: sends highlight: true in the request, uses
`HighLightMarkdown` component to render `<em>` tags with proper coloring
- Children chunks: highlights from child chunks are joined and preserved
on the parent
- TOC chunks: when other chunks have highlights, TOC-fetched chunks use
`content_with_weight` as a highlight fallback

**Retrieval testing**:
<img width="1410" height="1015" alt="05-retrieval-testing-results"
src="https://github.com/user-attachments/assets/f0cff8cf-0962-4320-b559-cd5037f622d2"
/>

**Search**:
<img width="1294" height="455" alt="03-search-highlight-results"
src="https://github.com/user-attachments/assets/a90e0e3e-3837-46be-8ddd-2412ff7cbc19"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 20:59:20 +08:00
Yongteng Lei
fac46ef67f Refa: change Minimax base url to mainland by default to align with UI (#14195)
### What problem does this PR solve?

Change Minimax base url to mainland by default to align with UI.

### Type of change

- [x] Refactoring
2026-04-17 19:08:57 +08:00
dependabot[bot]
b34a726acd Build(deps): Bump pypdf from 6.9.2 to 6.10.2 (#14184)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.9.2 to 6.10.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/releases">pypdf's
releases</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)
by <a href="https://github.com/Ygnas"><code>@​Ygnas</code></a></li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)
by <a href="https://github.com/j-t-1"><code>@​j-t-1</code></a></li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)
by <a href="https://github.com/rassie"><code>@​rassie</code></a></li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)
by <a
href="https://github.com/astahlman"><code>@​astahlman</code></a></li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)
by <a
href="https://github.com/ReinerBRO"><code>@​ReinerBRO</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's
changelog</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)</li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)</li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)</li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)</li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)</li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)</li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)</li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c476b4f293"><code>c476b4f</code></a>
REL: 6.10.2</li>
<li><a
href="c50a0104cf"><code>c50a010</code></a>
SEC: Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li><a
href="ac734dab4e"><code>ac734da</code></a>
SEC: Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
<li><a
href="b49e7eb454"><code>b49e7eb</code></a>
REL: 6.10.1</li>
<li><a
href="62338e9d36"><code>62338e9</code></a>
SEC: Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
<li><a
href="5dcc0aebaa"><code>5dcc0ae</code></a>
DEV: Update pytest-benchmark to 5.2.3</li>
<li><a
href="b42e4aa98a"><code>b42e4aa</code></a>
DEV: Update pinned pillow and pytest where possible (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3732">#3732</a>)</li>
<li><a
href="717446b121"><code>717446b</code></a>
ROB: Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
<li><a
href="9e461d361b"><code>9e461d3</code></a>
DEV: Bump softprops/action-gh-release from 2 to 3 (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3730">#3730</a>)</li>
<li><a
href="500d09d92f"><code>500d09d</code></a>
TST: Update <code>test_embedded_file__basic</code> to use
<code>tmp_path</code> fixture (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3726">#3726</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.9.2&new-version=6.10.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-17 18:43:19 +08:00
Jin Hai
94106646e7 Go: set and list default models (#14191)
### What problem does this PR solve?

```
RAGFlow(user)> set default vlm "zhipu-ai" "ccc" "glm-4.6v-flash";
SUCCESS
RAGFlow(user)> list default models;
+--------+----------------+----------------+----------------+------------+
| enable | model_instance | model_name     | model_provider | model_type |
+--------+----------------+----------------+----------------+------------+
| true   | ccc            | glm-4.6v-flash | zhipu-ai       | llm        |
| true   | ccc            | glm-4.6v-flash | zhipu-ai       | image2text |
+--------+----------------+----------------+----------------+------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-17 18:05:33 +08:00
Wang Qi
28d8b1c883 [Fix] trivial fix log creation (#14181)
### What problem does this PR solve?

Trivial fix log creation, follow on PR:
https://github.com/infiniflow/ragflow/pull/14136

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 13:13:41 +08:00
Magicbook1108
797aa6076a Fix: keyword extraction (#14177)
### What problem does this PR solve?

Fix: keyword extraction

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 11:32:48 +08:00
LeonTung
c3bf8d9d60 feat(templates): add a data analysis agent template (#14130)
### What problem does this PR solve?

Add a new agent template that demonstrates how to leverage the
`CodeExec` component to do the data analysis.

### Type of change

- [x] Other (please describe): Agent template
2026-04-17 11:32:04 +08:00
writinwaters
0df5d830d4 Refact: Updated agent template descriptions. (#14175)
### What problem does this PR solve?

Updated ingestion pipeline template descriptions for better technical
accuracy and readability.

### Type of change

- [x] Refactoring
2026-04-17 10:46:06 +08:00
Lynn
f194a09cd6 Fix: dataset update parent child (#14167)
### What problem does this PR solve?

Correctly set parent child config in parser_config.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 10:41:50 +08:00
Jin Hai
e03212fd7a Fix go cli models command and api (#14166)
### What problem does this PR solve?

```
RAGFlow(user)> list providers;
+--------------------------------------+----------+-------------------------------------------+--------------+
| base_url                             | name     | tags                                      | total_models |
+--------------------------------------+----------+-------------------------------------------+--------------+
| https://open.bigmodel.cn/api/paas/v4 | ZHIPU-AI | LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION | 21           |
| https://api.x.ai/v1                  | xAI      | LLM                                       | 6            |
+--------------------------------------+----------+-------------------------------------------+--------------+
RAGFlow(user)> show provider 'zhipu-ai';
+--------------------------------------+----------+-------------------------------------------+--------------+
| base_url                             | name     | tags                                      | total_models |
+--------------------------------------+----------+-------------------------------------------+--------------+
| https://open.bigmodel.cn/api/paas/v4 | ZHIPU-AI | LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION | 21           |
+--------------------------------------+----------+-------------------------------------------+--------------+
RAGFlow(user)> delete provider 'zhipu-ai';
SUCCESS
RAGFlow(user)> add provider 'zhipu-ai';
SUCCESS
RAGFlow(user)> create provider 'zhipu-ai' instance 'ccc' 'ccxxccxx';
SUCCESS
RAGFlow(user)> list instances from 'zhipu-ai';
+---------------------------------------------------+----------------------------------+--------------+----------------------------------+--------+
| apiKey                                            | id                               | instanceName | providerID                       | status |
+---------------------------------------------------+----------------------------------+--------------+----------------------------------+--------+
| ccxxccxx | 640dd7ee398711f1bdd838a74640adcc | ccc          | d1d59de5398411f1bdd838a74640adcc | active |
+---------------------------------------------------+----------------------------------+--------------+----------------------------------+--------+
RAGFlow(user)> list models from 'zhipu-ai';
+----------+------------+---------------+---------------+
| features | max_tokens | model_types   | name          |
+----------+------------+---------------+---------------+
| map[]    | 128000     | [chat]        | glm-4.7       |
| map[]    | 128000     | [chat]        | glm-4.5       |
| map[]    | 128000     | [chat]        | glm-4.5-x     |
| map[]    | 128000     | [chat]        | glm-4.5-air   |
| map[]    | 128000     | [chat]        | glm-4.5-airx  |
| map[]    | 128000     | [chat]        | glm-4.5-flash |
| map[]    | 64000      | [image2text]  | glm-4.5v      |
| map[]    | 128000     | [chat]        | glm-4-plus    |
| map[]    | 128000     | [chat]        | glm-4-0520    |
| map[]    | 128000     | [chat]        | glm-4         |
| map[]    | 8000       | [chat]        | glm-4-airx    |
| map[]    | 128000     | [chat]        | glm-4-air     |
| map[]    | 128000     | [chat]        | glm-4-flash   |
| map[]    | 128000     | [chat]        | glm-4-flashx  |
| map[]    | 1000000    | [chat]        | glm-4-long    |
| map[]    | 128000     | [chat]        | glm-3-turbo   |
| map[]    | 2000       | [image2text]  | glm-4v        |
| map[]    | 8192       | [chat]        | glm-4-9b      |
| map[]    | 512        | [embedding]   | embedding-2   |
| map[]    | 512        | [embedding]   | embedding-3   |
| map[]    | 4096       | [speech2text] | glm-asr       |
+----------+------------+---------------+---------------+
RAGFlow(user)> disable model 'glm-4.5-flash' from 'zhipu-ai' 'ccc';
SUCCESS
RAGFlow(user)> drop instance 'ccc' from 'zhipu-ai';
SUCCESS
RAGFlow(user)> list instances from 'zhipu-ai';
No data to print
```

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-17 09:55:25 +08:00
Wang Qi
96a23d2fd0 [Bug fix] fix bug found in regression when view chunks for document that not parsed in infinity, it would fail in UI (#14168)
### What problem does this PR solve?
See title, the fail image:
<img width="2667" height="915" alt="20260416-205718"
src="https://github.com/user-attachments/assets/0c564237-5ed0-49af-bf4c-d3b5519abc6e"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 09:51:23 +08:00
Magicbook1108
f906a203bb Fix doc generator (#14160)
### What problem does this PR solve?

Fix doc generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:38 +08:00
balibabu
4a9bfd18bc Fix: The PromptEditor's placeholder is only half displayed. (#14161)
### What problem does this PR solve?

Fix: The PromptEditor's placeholder is only half displayed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:16 +08:00
Magicbook1108
ea8de1bb47 Fix: different llm in chat (#14162)
### What problem does this PR solve?

Fix: different llm in chat

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 20:37:01 +08:00
writinwaters
8a874c7a09 Doc: Added Ingetrating Notion connector (#14163)
### What problem does this PR solve?

Added How to integrate Notion to RAGFlow.

### Type of change

- [x] Documentation Update
2026-04-16 20:06:02 +08:00
Lynn
655dd2f8c6 Fix: simplify _load_user (#14154)
### What problem does this PR solve?

Simplify _load_user, remove unused fallback.

### Type of change

- [x] Refactoring
2026-04-16 18:47:43 +08:00
balibabu
4cf4d444d2 Fix: Login page type error. (#14156)
### What problem does this PR solve?

Fix: Login page type error.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 18:46:52 +08:00
Magicbook1108
901023a80a Fix: literal eval http request input (#14145)
### What problem does this PR solve?

Fix: literal eval http request input

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

<img width="700" alt="img_v3_0210q_f4b49ff7-e670-4054-ab0e-9443a09215fg"
src="https://github.com/user-attachments/assets/089300be-06f9-4bb6-97af-61bf5f4a5e8c"
/>


<img width="700" alt="img_v3_0210q_398cd52a-2ad9-42be-8d5b-4e6e68a7d22g"
src="https://github.com/user-attachments/assets/239b43cd-a2a5-49d8-9200-991bb26336c8"
/>
2026-04-16 16:52:34 +08:00
euvre
9a785b26bd fix: change file size column from IntegerField to BigIntegerField to support files > 2GB (#14148)
### What problem does this PR solve?

Fixes #6034

Changes the `size` field in both `Document` and `File` models from
`IntegerField` (32-bit, max ~2GB) to `BigIntegerField` (64-bit, max
~9.2EB), and adds corresponding database migrations.

## Problem

When uploading a file larger than 2GB, the `size` value overflows a
32-bit signed integer (max 2,147,483,647). This causes:

- The stored `size` wraps around to an incorrect value (e.g., a 3GB file
shows as 2,097,152 KB in File Management).
- Subsequent file operations (e.g., download) fail because the corrupted
size leads to invalid storage lookups.

## Changes

- `Document.size`: `IntegerField` → `BigIntegerField`
- `File.size`: `IntegerField` → `BigIntegerField`
- Added `alter_db_column_type` migrations in `migrate_db()` for both
`document.size` and `file.size` columns to ensure existing deployments
are upgraded automatically.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-16 15:43:29 +08:00
euvre
0cd49e14dd fix: make Infinity connection pool size configurable and add retry logic for GraphRAG write bursts (#14143)
### What problem does this PR solve?

Resolve #14137 .

### Problem

Graph resolution succeeds (nodes/edges merged, pagerank updated), but
the subsequent burst of Infinity write operations in `set_graph`
exhausts the connection pool with `TOO_MANY_CONNECTIONS` errors. Root
causes:

1. **Hardcoded pool size** — `infinity_conn_pool.py` hardcoded
`ConnectionPool(max_size=4)` on initial creation and `max_size=32` on
refresh. Operators cannot tune this without patching code.
2. **No retry on transient failures** — a single `TOO_MANY_CONNECTIONS`
on edge deletes or chunk inserts kills the entire resolution+community
pipeline with no retry.

### Changes

#### `common/doc_store/infinity_conn_pool.py`

- Read `ConnectionPool` `max_size` from the `INFINITY_POOL_MAX_SIZE`
environment variable (default: `4`), applied consistently to both
initial creation and refresh paths.
- Log the actual pool size on startup for easier debugging.

#### `rag/graphrag/utils.py` — `set_graph()`

- **Edge deletes**: add exponential-backoff retry (3 attempts, 1s/2s/4s
delays) so transient `TOO_MANY_CONNECTIONS` errors are retried instead
of failing the entire job. Concurrency continues to be gated by the
existing `chat_limiter`.
- **Batch inserts**: add exponential-backoff retry (3 attempts, 1s/2s/4s
delays) for the same reason.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-16 15:40:54 +08:00
Qi Wang
969ce3a79f [Bug fix #14133] fix graph rag, raptor, mindmap log cannot show correctly in UI (#14136)
### What problem does this PR solve?
Fix #14133, knowledge graph, raptor, mindmap log cannot show correctly
in UI
<img width="1930" height="982" alt="Image"
src="https://github.com/user-attachments/assets/d2f8e6c1-d82d-4b00-a377-949aada545ca"
/>
After Fix:
<img width="2108" height="805" alt="image"
src="https://github.com/user-attachments/assets/b37426c1-83d3-4a32-a83c-9d340d69e0e6"
/>
<img width="2173" height="1067" alt="image"
src="https://github.com/user-attachments/assets/30105222-3310-43a0-9f83-1e320d05e413"
/>

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 13:08:36 +08:00
Yongteng Lei
356ba5650a Fix: sandbox don't attach attachment metadata (#14135)
### What problem does this PR solve?

Sandbox don't attach attachment metadata

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 12:08:54 +08:00
balibabu
53154b2cc3 Feat: Add a title prefix to the testid on the login page. (#14129)
### What problem does this PR solve?

Feat: Add a title prefix to the testid on the login page.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-16 12:08:44 +08:00
Magicbook1108
944a90d645 Feat: add button to turn off vlm parsing (#14125)
### What problem does this PR solve?

Feat: add button to turn off vlm parsing

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chanx <1243304602@qq.com>
2026-04-15 19:06:00 +08:00
chanx
dce0b1c030 Fix: Pipeline page style optimizations (#14128)
### What problem does this PR solve?

Fix: Pipeline page style optimizations

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-15 19:05:54 +08:00
Daniil Sivak
c93ec0a1f3 Fix: reject empty/space-only content in update_chunk API (#14082)
Closes #6541

### What problem does this PR solve?

Add content validation to `update_chunk` (SDK and non-SDK) to reject
empty or whitespace-only content before it reaches the embedding model.

**Before:** Calling `update_chunk` with space-only content (like `" "`,
`""`, `"\n"`) bypassed validation and was sent directly to the embedding
model, which returned an error. This was the same bug previously fixed
for `add_chunk` in #6390, but `update_chunk` was missed.

**After:** Empty/whitespace-only content is caught by validation and
returns an error: `` `content` is required ``

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-15 18:43:53 +08:00
Magicbook1108
d51789e2be Feat: update templates && add resume template (#14124)
### What problem does this PR solve?

Feat: update templates  && add resume template

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-15 18:42:29 +08:00
balibabu
c56a7f99d1 Fix: The pop-up menu of the PromptEditor will be blocked. #14126 (#14127)
### What problem does this PR solve?

Fix: The pop-up menu of the PromptEditor will be blocked.  #14126

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-15 18:42:02 +08:00
writinwaters
2520065c5a Doc: Added Integrate Confluence (#14131)
### What problem does this PR solve?

Added a guide on integrating Confluence as connector.

### Type of change

- [x] Documentation Update
2026-04-15 18:38:36 +08:00
Minal Mahala
f930389311 Refact: improve task resume mechanism for graphrag (#14096)
### What problem does this PR solve?

Addresses review feedback on #14074 (Checkpoint mechanism for
long-running workflow jobs, issue #12494).

**Changes based on @yuzhichang's review:**

1. **Renamed `checkpoint_service.py` → `task_checkpoint.py`** as
suggested.
2. **Replaced Redis with direct docEngine queries** as suggested — the
subgraph already gets persisted to the doc store by
`generate_subgraph()`, so we just query for it instead of maintaining a
separate checkpoint in Redis. This is simpler, has no extra dependency,
and uses a single source of truth.

**Changes based on CodeRabbit review:**

3. **Fixed `source_id` query format mismatch** — subgraphs are stored
with `source_id: [doc_id]` (list), but the original query used
`source_id: doc_id` (string). Now follows the same pattern as
`does_graph_contains()` in `rag/graphrag/utils.py`: filter by
`knowledge_graph_kwd` only, then match `source_id` in Python. This
avoids ambiguity across Elasticsearch / Infinity / OceanBase backends.

### Changes

| File | Change |
|---|---|
| `api/db/services/task_checkpoint.py` (new) |
`load_subgraph_from_store()` and `has_raptor_chunks()` — docEngine-based
checkpoint queries |
| `rag/graphrag/general/index.py` | `build_one()` calls
`load_subgraph_from_store()` before running LLM extraction |
| `rag/svr/task_executor.py` | RAPTOR per-doc loop calls
`has_raptor_chunks()` before processing |
| `test/unit_test/rag/graphrag/test_checkpoint_resume.py` (new) | 10
unit tests covering subgraph loading, source_id filtering, edge cases |

### How it works

- **GraphRAG:** Before running expensive LLM entity/relation extraction
for a doc, checks the doc store for an existing subgraph (saved by a
previous interrupted run). If found, loads it directly and skips LLM
calls.
- **RAPTOR:** Before processing a doc, checks if RAPTOR chunks
(`raptor_kwd="raptor"`) already exist for it. If yes, skips.

### Testing

- 10 new unit tests — all passing
- Full existing suite: 617 passed

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-04-15 17:37:28 +08:00
euvre
3364d86e6b Auto-inject knowledge parameter in async_chat when prompt_config is missing it (#14121)
### What problem does this PR solve?

Resolve #14115 .

## Problem

On the shared chat link page (`/chats/share?shared_id=...`), querying
the knowledge base returns "no relevant information was found", while
the same query works correctly on the editor chat page.

## Root Cause

Knowledge base retrieval in `async_chat()` is gated by the check `if
"knowledge" in param_keys` (line 598), where `param_keys` is derived
from `prompt_config["parameters"]`. If `parameters` is empty or missing
the `{"key": "knowledge", "optional": false}` entry, retrieval is
entirely skipped.

This can happen because `_apply_prompt_defaults()` — which ensures
`parameters` contains the `knowledge` entry — is only called in the
`create` (POST) and `update_chat` (PUT) handlers, but **not** in
`patch_chat` (PATCH). If a chat's `prompt_config` was updated via PATCH
without including `parameters`, the `knowledge` entry would be absent.
Additionally, `prompt_config["parameters"]` would raise a `KeyError` if
the key was missing entirely.

## Fix

Added a defensive safety net in `async_chat()`
(`api/db/services/dialog_service.py`) that auto-injects the `knowledge`
parameter when:
- `dialog.kb_ids` is set (knowledge bases are configured)
- `"knowledge"` is not already in `param_keys`
- `{knowledge}` placeholder exists in the system prompt

Also changed `prompt_config["parameters"]` to
`prompt_config.get("parameters", [])` to prevent `KeyError` when the key
is absent.

## Files Changed

- `api/db/services/dialog_service.py` — added auto-injection of
`knowledge` parameter and safe `.get()` access for `parameters`


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-15 17:31:31 +08:00
Ea001
38cefd88e2 Fix tag_feas code injection in retrieval ranking (#13923)
## Summary
- remove eval-based parsing from retrieval rank feature scoring
- validate `tag_feas` at write time in chunk APIs and SDK routes
- add regression tests for safe parsing and malicious payload rejection

## Details
`tag_feas` is intended to be structured rank-feature data, but the
retrieval ranking path was evaluating stored values as Python
expressions. This change treats `tag_feas` strictly as data.

### What changed
- replace `eval()` in `rag/nlp/search.py` with safe parsing via
`json.loads()` and optional `ast.literal_eval()` compatibility for
legacy Python-dict strings
- strictly filter parsed values down to `dict[str, finite number]`
- reject invalid `tag_feas` payloads at write time in web chunk routes
and SDK document chunk routes
- add focused regression tests to prove executable strings are ignored
and invalid payloads are rejected

## Validation
- `python -m pytest test/unit_test/common/test_tag_feature_utils.py
test/unit_test/rag/test_rank_feature_scores.py -q`

---------

Co-authored-by: unknown <zhenglinkai@CCN.Local>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-04-15 16:31:11 +08:00
Eden
1f33ca1099 fix(dialog): restore decorated answer in async_ask final SSE event (#13917)
## What's the problem

Both `async_chat()` and `async_ask()` call `decorate_answer()` to build
the final SSE payload — it inserts citation markers (`##N$$`) into the
answer text and prunes `doc_aggs` to only the cited documents.
Immediately after, both functions overwrite `final["answer"]` with `""`:

```python
# async_chat(), line ~774  (issue #13828)
final = decorate_answer(thought + full_answer)
final["final"] = True
final["audio_binary"] = None
final["answer"] = ""   # discards decorated text
yield final

# async_ask(), line ~1444  (same bug, different path)
final = decorate_answer(full_answer)
final["final"] = True
final["answer"] = ""   # discards decorated text
yield final
```

The client receives filtered references (built for a citation-decorated
answer it never sees) while displaying the raw, undecorated streaming
text. Citations can never match.

## Root cause

`final["answer"] = ""` was left over from an earlier design where
clients were meant to reconstruct the full answer purely from delta
events. Once `decorate_answer()` started placing citation markers, this
blank-out broke the contract: the final event is where the decorated
answer should land.

## Fix

Remove the two blank-override lines — one in `async_chat()`, one in
`async_ask()`:

```diff
-    final["answer"] = ""
```

`decorate_answer()` already sets `final["answer"]` to the correct
decorated string; there is nothing to override.

## Relation to #13828

Issue #13828 and PR #13835 identify the bug in `async_chat()`. This PR
absorbs that fix and also corrects the identical pattern in
`async_ask()` (used by the `/retrieval` route in `chat_api.py`), which
PR #13835 does not touch.

## Regression test

Added
`test/unit_test/api/db/services/test_dialog_service_final_answer.py`
with three tests:

| Test | Purpose |
|------|---------|
| `test_buggy_pattern_drops_answer` | Documents the old behaviour:
blank-override empties the final answer |
| `test_fixed_pattern_preserves_decorated_answer` | Core invariant:
final event carries the decorated text from `decorate_answer()` |
| `test_final_event_reference_matches_decorated_result` | Citation
markers in the answer must match the pruned `doc_aggs` in the same event
|

Local run result:

```
test_dialog_service_final_answer.py::test_buggy_pattern_drops_answer         PASSED
test_dialog_service_final_answer.py::test_fixed_pattern_preserves_decorated_answer PASSED
test_dialog_service_final_answer.py::test_final_event_reference_matches_decorated_result PASSED

3 passed in 0.04s
```

`ruff check` passes with no issues on all changed files.

---------

Co-authored-by: edenfunf <edenfunf@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-15 14:10:36 +08:00
balibabu
f08d13287a Feat: Edit the code of the code operator from a broad perspective. (#14116)
### What problem does this PR solve?

Feat: Edit the code of the code operator from a broad perspective.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-15 11:51:17 +08:00
chanx
2d291cd841 fix(flow): Fix text descriptions for multi-column layout options. (#14107)
### What problem does this PR solve?

fix(flow): Fix text descriptions for multi-column layout options.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-15 11:50:58 +08:00
Jin Hai
a0a4029f01 Fix document (#14118)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-15 11:35:16 +08:00
Jack
bc5f78996b Consolidateion of document upload API (#14106)
### What problem does this PR solve?

Consolidation WEB API & HTTP API for document upload

Before consolidation
Web API: POST /v1/document/upload
Http API - POST /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- POST
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-15 11:27:43 +08:00
xinmotlanthua
e1dede1366 fix(web): replace hardcoded English strings with i18n in floating chat widget (#14095)
## Summary
- Replace 3 hardcoded English strings in `floating-chat-widget.tsx` with
`react-i18next` `t()` calls so the widget respects the `locale` query
parameter
- Add `useTranslation` hook to the component
- Add translation keys (`chat.chatSupport`, `chat.replyInstantly`,
`chat.typeYourMessage`) to all 14 locale files

## Strings replaced
| Original | i18n key |
|---|---|
| `'Chat Support'` | `t('chat.chatSupport')` |
| `'We typically reply instantly'` | `t('chat.replyInstantly')` |
| `'Type your message...'` | `t('chat.typeYourMessage')` |

Closes #14072

Co-authored-by: khanhkhanhlele <namkhanh2172@gmail.com>
2026-04-14 20:12:56 +08:00
akie
a98b64326c Add warning log when metadata query hits 10000 result limit (#14109)
## What problem does this PR solve?

Add a warning log when `get_flatted_meta_by_kbs` returns 10,000 results,
which indicates the query limit has been reached and metadata may be
silently truncated.


## Type of change
- [x] Improvement (non-breaking change which improves observability)
2026-04-14 20:04:32 +08:00
NeedmeFordev
1a1b5aa53e Fix: respect the internet toggle before running Tavily web search (#14051) (#14052)
### What problem does this PR solve?

Fixes #14051.

The chat UI already sends an `internet` flag with each request, but the
backend previously triggered Tavily web retrieval whenever
`prompt_config.tavily_api_key` was configured. As a result, web search
could still run even when the internet toggle was off.

This PR makes web search an explicit opt-in at request time:
- `tavily_api_key` only indicates that web search is available
- Tavily retrieval runs only when `internet` is explicitly enabled
- the same behavior now applies to both the normal retrieval path and
the deep-research / reasoning path

This also fixes the no-KB fallback case so chats without KBs fall back
to normal solo chat when `internet` is off.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 19:55:20 +08:00
Jin Hai
8e9cef3687 Remove unused API (#14046)
### What problem does this PR solve?

1. Remove unused token related API
2. Fix typo

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-14 19:32:16 +08:00
chanx
912fedc9b9 Fix: metadata bug (#14105)
### What problem does this PR solve?

Fix: metadata bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 18:45:09 +08:00
writinwaters
1c0c1f27ef Doc: Updated FAQ (#14108)
### What problem does this PR solve?

Updated frequently asked questions.

### Type of change


- [x] Documentation Update
2026-04-14 18:42:16 +08:00
balibabu
1bc4868abe Fix: The file count in the file header did not change after uploading or deleting files. (#14034)
### What problem does this PR solve?
Fix: The file count in the file header did not change after uploading or
deleting files.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-14 18:07:32 +08:00
Jack
576431de99 Refactor: Change update doc from PUT to patch (#14067)
### What problem does this PR solve?

Before change, update_document in api/apps/restful_apis/document_api.py
is using "PUT".
After change, it will use "PATCH" which is more suitable.

### Type of change

- [x] Refactoring
2026-04-14 17:12:23 +08:00
Qi Wang
57aec2e65d Fix bug: run Knowledge graph or RAPTOR, it will update an existing task (#14102)
### What problem does this PR solve?

It fixed the bug: https://github.com/infiniflow/ragflow/issues/14101
When run Knowledge graph or RAPTOR, the last document running status
will be wrongly set, see below:
It should never touch existing document result.

![Image](https://github.com/user-attachments/assets/14fe1f9e-0541-4093-8111-ed0bd25b87ba)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 16:37:41 +08:00
balibabu
27ebc64ec0 Feat: Adapted for the upgraded knowledge graph of @antv/g6. (#14103)
### What problem does this PR solve?

Feat: Adapted for the upgraded knowledge graph of @antv/g6.

### Type of change

- [x] Refactoring
2026-04-14 16:33:52 +08:00
Magicbook1108
1376c004a9 Fix: update docs generator (#14070)
### What problem does this PR solve?

Refactor: update docs generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

1. Support multiple document generator components and correctly display
messages in the message component. The document generator will not
overwrite other messages.

<img width="700" alt="Screenshot from 2026-04-13 13-56-17"
src="https://github.com/user-attachments/assets/3f3e06e8-33ce-4df1-8b05-510c86af70a4"
/>

2. Support Chinese content and ensure correct Markdown rendering in PDF
and DOCX
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/69bf1f7b-261d-48e5-a9f3-8e94462b90ed"
/>

3. Simplify configuration page and support more output format
 
<img height="700" alt="image"
src="https://github.com/user-attachments/assets/8647374c-c055-4daa-ad71-cd9052eb138e"
/>

4. Hide download from other components except for message 
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/a723dfcb-b60d-4eb5-b2f6-d41ca5955eb4"
/>

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/a8762ac4-807b-4f0b-9287-65f82f7c9c98"
/>

5. Sanitize filename
 
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/df49509f-37c0-40f9-b03d-bd6ce7fdefa8"
/>


6. And more changes on usability
2026-04-14 15:24:43 +08:00
chanx
1031aebc8f feat(file): Add file ancestor directory lookup feature by go (#14037)
### What problem does this PR solve?

feat(file): Add file ancestor directory lookup feature by go

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-14 15:22:03 +08:00
chanx
6aec8058bb refactor: Remove knowledge base-related API handlers that are already included in the dataset. (#14094)
### What problem does this PR solve?

refactor: Remove knowledge base-related API handlers that are already
included in the dataset.

### Type of change

- [x] Refactoring
2026-04-14 15:19:31 +08:00
Jin Hai
2b6c50734f Sync code from EE (#14080)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-14 15:03:46 +08:00
Ricardo-M-L
c22811f096 fix: close file handles in json.load() calls in resume parser (#14061)
## Summary
- Replace `json.load(open(...))` with `with open(...) as f:
json.load(f)` in 2 resume parser files
- Fixes 4 leaked file descriptors in `corporations.py` (3) and
`schools.py` (1)

## Why
In a long-running server process like RAGFlow, leaked file handles can
accumulate and hit the OS file descriptor limit (`OSError: [Errno 24]
Too many open files`). The other instances mentioned in the issue
(`infinity_conn_base.py` and `init_data.py`) have already been fixed.

## Test plan
- [x] Verified affected files use `with` statement after fix
- [x] Grep confirms no remaining `json.load(open(` patterns in codebase

Fixes #13996

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:43:58 +08:00
Idriss Sbaaoui
de6a8e789a Fix: rerank overflow by enforcing top_k and 64 cap (#14084)
### What problem does this PR solve?

This fixes rerank overflow where retrieval could send more documents
than allowed (for example 66 when `page_size=6`), causing provider 400
errors and bypassing the user’s `top_k` intent in rerank-enabled paths.
this pr fixes #14081

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 10:47:25 +08:00
Idriss Sbaaoui
d6987b4d8f Fix p3 ci fails (#14069)
### What problem does this PR solve?

fix issue with stale tests on p3 level

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 10:47:07 +08:00
balibabu
d2b744facd Fix: The indented tree text generated on the search page overlaps. #14077 (#14078)
### What problem does this PR solve?

Fix: The indented tree text generated on the search page overlaps.
#14077

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-14 10:02:00 +08:00
Magicbook1108
8723c3aa86 Feat: more templates (#14075)
### What problem does this PR solve?

Feat: more templates
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/533e88f1-fc56-4337-a026-6623fc978893"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2026-04-14 10:00:55 +08:00
chanx
6ffa566ec3 Refactor: Standardize naming convention to camelCase (#14079)
### What problem does this PR solve?

Refactor: Standardize naming convention to camelCase

### Type of change

- [x] Refactoring
2026-04-13 21:07:07 +08:00
balibabu
9a38af7cbf Feat: Hide the download button embedded in the agent page. (#14083)
### What problem does this PR solve?

Feat: Hide the download button embedded in the agent page.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-13 21:06:41 +08:00
Syed Shahmeer Ali
c7ce062ea8 Fix: model_type not passed in ensure_tenant_model_id_for_params causing wrong tenant model lookup (#13782)
Summary

When setting a default model for an OpenAI-API-Compatible provider,
ensure_tenant_model_id_for_params called get_api_key
without a model_type filter. If the same model name was registered under
multiple types (e.g., both chat and embedding),
it could return the wrong tenant_llm_id, leading to Model(@None) not
authorized errors during chat.

This applies the same type-scoped fix that PR #13569 introduced in
get_model_config_by_type_and_name — now consistently
  in tenant_utils.py as well.

  Changes

  - Added _KEY_TO_MODEL_TYPE mapping in tenant_utils.py
- Each model key (llm_id, embd_id, etc.) now passes its correct LLMType
to get_api_key

  Fixes #13775
2026-04-13 20:57:28 +08:00
天海蒼灆
356d45fda1 Feat: add cell type coercion for Excel export (#13808)
### What problem does this PR solve?

- Implemented a helper function to convert markdown cell text to native
numeric types for Excel output.
- Ensured that leading zeros are preserved and handled various numeric
formats, including those with thousand separators and scientific
notation.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-13 20:54:57 +08:00
Lynn
47d3741dcc Feat: migrate script (#14076)
### What problem does this PR solve?

Add command line arguments for mysql config.

### Type of change

- [x] Other (please describe): tool scripts.
2026-04-13 20:45:11 +08:00
bitloi
853021ff2a feat: support multiple canvas_types for agent templates and remove duplicate files (#14030)
### What problem does this PR solve?

Closes #13907

The template catalog had duplicate files (e.g. `*_r.json`) only to place
the same template into multiple sidebar groups.
This increases maintenance cost and makes template updates error-prone.

This PR adds first-class support for multiple template categories in a
single file via `canvas_types`, then removes duplicate template files.

What changed:
- Added `canvas_types` to `CanvasTemplate` model and DB migration.
- Added normalization logic when loading templates:
  - accepts legacy `canvas_type`
  - accepts new `canvas_types`
  - merges/deduplicates values
- preserves backward compatibility by keeping `canvas_type` as first
normalized value.
- Updated template import flow to load only `.json` files and in stable
sorted order.
- Updated frontend template filtering to match on `canvas_types` first,
with fallback to legacy `canvas_type`.
- Consolidated duplicated template pairs into single files and removed:
  - `deep_search_r.json`
  - `reflective_academic_paper_generator_r.json`
  - `seo_article_writer_r.json`
- Added regression/edge-case tests for category normalization and route
serialization expectations.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-04-13 20:26:30 +08:00
writinwaters
ef07faea80 Doc: Updated frequently asked questions and answers. (#14085)
### What problem does this PR solve?

Updated frequently asked questions. 

### Type of change

- [x] Documentation Update
2026-04-13 20:26:16 +08:00
Tong Liu
6fdca2d212 [Security] Fix jinja2 SSTI vulnerability using SandboxedEnvironment (#14068) 2026-04-13 19:24:13 +08:00
balibabu
a023305b96 Fix: The chat page is not displaying the meta tags. (#14071)
### What problem does this PR solve?

Fix: The chat page is not displaying the meta tags.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-13 16:18:25 +08:00
Krishna Chaitanya
5ece2d8aa8 Fix: upgrade Apache Tika from 3.2.3 to 3.3.0 to address GHSA-72hv-8253-57qq (#13769)
### What problem does this PR solve?

Upgrades Apache Tika from 3.2.3 to 3.3.0 to address the security
vulnerability GHSA-72hv-8253-57qq (TIKA-4687).

Closes #13601

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

- `Dockerfile`: Updated tika JAR filename and `TIKA_SERVER_JAR` env var
from 3.2.3 to 3.3.0
- `Dockerfile.deps`: Updated tika JAR filename in COPY instruction from
3.2.3 to 3.3.0
- `download_deps.py`: Updated both Maven Central and Huawei Cloud mirror
download URLs from 3.2.3 to 3.3.0

### References

- Apache Tika 3.3.0 release:
https://www.apache.org/dyn/closer.lua/tika/3.3.0/tika-app-3.3.0.jar
- TIKA-4687: https://issues.apache.org/jira/browse/TIKA-4687
- GHSA-72hv-8253-57qq
2026-04-13 16:01:08 +08:00
Jin Hai
3e787b3b09 Go: update search (#14023)
### What problem does this PR solve?

Update search

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-13 15:07:04 +08:00
Yongteng Lei
1638083e18 Fix: sandbox cannot accept large args list (#14063)
### What problem does this PR solve?

Sandbox cannot accept large args list.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-13 14:14:08 +08:00
Jack
51ce6aab01 Consolidate set_meta into update_document (#14045)
### What problem does this PR solve?

Consolidate "set_meta" API into "update_document" .

Before consolidation
Web API: POST /api/v1/document/set_meta
Http API - PUT /v1/datasets/<dataset_id>/document/<document_id>

After consolidation, Restful API -- PUT
/v1/datasets/<dataset_id>/document/<document_id>

### Type of change

- [x] Refactoring
2026-04-13 12:47:17 +08:00
akie
3911d90993 Fix: agent application can not show Cite (#14047)
Close #14018

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Problem
In Agent applications, even with the cite option enabled, only inline
[ID: x] citation markers are visible (showing chunk content on hover).
The Agent does not display the referenced file cards below the response,
unlike Chat applications.

### Root Cause
The Agent's Retrieval tool (agent/tools/retrieval.py) calls
retriever.retrieval() with aggs=False, which means the retrieval results
do not include doc_aggs (document aggregation) data. Without doc_aggs,
the frontend ReferenceDocumentList component has no data to render the
file cards.

In contrast, the Chat application (api/db/services/dialog_service.py)
calls the same retriever.retrieval() method with aggs=True.

### Fix
Changed aggs=False to aggs=True in agent/tools/retrieval.py so that
document aggregation data is returned along with the retrieved chunks.
2026-04-13 11:06:14 +08:00
writinwaters
52442c8eb5 Docs: Added a guide on adding Github repo as data source (#14048)
### What problem does this PR solve?

Added a guide on adding Github repo as data source

### Type of change


- [x] Documentation Update
2026-04-10 21:32:26 +08:00
balibabu
462be53b76 Fix: When creating a dataset, if no chunk_method is selected, there is no indication that this is a required field. (#14039)
### What problem does this PR solve?
Fix: When creating a dataset, if no `chunk_method` is selected, there is
no indication that this is a required field.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-10 19:05:14 +08:00
Magicbook1108
82d74fd276 Refact: update pipeline template (#14036)
### What problem does this PR solve?

Refact: update pipeline template

### Type of change

- [x] Refactoring
2026-04-10 19:04:52 +08:00
Jack
4046a4cfb6 Consolidateion metadata summary API (#14031)
### What problem does this PR solve?

Consolidation WEB API & HTTP API for document metadata summary

Before consolidation
Web API: POST /api/v1/document/metadata/summary
Http API - GET /v1/datasets/<dataset_id>/metadata/summary

After consolidation, Restful API -- GET
/v1/datasets/<dataset_id>/metadata/summary

### Type of change

- [x] Refactoring
2026-04-10 18:41:30 +08:00
balibabu
11c89d87da Fix: The dataset on the search page is not displaying the required field error message. (#14041)
### What problem does this PR solve?

Fix: The dataset on the search page is not displaying the required field
error message.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-10 18:20:50 +08:00
Zhichang Yu
a9ca4ea1a1 Disable flask and quart debug (#14042)
### What problem does this PR solve?

Visit
`http://127.0.0.1:9381/?__debugger__=yes&cmd=resource&f=debugger.js`
will expose the flask code:
```
docReady(() => {
  if (!EVALEX_TRUSTED) {
    initPinBox();
  }
  // if we are in console mode, show the console.
  if (CONSOLE_MODE && EVALEX) {
    createInteractiveConsole();
  }

  const frames = document.querySelectorAll("div.traceback div.frame");
  if (EVALEX) {
    addConsoleIconToFrames(frames);
  }
  addEventListenersToElements(document.querySelectorAll("div.detail"), "click", () =>
    document.querySelector("div.traceback").scrollIntoView(false)
  );
  addToggleFrameTraceback(frames);
  addToggleTraceTypesOnClick(document.querySelectorAll("h2.traceback"));
  addInfoPrompt(document.querySelectorAll("span.nojavascript"));
  wrapPlainTraceback();
});

function addToggleFrameTraceback(frames) {
  frames.forEach((frame) => {
    frame.addEventListener("click", () => {
      frame.getElementsByTagName("pre")[0].parentElement.classList.toggle("expanded");
    });
  })
}

```

### Type of change

- [x] Other (please describe): Fix security risk
2026-04-10 18:01:49 +08:00
Jin Hai
cfc2928de2 Go: remove unused API route (#14028)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-10 18:00:41 +08:00
Jin Hai
3d59448b0d Go: add parameter parsing of list chats (#14026)
### What problem does this PR solve?

As title.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-10 14:33:32 +08:00
Magicbook1108
18cafff790 Fix: markdown parser in pipeline (#14032)
### What problem does this PR solve?

Fix: markdown parser in pipeline

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-10 14:11:14 +08:00
Magicbook1108
9ce293a736 Refact: update exesql notification (#14027)
### What problem does this PR solve?

Refact: update exesql notification

### Type of change


- [x] Refactoring
2026-04-10 13:42:57 +08:00
Magicbook1108
87a87a7122 Feat: pipeline support ONE chunking method (#14024)
### What problem does this PR solve?

Feat: pipeline support ONE chunking method

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-10 13:11:22 +08:00
Jin Hai
a37605cbd2 Go: add get chat (#14025)
### What problem does this PR solve?

As title

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-10 13:06:51 +08:00
eason
aa92abe73c fix: close file handles properly in json.load() calls (#13997)
## Summary

Fixes #13996

Replace `json.load(open(...))` with `with open(...) as f: json.load(f)`
in two files to ensure file descriptors are properly closed.

**Affected files:**
- `common/doc_store/infinity_conn_base.py` — schema loading for Infinity
doc store
- `api/db/init_data.py` — agent template loading at startup

## Why this matters

In a long-running server process like RAGFlow, leaked file descriptors
from `json.load(open(...))` can accumulate over time. While CPython's
refcounting usually cleans these up, it's not guaranteed (especially
under memory pressure or with alternative Python runtimes), and can lead
to `OSError: [Errno 24] Too many open files`.

## Test plan

- [ ] Verify Infinity doc store schema loading still works correctly
- [ ] Verify agent templates load correctly on startup

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved file handling in internal data processing to ensure proper
resource cleanup.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: easonysliu <easonysliu@tencent.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:16:49 +08:00
chanx
4538910b52 feat: Implement file-related functionality (#14011)
### What problem does this PR solve?

feat: Implement file-related functionality

- Implement file deletion API and business logic
- Add context support for file deletion operations and prevent root
folder deletion
-  Implement file move functionality
-  Add File Download API Endpoints and Utility Functions

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-10 12:15:27 +08:00
corevibe555
e7d044413f Fix: Google Drive connector missing new files after initial sync (#13943)
Closes https://github.com/infiniflow/ragflow/issues/13939

## What problem does this PR solve?

The Google Drive connector fails to detect new files after the initial
sync (#13939). The root cause is that `generate_time_range_filter()`
applies a strict `modifiedTime > poll_range_start` cutoff when querying
the Google Drive API. Files uploaded to Google Drive that retain their
original `modifiedTime` (common behavior) get silently excluded if their
timestamp predates the last sync's cutoff.

Unlike the Confluence and Jira connectors which use a configurable time
buffer (`CONFLUENCE_SYNC_TIME_BUFFER_SECONDS`) to offset
`poll_range_start` backward, the Google Drive connector had no such
mechanism — resulting in a razor-sharp timestamp boundary with zero
tolerance for overlap.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


## Summary

* **New Features**
* Added a configurable time buffer for Google Drive synchronization to
address timing delays and improve sync reliability.
* Improved file detection logic to include recently created files
alongside modified ones, reducing missed synchronizations.
2026-04-10 11:39:19 +08:00
NeedmeFordev
7315d25cbc Fix retrieval API handling for omitted dataset IDs (#13990)
### What problem does this PR solve?

This PR fixes a mismatch between the MCP retrieval contract and the
backend retrieval API.

`ragflow_retrieval` already describes `dataset_ids` as optional, but
`/api/v1/retrieval` still rejected omitted or empty `dataset_ids` with
`` `dataset_ids` is required. ``. That made MCP retrieval fail even
though the tool schema promised that the request could search across all
available datasets.

This change updates `/api/v1/retrieval` to accept missing or empty
`dataset_ids`, resolve all accessible datasets for the authenticated
user, and keep the route schema aligned with the new runtime behavior.
It also adds focused unit coverage for the fallback resolution path and
the no-accessible-datasets case.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Fixes: #13981

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved dataset resolution to reliably discover all accessible
datasets through proper pagination, replacing the previous parsing
method.
* Enhanced error handling with clearer messaging when no datasets are
available for retrieval.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-10 11:34:15 +08:00
Magicbook1108
27329b40ed Refact: refact on parser structure (#14012)
### What problem does this PR solve?

Refact: refact on parser structure

### Type of change

- [x] Refactoring
2026-04-10 10:03:44 +08:00
Jin Hai
cd04467b9b Go: add delete search (#14014)
### What problem does this PR solve?

As title.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-10 09:42:37 +08:00
balibabu
56810ec5a3 Fix: The knowledge base selected by the retrieval node is not displayed. (#14013)
### What problem does this PR solve?

Fix: The knowledge base selected by the retrieval node is not displayed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-10 09:40:35 +08:00
Magicbook1108
52f5880d21 Fix: support vlm fall back in pipeline (#14007)
### What problem does this PR solve?

Fix: support vlm fall back in pipeline for img/table parsing

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-09 20:20:11 +08:00
Jin Hai
5951e2b564 Go: Add create search (#13998)
### What problem does this PR solve?

As title

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-09 20:04:06 +08:00
Yongteng Lei
b33d2fdea5 Refa: GraphRAG to use async chat methods instead of thread pool execution (#14002)
### What problem does this PR solve?

GraphRAG _async_chat.

### Type of change

- [x] Refactoring
- [x] Performance Improvement


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Unified chat calls to an async invocation across extractors, improving
timeout handling and ensuring task IDs propagate reliably.
* **Tests**
* Added and expanded unit tests and mocks to cover extractor behavior,
timeout scenarios, and safe test-package imports, reducing regression
risk.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-09 19:57:35 +08:00
Octopus
c2ce49e037 fix: strip single quotes from synonym terms to prevent Infinity TokenError (#13969)
Fixes #13823

## Problem

When querying with words like `cat`, RAGFlow's query expansion system
looks up synonyms via WordNet, which can return terms containing single
quotes (e.g., `cat-o'-nine-tails`). When using Infinity as the document
store, these unescaped single quotes in the query string cause a
`TokenError` because Infinity's lexer treats `'` as a string delimiter.

```
TokenError: Error tokenizing ' OR "big cat" OR "computerized tomography")^0.7)': Missing ' from 1:531
```

## Solution

Strip single quotes from synonym terms before they are inserted into
query expressions, consistent with how single quotes are already
stripped from the input query text (line 51 of `query.py`):

- **`common/query_base.py`**: In `sub_special_char()`, strip `'` before
escaping other special characters. This fixes the Chinese text
processing path and the `paragraph()` method.
- **`rag/nlp/query.py`**: In the English text path, strip `'` from
tokenized synonym terms.
- **`memory/services/query.py`**: Same fix for the memory query English
text path.

## Testing

The fix can be verified by:
1. Using Infinity as the document store (`DOC_ENGINE=infinity`)
2. Creating a dataset and running a retrieval test with the keyword
`cat`
3. Confirming no `TokenError` is raised and results are returned
normally

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Enhanced special character handling in query processing and synonym
expansion by properly sanitizing single quotes before text processing.
* Simplified OCR detection output by removing timing metadata while
preserving core detection accuracy.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ximi <octo-patch@github.com>
2026-04-09 19:10:34 +08:00
Jin Hai
e2b879b258 Fix tiny issues (#14006)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Improved authentication error logging to better distinguish between
JWT and API token failures.
* Enhanced code documentation with clarifying comments for better
maintainability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-09 19:01:36 +08:00
balibabu
3c5a3e5fb4 Feat: Integrate the name, avatar, and description of chat and search into a single component. (#14008)
### What problem does this PR solve?

Feat: Integrate the name, avatar, and description of chat and search
into a single component.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Inline-editable avatar, name, and description fields
  * Expandable content blocks in search results
  * New RAGFlow heading/logo component

* **Refactor**
* Replaced scattered form fields with a composed Avatar/Name/Description
component
  * Mindmap drawer converted to a sheet-based drawer and layout cleanup
* Simplified search page controls and layout; improved scroll viewport
handling

* **Chores**
* Added/updated English and Chinese localization keys (placeholders,
view more/less)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-09 18:51:45 +08:00
eviaaaaa
1e83c8c051 Fix: align MCP tool call timeout and handle empty content (#13899)
### What problem does this PR solve?
Resolves #12105
This PR fixes two MCP tool call issues in
`common/mcp_tool_call_conn.py`.
First, the timeout passed to `tool_call(..., timeout=...)` was only
applied to the outer `future.result(...)` wait, but was not forwarded to
the internal MCP request. As a result, callers could pass a longer
timeout while the actual MCP request still failed after the default
internal timeout.
Second, the MCP tool call result handling assumed `result.content[0]`
always existed. If an MCP server returned an empty content list, this
could raise an exception unexpectedly.
This PR fixes both issues by:
- forwarding the external `timeout` value to the internal MCP request
timeout
- returning a clear message when the MCP server returns empty content
instead of indexing into an empty list

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update                                             
- [ ] Refactoring                               
- [ ] Performance Improvement
- [ ] Other (please describe)
2026-04-09 18:44:04 +08:00
Zhichang Yu
b7744e053e fix: support dense_vector from ES fields response (ES 9.x compatibility) (#13972)
fix: support dense_vector from ES fields response (ES 9.x compatibility)

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Configuration Chore (non-breaking change which updates
configuration)


## Summary by CodeRabbit

* **Bug Fixes**
* More accurate handling and unwrapping of dense-vector fields so
returned values have correct shapes.
* Field selection reliably limits returned data and falls back to
alternate result locations when needed.
* Use of consistent result IDs and tolerant handling when score values
are missing.

* **Chores / Configuration**
* Increased build memory and adjusted build-time flags for the frontend
build.
* Simplified runtime model/GPU checks and removed an automated runtime
GPU-install attempt.

* **Build Fixes**
* `web/vite.config.ts`: make `build.minify` and `build.sourcemap`
respect `VITE_MINIFY` and `VITE_BUILD_SOURCEMAP` env vars from
Dockerfile instead of hardcoding `terser` and `true`.

* **Environment**
* Allow stack version override and default the runtime image tag to
"latest".

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Correct unwrapping of dense-vector fields and reliable field selection
with fallback locations.
* Consistent use of hit-level IDs and tolerant handling when score
values are missing.

* **Chores / Configuration**
* Increased frontend build memory and added build-time minify/sourcemap
flags; build minification and sourcemap now configurable.
* Removed runtime GPU detection for model initialization; force CPU
initialization.

* **Environment**
* Allow stack version override and default runtime image tag to
"latest".

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 17:44:13 +08:00
Magicbook1108
107fe6cf90 Feat: support doc for pipeline parser in word (#14005)
### What problem does this PR solve?

Feat: support doc for pipeline parser in word

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added support for processing legacy Word `.doc` file formats,
extending document compatibility.

* **Bug Fixes**
* Enhanced error handling during document parsing to improve reliability
and prevent processing failures.
2026-04-09 16:40:42 +08:00
Magicbook1108
8d52ef2893 Feat: enable sync deleted files for connector (#14000)
### What problem does this PR solve?

Feat: enable sync deleted files for connector
1. first comes with github

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added "sync deleted files" feature for data sources, enabling
automatic removal of files deleted from the source system.
* Added multilingual support for the new sync deleted files setting
across multiple languages.

* **UI Improvements**
  * Improved checkbox form field rendering and layout.
  * Enhanced full-width display for authentication token input fields.
2026-04-09 16:40:14 +08:00
Jack
577c96bf2a Refactor: Merge document update API (#13962)
### What problem does this PR solve?

Refactor: merge document.rename into document.update_document

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a unified document update API (PUT) supporting name, metadata,
parser/chunk settings, and status changes.

* **Breaking Changes**
* Legacy single-parameter rename endpoint removed; renames now require
dataset + document identifiers.
  * `/list` now reads dataset id from a different query parameter.

* **Validation / Bug Fixes**
* Stricter meta_fields and parser-config validation; unauthenticated
requests return 401.

* **Frontend**
  * UI now sends dataset id when saving document names.

* **Tests**
* Numerous unit and HTTP tests adjusted or removed to match new API and
validations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
Co-authored-by: Qi Wang <wangq8@outlook.com>
Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
2026-04-09 11:17:38 +08:00
Ricardo-M-L
c13f8856a1 fix: correct typos in agent component filename and templates (#13930)
## Summary
- Rename misspelled file `varaiable_aggregator.py` →
`variable_aggregator.py`
- Fix `unkown` → `unknown` in template and frontend constant (3
instances)
- Fix `Finale` → `Final` in customer feedback template (2 instances)

## Test plan
- [ ] Verify variable aggregator component loads correctly
- [ ] Verify agent templates render properly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-09 11:06:01 +08:00
Lynn
dbfb439239 Feat: migrate script (#13976)
### What problem does this PR solve?

Add stage for migrate tenant_llm data into table tenant_model_instance
and tenant_model.

### Type of change

- [x] Other (please describe): tool script


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Added two new migration stages to move tenant model and instance
records into new target tables, with dry-run, full-execute, and "create
table only" modes; migration skips already-migrated rows to avoid
duplicates.
* **Bug Fixes**
  * Cleaned up migration header logging for clearer output.
* **Documentation**
* Added usage guide describing stages, options, modes, config format,
examples, and expected logs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-09 11:03:39 +08:00
Magicbook1108
c5871c1078 Fix: dsl import/export (#13992)
### What problem does this PR solve?

Fix: dsl import/export
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Enhanced JSON import functionality for agents to automatically
populate components from imported graph structures.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 10:55:22 +08:00
qinling0210
82fa85c837 Implement Delete in GO and refactor functions (#13974)
### What problem does this PR solve?

Implement Delete in GO and refactor functions

### Type of change

- [x] Refactoring

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a remove_chunks command to delete specific or all chunks from a
document.
  * Added new endpoints for chunk removal and chunk update.

* **Refactor**
* Renamed index commands to dataset/metadata table terminology and
updated REST routes accordingly.
* Updated chunk update flow to a JSON POST style and improved metadata
error messages.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-04-09 09:52:31 +08:00
Jack
3b7723855c Fix: revert xgboost version to 1.6.0 (#13984)
### What problem does this PR solve?

Revert xgboost version to 1.6.0

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated xgboost dependency from version 3.2.0 to 1.6.0

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 19:53:47 +08:00
Jin Hai
5fe6f7c9ac Go CLI: Add list configs and set log level command (#13983)
### What problem does this PR solve?

1. list configs
2. set log level debug/info/warn/error/fatal/panic

```

RAGFlow(user)> list configs;
+--------------------+-----------------------+
| key                | value                 |
+--------------------+-----------------------+
| redis_host         | localhost:6379        |
| doc_engine         | elasticsearch         |
| elasticsearch_host | http://localhost:1200 |
| log_level          | info                  |
| database           | mysql                 |
| database_host      | localhost:3306        |
| admin              | 0.0.0.0:9383          |
| storage_engine     | minio                 |
| minio_host         | localhost:9000        |
+--------------------+-----------------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added `LIST CONFIGS` command to view system configuration details
(Redis, database, log level, storage engine, and host settings).
* Added `SET LOG LEVEL` command to adjust logging verbosity at runtime.

* **Improvements**
* Enhanced log level configuration defaults and runtime state
management.
* Reorganized token management and system endpoints under `/system/`
routes for better API organization.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 19:32:53 +08:00
balibabu
86900dca99 Refactor: Remove unused API code (#13978)
### What problem does this PR solve?

Refactor: Remove unused API code

### Type of change


- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Style**
* Updated table header styling in dataset settings by removing a
hard-coded background color class, allowing the header to use default or
inherited component styling instead.

* **Refactor**
* Removed token management endpoints from the API service. Token
creation, listing, and removal functions are no longer available.
  * Removed the statistics data endpoint from available API routes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 18:46:08 +08:00
balibabu
c0c3287af4 Fix: Error message: Use 'const' instead. (#13982)
### What problem does this PR solve?

Fix: Linter error message: Use 'const' instead.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Updated variable declarations across form components, agent utilities,
memory management hooks, and data handling functions to enhance code
consistency and maintainability throughout the application codebase.

* **Style**
* Added ESLint suppressions to document intentional constant-condition
patterns in asynchronous event streaming operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 18:13:14 +08:00
Yongteng Lei
3064895bbb Fix: import error in sandbox provider (#13971)
### What problem does this PR solve?

Fix import error in sandbox provider.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated internal configuration import mechanism for sandbox provider
initialization. No end-user impact.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 15:35:30 +08:00
Jin Hai
fa75aee3b9 Refactor system API (#13958)
### What problem does this PR solve?

- ping
- token
- log level

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* System endpoints consolidated under /api/v1/system: ping, health
check, and token management moved to the centralized API surface.
* Token management unified at /api/v1/system/tokens with
list/create/delete behavior.

* **Documentation**
  * API reference updated to reflect the new /api/v1/system paths.

* **Tests**
* Client fixtures and test utilities updated to use
/api/v1/system/tokens; one unit test for health/oceanbase status
removed.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 15:26:18 +08:00
Jin Hai
ad789f5c43 Fix list files (#13960)
### What problem does this PR solve?

As title.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Standardized the query parameter used when listing documents so
listings behave consistently across the web and client interfaces.
* Clarified the error message shown when a required dataset ID is
missing to give clearer guidance to users.

* **Tests**
* Updated test coverage to reflect the standardized dataset identifier
usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 13:38:30 +08:00
balibabu
b8764cfa11 Fix: The document management table cannot be displayed. (#13967)
### What problem does this PR solve?

Fix: The document management table cannot be displayed.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved table layout and overflow behavior in the files view to
ensure proper scrolling and display.

* **Chores**
* Removed unused system status functionality and cleaned up service
methods.
  * Updated TypeScript configuration for compatibility.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 11:37:27 +08:00
dataCenter430
62a1333cf2 Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940)
…
### What problem does this PR solve?

Closes #13857

Parent-child chunking was introduced in v0.23.0 but is only configurable
through the web UI. Users managing datasets programmatically cannot
enable it via the HTTP API or Python SDK because `ParserConfig` uses
`extra="forbid"`, rejecting the `children_delimiter` field at
validation.

### What does this PR change?

Adds a `parent_child` nested config to `ParserConfig`, following the
same pattern as `raptor` and `graphrag`:

```json
"parser_config": {
  "parent_child": {
    "use_parent_child": true,
    "children_delimiter": "\n"
  }
}
```

- api/utils/validation_utils.py — new ParentChildConfig model, added to
ParserConfig
- api/utils/api_utils.py — naive defaults + flatten to
children_delimiter for the execution layer
- api/apps/services/dataset_api_service.py — flatten on the update path
- test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG
-
test/testcases/test_http_api/test_dataset_management/test_create_dataset.py
— 4 valid + 2 invalid test cases

No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py).
Existing UI flow via ext is unaffected.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added parent-child chunking configuration for dataset creation and
updates with new `use_parent_child` toggle and customizable
`children_delimiter` setting to specify how parent chunks are split into
child chunks.

* **Documentation**
* Updated HTTP and Python API references with parent-child chunking
configuration details and examples.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 11:36:57 +08:00
Qi Wang
0ced071a0b Use uv run python3 x.py instead of uv run x.py (#13966)
### Use uv run python3 x.py instead of uv run x.py

When directly call `uv run x.py` it will use the python in shebang, it
does not work if the default python lack of some packages, so change it
to best practices `uv run python3 x.py`

### Type of change

- [x] Documentation Update

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Documentation**
* Updated development setup instructions across all README files
(English and multiple language translations) to use explicit Python
interpreter invocation for the dependency download command.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 10:33:46 +08:00
MkDev11
cfee2bc9db feat: Auto-adjust chunk recall weights based on user feedback (#12689)
### What problem does this PR solve?

Implements automatic adjustment of knowledge base chunk recall weights
based on user feedback (upvotes/downvotes). When users upvote or
downvote a response, the system locates the corresponding knowledge
snippets and adjusts their recall weight to improve future retrieval
quality.

**Closes #12670**

**How it works:**
1. User upvotes/downvotes a response via `POST /thumbup`
2. System extracts chunk IDs from the conversation reference
3. For each referenced chunk:
   - Reads current `pagerank_fea` value from document store
   - Increments (+1) for upvote or decrements (-1) for downvote
   - Clamps weight to [0, 100] range
   - Updates chunk in ES/Infinity/OceanBase
4. Future retrievals score these chunks higher/lower based on
accumulated feedback

**Files changed:**
- `api/db/services/chunk_feedback_service.py` - New service for updating
chunk pagerank weights
- `api/apps/conversation_app.py` - Integrated feedback service into
thumbup endpoint
- `test/testcases/test_web_api/test_chunk_feedback/` - Unit tests

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Chat message feedback now updates per-chunk relevance weights
(feature-flag gated), with configurable weighting and atomic updates
across storage backends.

* **Bug Fixes**
* Stricter validation for message feedback inputs and more robust
handling of feedback transitions.

* **Tests**
* Expanded test coverage for chunk-feedback behavior, weighting
strategies, storage backends, and thumb-flip scenarios.

* **Chores**
  * CI workflow extended to run the new chunk-feedback web API tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
2026-04-08 09:52:18 +08:00
Jin Hai
4a2a17c27a Fix typos (#13961)
### What problem does this PR solve?

as title.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Internal code quality improvements with no user-facing changes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 23:16:52 +08:00
Jin Hai
931021875a Refactor system/version API to RESTful style (#13956)
### What problem does this PR solve?

Refactor version API to RESTful style. Python and go server API also
updated.
### Type of change

- [x] Refactoring



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **Refactor**
* Migrated core API endpoints to the `/api/v1/` namespace for improved
consistency and organization.
* Standardized system version, search, and chat list endpoints under the
new API versioning structure.

* **New Features**
* Added MinIO region configuration support, allowing specification of
storage engine regional settings via environment variables or
configuration files.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 19:07:47 +08:00
Yang_Ming
bc8d67ce78 feat: add region parameter support to MinIO connection (#13954)
## Summary
- Add optional `region` parameter to `Minio()` client constructor in
`rag/utils/minio_conn.py`
- Reads from `MINIO.region` in settings, defaults to `None` when not
configured
- Required by some S3-compatible storage services (e.g., AWS S3, Tencent
COS) for proper bucket access

## Motivation
When using RAGFlow with S3-compatible storage that requires a region
(such as AWS S3 or Tencent Cloud COS), the MinIO client fails to access
buckets because the `region` parameter is not passed through.

The `Minio()` Python client already supports the `region` parameter
natively — this PR simply wires it up from the RAGFlow configuration.

## Changes
- `rag/utils/minio_conn.py`: Pass `region=settings.MINIO.get("region",
None) or None` to `Minio()` constructor

## Backward Compatibility
- No breaking changes. When `region` is not configured, it defaults to
`None`, preserving the existing behavior exactly.

## Test Plan
- [ ] Verified with MinIO (no region set) — works as before
- [x] Verified with S3-compatible storage requiring region — bucket
access succeeds

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Enhanced MinIO client initialization with regional configuration
support for improved compatibility with region-specific deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Jarry Wang <code-better-life@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 16:38:23 +08:00
Jin Hai
68f665be7a CLI: Add float parsing (#13955)
### What problem does this PR solve?

Add float parsing

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 15:09:45 +08:00
Jin Hai
393efa9b7c Refactor variable of front end (#13953)
### What problem does this PR solve?

api_host -> webAPI
ExternalApi -> restAPIv1

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Updated internal API endpoint configuration to use consolidated base
URL constants for improved maintainability and consistency across the
application.

* **Chores**
* Updated server-side protocol validation for admin connectivity checks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 15:08:11 +08:00
balibabu
38acf34724 Fix: The agent selected a knowledge base, but the API returned the error: "No dataset is selected". (#13950)
### What problem does this PR solve?

Fix: The agent selected a knowledge base, but the API returned the
error: "No dataset is selected".

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-07 14:16:37 +08:00
auyua9
fa08fa2a17 docs: fix broken internal links in guides (#13935)
### What problem does this PR solve?

This fixes two broken internal documentation links in the guides:

- `docs/develop/mcp/launch_mcp_server.md` linked
`./acquire_ragflow_api_key.md`, but the target page lives one level up
as `../acquire_ragflow_api_key.md`.
- `docs/guides/dataset/run_retrieval_test.md` linked
`./construct_knowledge_graph.md`, but the actual page lives under
`./advanced/construct_knowledge_graph.md`.

These broken links make it harder to follow the MCP and retrieval-test
docs from the local docs tree.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-04-07 14:01:12 +08:00
Jin Hai
9ac5d28f06 Refactor context command (#13952)
### What problem does this PR solve?

Refactor context search command

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 13:59:27 +08:00
Ricardo-M-L
424aee5bec fix: correct typos in code comments, docstrings and docs (#13931)
## Summary
- Fix `a image` → `an image` in README and log message
- Fix `colomn` → `column` in table structure recognizer comment
- Fix `formated` → `formatted` in confluence connector docstring
- Fix `tabel of content` → `table of contents` in TOC prompt

## Test plan
- [ ] Documentation and comment changes, no functional impact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 13:05:39 +08:00
Ricardo-M-L
29cf8aba48 fix: correct typos in locale files and search hooks (#13932)
## Summary
- Fix `Refrence` → `Reference` in zh, id, zh-traditional locale files
(en.ts already correct)
- Fix `from from` → `from` and `this files` → `this file` in en.ts
- Fix variable name `reponse` → `response` in search hooks

## Test plan
- [ ] Verify UI strings display correctly
- [ ] Verify search functionality works with renamed variable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 12:26:25 +08:00
Yongteng Lei
112007243d Refa: refine code_exec component (#13925)
### What problem does this PR solve?

Refine code_exec component.

### Type of change

- [x] Refactoring
2026-04-07 11:48:29 +08:00
Jack
c4b0aaa874 Fix: #6098 - Add validation logic for parser_config when update document (#13911)
### What problem does this PR solve?

Add validation logic for parser_config.
Refactor the processing flow. Before change, validation logics and
update logics are mixed up - some validation logis executes followed by
some update logic executes and then another such
"validation-and-then-update" which is not good. After change, all
validation logic executes firstly. Update logic will be executed after
ALL validation logic executed.
Validation logic for parameters (that come from front end) will be
checked using Pydantic. For validation logic that depends on data from
DB, they will be in separate methods.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-04-07 11:33:05 +08:00
Jin Hai
5673245134 Refactor context command (#13948)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 11:30:09 +08:00
Idriss Sbaaoui
ff27ce86d6 fix: gpt-5 name-based config clearing from base chat path (#13949)
### What problem does this PR solve?

fix #13944 where OpenAI-compatible custom endpoints failed verification
when model names contained `gpt-5` becauser of incorrect name-based
handling in the Base/backend=`base` path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-07 11:24:47 +08:00
buildearth
a0be7c7ca7 Fix(connector): expose id_column, timestamp_column, metadata_columns for MySQL/PostgreSQL incremental sync (#13849)
### What problem does this PR solve?
The MySQL and PostgreSQL sync classes in `sync_data_source.py` were not
passing `id_column`, `timestamp_column`, and `metadata_columns` to
`RDBMSConnector`,
making incremental sync and document update impossible even when
configured.
   
- Without `id_column`: updated records generate new documents instead of
overwriting existing ones (doc ID is derived from content hash, so any
change produces a new ID).
- Without `timestamp_column`: `poll_source` always falls back to full
sync,
ignoring the configured time range.
- The three fields existed in the frontend default values but had no
form
inputs, so users had no way to fill them in.
### Type of change
  - [x] Bug Fix (non-breaking change which fixes an issue)        
  - [x] New Feature (non-breaking change which adds functionality)

### Changes
   
- **Backend** (`rag/svr/sync_data_source.py`): pass `id_column`,
    `timestamp_column`, and `metadata_columns` from `self.conf` to
`RDBMSConnector` for both `MySQL` and `PostgreSQL` sync classes.
- **Frontend**
(`web/src/pages/user-setting/data-source/constant/index.tsx`):
add `ID Column`, `Timestamp Column`, and `Metadata Columns` form fields
    to MySQL and PostgreSQL data source configuration UI with tooltips.

Signed-off-by: lixintao <lixintao@uniontech.com>
Co-authored-by: lixintao <lixintao@uniontech.com>
2026-04-07 10:24:30 +08:00
qinling0210
49386bc1b5 Implement UpdateDataset and UpdateMetadata in GO (#13928)
### What problem does this PR solve?

Implement UpdateDataset and UpdateMetadata in GO

Add cli:
UPDATE CHUNK <chunk_id> OF DATASET <dataset_name> SET <update_fields>
REMOVE TAGS 'tag1', 'tag2' from DATASET 'dataset_name';
SET METADATA OF DOCUMENT <doc_id> TO <meta>


### Type of change

- [ ] Refactoring
2026-04-07 09:44:51 +08:00
Lynn
60ec5880e5 Feat: mysql data migrate script (#13927)
### What problem does this PR solve?

Add a script to migrate data in tenant_llm into tenant_model_provider.

### Type of change

- [x] Other (please describe): tool script.
2026-04-03 20:01:37 +08:00
Magicbook1108
69264b3a70 Feat: Refact pipeline (#13826)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 19:26:45 +08:00
Jin Hai
6d9430a125 Add think chat to CLI (#13922)
### What problem does this PR solve?

Now user can use 'think mode' to chat with LLM

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-03 18:11:23 +08:00
Yingfeng
e518c20736 Update README (#13924)
### Type of change

- [x] Documentation Update
2026-04-03 17:29:48 +08:00
akie
35b2a714f9 Fix: tag datasets not visible in tag sets dropdown (#13921)
## Problem Description

When a user creates Dataset A using the **Tag parser** (for CSV/Excel
files with tag definitions), and then creates Dataset B, the Tag Sets
dropdown in Dataset B's Configuration page cannot display Dataset A.

### Steps to Reproduce
1. Create Dataset A with **Tag** as the chunking method
2. Upload a CSV file to Dataset A to generate tags
3. Create Dataset B
4. Navigate to Dataset B → Configuration → Tag Sets
5. **Expected**: Dataset A should appear in the dropdown
6. **Actual**: The dropdown is empty, Dataset A is not visible

---

## Root Cause Analysis

After thorough code review, **the original code logic is correct**. The
`chunk_method` field flows properly through the system:

### Data Flow

```mermaid
sequenceDiagram
    participant Frontend
    participant Pydantic
    participant API
    participant Database

    Note over Frontend,Database: Creating a Tag Dataset
    Frontend->>Pydantic: POST {chunk_method: "tag"}
    Pydantic->>API: serialization_alias converts<br/>chunk_method → parser_id
    API->>Database: INSERT {parser_id: "tag"}

    Note over Frontend,Database: Querying Datasets
    Frontend->>API: GET /api/v1/datasets
    API->>Database: SELECT parser_id, ...
    Database-->>API: Returns {parser_id: "tag"}
    API->>API: remap_dictionary_keys()<br/>parser_id → chunk_method
    API-->>Frontend: {chunk_method: "tag"}

    Note over Frontend: Filter: x.chunk_method === 'tag'
    Note over Frontend:  Match found!
```

### Field Mapping

**Location**: `api/utils/api_utils.py:657-662`
```python
DEFAULT_KEY_MAP = {
    "chunk_num": "chunk_count",
    "doc_num": "document_count",
    "parser_id": "chunk_method",  # Maps DB field to API response
    "embd_id": "embedding_model",
}
```

### Frontend Filtering (Already Correct)

**Location**:
`web/src/pages/dataset/dataset-setting/components/tag-item.tsx:24`
```typescript
const knowledgeOptions = knowledgeList
  .filter((x) => x.chunk_method === 'tag')  //  Correct field
  .map((x) => ({...}));
```

---

## Actual Issue

The most likely causes for the "bug" are:

1. **Browser Cache**: Old data cached before proper deployment
2. **Stale Data**: Datasets created before the code was fully deployed
3. **Container Not Restarted**: Changes not applied to running container

---

## Resolution

**No code changes are needed.** The existing code correctly:

1. Accepts `chunk_method` from frontend
2. Converts to `parser_id` via Pydantic serialization_alias
3. Stores in database as `parser_id`
4. Maps back to `chunk_method` in API response
5. Frontend filters by `chunk_method === 'tag'`
2026-04-03 17:29:10 +08:00
LeonTung
0b724be521 chore(templates): Update the customer feedback dispatcher template (#13919)
### What problem does this PR solve?
Update the customer feedback dispatcher template and introduce a new
operator `Variable Aggregator`.

### Type of change

- [x] Other (please describe): Template change

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-03 16:51:39 +08:00
balibabu
5b43c7cf16 Feat: Place the language configuration in web/.env for easy user configuration. (#13920)
### What problem does this PR solve?

Feat: Place the language configuration in web/.env for easy user
configuration.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-03 16:50:18 +08:00
Ricardo-M-L
354108922b fix: use f-string with separator in switch operator error message (#13915)
\`switch.py\` line 137 concatenates the operator directly after the text
without separator:
\`'Not supported operator' + operator\` → produces \`"Not supported
operatorXXX"\`

Changed to: \`f'Not supported operator: {operator}'\`
2026-04-03 16:49:28 +08:00
chanx
21af67f6f9 feat(File Management): Refactor File List API and Add Knowledge Base Document Initialization (#13914)
### What problem does this PR solve?

feat(File Management): Refactor File List API and Add Knowledge Base
Document Initialization

- Migrate the file list API endpoint from `/v1/file/list` to
`/api/v1/files` to align with the Python implementation.
- Add logic for initializing knowledge base documents; automatically
create the `.knowledgebase` folder and associated documents when
retrieving the root directory.
- Enhance parameter validation and error handling, including the
introduction of a new `CodeParamError` error code.
- Optimize the file list response structure to match the implementation
on the Python side.
- Update the Vite configuration to support proxying the new
`/api/v1/files` endpoint.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-03 15:08:43 +08:00
writinwaters
6263857c1e Agent templates regrouped and renamed (#13873)
### What problem does this PR solve?

Regrouped and renamed agent templates to increase user engagement.

### Type of change


- [x] Refactoring
2026-04-03 13:43:25 +08:00
Zhichang Yu
ab358fe949 feat: make Azure cloud authority configurable for SPN auth (#13898)
## Summary
- The Azure SPN storage handler hardcoded
`AzureAuthorityHosts.AZURE_CHINA`, preventing users in Azure Public
Cloud regions (UK-South, EU, US, etc.) from authenticating
- Add a `cloud` config option (env: `AZURE_CLOUD`) supporting all four
Azure sovereignties: `public`, `china`, `government`, `germany`
- Defaults to `public` (global Azure) — the most common international
use case

Closes #13259

## Test plan
- [ ] Verify default (`cloud: public`) connects to Azure Public Cloud
endpoints
- [ ] Verify `cloud: china` retains existing behavior for Azure China
users
- [ ] Verify `AZURE_CLOUD` env var overrides the config file value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 12:51:26 +08:00
Zhichang Yu
384fa6fc6e Replace MinIO official image with pgsty/minio fork (#13896)
## Summary

- Replace `quay.io/minio/minio` with `pgsty/minio` community fork in
`docker/docker-compose-base.yml`

MinIO stopped distributing pre-built Docker images and changed its
license. The pgsty/minio fork provides drop-in compatible images under
AGPLv3.

Closes #13840

## Test plan

- [x] Verify `docker compose -f docker/docker-compose-base.yml up -d`
pulls the pgsty/minio image successfully
- [ ] Verify MinIO console accessible on port 9001
- [ ] Verify RAGFlow backend can connect to MinIO and perform file
operations normally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 22:03:02 +08:00
Yongteng Lei
b7daf6285b Refa: Chat conversations /convsersation API to RESTFul (#13893)
### What problem does this PR solve?

Chat conversations /convsersation API to RESTFul.

### Type of change

- [x] Refactoring
2026-04-02 20:49:23 +08:00
chanx
bbb9b1df85 feat: Implement file upload and folder creation features by GO (#13903)
### What problem does this PR solve?

feat: Implement file upload and folder creation features

- Add file upload route in router.go
- Add file operation methods in dao/file.go
- Add util/file.go for file type detection and filename handling
- Implement file upload and folder creation endpoints in handler/file.go
- Implement file upload and folder creation logic in service/file.go
- Modify response message format in memory.go
- Add document count method in dao/document.go

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-02 20:21:04 +08:00
Jin Hai
6c29128de1 Refactor model provider and command (#13887)
### What problem does this PR solve?

Introduce 5 new tables, including model groups and provider instance.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-02 20:20:35 +08:00
qinling0210
f02f5fa435 Get ROW_ID from search() in Infinity (#13901)
### What problem does this PR solve?

1. Search() in Infinity can return row_id now

2.  To Get ROW_ID from search(), refer to handling of retrieval_test.

example
```
$ curl -s -X POST "http://localhost:$PORT/v1/chunk/retrieval_test" -H "Authorization: $TOKEN" -H "Content-Type: application/json" -d '{"kb_id": "4fcd01582ca911f1954184ba59049aa3", "question": "曹操"}'
```


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-02 18:56:43 +08:00
Idriss Sbaaoui
ee1bb8a8b5 Fix: overlapping document parse race that can clear chunks (#13900)
### What problem does this PR solve?

This PR fixes a race in batch document parsing where overlapping parse
requests for the same document could clear/rewrite chunk state and make
previously parsed content appear lost. It adds an atomic per-document
parse guard so only one parse can run at a time for that document (Fixes
#13864 ).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-02 18:50:56 +08:00
writinwaters
3b96cedece Docs: Updated chat-specific APIs (#13888)
### What problem does this PR solve?

Chat-specific API descriptions updated.

### Type of change

- [x] Documentation Update
2026-04-02 14:15:09 +08:00
NeedmeFordev
6b7989b4b4 Add file type validation (#13802)
### What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types
([#13795](https://github.com/infiniflow/ragflow/issues/13795)).

Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.

This change adds extension validation in two places:

1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.

It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- Manual verification with mixed file types under the configured WebDAV
path:
  - supported: `.pdf`, `.txt`, `.md`
  - unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.

Fixes: #13795
2026-04-02 14:12:27 +08:00
Idriss Sbaaoui
dd529137eb Fix: markdown table double extraction in parser (#13892)
### What problem does this PR solve?

Fixes markdown tables being parsed twice (once as markdown and again as
generated HTML), which caused duplicate table chunks in the chunk list
UI.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-02 13:31:56 +08:00
Sank
1c2c4b337e [RU] Add schema synchronization and translate (#13891)
### What problem does this PR solve?

Add schema synchronization and translate

### Type of change

- [x] Translation into Russian
2026-04-02 11:18:27 +08:00
Ricardo-M-L
09a09a5b20 fix: correct typo in IterationItem name check and incomplete error message (#13890)
Two small fixes:

1. **iterationitem.py line 72**: Typo "interationitem" → "iterationitem"
(missing 't'). The component name check never matched IterationItem
components.

2. **raptor.py line 94**: Error message "Embedding error: " had a
trailing colon with no details. Changed to "Embedding error: empty
embeddings returned".
2026-04-02 10:35:28 +08:00
balibabu
af40be68c3 Fix: The dataset on the list page cannot be renamed. (#13886)
### What problem does this PR solve?

Fix: The dataset on the list page cannot be renamed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-01 20:53:05 +08:00
Yongteng Lei
b622c47ed6 Refa: Chats /chat API to RESTFul (#13881)
### What problem does this PR solve?

 Refactor Chats /chat API to RESTFul.

### Type of change

- [x] Refactoring
2026-04-01 20:10:37 +08:00
qinling0210
bb4a06f759 Implement InsertDataset and InsertMetadata in GO (#13883)
### What problem does this PR solve?

Implement InsertDataset and InsertMetadata in GO

new internal cli for go:

INSERT DATASET FROM FILE "file_name"
INSERT METADATA FROM FILE "file_name"

### Type of change

- [x] Refactoring
2026-04-01 16:16:25 +08:00
Liu An
b1d28b5898 Revert "Refa: Chats /chat API to RESTFul (#13871)" (#13877)
### What problem does this PR solve?

This reverts commit 1a608ac411.

### Type of change

- [x] Other (please describe):
2026-04-01 11:05:29 +08:00
Yongteng Lei
1a608ac411 Refa: Chats /chat API to RESTFul (#13871)
### What problem does this PR solve?

Chats /chat API to RESTFul.

### Type of change

- [x] Refactoring
2026-04-01 10:50:22 +08:00
balibabu
00b62dd587 Feat: If a model configured in the agent is deleted from the user center, a notification will be displayed on the canvas with a red border. (#13872)
### What problem does this PR solve?

Feat: If a model configured in the agent is deleted from the user
center, a notification will be displayed on the canvas with a red
border.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-31 18:43:24 +08:00
Jin Hai
efd6ecc3e5 New provider and models API and CLI (#13865)
### What problem does this PR solve?

As title.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-31 18:42:12 +08:00
Sank
68b4287892 Add translate [RU] for MinerU (#13832)
add translate for MinerU in knowledgeConfiguration

### Type of change

- [X] Other (please describe):
2026-03-31 17:03:31 +08:00
balibabu
36513313f8 Fix: The agent form sheet will be obscured by the message log sheet. (#13870)
### What problem does this PR solve?

Fix: The agent form sheet will be obscured by the message log sheet.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-31 16:18:43 +08:00
Paul Y Hui
3e702c6265 fix: guard against missing/malformed Authorization header in apikey_required (#13860)
### What problem does this PR solve?

Previously, `apikey_required` called
`request.headers.get('Authorization').split()[1]` without checking for
None or insufficient parts, causing an unhandled AttributeError or
IndexError (500) instead of a proper 403 JSON response.

This applies the same guarding pattern already used by `token_required`
in the same file.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-03-31 15:25:00 +08:00
balibabu
4f27090289 Fix: Unable to reconnect after deleting the connection between begin and parser #13868 (#13869)
### What problem does this PR solve?

Fix: Unable to reconnect after deleting the connection between begin and
parser #13868
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-31 14:44:47 +08:00
writinwaters
db5ab7bbe8 Docs: Image2text is supported by GPUStack. (#13856)
### What problem does this PR solve?

 Image2text is supported by GPUStack. #9515 

### Type of change

- [x] Documentation Update
2026-03-30 20:39:02 +08:00
balibabu
3a4f0d1a83 Fix: The chat settings are not displayed correctly on the first page load. (#13855)
### What problem does this PR solve?
Fix: The chat settings are not displayed correctly on the first page
load.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-30 20:16:52 +08:00
qinling0210
620fe215a4 Fix python metadata search (#13727)
### What problem does this PR solve?

Fix python metadata search

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-30 19:37:19 +08:00
qinling0210
0462c20113 Fix special characters in matching text of search() (#13852)
### What problem does this PR solve?

Fix special characters in matching text of search(). We should escape
some special characters(such as ?, *,:) before passing to matching_text
of search()

Fix https://github.com/infiniflow/ragflow/issues/13729

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-30 18:47:10 +08:00
Zhichang Yu
0d85a8e7aa feat: add dynamic log level adjustment APIs (#13850)
Add REST APIs to dynamically query and modify log levels at runtime for
both Python (Flask) and Go servers.

Changes:
- common/log_utils.py: add set_log_level() and get_log_levels()
functions
- admin/server/routes.py: add GET/PUT /api/v1/admin/log_levels endpoints
- api/apps/system_app.py: add GET/PUT /api/{version}/system/log_levels
endpoints
- internal/logger/logger.go: add GetLevel() and SetLevel() with atomic
level support
- internal/handler/system.go: add GetLogLevel, SetLogLevel, Health
handlers
- internal/router/router.go: route /health to systemHandler
- internal/admin/handler.go: add GetLogLevel, SetLogLevel handlers
- internal/admin/router.go: add /api/v1/admin/log_level routes

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 18:40:58 +08:00
黄圣祺
534729546e fix(html-parser): correct h4 heading mapping from ##### to #### (#13833)
## Summary

- Fix incorrect Markdown heading mapping for `h4` in `TITLE_TAGS`
dictionary
- `h4` was mapped to `"#####"` (h5 level) instead of `"####"` (correct
h4 level)

Closes #13819

## Details

In `deepdoc/parser/html_parser.py`, the `TITLE_TAGS` dictionary had a
typo where `h4` was assigned 5 `#` characters instead of 4, causing h4
headings to be converted to h5-level Markdown headings during HTML
parsing.

## Test plan

- [ ] Parse an HTML document containing `<h4>` tags and verify the
output uses `####` (4 hashes)
- [ ] Verify other heading levels remain correct

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 13:17:32 +08:00
Jin Hai
2faaa9f9ce Update docker container start printout (#13847)
### What problem does this PR solve?

Printout RAGFlow version

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-30 12:59:42 +08:00
Jin Hai
e20cf39735 Refactor Go server model provider reading and access (#13831)
### What problem does this PR solve?

1. Refactor model provider json file format
2. Use memory data structure to replace database
3. Add CLI command to access

```
RAGFlow(user)> list pool models from 'xai';
+-------------------------------------------------------------------------------------+------------+-------------+-----------------------+
| features                                                                            | max_tokens | model_types | name                  |
+-------------------------------------------------------------------------------------+------------+-------------+-----------------------+
| map[]                                                                               | 256000     | [llm]       | grok-4                |
| map[]                                                                               | 131072     | [llm]       | grok-3                |
| map[]                                                                               | 131072     | [llm]       | grok-3-fast           |
| map[]                                                                               | 131072     | [llm]       | grok-3-mini           |
| map[]                                                                               | 131072     | [llm]       | grok-3-mini-mini-fast |
| map[multimodal:map[enabled:true input_modalities:[image] output_modalities:[text]]] | 32768      | [vlm]       | grok-2-vision         |
+-------------------------------------------------------------------------------------+------------+-------------+-----------------------+
RAGFlow(user)> show pool model 'grok-2-vision' from 'xai';
+-------------------------------------------------------------------------------------+------------+-------------+---------------+
| features                                                                            | max_tokens | model_types | name          |
+-------------------------------------------------------------------------------------+------------+-------------+---------------+
| map[multimodal:map[enabled:true input_modalities:[image] output_modalities:[text]]] | 32768      | [vlm]       | grok-2-vision |
+-------------------------------------------------------------------------------------+------------+-------------+---------------+
RAGFlow(user)> list pool providers;
+--------+------------------------------------------------------------+---------------------------+
| name   | tags                                                       | url                       |
+--------+------------------------------------------------------------+---------------------------+
| OpenAI | LLM,TEXT EMBEDDING,TTS,TEXT RE-RANK,SPEECH2TEXT,MODERATION | https://api.openai.com/v1 |
| xAI    | LLM                                                        | https://api.x.ai/v1       |
+--------+------------------------------------------------------------+---------------------------+
RAGFlow(user)> show pool provider 'openai';
+---------------------------+--------+------------------------------------------------------------+--------------+
| base_url                  | name   | tags                                                       | total_models |
+---------------------------+--------+------------------------------------------------------------+--------------+
| https://api.openai.com/v1 | OpenAI | LLM,TEXT EMBEDDING,TTS,TEXT RE-RANK,SPEECH2TEXT,MODERATION | 27           |
+---------------------------+--------+------------------------------------------------------------+--------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-30 12:00:49 +08:00
qinling0210
a8bbe167a9 Bump to infinity v0.7.0-dev5 (#13846)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev5

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-30 10:19:06 +08:00
Heyang Wang
641b319647 feat: support reading tags via API (#12891) (#13732)
### What problem does this PR solve?

Enable reading Tag Set tags via API (expose tag_kwd field). The result
of the queried list chunks is as shown below:

<img width="1422" height="818" alt="image"
src="https://github.com/user-attachments/assets/abd1960a-fe34-489e-9d72-525f8e574938"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
2026-03-29 20:17:01 +08:00
KeJun
cb78ce0a7b feat: support rss datasource (#13721)
### What problem does this PR solve?

Supporting public RSS/Atom feed URLs as data sources for RagFlow.

link https://github.com/infiniflow/ragflow/issues/12313

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-27 22:58:44 +08:00
Jin Hai
f32a832f92 Add rename model directory to entity to avoid name misunderstanding (#13829)
### What problem does this PR solve?

Model-> entity

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-27 19:25:18 +08:00
balibabu
308b3a1299 Feat: Remove antd-related code and upgrade lucide-react to the latest version. (#13830)
### What problem does this PR solve?

Feat: Remove antd-related code and upgrade lucide-react to the latest
version.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-27 19:24:52 +08:00
Jin Hai
1fff48b656 Add minio go test (#13800)
### What problem does this PR solve?

1. Add go test
2. Update CI process

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-27 18:12:56 +08:00
Liu An
2240fc778c Fix: add missing "mom" field to infinity_mapping.json for parent-child chunker (#13821)
### What problem does this PR solve?

When using Infinity as DOC_ENGINE with parent-child chunker enabled,
vector insertion fails because the "mom" field is missing from the index
mapping. This fix adds the required field to resolve the issue.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-27 13:06:18 +08:00
Krishna Chaitanya
cdbbd2620c Fix: upgrade pyasn1 from 0.6.2 to 0.6.3 to address CVE-2026-30922 (#13773)
## Summary

- Adds `pyasn1>=0.6.3` as a `[tool.uv.constraint-dependencies]` entry to
mitigate **CVE-2026-30922** (CVSS 7.5 HIGH)
- Regenerates `uv.lock` so the resolved pyasn1 version moves from
**0.6.2 to 0.6.3**

## Details

**CVE-2026-30922** is a Denial of Service vulnerability in pyasn1 caused
by unbounded recursion when decoding ASN.1 data with deeply nested
structures. An attacker can send crafted payloads with thousands of
nested SEQUENCE or SET tags to trigger a `RecursionError` crash or
memory exhaustion.

- **Severity:** HIGH (CVSS 7.5)
- **Affected versions:** pyasn1 < 0.6.3
- **Fixed in:** pyasn1 >= 0.6.3
- **NVD:** https://nvd.nist.gov/vuln/detail/CVE-2026-25769

`pyasn1` is not a direct dependency of RAGFlow but is pulled in
transitively via `google-auth` -> `rsa` -> `pyasn1-modules` -> `pyasn1`.
The `constraint-dependencies` mechanism in uv is the correct way to
enforce a minimum version for transitive dependencies without polluting
the direct dependency list.

## Test plan

- [x] `pyproject.toml` passes TOML validation
- [x] `uv lock` resolves successfully with the new constraint
- [x] pyasn1 version in `uv.lock` is now 0.6.3
- [ ] Existing CI/CD tests continue to pass

Closes #13686
2026-03-27 10:37:34 +08:00
chanx
8a9bbf3d6d Feat: add memory function by go (#13754)
### What problem does this PR solve?

Feat: Add Memory function by go

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-27 09:49:50 +08:00
黄圣祺
406339af1f Fix(paddleocr): load all PDF pages for image cropping instead of first 100 (#13811)
## Summary

Closes #13803

The `__images__` method in `paddleocr_parser.py` defaulted to
`page_to=100`, only loading the first 100 pages for image cropping.
However, the PaddleOCR API processes **all** pages of the PDF. For PDFs
with more than 100 pages, page indices beyond 99 were rejected as out of
range during crop validation, causing content loss.

## Root Cause

```
__images__(page_to=100) → loads pages 0-99 → page_images has 100 entries
PaddleOCR API → processes all 226 pages → tags reference pages 1-226
extract_positions() → converts tag "101" to index 100
crop() validation → 0 <= 100 < 100 → False → "All page indices [100] out of range"
```

## Fix

Changed `page_to` default from `100` to `10**9`, so all PDF pages are
loaded for cropping. Python's list slicing safely handles oversized
indices.

## Test plan

- [ ] Parse a PDF with >100 pages using PaddleOCR — no more "out of
range" warnings
- [ ] Parse a PDF with <100 pages — behavior unchanged
- [ ] Verify cropped images are generated correctly for all pages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 09:33:11 +08:00
Sank
992a15146d Add translate autoMetadata (#13807)
### What problem does this PR solve?

add translate autoMetadata in Russia

### Type of change

- [x] Other
2026-03-27 09:31:20 +08:00
Renzo
f3aa3381a2 Fix username line break in SharedBadge component (#13794)
## Summary
- Added Tailwind truncation classes (`inline-block max-w-[120px]
truncate align-middle`) to the username `<span>` in `SharedBadge` to
prevent long usernames from wrapping onto multiple lines
- Added `title` attribute to show the full username on hover when
truncated


![ragflow](https://github.com/user-attachments/assets/8b3d8c03-d605-4957-bcf0-8b4d81fc4e70)


## Test plan
- [x] Verify long usernames display truncated with ellipsis (`...`)
- [x] Verify hovering over a truncated username shows the full name as a
tooltip
- [x] Verify short usernames display normally without truncation

Closes #13748
2026-03-27 09:31:08 +08:00
Yingfeng
6e309f9d0a Feat: Initialize context engine CLI (#13776)
### What problem does this PR solve?

- Add multiple output format to ragflow_cli
- Initialize contextengine to Go module
  - ls datasets/ls files
  - cat file
  - search -d dir -q query

issue: #13714

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-26 21:07:06 +08:00
Idriss Sbaaoui
3b1e77a6d4 Fix: shared KB embedding authorization for team members (#13809)
### What problem does this PR solve?

fixes issue #13799 where team members get model not authorized when
running RAG on an admin-shared knowledge base after the admin changes
the KB embedding model (for example to bge-m3).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-26 21:01:07 +08:00
Lynn
8d4a3d0dfe Fix: create dataset with chunk_method or pipeline (#13814)
### What problem does this PR solve?

Allow create datasets with parse_type == 1/None and chunk_method, or
parse_type == 2 and pipeline_id.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-26 20:43:53 +08:00
Lynn
6a4a9debd2 Fix: allow create dataset with resume chunk_method (#13798)
### What problem does this PR solve?

Allow create dataset with resume chunk_method.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-26 19:06:51 +08:00
balibabu
8402fcac6b Fix: The chunk method of the knowledge base cannot be saved. (#13813)
### What problem does this PR solve?

Fix: The chunk method of the knowledge base cannot be saved.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-26 19:05:49 +08:00
Syed Shahmeer Ali
ff92b5575b Fix: /file2document/convert blocks event loop on large folders causing 504 timeout (#13784)
Problem

The /file2document/convert endpoint ran all file lookups, document
deletions, and insertions synchronously inside the
request cycle. Linking a large folder (~1.7GB with many files) caused
504 Gateway Timeout because the blocking DB loop
  held the HTTP connection open for too long.

  Fix

- Extracted the heavy DB work into a plain sync function _convert_files
- Inputs are validated and folder file IDs expanded upfront (fast path)
- The blocking work is dispatched to a thread pool via
get_running_loop().run_in_executor() and the endpoint returns 200
  immediately
- Frontend only checks data.code === 0 so the response change
(file2documents list → True) has no impact

  Fixes #13781

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:45:10 +08:00
Jin Hai
e705ac6643 Add logout (#13796)
### What problem does this PR solve?

Add command: logout

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-26 11:54:23 +08:00
qinling0210
ebf36950e4 Implement Create/Drop Index/Metadata index in GO (#13791)
### What problem does this PR solve?

Implement Create/Drop Index/Metadata index in GO

New API handling in GO:
POST/kb/index 
DELETE /kb/index
POST /tenant/doc_meta_index
DELETE /tenant/doc_meta_index

CREATE INDEX FOR DATASET 'dataset_name' VECTOR_SIZE 1024;
DROP INDEX FOR DATASET 'dataset_name';
CREATE INDEX DOC_META;
DROP INDEX DOC_META;

### Type of change

- [x] Refactoring
2026-03-26 11:54:10 +08:00
Yongteng Lei
d19ca71b43 Refa: Searches /search API to RESTFul (#13770)
### What problem does this PR solve?

Searches /search API to RESTFul

### Type of change

- [x] Documentation Update
- [x] Refactoring

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-26 01:07:41 +08:00
Yongteng Lei
ea1430bec5 Security: do not use litellm 1.82.7 and 1.82.8 (#13768)
### What problem does this PR solve?

See [issue](https://github.com/BerriAI/litellm/issues/24518) from
Litellm.

Upgraded from `1.81.15` to `1.82.6`, so RAGFlow is safe as always. 

### Type of change

- [x] Security

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-25 22:39:33 +08:00
Jin Hai
61cc5ffef2 Add api tokens commands of go admin cli (#13765)
### What problem does this PR solve?

- GENERATE TOKENS OF USER 'xxx@xxx.com'
- DROP KEY 'ragflow-yyyyy' OF 'xxx@xxx.com'
- LIST KEYS OF 'xxx@xxx.com'

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-25 21:39:14 +08:00
balibabu
33948b9dd8 Fix: Fix the issue of errors when creating datasets. (#13787)
### What problem does this PR solve?

Fix: Fix the issue of errors when creating datasets.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-25 21:37:58 +08:00
balibabu
8f45398422 Fix: Using AvatarUpload in a dialog and pressing Enter will cause a file selection pop-up to appear. #13779 (#13780)
### What problem does this PR solve?
Fix: Using AvatarUpload in a dialog and pressing Enter will cause a file
selection pop-up to appear. #13779

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-03-25 19:02:51 +08:00
Jin Hai
24fcd6bbc7 Update CI (#13774)
### What problem does this PR solve?

CI isn't stable, try to fix it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-25 18:17:52 +08:00
Idriss Sbaaoui
f3b4d6ab0e Fix: ci fails (#13778)
### What problem does this PR solve?

fix tests failing at p2 and p3

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-25 17:56:13 +08:00
Liu An
543d164e9b Fix: add build-essential for Python C extension packages (#13772)
### What problem does this PR solve?

The removal of cargo in commit f59d96f87 also removed build-essential
which was needed to compile C extension packages like datrie.

Use aliyun mirror for coverage pip install

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-25 17:14:48 +08:00
chanx
a2e6daa8d6 Fix: Metadata,chunk,dataset Related bugs (#13760)
### What problem does this PR solve?

Fix: Metadata,chunk,dataset Related bugs
- metadata not show add button #13731
- chunk edit question style
- dataset modified chunk method bug
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-25 10:47:34 +08:00
Yongteng Lei
1b29522279 Fix: migrate_add_unique_email silently skips unique constraint (#13744)
### What problem does this PR solve?

Fix
migrate_add_unique_email-silently-skips-unique-constraint-when-non-unique-user_email-index-exists.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-24 20:24:24 +08:00
NeedmeFordev
840cc8fbe9 fix(asana): use project memberships endpoint for project IDs in connector (#13746)
### What problem does this PR solve?

Fixes a bug in the Asana connector where providing `Project IDs` caused
sync to fail with:

`project_membership: Not a recognized ID: <PROJECT_GID>`

Root cause: the connector called `get_project_membership(project_gid)`,
but that API expects a **project membership gid**, not a **project
gid**.
This PR switches to the correct project-scoped API and adds regression
tests.

Fixes: [#13669](https://github.com/infiniflow/ragflow/issues/13669)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes made

- Updated `common/data_source/asana_connector.py`:
- Replaced `get_project_membership(pid, ...)` with
`get_project_memberships_for_project(pid, ...)`
- Trimmed and filtered `asana_project_ids` parsing to avoid
empty/whitespace IDs
  - Normalized `asana_team_id` by trimming whitespace
  - Used safer access for membership email extraction (`m.get("user")`)
- Added `test/unit_test/common/test_asana_connector.py`:
  - Verifies the correct project-membership API method is called
  - Verifies empty `project_ids` path returns workspace emails
  - Verifies project/team input normalization behavior

### Compatibility / risk

- Non-breaking bug fix
- No API contract changes
- Existing behavior for empty `Project IDs` remains unchanged
2026-03-24 20:21:31 +08:00
qinling0210
7c8927c4fb Implement GetChunk() in Infinity in GO (#13758)
### What problem does this PR solve?

Implement GetChunk() in Infinity in GO

Add cli:
GET CHUNK 'XXX';
LIST CHUNKS OF DOCUMENT 'XXX';

### Type of change

- [x] Refactoring
2026-03-24 20:10:21 +08:00
Jin Hai
b308cd3a02 Update go cli (#13717)
### What problem does this PR solve?

Go cli

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-24 20:08:36 +08:00
balibabu
d84b688b91 Fix: This resolves the issue where selecting a knowledge base in chat could not differentiate between different users. (#13764)
### What problem does this PR solve?

Fix: This resolves the issue where selecting a knowledge base in chat
could not differentiate between different users.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-24 20:07:06 +08:00
Yongteng Lei
3d10e2075c Refa: files /file API to RESTFul style (#13741)
### What problem does this PR solve?

Files /file API to RESTFul style.

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-24 19:24:41 +08:00
Idriss Sbaaoui
10a36d6443 Tests : add tests for dataset settings (#13747)
### What problem does this PR solve?

add tests

### Type of change

- [x] Other (please describe): test

Co-authored-by: Liu An <asiro@qq.com>
2026-03-24 19:04:04 +08:00
Yongteng Lei
1b1f1bc69f Fix: minor fix of refacotr excel parser use lazy image loader (#13752)
### What problem does this PR solve?

Minor fix.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Hu Di <812791840@qq.com>
2026-03-24 19:03:54 +08:00
Baki Burak Öğün
8a4da41406 docs: add Turkish README translation (README_tr.md) (#13750)
## Summary
Add a complete Turkish translation of the README and include a Turkish
language badge across all existing README files.

## Changes
- **New file**: `README_tr.md` - Full Turkish translation of README.md,
covering all sections (What is RAGFlow, Demo, Latest Updates, Key
Features, System Architecture, Get Started, Configurations, Docker
Image, Development from Source, Documentation, Roadmap, Community,
Contributing)
- **Updated 9 existing README files** (README.md, README_zh.md,
README_tzh.md, README_ja.md, README_ko.md, README_id.md,
README_pt_br.md, README_fr.md, README_ar.md) to include the Turkish
language badge in the language selector

## Impact
- 10 files changed, 417 insertions
- Follows the same structure and conventions as other language-specific
README files (README_ja.md, README_ko.md, etc.)
- Turkish badge uses the same styling pattern (highlighted with DBEDFA
in README_tr.md, standard DFE0E5 in others)

---------

Co-authored-by: bakiburakogun <bakiburakogun@users.noreply.github.com>
2026-03-24 19:00:48 +08:00
Baki Burak Öğün
1319a25416 feat: complete Turkish localization (#13749)
## Summary
Complete and improve the existing Turkish (tr.ts) localization to fully
match the English (en.ts) reference file.

## Changes
- **Translate 6 English model tips** in the setting section
(chatModelTip, embeddingModelTip, img2txtModelTip, sequence2txtModelTip,
rerankModelTip, ttsModelTip) to Turkish
- **Expand all 13 truncated parser HTML descriptions** (book, laws,
manual, naive, paper, presentation, qa, resume, table, picture, one,
knowledgeGraph, tag) to match the full en.ts structure
- **Expand shortened tooltips** across knowledgeDetails,
knowledgeConfiguration, chat, and setting sections (~40+ tooltips
expanded)
- **Add missing translation details** for data source connectors
(SeaFile, Jira, Gmail, Moodle, Dropbox, Google Drive, etc.)

## Impact
- 182 insertions, 71 deletions in web/src/locales/tr.ts
- No structural changes, only translation content improvements
- All application terminology maintained consistently

Co-authored-by: bakiburakogun <bakiburakogun@users.noreply.github.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-24 18:58:58 +08:00
Jin Hai
f59d96f879 Remove rust/cargo install in docker (#13739)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-24 17:04:57 +08:00
balibabu
48c60b8ce5 Fix: Fixed the issue where agent log time could not be selected. (#13756)
### What problem does this PR solve?
Fix: Fixed the issue where agent log time could not be selected.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-24 16:02:26 +08:00
Jin Hai
9eb11bf65d Fix ping response (#13757)
### What problem does this PR solve?

As title to be compatible with go server

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-24 15:15:21 +08:00
Stephen Hu
d32967eda8 refactor: let excel use lazy image loader (#13558)
### What problem does this PR solve?

let excel use lazy image loader

### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-23 21:24:40 +08:00
Magicbook1108
f991cd362e Fix: type check in resume parsing method (#13740)
### What problem does this PR solve?

Fix: type check in resume parsing method
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-23 21:19:09 +08:00
Idriss Sbaaoui
df2cc32f51 Fix: dataset settings save (#13745)
### What problem does this PR solve?

Saving dataset settings failed with validation error 101 (Extra inputs
are not permitted)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-23 17:46:41 +08:00
qinling0210
ac542da505 Fix tokenizer in cpp (#13735)
### What problem does this PR solve?

Tokenzier in Infinity is modified in
https://github.com/infiniflow/infinity/pull/3330, sync the code change
to cpp files in ragflow

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-23 15:40:35 +08:00
qinling0210
7b86f577be Implement metadata search in Infinity in GO (#13706)
### What problem does this PR solve?

Add cli

LIST DOCUMENTS OF DATASET quoted_string ";"
LIST METADATA OF DATASETS quoted_string ("," quoted_string)* ";"
LIST METADATA SUMMARY OF DATASET quoted_string (DOCUMENTS quoted_string
("," quoted_string)*)? ";"

### Type of change

- [x] Refactoring
2026-03-21 18:10:00 +08:00
Lynn
db57155b30 Fix: get user_id from variables (#13716)
### What problem does this PR solve?

Get user_id from canvas variable when input a {} pattern value.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-20 23:39:34 +08:00
Yongteng Lei
dd839f30e8 Fix: code supports matplotlib (#13724)
### What problem does this PR solve?

Code as "final" node: 

![img_v3_02vs_aece4caf-8403-4939-9e68-9845a22c2cfg](https://github.com/user-attachments/assets/9d87b8df-da6b-401c-bf6d-8b807fe92c22)

Code as "mid" node:

![img_v3_02vv_f74f331f-d755-44ab-a18c-96fff8cbd34g](https://github.com/user-attachments/assets/c94ef3f9-2a6c-47cb-9d2b-19703d2752e4)


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-20 20:32:00 +08:00
balibabu
0507463f4e Fix: The retrieval_test interface is continuously requested when the user enters a question. #13719 (#13720)
### What problem does this PR solve?

Fix: The retrieval_test interface is continuously requested when the
user enters a question. #13719

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-20 15:46:41 +08:00
Jin Hai
9ce766192f Init storage engine (#13707)
### What problem does this PR solve?

1. Init Minio / S3 / OSS
2. Fix minio / s3 / oss config

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-20 13:15:41 +08:00
Jin Hai
04a60a41e0 Allow default admin user login ragflow user of go server (#13715)
### What problem does this PR solve?

1. Allow admin@ragflow.io login go ragflow server
2. Fix go server start error.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-20 12:02:44 +08:00
tmimmanuel
13d0df1562 feat: add Perplexity contextualized embeddings API as a new model provider (#13709)
### What problem does this PR solve?

Adds Perplexity contextualized embeddings API as a new model provider,
as requested in #13610.

- `PerplexityEmbed` provider in `rag/llm/embedding_model.py` supporting
both standard (`/v1/embeddings`) and contextualized
(`/v1/contextualizedembeddings`) endpoints
- All 4 Perplexity embedding models registered in
`conf/llm_factories.json`: `pplx-embed-v1-0.6b`, `pplx-embed-v1-4b`,
`pplx-embed-context-v1-0.6b`, `pplx-embed-context-v1-4b`
- Frontend entries (enum, icon mapping, API key URL) in
`web/src/constants/llm.ts`
- Updated `docs/guides/models/supported_models.mdx`
- 22 unit tests in `test/unit_test/rag/llm/test_perplexity_embed.py`

Perplexity's API returns `base64_int8` encoded embeddings (not
OpenAI-compatible), so this uses a custom `requests`-based
implementation. Contextualized vs standard model is auto-detected from
the model name.

Closes #13610

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2026-03-20 10:47:48 +08:00
Zhicheng Wu
456b1bbf66 fix: row selection leaks across pages in dataset and file list tables (#13668)
### What problem does this PR solve?

When using pagination in the Dataset file list or File Manager,
selecting row N on page 1 would incorrectly cause row N on page 2 (and
subsequent pages) to also appear selected. This is a state pollution
bug.

### Root Cause

TanStack React Table defaults to using array indices (0, 1, 2...) as
`rowSelection` keys. With server-side (manual) pagination, each page's
rows start from index 0, so a selection like `{2: true}` on page 1 also
matches index 2 on every other page.

### Fix

- Added `getRowId: (row) => row.id` to `useReactTable` in both
`DatasetTable` and `FilesTable`, so selection state is keyed by unique
document/file IDs instead of positional indices.
- Updated the `useSelectedIds` helper to support ID-based selection keys
while maintaining backward compatibility with index-based keys.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Files Changed

| File | Change |
|------|--------|
| `web/src/pages/dataset/dataset/dataset-table.tsx` | Added `getRowId`
to table config |
| `web/src/pages/files/files-table.tsx` | Added `getRowId` to table
config |
| `web/src/hooks/logic-hooks/use-row-selection.ts` | Updated
`useSelectedIds` to handle ID-based selection |
2026-03-19 21:08:09 +08:00
chanx
e1dbfb8a9c fix(dao): Remove unnecessary status filter conditions in user queries (#13698)
### What problem does this PR solve?

Fix: Enhanced the user deletion function to return detailed deletion
information.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-19 21:05:15 +08:00
Magicbook1108
cfe6ea6f56 Feat: CREATE / DELETE / LIST dataset api in Go (#13695)
### What problem does this PR solve?

Feat: CREATE / DELETE / LIST dataset api in Go

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Lynn <lynn_inf@hotmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-19 20:48:32 +08:00
Lynn
f06e332c44 Fix: allow on (#13704)
### What problem does this PR solve?

Allow input on/ON as status.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-19 20:41:02 +08:00
writinwaters
b5e0b37d69 Refact: Renamed 'Agent flow' to 'Workflow' (#13705)
### What problem does this PR solve?

'Agent flow' rebranded.

### Type of change

- [x] Refactoring
2026-03-19 20:17:25 +08:00
Jin Hai
8d50ee632d Add environments reading (#13701)
### What problem does this PR solve?

environment variable > config file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-19 18:50:28 +08:00
yH
757d8d42dd Fix: use configured OrderByExpr in _community_retrieval_ (#13683)
The `odr` variable was configured with `desc("weight_flt")` but a new
empty `OrderByExpr()` was passed to `dataStore.search()` instead,
causing the descending sort to have no effect.

### What problem does this PR solve?

In `_community_retrieval_`, the configured `OrderByExpr` with
`desc("weight_flt")` was discarded — a new empty `OrderByExpr()` was
passed to `dataStore.search()` instead, so community reports were never
sorted by weight.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-19 17:55:40 +08:00
Lynn
e12147f5b9 Fix: admin client (#13699)
### What problem does this PR solve?

Define a crypt function in admin directory, remove import from
api.utils. And move requests-toolbelt to dependency.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-19 17:06:54 +08:00
Lynn
4bb1acaa5b Refactor: dataset / kb API to RESTFul style (#13690)
### What problem does this PR solve?

1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-19 14:41:36 +08:00
Idriss Sbaaoui
7827f0fce5 fix : empty mind map (#13693)
### What problem does this PR solve?

Fix graphrag extractor chat response parsing and skip truncated cache
values

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-19 13:53:06 +08:00
Jin Hai
7ebe1d2722 Fix docker building (#13681)
### What problem does this PR solve?

1. Refactor go server log
2. Update docker building, since nginx config should be set according to
the deployment.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-19 10:25:35 +08:00
NeedmeFordev
c3f79dbcb0 fix(jira): prevent missed incremental updates after issue edits (#13674)
### What problem does this PR solve?

Fixes [#13505](https://github.com/infiniflow/ragflow/issues/13505): Jira
incremental sync could miss updated issues after initial sync,
especially near time boundaries.

Root cause:
- Jira JQL uses minute-level precision for `updated` filters.
- Incremental windows had no overlap buffer, so boundary updates could
be skipped.
- Sync log cursor tracking used a backward-facing update for
`poll_range_start`.
- Existing-doc updates in `upload_document` lacked a KB ownership guard
for doc-id collisions.

What changed:
- Added Jira incremental overlap buffer (`time_buffer_seconds`,
defaulting to `JIRA_SYNC_TIME_BUFFER_SECONDS`) when building JQL
lower-bound time.
- Preserved second-level post-filtering to avoid duplicate reprocessing
while still catching boundary updates.
- Improved Jira sync logging to include start/end window and overlap
configuration.
- Updated sync cursor tracking in `increase_docs` to keep
`poll_range_start` moving forward with max update time.
- Added KB ID safety check before updating existing document records in
`upload_document`.

Verification performed:
- Python syntax compile checks passed for modified files.
- Manual verification flow:
  1. Run full Jira sync.
  2. Edit an already-indexed Jira issue.
  3. Run next incremental sync.
  4. Confirm updated content is re-ingested into KB.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-18 23:31:05 +08:00
Daniil Sivak
dee68c571b Feat: support variable interpolation in headers (#13680)
Closes #13277

### What problem does this PR solve?

Adds `{variable_name}` (and `{component@variable}`) interpolation
support to HTTP header values in the `Invoke` component, matching the
existing URL interpolation behavior.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

<img width="1280" height="867" alt="image"
src="https://github.com/user-attachments/assets/8ab7b4e9-7cc0-4a7f-8a5f-f838a15a5fda"
/>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-18 22:38:20 +08:00
Mustafa YILDIZ
e4d8cdaff3 feat: add Turkish language support (#13670)
### What problem does this PR solve?
RAGFlow had no Turkish language support. This PR adds Turkish (tr)
locale translations to the UI.

### Type of change
- [x] New Feature (non-breaking change which adds functionality)

### What problem does this PR solve?

Co-authored-by: Mustafa YILDIZ <mustafa.yildiz@cilek.com>
2026-03-18 21:09:32 +08:00
writinwaters
bbd0cd80e4 Docs: Updated Add Google Drive as data source (#13684)
### What problem does this PR solve?

Gave an editorial pass to the Add Google Drive document.

### Type of change

- [x] Documentation Update
2026-03-18 21:05:25 +08:00
Octopus
f171554c0a feat: upgrade MiniMax default model to M2.7 (#13676)
## Summary
Upgrade MiniMax model configuration to include the latest M2.7 model.

## Changes
- Add `MiniMax-M2.7` and `MiniMax-M2.7-highspeed` to the model selection
list in `conf/llm_factories.json`
- Place M2.7 models at the top of the list as the recommended default
- Retain all previous models (M2.5, M2.5-highspeed, M2.1, M2) as
available alternatives

## Why
MiniMax-M2.7 is the latest flagship model with enhanced reasoning and
coding capabilities. This update ensures RAGFlow users can access the
newest model while maintaining backward compatibility with existing
configurations.

## Testing
- JSON config validated (well-formed)
- No existing MiniMax-specific unit tests affected
- Model entries follow the same structure as existing entries

Co-authored-by: PR Bot <pr-bot@minimaxi.com>
2026-03-18 19:20:10 +08:00
Idriss Sbaaoui
9070408b04 Fix : model-specific handling (#13675)
### What problem does this PR solve?

add a handler for gpt 5 models that do not accept parameters by dropping
them, and centralize all models with specific paramter handling function
into a single helper.
solves issue #13639 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-03-18 17:28:20 +08:00
Yongteng Lei
53e395ca2e Fix: cannot debug invoke component (#13649)
### What problem does this PR solve?

Cannot debug invoke component.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-18 14:22:13 +08:00
Jin Hai
74866371ef Fix compatiblity issue (#13667)
### What problem does this PR solve?

1. Change go admin server port from 9385 to 9383 to avoid conflicts
2. Start go server after python servers are started completely, in
entrypoint.sh
3. Fix some database migration issue
4. Add more API routes in web to compliant with EE.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-18 11:51:03 +08:00
Daniil Sivak
60ad32a0c2 Feat: support epub parsing (#13650)
Closes #1398

### What problem does this PR solve?

Adds native support for EPUB files. EPUB content is extracted in spine
(reading) order and parsed using the existing HTML parser. No new
dependencies required.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

To check this parser manually:

```python
uv run --python 3.12 python -c "
from deepdoc.parser import EpubParser

with open('$HOME/some_epub_book.epub', 'rb') as f:
  data = f.read()

sections = EpubParser()(None, binary=data, chunk_token_num=512)
print(f'Got {len(sections)} sections')
for i, s in enumerate(sections[:5]):
  print(f'\n--- Section {i} ---')
  print(s[:200])
"
```
2026-03-17 20:14:06 +08:00
Idriss Sbaaoui
1399c60164 fix builtin model fail when parsing (#13657)
### What problem does this PR solve?

using builtin model when parsing gave an error because it expects
fid==builtin. split_model_name_and_factory returns id=None. pr allows
the model to be accepted wheter with or without @Builtin

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-17 19:38:54 +08:00
balibabu
6cae364ac2 Feat: Export Agent Logs. (#13658)
### What problem does this PR solve?
Feat: Export Agent Logs.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-03-17 18:51:26 +08:00
balibabu
fc4f1e2488 Fix: The dataset description should not be a required field. (#13655)
### What problem does this PR solve?

Fix: The dataset description should not be a required field.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-17 18:51:18 +08:00
Idriss Sbaaoui
ad6bdb5bfe Fix: left preview containment regression for file previews (#13652)
### What problem does this PR solve?

Fix left preview containment regression for file previews

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-17 17:21:13 +08:00
Yongteng Lei
ca6c3218c3 Refa: follow-up expose agent structured outputs in non-stream completions (#13524)
### What problem does this PR solve?

Follow-up expose agent structured outputs in non-stream completions
#13389.

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2026-03-17 17:11:27 +08:00
qinling0210
ca182dc188 Implement Search() in Infinity in GO (#13645)
### What problem does this PR solve?

Implement Search() in Infinity in GO.

The function can handle the following request. 
"search '曹操' on datasets 'infinity'" 
"search '常胜将军' on datasets 'infinity'"
"search '卓越儒雅' on datasets 'infinity'"
"search '辅佐刘禅北伐中原' on datasets 'infinity'"

The output is exactly the same as  request to python Search()

### Type of change

- [ ] New Feature (non-breaking change which adds functionality)
2026-03-17 16:45:45 +08:00
balibabu
549833b8a4 Fix: Fixed an issue where agent template titles were not displayed in Chinese mode. (#13647)
### What problem does this PR solve?

Fix: Fixed an issue where agent template titles were not displayed in
Chinese mode.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-17 15:56:57 +08:00
Stephen Hu
77483b1e58 refactor: remove useless variable in raptor (#13648)
### What problem does this PR solve?

remove useless variable in raptor

### Type of change


- [x] Refactoring
2026-03-17 15:56:51 +08:00
Jin Hai
986dcf1cc8 Revert "Refactor: dataset / kb API to RESTFul style" (#13646)
Reverts infiniflow/ragflow#13619
2026-03-17 12:09:48 +08:00
balibabu
fdf2d84ffc Fix: Fixed an issue where the agent could not publish. (#13644)
### What problem does this PR solve?

Fix: Fixed an issue where the agent could not publish.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-17 11:44:01 +08:00
Lynn
1db5409d82 Refactor: dataset / kb API to RESTFul style (#13619)
### What problem does this PR solve?

1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.

### Type of change

- [x] Refactoring
2026-03-16 22:51:34 +08:00
Yingfeng
73bc9b91de Limit max recursion depth for rag analyzer#3318 (#13637)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-16 22:49:56 +08:00
chanx
5403f142ae Feat: Add chunk also supports uploading image. (#13628)
### What problem does this PR solve?

Feat: Add chunk also supports uploading image.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-16 20:15:49 +08:00
Yongteng Lei
af7e24ba8c Feat: add_chunk supports add image (#13629)
### What problem does this PR solve?

Add_chunk supports add image.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-16 20:15:36 +08:00
Magicbook1108
09ff1bc2b0 Fix: paddle ocr coordinate lower > upper (#13630)
### What problem does this PR solve?

Fix: paddle ocr coordinate lower > upper #13618 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-16 20:15:26 +08:00
Jin Hai
0545801251 Update CI process (#13632)
### What problem does this PR solve?

This pull request updates the GitHub Actions workflow for testing,
primarily to simplify Docker Compose usage and environment file
management. The main changes focus on removing unnecessary subdirectory
references, updating environment file handling, and streamlining the
workflow steps.


### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-16 19:00:28 +08:00
Idriss Sbaaoui
d5ed179d15 Playwright : add test ids and chat test (#13432)
### What problem does this PR solve?


### Type of change

- [x] Other
2026-03-16 16:39:05 +08:00
balibabu
f4d126acb0 Fix: Shared chat link triggers infinite POST loop with empty question, input disabled #13606 (#13625)
### What problem does this PR solve?

Fix: Shared chat link triggers infinite POST loop with empty question,
input disabled #13606

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-16 15:43:18 +08:00
balibabu
fa48ffe5de Feat: Translate embedded dialog text. (#13623)
### What problem does this PR solve?

Feat: Translate embedded dialog text.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-16 15:43:13 +08:00
Idriss Sbaaoui
c98ad2f0d8 fix multimodel input bars slowly being pushed offscreen (#13620)
### What problem does this PR solve?

when the conversation starts to get long on multimodel chat, the
conversation pushes the input bar offscreem

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-16 15:03:50 +08:00
Idriss Sbaaoui
bb962e67b0 fix : build error (#13622)
### What problem does this PR solve?

add timeout to fix fail at build during uvsync step

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-16 15:03:25 +08:00
Yingfeng
b686a60713 Switch from demo.ragflow.io to cloud.ragflow.io (#13624)
### What problem does this PR solve?

Switch from demo.ragflow.io to cloud.ragflow.io

### Type of change

- [x] Documentation Update
2026-03-16 14:44:39 +08:00
Liu An
5b3bb25010 Fix: switch Python package mirror from Tsinghua to Aliyun (#13617)
### What problem does this PR solve?

Replace pypi.tuna.tsinghua.edu.cn with mirrors.aliyun.com to resolve
issues with missing packages on the Tsinghua mirror.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-16 12:12:25 +08:00
Jin Hai
a2d72202cf Revert "Refactor dataset / kb API to RESTFul style" (#13614)
Reverts infiniflow/ragflow#13263
2026-03-16 10:44:38 +08:00
Ram Mourya
ae9b1c7f6a Docs : Fixed the links for user and developer guide in readme files (#13609)
Fixed the links for user and developer guide in readme files.
2026-03-16 10:23:52 +08:00
Yongteng Lei
287637162c Revert "fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416" (#13613)
Reverts infiniflow/ragflow#13583 which cause uv sync fails.
2026-03-16 10:19:29 +08:00
Lynn
7c32e206be Refactor dataset / kb API to RESTFul style (#13263)
### What problem does this PR solve?

1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.

### Type of change

- [x] Refactoring
2026-03-13 20:02:35 +08:00
apps-lycusinc
8b984c9d5f Fixing WordNetCorpusReader object has no attribute _LazyCorpusLoader_… (#13600)
### What problem does this PR solve?

Forces NLTK to load the corpus synchronously once, preventing concurrent
tasks from triggering the lazy-loading race condition that cause Fixing
WordNetCorpusReader object has no attribute _LazyCorpusLoader_… #13590


### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: shakeel <shakeel@lollylaw.com>
2026-03-13 19:55:01 +08:00
Magicbook1108
161659becc Fix: model selecton rule in get_model_config_by_type_and_name (#13569)
### What problem does this PR solve?

Fix: model selecton rule in get_model_config_by_type_and_name

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 19:46:13 +08:00
balibabu
cb49cd30c4 Feat: Add the user_id field to the agent log table and the embedded page. (#13596)
### What problem does this PR solve?

Feat: Add the `user_id` field to the agent log table and the embedded
page.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-13 19:06:18 +08:00
Jin Hai
cc7e94ffb6 Use different API route according the ENV (#13597)
### What problem does this PR solve?

1. Fix go server date precision
2. Use API_SCHEME_PROXY to control the web API route

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-13 19:05:30 +08:00
balibabu
1569ed82f8 Refactor: Delete flow.ts (#13498)
### What problem does this PR solve?

Feat: Delete flow.ts

### Type of change


- [x] Refactoring
2026-03-13 18:04:54 +08:00
chanx
a3e6c2e84a Fix: Enhanced user management functionality and cascading data deletion. (#13594)
### What problem does this PR solve?
Fix: Enhanced user management functionality and cascading data deletion.

Added tenant and related data initialization functionality during user
creation, including tenants, user-tenant relationships, LLM
configuration, and root folder.
Added cascading deletion logic for user deletion, ensuring that all
associated data is cleaned up simultaneously when a user is deleted.
Implemented a Werkzeug-compatible password hash algorithm (scrypt) and
verification functionality.
Added multiple DAO methods to support batch data operations and
cascading deletion.
Improved user login processing and added token signing functionality.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 16:53:54 +08:00
Sank
a67fa03584 fix CVE-2026-28804 CVE-2026-31826 (#13592)
What problem does this PR solve?

fix CVE-2026-28804 CVE-2026-31826

 Bug Fix (non-breaking change which fixes an issue)

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 16:34:28 +08:00
balibabu
717f1f1362 Feat: Modify the style of the release confirmation box. (#13542)
### What problem does this PR solve?

Feat: Modify the style of the release confirmation box.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: balibabu <assassin_cike@163.com>
Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
2026-03-13 16:31:17 +08:00
Lynn
02070bab2a Feat: record user_id in memory (#13585)
### What problem does this PR solve?

Get user_id from canvas and record it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-13 15:38:35 +08:00
Jin Hai
5c955a31cc Update go server (#13589)
### What problem does this PR solve?

1. Add more CLI command
2. Add some license hooks

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-13 14:41:02 +08:00
Idriss Sbaaoui
ef94a9c291 Fix : remove min value for description field (#13587)
### What problem does this PR solve?

min value and message force users to input a descript in datasets. Also
had a wrong error message.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 13:11:54 +08:00
Ethan T.
71804bf5bc fix(db_models): guard MySQL-specific SQL in migration with DB_TYPE check (fixes #13544) (#13582)
## Summary

Fixes #13544: PostgreSQL startup crash because
`update_tenant_llm_to_id_primary_key()` unconditionally uses
MySQL-specific SQL.

- Split `update_tenant_llm_to_id_primary_key()` into
`_update_tenant_llm_to_id_primary_key_mysql()` and
`_update_tenant_llm_to_id_primary_key_postgres()`, dispatching on
`settings.DATABASE_TYPE`
- MySQL path: unchanged (existing `DATABASE()`, `SET @row = 0`,
`AUTO_INCREMENT`, `DROP PRIMARY KEY` logic)
- PostgreSQL path: uses `current_database()`, `ROW_NUMBER() OVER (ORDER
BY ...)` for sequential IDs, `CREATE SEQUENCE` + `nextval()` for
auto-increment, and `information_schema.table_constraints` to find the
PK constraint name
- Also fix `migrate_add_unique_email()`: MySQL-only
`information_schema.statistics` is replaced with `pg_indexes` on
PostgreSQL

## Test plan

- [ ] Start RAGFlow with `DB_TYPE=postgres` — startup should complete
without `function database() does not exist` error
- [ ] Start RAGFlow with `DB_TYPE=mysql` (default) — existing behaviour
unchanged, migration runs as before
- [ ] Fresh PostgreSQL install: verify `tenant_llm.id` column is created
as a serial primary key after migration
- [ ] Idempotency: running migration twice on PostgreSQL should be a
no-op (column already exists check passes)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: gambletan <gambletan@github>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 11:53:01 +08:00
Sank
e90f0e8910 fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416 (#13583)
### What problem does this PR solve?

fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 11:17:39 +08:00
Liu An
667f9c1c3a fix: remove duplicate "arabic" key in French translations (#13529)
### What problem does this PR solve?

Removed duplicate key that caused build warning during Vite build.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 10:57:19 +08:00
Idriss Sbaaoui
810692dfa3 fix: restore cross_languages default chat-model fallback for retrieval (#13471)
### What problem does this PR solve?

issue #13465 
POST /api/v1/retrieval failed with
{"code":100,...,"message":"Exception('Model Name is required')"} when
cross_languages was provided and no explicit llm_id was passed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 10:52:37 +08:00
Jimmy Ben Klieve
1a4dee4313 refactor(ui): unify top level pages structure, use standard language codes and time zones (#13573)
### What problem does this PR solve?

- Unify top level pages structure
- Standardize locale language codes (BCP 47) and time zones (IANA tz)


> **Note:** 
> Newly created user info brings non-standard default values `timezone:
"UTC+8\tAsia/Shanghai"` and `language: "English"`.


### Type of change

- [x] Refactoring
2026-03-12 21:01:09 +08:00
Ethan Clarke
35cd56f990 feat: add MiniMax-M2.5 and M2.5-highspeed models (#13557)
## Summary

Add MiniMax's latest M2.5 model family to the model registry and update
the default API base URL to the international endpoint for broader
accessibility.

## Changes

- **Add MiniMax-M2.5 models** to `conf/llm_factories.json`:
- `MiniMax-M2.5` — Peak Performance. Ultimate Value. Master the Complex.
  - `MiniMax-M2.5-highspeed` — Same performance, faster and more agile.
- Both support 204,800 token context window and tool calling (`is_tools:
true`).
- **Update default MiniMax API base URL** in `rag/llm/__init__.py`:
- From `https://api.minimaxi.com/v1` (domestic) to
`https://api.minimax.io/v1` (international).
- Chinese users can still override via the Base URL field in the UI
settings (as documented in existing i18n strings).

## Supported Models

| Model | Context Window | Tool Calling | Description |
|-------|---------------|-------------|-------------|
| `MiniMax-M2.5` | 204,800 tokens | Yes | Peak Performance. Ultimate
Value. |
| `MiniMax-M2.5-highspeed` | 204,800 tokens | Yes | Same performance,
faster and more agile. |

## API Documentation

- OpenAI Compatible API:
https://platform.minimax.io/docs/api-reference/text-openai-api

## Testing

- [x] JSON validation passes
- [x] Python syntax validation passes
- [x] Ruff lint passes
- [x] MiniMax-M2.5 API call verified (returns valid response)
- [x] MiniMax-M2.5-highspeed API call verified (returns valid response)

Co-authored-by: PR Bot <pr-bot@minimaxi.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-12 20:41:46 +08:00
Jin Hai
3fbf8bc3d4 Expose go version server and admin server port out of docker in CI (#13572)
### What problem does this PR solve?

- Print Go version log when start server
- Expose the server port in CI docker container

### Type of change

- [x] Other (please describe): For CI

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 20:39:57 +08:00
Jin Hai
d688b72dff Go: Add admin server status checking (#13571)
### What problem does this PR solve?

RAGFlow server isn't available when admin server isn't connected.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 20:02:50 +08:00
chanx
1df804a14a Feature (System Settings): Implemented system settings management functionality (#13556)
### What problem does this PR solve?

Feature (System Settings): Implemented system settings management
functionality

- Added a new SystemSettings model, including creation and update time
fields.

- Implemented SystemSettingsDAO, providing CRUD operations and
transaction support.

- Implemented management interfaces for variables, configurations, and
environment variables in the admin service.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-12 19:06:20 +08:00
guptas6est
7c79602c77 fix(web): upgrade lodash to 4.17.23 and dompurify to 3.3.2 to fix CVE-2026-0540 and CVE-2025-13465 (#13488)
### What problem does this PR solve?

This PR fixes two security vulnerabilities in web dependencies
identified by Trivy:

1. CVE-2025-13465 (lodash): Prototype pollution vulnerability in _.unset
and _.omit functions
2. CVE-2026-0540 (dompurify): Cross-site scripting (XSS) vulnerability

**Changes:**
- Upgraded lodash from 4.17.21 to 4.17.23
- Upgraded dompurify from 3.3.1 to 3.3.2
- Added npm override to force monaco-editor's transitive dependency on
dompurify to use 3.3.2 (monaco-editor still depends on vulnerable 3.2.7)

Both upgrades are backward-compatible patch versions. Build verified
successfully with no breaking changes.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 19:04:26 +08:00
Ray Zhang
375f62a6c3 docs(migration): add project name (-p) usage to backup & migration guide (#13565)
## Summary

- Add documentation for the `-p project_name` flag in the migration
script, covering all steps (stop, backup, restore, start)
- Add a note explaining how Docker volume name prefixes relate to the
Compose project name
- Update `docker-compose` to `docker compose` (Compose V2 syntax) for
consistency
- Fix `sh` to `bash` to match the script's shebang line

This is the documentation follow-up to #12187 which added `-p` project
name support to `docker/migration.sh`.

## Test plan

- [ ] Verify the documentation renders correctly on the docs site
- [ ] Confirm all example commands are accurate against the current
`migration.sh`
2026-03-12 19:01:25 +08:00
qinling0210
1be07a0a34 Fix "Result window is too large" during meta data search (#13521)
### What problem does this PR solve?

Fix
https://github.com/infiniflow/ragflow/issues/13210#issuecomment-3982878498

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 18:59:56 +08:00
Jin Hai
cebf5892ec Create go version storage component, but not used (#13561)
### What problem does this PR solve?

Implement: minio, s3, oss, azure_sas, azure_spn, gcs, opendal

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 18:58:25 +08:00
Jinghan Xu
f6b06fab72 Fix: allow document parsing status recovery after transient errors (#13341)
### What problem does this PR solve?

Fixes #13285

When an LLM returns a transient error (e.g. overloaded) during parsing,
the task progress is set to -1. Previously, the progress could never be
updated again, leaving the document permanently stuck in FAIL status
even after the task successfully recovered and completed.

Three coordinated changes address this:

1. task_service.update_progress: relax the progress update guard to
accept prog >= 1 even when current progress is -1, so a task that
recovers from a transient failure can report completion.

2. document_service.get_unfinished_docs: include documents that are
marked FAIL (progress == -1) but still have at least one non-failed task
(task.progress >= 0) in the polling set, so their status can be
re-synced once a task recovers. Documents where all tasks have
permanently failed are excluded to avoid unnecessary polling.

3. document_service.update_progress: explicitly set document status to
RUNNING when not all tasks have finished, instead of preserving whatever
stale status (potentially FAIL) the document previously had.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 18:02:12 +08:00
Yongteng Lei
13a34d7689 Feat: inject sys.date into canvas (#13567)
### What problem does this PR solve?

Inject sys.date into canvas.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-12 17:49:13 +08:00
Magicbook1108
eda7835d47 Fix: image pdf in ingestion pipeline (#13563)
### What problem does this PR solve?

Fix: image pdf in ingestion pipeline #13550


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 17:49:02 +08:00
NeedmeFordev
387b0b27c4 feat(parser): support external Docling server via DOCLING_SERVER_URL (#13527)
### What problem does this PR solve?

This PR adds support for parsing PDFs through an external Docling
server, so RAGFlow can connect to remote `docling serve` deployments
instead of relying only on local in-process Docling.

It addresses the feature request in
[#13426](https://github.com/infiniflow/ragflow/issues/13426) and aligns
with the external-server usage pattern already used by MinerU.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### What is changed?

- Add external Docling server support in `DoclingParser`:
  - Use `DOCLING_SERVER_URL` to enable remote parsing mode.
- Try `POST /v1/convert/source` first, and fallback to
`/v1alpha/convert/source`.
- Keep existing local Docling behavior when `DOCLING_SERVER_URL` is not
set.
- Wire Docling env settings into parser invocation paths:
  - `rag/app/naive.py`
  - `rag/flow/parser/parser.py`
- Add Docling env hints in constants and update docs:
  - `docs/guides/dataset/select_pdf_parser.md`
  - `docs/guides/agent/agent_component_reference/parser.md`
  - `docs/faq.mdx`

### Why this approach?

This keeps the change focused on one issue and one capability (external
Docling connectivity), without introducing unrelated provider-model
plumbing.

### Validation

- Static checks:
  - `python -m py_compile` on changed Python files
  - `python -m ruff check` on changed Python files
- Functional checks:
  - Remote v1 endpoint path works
  - v1alpha fallback works
  - Local Docling path remains available when server URL is unset

### Related links

- Feature request: [Support external Docling server (issue
#13426)](https://github.com/infiniflow/ragflow/issues/13426)
- Compare view for this branch:
[main...feat/docling-server](https://github.com/infiniflow/ragflow/compare/main...spider-yamet:ragflow:feat/docling-server?expand=1)

##### Fixes [#13426](https://github.com/infiniflow/ragflow/issues/13426)
2026-03-12 17:09:03 +08:00
Josh
a353c7bdd7 Fix: avoid empty doc filter in knowledge retrieval (#13484)
## Summary
Fix knowledge-base chat retrieval when no individual document IDs are
selected.

## Root Cause
`async_chat()` initialized `doc_ids` as an empty list when the request
did not explicitly select documents. That empty list was then forwarded
into retrieval as an active `doc_id` filter, effectively becoming
`doc_id IN []` and suppressing all chunk matches.

## Changes
- treat missing selected document IDs as `None` instead of `[]`
- keep explicit document filtering when IDs are actually provided
- add regression coverage for the shared chat retrieval path

## Validation
- `python3 -m py_compile api/db/services/dialog_service.py
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py`
- `.venv/bin/python -m pytest
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py`
- manually verified that chat completions again inject retrieved
knowledge into the prompt

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-12 16:03:30 +08:00
cambrianlee
227c852e67 Fix typo: documnet_keyword -> document_keyword in Chunk class (#13531)
### What problem does this PR solve?
The Chunk class had a typo in the attribute name 'documnet_keyword',
which caused the document_name field to remain empty when retrieving
chunks via the SDK. This fix corrects the spelling to
'document_keyword'.

Changes:
- Line 36: Changed self.documnet_keyword to self.document_keyword
- Line 52: Updated backward compatibility code to use
self.document_keyword


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 15:23:55 +08:00
Jin Hai
e78938c72c Update go admin server default port to 9383 (#13559)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 13:41:08 +08:00
Jimmy Ben Klieve
31a8184f63 refactor(ui): update ui for user settings, etc. (#13532)
### What problem does this PR solve?

Update UI styles:
- **User settings**
- Component styles: 
   - `ui/button.tsx`
   - `ui/checkbox.tsx`
   - `avatar-upload.tsx`
   - `file-uploader.tsx`
   - `icon-font.tsx`

### Type of change

- [x] Refactoring
2026-03-12 13:33:36 +08:00
chanx
0da9c4618d feat(cli): Enhance CLI functionality and add administrator mode support (#13539)
### What problem does this PR solve?

feat(cli): Enhance CLI functionality and add administrator mode support

- Modify `parseActivateUser` in `parser.go` to support 'on'/'off' states
- Add administrator mode switching and host port settings functionality
to `cli.go`
- Implement user management API calls in `client.go`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-12 13:33:13 +08:00
chanx
4bd5bb141d Fix: data-source-detail page style (#13507)
### What problem does this PR solve?

Fix: data-source-detail page style

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 13:32:39 +08:00
Jin Hai
5cbdfc5f17 Fix Gitee embedding model URL error (#13553)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 13:13:06 +08:00
Yongteng Lei
375a910bcf Fix: add deadlock retry (#13552)
### What problem does this PR solve?

 Add deadlock retry.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 12:39:01 +08:00
Jin Hai
90afce192c Add license and fingerprint API hook (#13548)
### What problem does this PR solve?

For EE

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 11:52:39 +08:00
Jin Hai
2fb1360d9d Add command line parameter and fix error message (#13526)
### What problem does this PR solve?

`./server_main -p 9380`

`./server_main -h`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 09:50:57 +08:00
Yongteng Lei
e1b632a7bb Feat: add delete all support for delete operations (#13530)
### What problem does this PR solve?

Add delete all support for delete operations.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2026-03-12 09:47:42 +08:00
qinling0210
d201a81db7 Add command history in ragflow cli (#13538)
### What problem does this PR solve?

In ragflow cli,  use Up/Down arrows to navigate command history,

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-11 19:14:18 +08:00
Liu An
852393c114 Test: Lower priority of chat assistant and chunk list API tests (#13540)
### What problem does this PR solve?

Mark test cases as lower priority (p3) for:
- Creating chat assistants
- Deleting chat assistants
- Listing chat assistants
- Listing chunks within datasets

### Type of change

- [x] Update testcases
2026-03-11 19:00:18 +08:00
foyou
f75dc6a452 Docs: Fix normalization of case and some code blocks (#13520)
### What problem does this PR solve?

Standardize term capitalization in `deploy_local_llm.mdx` and improve
code block formatting.

### Type of change

- [x] Documentation Update
2026-03-11 17:51:13 +08:00
Ethan T.
1cee8b1a7b fix: use context managers for file handles to prevent resource leaks (#13514)
## Summary
- Convert bare `open()` calls to `with` context managers or
`Path.read_text()`
- File handles leak if not properly closed, especially on exceptions
- Fixes in crypt.py, sequence2txt_model.py, term_weight.py,
deepdoc/vision/__init__.py

## Test plan
- [x] File operations work correctly with context managers
- [x] Resources properly cleaned up on exceptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:47:06 +08:00
Attili-sys
6afd13ff29 Feat/arabic language support (#13516)
### What problem does this PR solve?

This PR implements comprehensive Arabic language support for the RAGFlow
application. The changes include:
- Complete Arabic translation of all UI text elements in the web
interface
- RTL (right-to-left) layout support for Arabic content
- Localization updates for all supported languages (ar, bg, de, en, es,
fr, id, it, ja, pt-br, ru, vi, zh-traditional, zh)
- UI component adjustments to properly display Arabic text and support
RTL layout

The implementation ensures that Arabic-speaking users can fully interact
with the application in their native language with proper text rendering
and layout direction.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

<img width="2866" height="1617" alt="image"
src="https://github.com/user-attachments/assets/f2751b34-1b65-4867-b81d-a1068c17b9b7"
/>

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-11 15:06:07 +08:00
chanx
9ca2bac984 Feat: Implement user creation, deletion, and permission management functionality. (#13519)
### What problem does this PR solve?

Feat: Implement user creation, deletion, and permission management
functionality.

- Added the `ListByEmail` method to `user.go` to query users by email
address.

- Updated the user activation status handling logic in `handler.go`,
adding input validation.

- Added RSA password decryption functionality to `password.go`.

- Implemented complete user management functionality in `service.go`,
including user creation, deletion, password modification, activation
status, and permission management.

- Added input validation and error handling logic.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-11 14:04:00 +08:00
Jin Hai
2028e895fd Add license and time record DAO (#13522)
### What problem does this PR solve?

1. Change go server default port to 9382
2. Compatible with EE data model.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-11 14:02:24 +08:00
qinling0210
1815f5950b Call get_flatted_meta_by_kbs in dify retrieval (#13509)
### What problem does this PR solve?

Fix https://github.com/infiniflow/ragflow/issues/13388

Call get_flatted_meta_by_kbs in dify retrieval. Remove get_meta_by_kbs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-11 13:42:24 +08:00
Josh
2d2d3cdbcf Fix document metadata loading for paged listings (#13515)
## Summary
- scope normal document-list metadata lookups to the current page's
document IDs
- keep the `return_empty_metadata=True` path dataset-wide because it
needs full knowledge of docs that already have metadata
- add unit tests for both paged listing paths and the unchanged
empty-metadata behavior

## Why
`DocumentService.get_list()` and the normal `get_by_kb_id()` path were
calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`,
which loads metadata for the entire dataset on every page request.

That becomes especially problematic on large datasets. The metadata scan
path paginates through the full metadata index without an explicit sort,
while the ES helper only switches to `search_after` beyond `10000`
results when a sort is present. In practice this can lead to unnecessary
full-dataset metadata work, slower document-list loading, and unreliable
`meta_fields` in list responses for large KBs.

This change keeps the existing empty-metadata filter behavior intact,
but scopes normal list responses to metadata for the current page only.
2026-03-11 13:42:16 +08:00
Jimmy Ben Klieve
507ba4ea20 refactor(ui): update knowledge graph, chunk, metadata, agent log styles (#13518)
### What problem does this PR solve?

Update UI styles:
- **Dataset** > **Knowledge graph** tooltip
- **Dataset** > **Files** > **Manage metadata** modal
- **Dataset** > **Files** > **Modify Chunking Method** > **Auto
metadata** > **Manage generation settings** modal
- **Agent** > **Canvas (Ingestion pipeline)** > **Dataflow result**

### Type of change

- [x] Refactoring
2026-03-11 11:27:20 +08:00
Jin Hai
2133fd76a8 Add auth middleware (#13506)
### What problem does this PR solve?

Use auth middle-ware to check authorization.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-11 11:23:13 +08:00
eviaaaaa
d0ca388bec Refa: implement unified lazy image loading for Docx parsers (qa/manual) (#13329)
## Summary
This PR is the direct successor to the previous `docx` lazy-loading
implementation. It addresses the technical debt intentionally left out
in the last PR by fully migrating the `qa` and `manual` parsing
strategies to the new lazy-loading model.

Additionally, this PR comprehensively refactors the underlying `docx`
parsing pipeline to eliminate significant code redundancy and introduces
robust fallback mechanisms to handle completely corrupted image streams
safely.


## What's Changed

* **Centralized Abstraction (`docx_parser.py`)**: Moved the
`get_picture` extraction logic up to the `RAGFlowDocxParser` base class.
Previously, `naive`, `qa`, and `manual` parsers maintained separate,
redundant copies of this method. All downstream strategies now natively
gather raw blobs and return `LazyDocxImage` objects automatically.
* **Robust Corrupted Image Fallback (`docx_parser.py`)**: Handled edge
cases where `python-docx` encounters critically malformed magic headers.
Implemented an explicit `try-except` structure that safely intercepts
`UnrecognizedImageError` (and similar exceptions) and seamlessly falls
back to retrieving the raw binary via `getattr(related_part, "blob",
None)`, preventing parser crashes on damaged documents.

* **Legacy Code & Redundancy Purge**:
* Removed the duplicate `get_picture` methods from `naive.py`, `qa.py`,
and `manual.py`.
* Removed the standalone, immediate-decoding `concat_img` method in
`manual.py`. It has been completely replaced by the globally unified,
lazy-loading-compatible `rag.nlp.concat_img`.
* Cleaned up unused legacy imports (e.g., `PIL.Image`, docx exception
packages) across all updated strategy files.

## Scope
To keep this PR focused, I have restricted these changes strictly to the
unification of `docx` extraction logic and the lazy-load migration of
`qa` and `manual`.

## Validation & Testing
I've tested this to ensure no regressions and validated the fallback
logic:

* **Output Consistency**: Compared identical `.docx` inputs using `qa`
and `manual` strategies before and after this branch: chunk counts,
extracted text, table HTML, and attached images match perfectly.
* **Memory Footprint Drop**: Confirmed a noticeable drop in peak memory
usage when processing image-dense documents through the `qa` and
`manual` pipelines, bringing them up to parity with the `naive`
strategy's performance gains.

## Breaking Changes
* None.
2026-03-11 10:00:07 +08:00
balibabu
d36e3c97d1 Feat: Add a user_id field to the message and retrieval operators. (#13508)
### What problem does this PR solve?

Feat: Add a user_id field to the message and retrieval operators.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-10 22:18:27 +08:00
Yongteng Lei
3c80a0ae09 Fix: support vLLM's new reasoning field (#13493)
### What problem does this PR solve?

Support vLLM's new reasoning field

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 21:13:14 +08:00
yzy
07c9cf6cbe Fix: return structured JSON output for non-streaming agent API (#13389)
### What problem does this PR solve?

Previously, when an Agent component was configured with structured
output, the non-streaming /agents/{agent_id}/completions API never
returned the structured field in its response.

The root cause: the non-streaming code path only collected message
events to build full_content, then returned the workflow_finished
payload — which only contains the output of the last component in the
execution path (typically a Message component).
Any structured output set by upstream components (e.g., Agent or LLM)
was silently discarded.

This PR fixes the non-streaming handler to iterate node_finished events
and collect structured output from intermediate components.
If any component produced a non-empty structured value, it is included
in the final response under data.structured. The streaming path is
unaffected, as it already exposes node_finished events to the caller.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 19:22:04 +08:00
Heyang Wang
08f83ff331 Feat: Support get aggregated parsing status to dataset via the API (#13481)
### What problem does this PR solve?

Support getting aggregated parsing status to dataset via the API

Issue: #12810

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
2026-03-10 18:05:45 +08:00
Liu An
68a623154a Fix: bin directory cannot be copied to docker image introduced by #13444 (#13502)
### What problem does this PR solve?

bin directory cannot be copied to docker image introduced by

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 17:31:20 +08:00
chanx
f14b53c764 feat(admin): Implemented default administrator initialization and login functionality. (#13504)
### What problem does this PR solve?

feat(admin): Implemented default administrator initialization and login
functionality.

Added support for default administrator configuration, including super
user nickname, email, and password.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 17:30:21 +08:00
balibabu
81461b4505 Fix: The number of deleted session prompts is displayed incorrectly. #13499 (#13500)
### What problem does this PR solve?

Fix: The number of deleted session prompts is displayed incorrectly.
#13499
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 16:01:31 +08:00
Magicbook1108
675810e0cf Refact: optimize confluence performance (#13497)
### What problem does this PR solve?

Refact: optimize confluence performance #13494

### Type of change

- [x] Refactoring
2026-03-10 15:02:24 +08:00
Alexander Vostres
9ba43ae4ee Fix "Coordinate lower is less than upper" error with MinerU (#13483)
### What problem does this PR solve?

Fixes #6004 #7142 #11959

Unlike #9207 we actually normalize the coordinates here

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 15:02:01 +08:00
balibabu
aaf900cf16 Feat: Display release status in agent version history. (#13479)
### What problem does this PR solve?
Feat: Display release status in agent version history.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-03-10 14:25:27 +08:00
Idriss Sbaaoui
249b78561b Fix missmatch docnm_kwd in raptor chunks (#13451)
### What problem does this PR solve?

issue #13393 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 14:24:33 +08:00
qinling0210
185ab0d4ef Fix delete_document_metadata (#13496)
### What problem does this PR solve?

Avoid getting doc in function delete_document_metadata as the doc might
have been removed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:24 +08:00
Magicbook1108
7143954b48 Fix: chats_openai in none stream condition (#13495)
### What problem does this PR solve?

Fix: chats_openai in none stream condition #13453

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:17 +08:00
qinling0210
7c92f51133 Fix retrieval function when metadata_condtion is specified in retrieval API (#13473)
### What problem does this PR solve?

Fix https://github.com/infiniflow/ragflow/issues/13388

The following command returns empty when there is doc with the meta data
```
curl --request POST \
     --url http://localhost:9222/api/v1/retrieval \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \
     --data '{
          "question": "any question",
          "dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"],
           "metadata_condition": {
            "logic": "and",
            "conditions": [
              {
                "name": "character",
                "comparison_operator": "is",
                "value": "刘备"
              }
            ]
          }
     }'
```

When metadata_condtion is specified in the retrieval API, it is
converted to doc_ids and doc_ids is passed to retrieval function.
In retrieval funciton, when doc_ids is explicitly provided , we should
bypass threshold.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 11:57:32 +08:00
tunsuy
292a1a8566 fix: detect and fallback garbled PDF text to OCR (#13366) (#13404)
## Problem

When PDF fonts lack ToUnicode/CMap mappings, pdfplumber (pdfminer)
cannot map CIDs to correct Unicode characters, outputting PUA characters
(U+E000~U+F8FF) or `(cid:xxx)` placeholders. The original code fully
trusted pdfplumber text without any garbled detection, causing garbled
output in the final parsed result.

Relates to #13366

## Solution

### 1. Garbled text detection functions
- `_is_garbled_char(ch)`: Detects PUA characters (BMP/Plane 15/16),
replacement character U+FFFD, control characters, and
unassigned/surrogate codepoints
- `_is_garbled_text(text, threshold)`: Calculates garbled ratio and
detects `(cid:xxx)` patterns

### 2. Box-level fallback (in `__ocr()`)
When a text box has ≥50% garbled characters, discard pdfplumber text and
fallback to OCR recognition.

### 3. Page-level detection (in `__images__()`)
Sample characters from each page; if garbled rate ≥30%, clear all
pdfplumber characters for that page, forcing full OCR.

### 4. Layout recognizer CID filtering
Filter out `(cid:xxx)` patterns in `layout_recognizer.py` text
processing to prevent them from polluting layout analysis.

## Testing
- 29 unit tests covering: normal CJK/English text, PUA characters, CID
patterns, mixed text, boundary thresholds, edge cases
- All 85 existing project unit tests pass without regression
2026-03-10 11:20:31 +08:00
Jin Hai
7f6a9e8ee9 Update ext field type of heartbeat message (#13490)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:49:39 +08:00
chanx
02108772d8 refactor: Moves the LLM factory initialization logic to the dao package. (#13476)
### What problem does this PR solve?

refactor: Moves the LLM factory initialization logic to the `dao`
package.

Removes the `init_data` package and integrates the LLM factory
initialization functionality into the `dao` package.
Adds a `utility` package to provide general utility functions.
Updates `server_main.go` to use the new initialization path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:35:55 +08:00
atian8179
88a40b95a2 fix: include missing modules in ragflow-cli PyPI package (#13457)
## Problem

The `ragflow-cli` PyPI package (v0.24.0) is missing `http_client.py`,
`ragflow_client.py`, and `user.py`, causing import errors when installed
from PyPI.

## Root Cause

`pyproject.toml` only lists `ragflow_cli` and `parser` in
`[tool.setuptools] py-modules`.

## Fix

Add the three missing modules to `py-modules`.

Fixes #13456

Co-authored-by: atian8179 <atian8179@users.noreply.github.com>
2026-03-10 10:02:21 +08:00
Jin Hai
4fe706876c Service list and minio status (#13480)
### What problem does this PR solve?

1. Resolve standard user can access admin service
2. Get RAGFlow service status
3. Fix minio status fetching

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 09:56:43 +08:00
writinwaters
4f507c0058 Docs: Updated Switch chunk availability (#13482)
### What problem does this PR solve?

A quick editorial pass.

### Type of change

- [x] Documentation Update
2026-03-09 21:14:45 +08:00
Yongteng Lei
7484298c82 Refa: convert download_img to async (#13477)
### What problem does this PR solve?

Convert download_img to async.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-03-09 19:00:17 +08:00
Jin Hai
52bcd98d29 Add scheduled tasks (#13470)
### What problem does this PR solve?

1. RAGFlow server will send heartbeat periodically.
2. This PR will including:
- Scheduled task
- API server message sending
- Admin server API to receive the message.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 17:48:29 +08:00
Jin Hai
c732a1c8e0 Refactor the go_binding to binding (#13469)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 15:52:31 +08:00
chanx
25ace613b0 feat: Added LLM factory initialization functionality and knowledge base related API interfaces (#13472)
### What problem does this PR solve?

feat: Added LLM factory initialization functionality and knowledge base
related API interfaces

refactor(dao): Refactored the TenantLLMDAO query method
feat(handler): Implemented knowledge base related API endpoints
feat(service): Added LLM API key setting functionality
feat(model): Extended the knowledge base model definition
feat(config): Added default user LLM configuration

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 15:52:14 +08:00
Stephen Hu
d0465ba909 refactor: improve paddle ocr logic (#13467)
### What problem does this PR solve?

improve paddle ocr logic

### Type of change
- [x] Refactoring
2026-03-09 14:16:57 +08:00
天海蒼灆
3ce236c4e3 Feat: add switch_chunks endpoint to manage chunk availability (#13435)
### What problem does this commit solve?

This commit introduces a new API endpoint
`/datasets/<dataset_id>/documents/<document_id>/chunks/switch` that
allows users to switch the availability status of specified chunks in a
document as same as chunk_app.py

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 12:36:45 +08:00
guptas6est
32d31284cc Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454)
### What problem does this PR solve?

This PR addresses security vulnerabilities in PDF processing
dependencies identified by Trivy security scan:

1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient
decoding of ASCIIHexDecode streams
2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop
when parsing malformed comments

Since pypdf2 is deprecated with no available fixes, this PR migrates all
pypdf2 usage to the actively maintained pypdf library (version 6.7.5),
which resolves
both vulnerabilities.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-09 12:06:00 +08:00
JiangNan
2634cfc06f Fix: undefined variable and wrong method name in agent components (#13462)
## Summary

This PR fixes two runtime bugs in agent components:

**Bug 1: `agent/component/invoke.py` — `NameError` in POST +
`clean_html` path**

The POST method's `clean_html` branch uses the variable `sections`
without ever defining it. Both the GET and PUT branches correctly call
`sections = HtmlParser()(None, response.content)` before referencing
`sections`, but this line was missing from the POST branch (copy-paste
omission). This causes a `NameError` whenever a user configures an
Invoke component with `method="post"` and `clean_html=True`.

**Bug 2: `agent/component/data_operations.py` — `AttributeError` in
`_recursive_eval`**

The `_recursive_eval` method recursively calls `self.recursive_eval()`
(without the leading underscore) instead of `self._recursive_eval()`.
Since the method is defined as `_recursive_eval`, this causes an
`AttributeError` at runtime when the `literal_eval` operation processes
nested dicts or lists.

## Test plan

- [ ] Configure an Invoke node with `method=post` and `clean_html=True`,
verify HTML is parsed correctly without `NameError`
- [ ] Configure a DataOperations node with `operations=literal_eval` on
nested data, verify no `AttributeError`

---------

Signed-off-by: JiangNan <1394485448@qq.com>
2026-03-09 11:09:47 +08:00
Jin Hai
610c1b507d Add more API of admin server of go (#13403)
### What problem does this PR solve?

Add APIs to admin server.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 10:44:53 +08:00
Eden
ab6ca75245 fix(agent): ensure database connections are properly closed in ExeSQL tool (#13427)
## Summary

Fix a database connection and cursor resource leak in the ExeSQL agent
tool.

When SQL execution raises an exception (for example syntax error or
missing table),
the existing code path skips `cursor.close()` and `db.close()`, causing
database
connections to accumulate over time.

This can eventually lead to connection exhaustion in long-running agent
workflows.

## Root Cause

The cleanup logic for database cursors and connections is placed after
the SQL
execution loop without `try/finally` protection. If an exception occurs
during
`cursor.execute()`, `fetchmany()`, or result processing, the cleanup
code is not
reached and the connection remains open.

The same issue also exists in the IBM DB2 execution path where
`ibm_db.close(conn)`
may be skipped when exceptions occur.

## Fix

- Wrap SQL execution logic in `try/finally` blocks to guarantee resource
cleanup.
- Ensure `cursor.close()` and `db.close()` are always executed.
- Add explicit `db.close()` when `db.cursor()` creation fails.
- Remove redundant close calls in early-return branches since `finally`
now handles cleanup.

## Impact

- No change to normal execution behavior.
- Ensures database resources are always released when errors occur.
- Prevents connection leaks in long-running workflows.
- Only affects `agent/tools/exesql.py`.

## Testing

Manual test scenarios:

1. Valid SQL execution
2. SQL syntax error
3. Query against a non-existing table
4. Execution cancellation during query

In all scenarios the database cursor and connection are properly closed.

Code quality checks:

- `ruff check` passed
- No new warnings introduced
2026-03-09 10:36:02 +08:00
Liu An
89e495e1bc Chore: update release workflow configuration (#13466)
### What problem does this PR solve?

update release workflow configuration

### Type of change

- [x] Update CI
2026-03-09 10:32:51 +08:00
Heyang Wang
c217b8f3d8 Feat: add DingTalk AI Table connector and integration for data synch… (#13413)
### What problem does this PR solve?

Add DingTalk AI Table connector and integration for data synchronization

Issue #13400

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: wangheyang <wangheyang@corp.netease.com>
2026-03-06 21:13:23 +08:00
Jimmy Ben Klieve
094eae3cf5 refactor(ui): adjust dataset page styles (#13452)
### What problem does this PR solve?

- Adjust UI styles in **Dataset** pages.
- Adjust several shared components styles
- Modify files and directory structure in `src/layouts`

### Type of change

- [x] Refactoring
2026-03-06 21:13:14 +08:00
Liu An
7166a7e50e Test: adjust test priority markers for API tests (#13450)
### What problem does this PR solve?

Changed test priority markers from p1/p2 to p3 in three test files:
- test_table_parser_dataset_chat.py: Adjusted priority for table parser
dataset chat test
- test_delete_chunks.py: Updated priority for chunk deletion test with
invalid IDs
- test_retrieval_chunks.py: Modified priority for chunks retrieval
pagination test

These changes demote the priority of specific test cases to p3,
indicating they are lower priority tests that can run later in the test
suite execution.

### Type of change

- [x] Test update
2026-03-06 20:17:39 +08:00
chanx
ae4645e01b Fix: Add folder upload #9743 (#13448)
### What problem does this PR solve?

Fix: Add folder upload  #9743

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 20:17:29 +08:00
balibabu
82a616589b Feat: Add PublishConfirmDialog (#13447)
### What problem does this PR solve?

Feat: Add PublishConfirmDialog

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 20:17:21 +08:00
Achieve3318
45cf24cd2f feat(memory): implement get_highlight for OceanBase memory (#13449)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 20:17:11 +08:00
Jin Hai
01a100bb29 Fix data models (#13444)
### What problem does this PR solve?

Since database model is updated in python version, go server also need
to update

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-06 20:05:10 +08:00
OliverW
3ed91345aa fix(auth): return HTTP 401 for token-auth failures (#13420)
Follow-up to #12488 #13386

### What problem does this PR solve?

Previously, token authentication failures returned HTTP 200 with an
error code in the response body.

This PR updates `token_required` to raise `Unauthorized` and relies on
the global error handler to return a structured JSON response with HTTP
401 status.

The response body structure (`code`, `message`, `data`) remains
unchanged to preserve compatibility with the official SDK.

Frontend logic has been updated to handle HTTP 401 responses in addition
to checking `data.code`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 18:18:14 +08:00
Yongteng Lei
51be1f1442 Refa: empty ids means no-op operation (#13439)
### What problem does this PR solve?

Empty ids means no-op operation.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2026-03-06 18:16:42 +08:00
Zhichang Yu
7781c51a21 Revert aliyun registry to registry.cn-hangzhou.aliyuncs.com (#13445)
## Summary
- Revert aliyun registry from
`infiniflow-registry.cn-shanghai.cr.aliyuncs.com` back to
`registry.cn-hangzhou.aliyuncs.com`

## Test plan
- [ ] Verify the docker/.env file contains the correct registry URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:03:35 +08:00
Magicbook1108
826af383b4 Fix: paddle ocr missing outlines (#13441)
### What problem does this PR solve?

Fix: paddle ocr missing outlines #13422

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 17:19:51 +08:00
Jin Hai
2504c3adde Fix docker file (#13438)
### What problem does this PR solve?

To copy infinity/resource into docker images

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-06 16:56:12 +08:00
chanx
81fd1811b8 Feat:Using Go to implement user registration logic (#13431)
### What problem does this PR solve?

Feat:Using Go to implement user registration logic

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 16:42:49 +08:00
Achieve3318
37eb533fea Feat(memory): implement get_aggregation for OceanBase memory (#13428)
### What problem does this PR solve?

- Add aggregation_utils.aggregate_by_field for pure aggregation logic
- Wire OBConnection.get_aggregation to use it (unwrap tuple, pass
messages)
- Add unit tests for aggregate_by_field (no DB/heavy deps)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 12:51:22 +08:00
BitToby
383986dc5f fix: re-chunk documents when data source content is updated (#12918)
Closes: #12889 

### What problem does this PR solve?

When syncing external data sources (e.g., Jira, Confluence, Google
Drive), updated documents were not being re-chunked. The raw content was
correctly updated in blob storage, but the vector database retained
stale chunks, causing search results to return outdated information.

**Root cause:** The task digest used for chunk reuse optimization was
calculated only from parser configuration fields (`parser_id`,
`parser_config`, `kb_id`, etc.), without any content-dependent fields.
When a document's content changed but the parser configuration remained
the same, the system incorrectly reused old chunks instead of
regenerating new ones.

**Example scenario:**
1. User syncs a Jira issue: "Meeting scheduled for Monday"
2. User updates the Jira issue to: "Meeting rescheduled to Friday"
3. User triggers sync again
4. Raw content panel shows updated text ✓
5. Chunk panel still shows old text "Monday" ✗

**Solution:**
1. Include `update_time` and `size` in the chunking config, so the task
digest changes when document content is updated
2. Track updated documents separately in `upload_document()` and return
them for processing
3. Process updated documents through the re-parsing pipeline to
regenerate chunks


[1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 12:48:47 +08:00
Lynn
0214257886 Fix: init func (#13430)
### What problem does this PR solve?

Fix update_cnt add error in init_data.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 11:42:31 +08:00
balibabu
6849d35bf5 Feat: Optimize the style of the chat page. (#13429)
### What problem does this PR solve?

Feat: Optimize the style of the chat page.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 11:42:25 +08:00
Jonah Hartmann
6023eb27ac feat: add Ragcon provider (#13425)
### What problem does this PR solve?

This PR aims to extend the list of possible providers. Adds new Provider
"RAGcon" within the Ollama Modal. It provides all model types except OCR
via Openai-compatible endpoints.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>
2026-03-06 09:37:27 +08:00
guptas6est
c35b210c3a fix(security): upgrade requests to 2.32.5 in agent/sandbox to fix CVE-2024-47081 (#13424)
### What problem does this PR solve?

This PR remediates CVE-2024-47081 (MEDIUM severity) in the agent/sandbox
component by upgrading the requests library from version 2.32.3 to
2.32.5. The vulnerability allows .netrc credentials to leak via
malicious URLs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 09:29:18 +08:00
guptas6est
aa57bcf92a fix: upgrade urllib3 to 2.6.3 to resolve CVE-2025-66418, CVE-2025-66471, CVE-2026-21441 (#13423)
### What problem does this PR solve?

This PR remediates three HIGH severity vulnerabilities in urllib3
affecting the admin client and Python SDK:
- **CVE-2025-66418**: Unbounded decompression chain leads to resource
exhaustion
- **CVE-2025-66471**: Streaming API improperly handles highly compressed
data
- **CVE-2026-21441**: Decompression-bomb safeguard bypass when following
HTTP redirects
Trivy security scan identified urllib3 v2.5.0 as vulnerable in both
`admin/client/uv.lock` and `sdk/python/uv.lock`. This PR updates urllib3
to v2.6.3 to eliminate these security risks.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 09:29:10 +08:00
Jimmy Ben Klieve
ef4cbe72a3 refactor(ui): adjust global navigation bar style (#13419)
### What problem does this PR solve?

Renovate global navigation bar, align styles to the design.
(May causes minor layout issues in sub-pages, will check and fix soon)

### Type of change

- [x] Refactoring
2026-03-05 20:47:29 +08:00
leonardlin
9e0e128ce5 Add checksum/values annotation to ragflow.yaml (#13409)
Add checksum annotation for values in ragflow.yaml

### What problem does this PR solve?

This PR is about this ticket: #13408

Ragflow helm charts do not include the Values.yaml in the list of
watched changes.
If you update the Values.yaml for an existing deployment, helm will not
detect it and not update the deployment.

This PR fixes that.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 20:27:38 +08:00
writinwaters
963e31e9b5 Refact: Updated the doc structure. (#13414)
### What problem does this PR solve?

Updated the doc structure.

### Type of change


- [x] Documentation Update
2026-03-05 19:04:56 +08:00
Idriss Sbaaoui
d90d6026af Playwright : new chat multi model test (#13402)
### What problem does this PR solve?

new test for chat multiple model and other chat parameters under
playwright

### Type of change

- [x] Other (please describe): new test/ data-testid
2026-03-05 18:51:57 +08:00
Yongteng Lei
d9785ea2ce Fix: Alibaba cloud OSS config issue (#13406)
### What problem does this PR solve?

 Alibaba Could OSS config issue #13390.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 18:13:45 +08:00
chanx
8b534c895e Fix: UI Placeholder and Hint Optimization (#13416)
### What problem does this PR solve?

Fix: UI Placeholder and Hint Optimization

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 18:13:19 +08:00
chanx
35fc5edc93 feat: Adds the tenant model ID field to the interface definition. (#13274)
### What problem does this PR solve?

feat: Adds the tenant model ID field to the interface definition

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-05 17:27:34 +08:00
Lynn
62cb292635 Feat/tenant model (#13072)
### What problem does this PR solve?

Add id for table tenant_llm and apply in LLMBundle.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-05 17:27:17 +08:00
Magicbook1108
47540a4147 Feat: published agent version control (#13410)
### What problem does this PR solve?

Feat: published agent version control

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-05 17:26:39 +08:00
guptas6est
8c9b080499 fix: update axios to 1.13.5+ to remediate CVE-2026-25639 DoS vulnerability (#13380)
### What problem does this PR solve?

This PR remediates CVE-2026-25639, a HIGH severity Denial of Service
vulnerability in axios caused by __proto__ pollution in the mergeConfig
function. The vulnerability affects both the web frontend and the
sandbox nodejs environment.

Trivy security scan identified axios versions below 1.13.5 as
vulnerable. This PR updates axios to secure versions (1.13.6 in web,
1.13.5 in sandbox) to eliminate the security risk.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 17:26:04 +08:00
Yongteng Lei
f13a1fb007 Refa: improve model verification ux (#13392)
### What problem does this PR solve?

Improve model verification UX. #13395 

### Type of change

- [x] Refactoring

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-03-05 17:23:47 +08:00
Liu An
3124fa955e chore: add bin and internal dirs to .gitignore for Go server build output (#13407)
### What problem does this PR solve?

add bin and internal dirs to .gitignore for Go server build output
2026-03-05 15:52:01 +08:00
tunsuy
e1f1184b01 test: add unit tests for graphrag/utils.py (87 test cases) (#13328)
Add comprehensive unit tests for `graphrag/utils.py`, covering 15
functions/classes with 87 test cases.

Tested functions:
- clean_str, dict_has_keys_with_types, perform_variable_replacements
- get_from_to, compute_args_hash, is_float_regex
- GraphChange dataclass
- handle_single_entity_extraction, handle_single_relationship_extraction
- graph_merge, tidy_graph
- split_string_by_multi_markers, pack_user_ass_to_openai_messages
- is_continuous_subsequence, merge_tuples, flat_uniq_list

All 327 existing + new tests pass with no regressions.
2026-03-05 15:30:43 +08:00
Jin Hai
3e3b665b89 RAGFlow admin server go version (#13394)
### What problem does this PR solve?

1. init go admin server
2. refactor api server router
3. add benchmark CI to 450s time limit
4. remove docker builder container after building

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-05 15:18:40 +08:00
Liu An
6f5bd4d2e9 feat: add bin and internal dirs to .gitignore for Go server build output (#13391)
### What problem does this PR solve?

add bin and internal dirs to .gitignore for Go server build output
2026-03-05 14:26:40 +08:00
天海蒼灆
118f737b3a Feat:Enhance chunk management by adding support for 'available', 'tag_kwd' and 'tag_feas' (#13383)
### What problem does this PR solve?

Enhance chunk management by adding support for 'available', 'tag_kwd'
and 'tag_feas' fields in list, add, and update chunk functions just like
chunk_app.py.This improves data handling and flexibility in chunk
processing.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-05 13:45:39 +08:00
orbcom-pedroferreira
61209ff3bf Feat: File uploads for future conversations on SDK API (#13378)
### What problem does this PR solve?

This PR aims to:

1. Enable file uploads for the public API, similarly to what
/document/upload_info accomplishes for the frontend;
2. Enable files sent to the /chat/:chat_id/completions endpoint to be
used within the conversation.
We classify the first item as a new future, while classifying the second
one as a bug fix.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

*The work related to this PR was co-authored by*

[Bruno Ferreira](https://github.com/brunopferreira): Custom Solutions
Manager @ [Orbcom](https://orbcom.pt/)
[Pedro Ferreira](https://github.com/sirj0k3r): Lead Software Developer @
[Orbcom](https://orbcom.pt/)
[Pedro Cardoso](https://github.com/pedromiguel4560): Associate Software
Developer @ [Orbcom](https://orbcom.pt/)

*This PR replaces #13248*

---------

Co-authored-by: Pedro Cardoso <pedrocardoso@orbcom.pt>
Co-authored-by: Pedro Ferreira <pedroferreira@orbcom.pt>
2026-03-04 22:26:58 +08:00
tunsuy
020068dd16 Fix: preserve field boundaries in chunked documents from MySQL… (#13369)
### What problem does this PR solve?

When multiple columns are used as content columns in RDBMS connector,
the generated document text gets chunked by TxtParser which strips
newline delimiters during merge. This causes field names and values from
different columns to be concatenated without any separator, making the
content unreadable.

Changes:
- txt_parser.py: restore newline separator when merging adjacent text
segments within a chunk, so that split sections are not directly
concatenated
- rdbms_connector.py: use double newline between fields and place field
value on a new line after the field name bracket, giving TxtParser
clearer boundaries to work with

Closes #13001

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: tunsuytang <tunsuytang@tencent.com>
2026-03-04 21:42:02 +08:00
writinwaters
9deb3a6249 Refact: Fine tweaks to the doc structure. (#13379)
### What problem does this PR solve?

Fine tweaks to the doc structure.

### Type of change


- [x] Documentation Update
2026-03-04 21:30:28 +08:00
balibabu
be231faec0 Feat: Write the row and column numbers into the element's data attribute for easy code location. (#13368)
### What problem does this PR solve?

Feat: Write the row and column numbers into the element's data attribute
for easy code location.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Liu An <asiro@qq.com>
2026-03-04 20:50:58 +08:00
Idriss Sbaaoui
b3a7332c08 playwright : add data-testids for new test (#13364)
### What problem does this PR solve?

add data-testids for new test

### Type of change

- [x] Other (please describe): add data-testids for new test
2026-03-04 19:28:36 +08:00
Yao Wei
c99b53064d fix: remove company info from resume_summary to prevent over-retrieval (#13358)
### What problem does this PR solve?

Problem: When searching for a specific company name like(Daofeng
Technology), the search would incorrectly return unrelated resumes
containing generic terms like (Technology) in their company names

Root Cause: The `corporation_name_tks` field was included in the
identity fields that are redundantly written to every chunk. This caused
common words like "科技" to match across all chunks, leading to
over-retrieval of irrelevant resumes.

Solution: Remove `corporation_name_tks` from the `_IDENTITY_FIELDS`
list. Company information is still preserved in the "Work Overview"
chunk where it belongs, allowing proper company-based searches while
preventing false positives from generic terms.

---------

Co-authored-by: Aron.Yao <yaowei@192.168.1.68>
Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-04 19:24:49 +08:00
Jin Hai
70e9743ef1 RAGFlow go API server (#13240)
# RAGFlow Go Implementation Plan 🚀

This repository tracks the progress of porting RAGFlow to Go. We'll
implement core features and provide performance comparisons between
Python and Go versions.

## Implementation Checklist

- [x] User Management APIs
- [x] Dataset Management Operations
- [x] Retrieval Test
- [x] Chat Management Operations
- [x] Infinity Go SDK

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-03-04 19:17:16 +08:00
Idriss Sbaaoui
2508c46c8f Playwright : add new test for configuration tab in datasets (#13365)
### What problem does this PR solve?

this pr adds new tests, for the full configuration tab in datasests

### Type of change

- [x] Other (please describe): new tests
2026-03-04 19:10:06 +08:00
Idriss Sbaaoui
88e8509159 benchmark fail in ci (#13377)
### What problem does this PR solve?
ci fails in elastic search because of benchmark

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-04 19:01:41 +08:00
Stephen Hu
c7d17c84b2 Refa:improve excel parser logic (#13372)
### What problem does this PR solve?

improve excel parser logic

### Type of change
- [x] Refactoring
2026-03-04 18:00:17 +08:00
Jin Hai
6bb00e2762 Update graspologic to gitee (#13362)
### What problem does this PR solve?

Accelerate python module downloading

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-04 17:48:47 +08:00
Good0987
8a7272f423 Test: add scenario for embedding_model update when chunk_count > 0 (#13351)
### What problem does this PR solve?

Guard embedding_model change when dataset has existing chunks. API must
return code 102 with message 'When chunk_num (N) > 0, embedding_model
must remain <current_model>' to prevent silent embedding drift.

### Type of change

- [x] Add Testcases

Co-authored-by: Liu An <asiro@qq.com>
2026-03-04 17:41:35 +08:00
Jin Hai
f47c47df99 Disable benchmark (#13370)
### What problem does this PR solve?

benchmark always failed in new CI machine. please enable it after the
issue is fixed.

### Type of change

- [x] Other (please describe): disable benchmark

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-04 16:36:42 +08:00
yiminghub2024
5eb602166c Enhance local model deployment documentation support gpustack guide (#13339)
### Type of change

- [X] Documentation Update:Enhance local model deployment documentation
support gpustack guide
2026-03-04 13:54:20 +08:00
少卿
54ae5b4a27 Fix Dify external retrieval by providing metadata.document_id (#13337)
### What problem does this PR solve?

## Summary                                                           
  Dify’s external retrieval expects `records[].metadata.document_id` to
  be a non-empty string.                                               
  RAGFlow currently only sets `metadata.doc_id`, which causes Dify     
  validation to fail.                                                  
                                                                       
  This PR adds `metadata.document_id` (mapped from `doc_id`) in the    
  Dify-compatible retrieval response.                                  
                                                                       
  ## Changes                                                           
- Add `meta["document_id"] = c["doc_id"]` in
`api/apps/sdk/dify_retrieval.py`
                                                                       
  ## Testing                                                           
  - Not run (logic-only change).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-04 13:23:37 +08:00
Jin Hai
b9ad014f63 Supports login cross multiple RAGFlow servers (#13322)
### What problem does this PR solve?

1. Use redis to store the secret key.
2. During startup API server will read the secret from redis. If no such
secret key, generate one and store it into redis, atomically.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-04 13:07:45 +08:00
balibabu
5f8966608d Fix: The dropdown menu for large models does not automatically focus on the search box. #13313 (#13360)
### What problem does this PR solve?

Fix: The dropdown menu for large models does not automatically focus on
the search box. #13313

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-04 12:48:35 +08:00
Magicbook1108
93d621a666 Fix: Correct PDF chunking parameter name in naive (#13357)
### What problem does this PR solve?

Fix: Correct PDF chunking parameter name in naive #13325

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-04 11:51:10 +08:00
balibabu
733a64f0d6 Fix: Change the background color of the message notification button. (#13344)
### What problem does this PR solve?

Fix: Change the background color of the message notification button.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-04 11:10:05 +08:00
statxc
839b603768 feat: Add PDF parser selection to Agent Begin and Await Response comp… (#13325)
### Issue: #12756

### What problem does this PR solve?

When users upload files through Agent's Begin or Await Response
components, the parsing is hardcoded to "Plain Text", ignoring all other
available parsers (DeepDOC, TCADP, Docling, MinerU, PaddleOCR). This PR
adds a PDF parser dropdown to these components so users can select the
appropriate parser for their file inputs.


### Changes

**Backend**
- `agent/component/fillup.py` - Added `layout_recognize` param to
`UserFillUpParam`, forwarded to `FileService.get_files()`
- `agent/component/begin.py` - Same forwarding in `Begin._invoke()`
- `agent/canvas.py` - Extract Begin's `layout_recognize` for `sys.files`
parsing, added param to `get_files_async()` / `get_files()`
- `api/db/services/file_service.py` - Added `layout_recognize` param to
`parse()` and `get_files()`, replacing hardcoded `"Plain Text"`
- `rag/app/naive.py` - Added `"plain text"` and `"tcadp parser"` aliases
to PARSERS dict to match dropdown values after `.lower()`

**Frontend**
- `web/src/pages/agent/form/begin-form/index.tsx` - Show
`LayoutRecognizeFormField` dropdown when file inputs exist
- `web/src/pages/agent/form/begin-form/schema.ts` - Added
`layout_recognize` to Zod schema
- `web/src/pages/agent/form/user-fill-up-form/index.tsx` - Same dropdown
for Await Response component


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-04 11:09:33 +08:00
Liu An
7715bad04e refactor: reorganize unit test files into appropriate directories (#13343)
### What problem does this PR solve?

Move test files from utils/ to their corresponding functional
directories:
- api/db/ for database related tests
- api/utils/ for API utility tests
- rag/utils/ for RAG utility tests

### Type of change

- [x] Refactoring
2026-03-04 11:02:56 +08:00
Copilot
33ba955b02 Translate Chinese text to English in agent/sandbox (#13356)
Chinese text remained in generated code comments, log messages, field
descriptions, and documentation files under `agent/sandbox/`.

### Changes

- **`tests/MIGRATION_GUIDE.md`** — Full EN translation (migration guide
from OpenSandbox → Code Interpreter)
- **`tests/QUICKSTART.md`** — Full EN translation (quick test guide for
Aliyun sandbox provider)
- **`providers/aliyun_codeinterpreter.py`** — Removed `(主账号ID)` from
docstring, error log, and config field description
- **`sandbox_spec.md`** — Removed `(主账号ID)` from `account_id` field
description
- **`tests/test_aliyun_codeinterpreter_integration.py`** — Removed
`(主账号ID)` from inline comment

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuzhichang <153784+yuzhichang@users.noreply.github.com>
2026-03-04 10:49:38 +08:00
wyou
0a4c0c38c7 Feat: expose admin service in helm configuration (#13345)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
For helm deployment, there is also requirement to enable the Admin
Service for administrative operations.
So expose the ability of enable/disable this function by helm
configuration.
        When it's enabled (by default),
<img width="486" height="190" alt="image"
src="https://github.com/user-attachments/assets/4db0dc3d-bd94-4ad9-bb5d-a240aac5e1c5"
/>
        Admin access and operations would be feasible like below,
<img width="2530" height="876" alt="image"
src="https://github.com/user-attachments/assets/3e948e1b-7522-4f8d-8dc0-c80a22242022"
/>
Something like 'user management' is very much important for Ragflow
User/Owner to control their clients.
2026-03-04 10:26:10 +08:00
Idriss Sbaaoui
2f4ca38adf Fix : make playwright tests idempotent (#13332)
### What problem does this PR solve?

Playwright tests previously depended on cross-file execution order
(`auth -> provider -> dataset -> chat`).
This change makes setup explicit and idempotent via fixtures so tests
can run independently.

- Added/standardized prerequisite fixtures in
`test/playwright/conftest.py`:
- `ensure_auth_context`, `ensure_model_provider_configured`,
`ensure_dataset_ready`, `ensure_chat_ready`
- Made provisioning reusable/idempotent with `RUN_ID`-based resource
naming.
- Synced auth envs (`E2E_ADMIN_EMAIL`, `E2E_ADMIN_PASSWORD`) into seeded
creds.
- Fixed provider cache freshness (`auth_header`/`page` refresh on cache
hit).

Also included minimal stability fixes:
- dataset create stale-element click handling,
- search wait logic for results/empty-state,
- agent create-menu handling,
- agent run-step retry when run UI doesn’t open first click.

### Type of change

- [x] Test fix
- [x] Refactoring

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-03-04 10:07:14 +08:00
writinwaters
1c87f97dde Docs: Minor document structure tweak. (#13346)
### What problem does this PR solve?

Refactored the document architecture.

### Type of change

- [x] Documentation Update
2026-03-03 20:09:34 +08:00
writinwaters
f7c808383f Docs: Refactored documentation (#13340)
### What problem does this PR solve?

Refactored documentation. 

### Type of change

- [x] Documentation Update
2026-03-03 17:48:48 +08:00
Yao Wei
48755a3352 Fix: (resume) Cross-verify project experience and work experience, and remove duplicate text (#13323)
Cross-verify project experience and work experience, and remove
duplicate text

---------

Co-authored-by: Aron.Yao <yaowei@192.168.1.68>
Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>
2026-03-03 14:53:46 +08:00
balibabu
eca60208e3 Fix: The document generation node cannot generate the output content of a large model to a file. #13321 (#13326)
### What problem does this PR solve?

Fix: The document generation node cannot generate the output content of
a large model to a file. #13321
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-03 11:05:24 +08:00
Magicbook1108
4f09b3e2a4 Fix: pipeline canvas category (#13319)
### What problem does this PR solve?

Fix: pipeline canvas category

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 20:27:36 +08:00
Yongteng Lei
707de2461a Fix: use async_chat with sync wrapper in resume parser (#13320)
### What problem does this PR solve?

Fix AttributeError when calling llm.chat() in resume parser. LLMBundle
only has async_chat method, not chat method. Use `_run_coroutine_sync`
wrapper to call async_chat synchronously.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 19:51:06 +08:00
chanx
ef264b52c7 Fix: Fixed some errors in the console (#13317)
### What problem does this PR solve?

Fix: Fixed some errors in the console
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 19:19:15 +08:00
Yingfeng
a806f7b707 Potential fix for code scanning alert no. 71: Incomplete URL substring sanitization (#13318)
Potential fix for
[https://github.com/infiniflow/ragflow/security/code-scanning/71](https://github.com/infiniflow/ragflow/security/code-scanning/71)

In general, instead of using `String.prototype.includes` on the entire
URL string, parse the URL and make decisions based on its `host` (or
`hostname`) field. This avoids cases where the trusted domain appears in
the path, query, or as part of a different hostname.

Here, `payload.source_fid` is set to `'siliconflow_intl'` if
`postBody.base_url` “contains” `api.siliconflow.com`. To keep behavior
for correct inputs but close the hole, we should:

1. Safely parse `postBody.base_url` using the standard `URL` class.
2. Extract the hostname (`url.hostname`).
3. Compare it appropriately:
- If we only want the exact host `api.siliconflow.com`, use strict
equality.
- If international endpoints may include subdomains like
`foo.api.siliconflow.com`, allow those via suffix check on the hostname.
4. Fall back to `LLMFactory.SILICONFLOW` if parsing fails or the host
does not match.

Concretely, in `web/src/pages/user-setting/setting-model/hooks.tsx`, in
the `onApiKeySavingOk` callback where `payload.source_fid` is set,
replace the `toLowerCase().includes('api.siliconflow.com')` logic with a
small block that:

- Initializes a local `let sourceFid = LLMFactory.SILICONFLOW;`
- If `postBody.base_url` is present, attempts `new
URL(postBody.base_url)` inside a `try/catch`, lowercases `url.hostname`,
and checks whether it equals `api.siliconflow.com` or ends with
`.api.siliconflow.com`.
- Assigns `payload.source_fid = sourceFid`.

No new external dependencies are required; `URL` is available in modern
browsers and Node, and TypeScript understands it.


_Suggested fixes powered by Copilot Autofix. Review carefully before
merging._

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-03-02 19:11:52 +08:00
Idriss Sbaaoui
b0ace2c5d0 feat: enable Arabic in production UI and add complete Arabic documentation (#13315)
### What problem does this PR solve?

This PR adds end-to-end Arabic support in production. It also adds a
full Arabic README

### Type of change

 - [x] New Feature (non-breaking change which adds functionality)
 - [x] Documentation Update
2026-03-02 19:10:11 +08:00
Yao Wei
f8c91e8854 Refa: Resume parsing module (architectural optimizations based on SmartResume Pipeline) (#13255)
Core optimizations (refer to arXiv:2510.09722):

1. PDF text fusion: Metadata + OCR dual-path extraction and fusion

2. Page-aware reconstruction: YOLOv10 page segmentation + hierarchical
sorting + line number indexing

3. Parallel task decomposition: Basic information/work
experience/educational background three-way parallel LLM extraction

4. Index pointer mechanism: LLM returns a range of line numbers instead
of generating the full text, reducing the illusion of full text.

---------

Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>
Co-authored-by: Aron.Yao <yaowei@192.168.1.68>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-02 19:05:50 +08:00
balibabu
7d6f20585f Feat: Modify the style of the classification operator and fix some console errors. (#13314)
### What problem does this PR solve?

Feat: Modify the style of the classification operator and fix some
console errors.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 16:53:24 +08:00
Magicbook1108
5fc3bd38b0 Feat: Support siliconflow.com (#13308)
### What problem does this PR solve?

Feat: Support siliconflow.com

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 15:37:42 +08:00
Magicbook1108
1db221f19e Feat: add more models for siliconflow and tongyi-qwen (#13311)
### What problem does this PR solve?

Feat: add more models for siliconflow and tongyi-qwen

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 15:37:08 +08:00
liuxiaoyusky
8ba66dd62a Fix: respect user-configured chunk_token_num for MinerU/docling/paddleocr parsers (#13234)
## Summary

When using MinerU, docling, TCADP, or paddleocr as the PDF parser with
the General (naive) chunk method, the user-configured `chunk_token_num`
is **unconditionally overwritten to 0** at
[rag/app/naive.py#L858-L859](https://github.com/infiniflow/ragflow/blob/main/rag/app/naive.py#L858-L859),
effectively disabling chunk merging regardless of what the user sets in
the UI.

### Problem

A user sets `chunk_token_num = 2048` in the dataset configuration UI,
expecting small parser blocks to be merged into larger chunks. However,
this line:

```python
if name in ["tcadp", "docling", "mineru", "paddleocr"]:
    parser_config["chunk_token_num"] = 0
```

silently overrides the user's setting. As a result, every MinerU output
block becomes its own chunk. For short documents (e.g. a 3-page PDF fund
factsheet parsed by MinerU), this produces **47 tiny chunks** — some as
small as 11 characters (`"July 2025"`) or 15 characters (`"CIES
Eligible"`).

This severely degrades retrieval quality: vector embeddings of such
short fragments have minimal semantic value, and keyword search produces
excessive noise.

### Fix

Only apply the `chunk_token_num = 0` override when the user has **not**
explicitly configured a positive value:

```python
if name in ["tcadp", "docling", "mineru", "paddleocr"]:
    if int(parser_config.get("chunk_token_num", 0)) <= 0:
        parser_config["chunk_token_num"] = 0
```

This preserves the original default behavior (no merging) while
respecting the user's explicit configuration.

### Before / After (MinerU, 3-page PDF, chunk_token_num=2048)

| | Before | After |
|---|---|---|
| Chunks produced | 47 | ~8 (merged by token limit) |
| Smallest chunk | 11 chars | ~500 chars |
| User setting respected | No | Yes |

## Test plan

- [ ] Parse a PDF with MinerU and `chunk_token_num = 2048` → verify
chunks are merged up to token limit
- [ ] Parse a PDF with MinerU and `chunk_token_num = 0` (or default) →
verify original behavior (no merging)
- [ ] Parse a PDF with DeepDOC parser → verify no change in behavior
(not affected by this code path)
- [ ] Repeat with docling/paddleocr if available
2026-03-02 15:31:40 +08:00
少卿
d430446e69 fix:absolute page index mix-up in DeepDoc PDF parser (#12848)
### What problem does this PR solve?

Summary:
This PR addresses critical indexing issues in
deepdoc/parser/pdf_parser.py that occur when parsing long PDFs with
chunk-based pagination:

Normalize rotated table page numbering: Rotated-table re-OCR now writes
page_number in chunk-local 1-based form, eliminating double-addition of
page_from offset that caused misalignment between table positions and
document boxes.
Convert absolute positions to chunk-local coordinates: When inserting
tables/figures extracted via _extract_table_figure, positions are now
converted from absolute (0-based) to chunk-local indices before distance
matching and box insertion. This prevents IndexError and out-of-range
accesses during paged parsing of long documents.

Root Cause:
The parser mixed absolute (0-based, document-global) and relative
(1-based, chunk-local) page numbering systems. Table/figure positions
from layout extraction carried absolute page numbers, but insertion
logic expected chunk-local coordinates aligned with self.boxes and
page_cum_height.


Testing(I do):

Manual verification: Parse a 200+ page PDF with from_page > 0 and table
rotation enabled. Confirm that:

Tables and figures appear on correct pages
No IndexError or position mismatches occur
Page numbers in output match expected chunk-local offsets


Automated testing: 我没做


## Separate Discussion: Memory Optimization Strategy(from codex-5.2-max
and claude 4.5 opus and me)

### Context

The current implementation loads entire page ranges into memory
(`__images__`, `page_chars`, intermediates), which can cause RAM
exhaustion on large documents. While the page numbering fix resolves
correctness issues, scalability remains a concern.

### Proposed Architecture

**Pipeline-Driven Chunking with Explicit Resource Management:**

1. **Authoritative chunk planning**: Accept page-range specifications
from upstream pipeline as the single source of truth. The parser should
be a stateless worker that processes assigned chunks without making
independent pagination decisions.

2. **Granular memory lifecycle**:
   ```python
   for chunk_spec in chunk_plan:
       # Load only chunk_spec.pages into __images__
       page_images = load_page_range(chunk_spec.start, chunk_spec.end)
       
       # Process with offset tracking
       results = process_chunk(page_images, offset=chunk_spec.start)
       
       # Explicit cleanup before next iteration
       del page_images, page_chars, layout_intermediates
       gc.collect()  # Force collection of large objects
   ```

3. **Persistent lightweight state**: Keep model instances (layout
detector, OCR engine), document metadata (outlines, PDF structure), and
configuration across chunks to avoid reinitialization overhead (~2-5s
per chunk for model loading).

4. **Adaptive fallback**: Provide `max_pages_per_chunk` (default: 50)
only when pipeline doesn't supply a plan. Never exceed
pipeline-specified ranges to maintain predictable memory bounds.

5. **Optional: Dynamic budgeting**: Expose a memory budget parameter
that adjusts chunk size based on observed image dimensions and format
(e.g., reduce chunk size for high-DPI scanned documents).

### Benefits

- **Predictable memory footprint**: RAM usage bounded by `chunk_size ×
avg_page_size` rather than total document size
- **Horizontal scalability**: Enables parallel chunk processing across
workers
- **Failure isolation**: Page extraction errors affect only current
chunk, not entire document
- **Cloud-friendly**: Works within container memory limits (e.g., 2-4GB
per worker)

### Trade-offs

- **Increased I/O**: Re-opening PDF for each chunk vs. keeping file
handle (mitigated by page-range seeks)
- **Complexity**: Requires careful offset tracking and stateful
coordination between pipeline and parser
- **Warmup cost**: Model initialization overhead amortized across chunks
(acceptable for documents >100 pages)

### Implementation Priority

This optimization should be **deferred to a separate PR** after the
current correctness fix is merged, as:
1. It requires broader architectural changes across the pipeline
2. Current fix is critical for correctness and can be backported
3. Memory optimization needs comprehensive benchmarking on
representative document corpus


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 14:58:37 +08:00
Ahmad Intisar
184388879d feat: Add disable_password_login configuration to support SSO-only authentication (#13151)
### What problem does this PR solve?

Enterprise deployments that use an external Identity Provider (e.g.,
Microsoft Entra ID, Okta, Keycloak) need the ability to enforce SSO-only
authentication by hiding the email/password login form. Currently, the
login page always shows the password form alongside OAuth buttons, with
no way to disable it.

This PR adds a `disable_password_login` configuration option under the
existing `authentication` section in `service_conf.yaml`. When set to
`true`, the login page only displays configured OAuth/SSO buttons and
hides the email/password form, "Remember me" checkbox, and "Sign up"
link.

The flag can be set via:
- `service_conf.yaml` (`authentication.disable_password_login: true`)
- Environment variable (`DISABLE_PASSWORD_LOGIN=true`)

Default behavior is unchanged (`false`).

### Behavior

| `disable_password_login` | OAuth configured | Result |
|---|---|---|
| `false` (default) | No | Standard email/password form |
| `false` | Yes | Email/password form + SSO buttons below |
| `true` | Yes | **SSO buttons only** (no form, no sign up link) |
| `true` | No | Empty card (admin should configure OAuth first) |

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### Files changed (5)

1. `docker/service_conf.yaml.template` — added `disable_password_login:
false` under authentication
2. `common/settings.py` — added `DISABLE_PASSWORD_LOGIN` global variable
and loader in `init_settings()`
3. `common/config_utils.py` — fixed `TypeError` in `show_configs()` when
authentication section contains non-dict values (e.g., booleans)
4. `api/apps/system_app.py` — exposed `disablePasswordLogin` flag in
`/config` endpoint
5. `web/src/pages/login/index.tsx` — conditionally render password form
based on config flag; OAuth buttons always render when channels exist

---------

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
2026-03-02 14:06:03 +08:00
Magicbook1108
daec36e935 Fix: add soft limit for graph rag size (#13252)
### What problem does this PR solve?

Fix: add soft limit for graph rag size #13258 Q2

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-02 14:02:36 +08:00
huber
8a6b5ced6b fix: add missing chunk_data column to OceanBase schema migration (#13306)
### What problem does this PR solve?

When using OceanBase as the document storage engine, parsing and
inserting chunks with chunk_data (e.g., table parser row data) fails
with the following error:
```
[ERROR][Exception]: Insert chunk error: ['Unconsumed column names: chunk_data']
This happens because the chunk_data column was recently introduced but was omitted from the EXTRA_COLUMNS list in 
rag/utils/ob_conn.py
```
As a result, the automatic schema migration for existing OceanBase
tables does not append the missing chunk_data column, causing the
underlying pyobvector or SQLAlchemy to raise an unconsumed column names
error during data insertion.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### What is the solution?
Added column_chunk_data to the EXTRA_COLUMNS list in 
```
rag/utils/ob_conn.py
```
This ensures that the OceanBase connection wrapper can correctly detect
the missing column and automatically alter existing chunk tables to
include the chunk_data field during initialization.
2026-03-02 13:25:11 +08:00
Magicbook1108
f0dd12289c Feat: add preprocess parameters for ingestion pipeline (#13300)
### What problem does this PR solve?
Feat: add preprocess parameters for ingestion pipeline

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 13:18:57 +08:00
Yihang Wang
7fc97da610 security: Adopt Jinja2 SandboxedEnvironment for template rendering. (#13305) 2026-03-02 13:17:29 +08:00
Idriss Sbaaoui
860c4bd0bb Feat: UI testing automation with playwright (#12749)
### What problem does this PR solve?

This PR helps automate the testing of the ui interface using pytest
Playwright

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test automation infrastructure

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 13:04:08 +08:00
Attili-sys
21bc1ab7ec Feature rtl support (#13118)
### What problem does this PR solve?

This PR adds comprehensive **Right-to-Left (RTL) language support**,
primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu,
etc.).

Previously, RTL content had multiple rendering issues:

- Incorrect sentence splitting for Arabic punctuation in citation logic
- Misaligned text in chat messages and markdown components  
- Improper positioning of blockquotes and “think” sections  
- Incorrect table alignment  
- Citation placement ambiguity in RTL prompts  
- UI layout inconsistencies when mixing LTR and RTL text  

This PR introduces backend and frontend improvements to properly detect,
render, and style RTL content while preserving existing LTR behavior.

#### Backend
- Updated sentence boundary regex in `rag/nlp/search.py` to include
Arabic punctuation:
  - `،` (comma)
  - `؛` (semicolon)
  - `؟` (question mark)
  - `۔` (Arabic full stop)
- Ensures citation insertion works correctly in RTL sentences.
- Updated citation prompt instructions to clarify citation placement
rules for RTL languages.

#### Frontend
- Introduced a new utility: `text-direction.ts`
  - Detects text direction based on Unicode ranges.
  - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts.
  - Provides `getDirAttribute()` for automatic `dir` assignment.

- Applied dynamic `dir` attributes across:
  - Markdown rendering
  - Chat messages
  - Search results
  - Tables
  - Hover cards and reference popovers

- Added proper RTL styling in LESS:
  - Text alignment adjustments
  - Blockquote border flipping
  - Section indentation correction
  - Table direction switching
  - Use of `<bdi>` for figure labels to prevent bidirectional conflicts

#### DevOps / Environment
- Added Windows backend launch script with retry handling.
- Updated dependency metadata.
- Adjusted development-only React debugging behavior.

---

### Type of change

- [x] Bug Fix (non-breaking change which fixes RTL rendering and
citation issues)
- [x] New Feature (non-breaking change which adds RTL detection and
dynamic direction handling)

---------

Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 13:03:44 +08:00
balibabu
a897aedea9 Feat: Modify the form styles for retrieval and conditional operators. (#13299)
### What problem does this PR solve?

Feat: Modify the form styles for retrieval and conditional operators.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 12:05:27 +08:00
chanx
0cdddea59a feat: pipeline add preprocess (#13302)
### What problem does this PR solve?

feat: pipeline add preprocess

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-02 11:50:48 +08:00
balibabu
cf3d3c7c89 Feat: When exporting the agent DSL, the tailkey, password, and history fields need to be cleared. #13281 (#13282)
### What problem does this PR solve?
Feat: When exporting the agent DSL, the tailkey, password, and history
fields need to be cleared. #13281

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-02 11:41:38 +08:00
dependabot[bot]
b956ad180c Build(deps): Bump pypdf from 6.7.3 to 6.7.4 (#13298)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.7.3 to 6.7.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/releases">pypdf's
releases</a>.</em></p>
<blockquote>
<h2>Version 6.7.4, 2026-02-27</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Allow limiting output length for RunLengthDecode filter (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Deal with invalid annotations in extract_links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's
changelog</a>.</em></p>
<blockquote>
<h2>Version 6.7.4, 2026-02-27</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Allow limiting output length for RunLengthDecode filter (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)</li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Deal with invalid annotations in extract_links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)</li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1650bc31e8"><code>1650bc3</code></a>
REL: 6.7.4</li>
<li><a
href="f309c60037"><code>f309c60</code></a>
SEC: Allow limiting output length for RunLengthDecode filter (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)</li>
<li><a
href="993f052748"><code>993f052</code></a>
DEV: Bump actions/upload-artifact from 6 to 7 (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3662">#3662</a>)</li>
<li><a
href="a3c996bffc"><code>a3c996b</code></a>
DEV: Bump actions/download-artifact from 7 to 8 (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3663">#3663</a>)</li>
<li><a
href="37de32022e"><code>37de320</code></a>
ROB: Deal with invalid annotations in extract_links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)</li>
<li>See full diff in <a
href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.7.3&new-version=6.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-02 11:32:12 +08:00
Idriss Sbaaoui
9d78d3ddb1 Tests: fix failling http in CI (#13301)
### What problem does this PR solve?
test_doc_sdk_routes_unit had two flaky/incorrect branch assumptions:

1. parse/stop_parsing production logic gates on doc.run, but tests used
progress, causing branch mismatch and unintended fallthrough into
mutation/DB paths.
2. stop_parsing invalid-state test asserted an outdated message
fragment, making the contract brittle.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 10:44:33 +08:00
Jimmy Ben Klieve
7e0dd906f2 refactor: update admin ui (#13280)
### What problem does this PR solve?

Update for Admin UI:
- Update file picker input in **Registration whitelist** > **Import from
Excel** modal
- Modify DOM structure of **Sandbox Settings** and move several
hardcoded texts into translation files

### Type of change

- [x] Refactoring
2026-02-28 19:21:51 +08:00
Idriss Sbaaoui
e62552d482 Added some React IDs for playwright e2e tests (#13265)
### What problem does this PR solve?

Necessary ids for implementing the new testing suite with playwright for
UI

### Type of change

- [x] Other (please describe): Testing IDs

Co-authored-by: Liu An <asiro@qq.com>
2026-02-28 15:13:47 +08:00
Magicbook1108
1027916bfe Fix: inconsistent state handling for multi-user single-canvas access (#13267)
### What problem does this PR solve?

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-02-28 15:09:21 +08:00
Yongteng Lei
c91e803a38 Fix: close detached PIL image on JPEG save failure in encode_image (#13278)
### What problem does this PR solve?

Properly close detached PIL image on JPEG save failure in encode_image.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-28 14:43:35 +08:00
天海蒼灆
983150b936 Fix (api): fix the document parsing status check logic (#12504)
### What problem does this PR solve?
When the original code terminates the parsing task halfway, the progress
may not be 0 or 1, which will result in the inability to call the
interface to parse again

-Change the document parsing progress check to task status check, and
use TaskStatus.RUNNING.value to judge
-Update the condition judgment for stopping parsing documents, and check
whether the task is running instead


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-28 14:38:55 +08:00
Jin Hai
32ec950ca8 Fix create / drop chat session syntax (#13279)
### What problem does this PR solve?

This pull request refactors the chat session creation and deletion logic
in both the parser and client code to use unique session IDs instead of
session names. It also updates the corresponding command syntax and
payloads, ensuring more robust and unambiguous session management.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-02-28 14:18:21 +08:00
Jin Hai
d9d4825079 Add chat sessions related command (#13268)
### What problem does this PR solve?

1. Create / Drop / List chat sessions
2. Chat with LLM and datasets

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-02-28 12:52:45 +08:00
Jin Hai
54094771a3 Fix streaming chat on web API (#13275)
### What problem does this PR solve?

This pull request makes a small but important fix to how streaming
requests are handled in the `completion` endpoint of
`conversation_app.py`. The main change ensures that the `stream`
argument is not passed twice, which could cause errors.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-02-28 12:16:38 +08:00
Yongteng Lei
0110151e12 Fix: document remove race condition (#13242)
### What problem does this PR solve?

Fix document remove race condition.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-28 11:23:24 +08:00
eviaaaaa
fa71f8d0c7 refactor(word): lazy-load DOCX images to reduce peak memory without changing output (#13233)
**Summary**
This PR tackles a significant memory bottleneck when processing
image-heavy Word documents. Previously, our pipeline eagerly decoded
DOCX images into `PIL.Image` objects, which caused high peak memory
usage. To solve this, I've introduced a **lazy-loading approach**:
images are now stored as raw blobs and only decoded exactly when and
where they are consumed.

This successfully reduces the memory footprint while keeping the parsing
output completely identical to before.

**What's Changed**
Instead of a dry file-by-file list, here is the logical breakdown of the
updates:

* **The Core Abstraction (`lazy_image.py`)**: Introduced `LazyDocxImage`
along with helper APIs to handle lazy decoding, image-type checks, and
NumPy compatibility. It also supports `.close()` and detached PIL access
to ensure safe lifecycle management and prevent memory leaks.
* **Pipeline Integration (`naive.py`, `figure_parser.py`, etc.)**:
Updated the general DOCX picture extraction to return these new lazy
images. Downstream consumers (like the figure/VLM flow and base64
encoding paths) now decode images right at the use site using detached
PIL instances, avoiding shared-instance side effects.
* **Compatibility Hooks (`operators.py`, `book.py`, etc.)**: Added
necessary compatibility conversions so these lazy images flow smoothly
through existing merging, filtering, and presentation steps without
breaking.

**Scope & What is Intentionally Left Out**
To keep this PR focused, I have restricted these changes strictly to the
**general Word pipeline** and its downstream consumers.
The `QA` and `manual` Word parsing pipelines are explicitly **not
modified** in this PR. They can be safely migrated to this new lazy-load
model in a subsequent, standalone PR.

**Design Considerations**
I briefly considered adding image compression during processing, but
decided against it to avoid any potential quality degradation in the
derived outputs. I also held off on a massive pipeline re-architecture
to avoid overly invasive changes right now.

**Validation & Testing**
I've tested this to ensure no regressions:

* Compared identical DOCX inputs before and after this branch: chunk
counts, extracted text, table HTML, and image descriptions match
perfectly.
* **Confirmed a noticeable drop in peak memory usage when processing
image-dense documents.** For a 30MB Word document containing 243 1080p
screenshots, memory consumption is reduced by approximately 1.5GB.

**Breaking Changes**
None.
2026-02-28 11:22:31 +08:00
SFL79
4f0c892b32 feat(ui): add individual model delete buttons across all providers (#13271)
### What problem does this PR solve?

Added the option to delete models individually from providers.
For additional context, see
[issue-13184](https://github.com/infiniflow/ragflow/issues/13184)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Note: when deleting a selected model, it leaves the full model name as
text as seen here:
<img width="676" height="90" alt="image"
src="https://github.com/user-attachments/assets/c11c7c1b-3f2a-4119-b20c-bb8148a8ad16"
/>

If attempting to use ragflow with that deleted model, ragflow will throw
an unauthorized model error as expected.
I left it like that on purpose, so it's easier for the user to
understand what he deleted and that he needs to replace it with another
model.

Co-authored-by: Shahar Flumin <shahar@Shahars-MacBook-Air.local>
2026-02-28 10:51:39 +08:00
Yesid Cano Castro
d1afcc9e71 feat(seafile): add library and directory sync scope support (#13153)
### What problem does this PR solve?

The SeaFile connector currently synchronises the entire account — every
library
visible to the authenticated user. This is impractical for users who
only need
a subset of their data indexed, especially on large SeaFile instances
with many
shared libraries.

This PR introduces granular sync scope support, allowing users to choose
between
syncing their entire account, a single library, or a specific directory
within a
library. It also adds support for SeaFile library-scoped API tokens
(`/api/v2.1/via-repo-token/` endpoints), enabling tighter access control
without
exposing account-level credentials.


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Test

```
from seafile_connector import SeaFileConnector
import logging
import os

logging.basicConfig(level=logging.DEBUG)

URL = os.environ.get("SEAFILE_URL", "https://seafile.example.com")
TOKEN = os.environ.get("SEAFILE_TOKEN", "")
REPO_ID = os.environ.get("SEAFILE_REPO_ID", "")
SYNC_PATH = os.environ.get("SEAFILE_SYNC_PATH", "/Documents")
REPO_TOKEN = os.environ.get("SEAFILE_REPO_TOKEN", "")

def _test_scope(scope, repo_id=None, sync_path=None):
    print(f"\n{'='*50}")
    print(f"Testing scope: {scope}")
    print(f"{'='*50}")

    creds = {"seafile_token": TOKEN} if TOKEN else {}
    if REPO_TOKEN and scope in ("library", "directory"):
        creds["repo_token"] = REPO_TOKEN

    connector = SeaFileConnector(
        seafile_url=URL,
        batch_size=5,
        sync_scope=scope,
        include_shared = False,
        repo_id=repo_id,
        sync_path=sync_path,
    )
    connector.load_credentials(creds)
    connector.validate_connector_settings()

    count = 0
    for batch in connector.load_from_state():
        for doc in batch:
            count += 1
            print(f"  [{count}] {doc.semantic_identifier} "
                  f"({doc.size_bytes} bytes, {doc.extension})")

    print(f"\n-> {scope} scope: {count} document(s) found.\n")

# 1. Account scope
if TOKEN:
    _test_scope("account")
else:
    print("\nSkipping account scope (set SEAFILE_TOKEN)")

# 2. Library scope
if REPO_ID and (TOKEN or REPO_TOKEN):
    _test_scope("library", repo_id=REPO_ID)
else:
    print("\nSkipping library scope (set SEAFILE_REPO_ID + token)")

# 3. Directory scope
if REPO_ID and SYNC_PATH and (TOKEN or REPO_TOKEN):
    _test_scope("directory", repo_id=REPO_ID, sync_path=SYNC_PATH)
else:
    print("\nSkipping directory scope (set SEAFILE_REPO_ID + SEAFILE_SYNC_PATH + token)")
```
2026-02-28 10:24:28 +08:00
Stephen Hu
aec2ef4232 refactor:improve tts model's codes (#13137)
### What problem does this PR solve?

improve tts model's codes

### Type of change

- [x] Refactoring
2026-02-28 10:18:00 +08:00
Stephen Hu
9577753c10 Refactor: improve the logic about docling parser extract box (#13215)
### What problem does this PR solve?
 improve the logic about docling parser extract box

### Type of change
- [x] Refactoring
2026-02-28 10:05:24 +08:00
chanx
510ff89661 Fix: remove unused files (#13232)
### What problem does this PR solve?

Fix: remove unused files

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 23:05:40 +08:00
Jimmy Ben Klieve
c0823e8d6d refactor: update chat ui (#13269)
### What problem does this PR solve?

Update **Chat** UI:
- Align to the design.
- Update `<AudioButton>` visualizer logic.
- Fix keyboard navigation issue.

### Type of change

- [x] Refactoring
2026-02-27 22:26:19 +08:00
Enes Delibalta
4e48aba5c4 fix: update DoclingParser return type hint (#13243)
### What problem does this PR solve?

The _transfer_to_sections method was throwing a type hint violation
because it occasionally returns 3-item tuples instead of 2. Adjusted to
list[tuple[str, ...]] to prevent runtime crashes.

Error: 

20:53:21 Page(1~10): [ERROR]Internal server error while chunking:
Method
deepdoc.parser.docling_parser.DoclingParser._transfer_to_sections()
return [(1. JIRA Nasıl Kullanılır?, text,
@@1\t70.8\t194.9\t70.9\t85.5##), (1.1. Proje O...##)] violates type
hint list[tuple[str, str]], as list index
15 item tuple tuple (Gelen ekran
üzerinden alanları isterlerine göre doldurduğunuz taktirde Create
düğmesi i...##) length 3 != 2.
20:53:21 [ERROR][Exception]: Method
deepdoc.parser.docling_parser.DoclingParser._transfer_to_sections()
return [('1. JIRA Nasıl Kullanılır?', 'text',
'@@1\t70.8\t194.9\t70.9\t85.5##'), ('1.1. Proje O...##')] violates
type hint list[tuple[str, str]], as list index
15 item tuple tuple ('Gelen ekran
üzerinden alanları isterlerine göre doldurduğunuz taktirde Create
düğmesi i...##') length 3 != 2.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Enes Delibalta <enes.delibalta@pentanom.com>
2026-02-27 20:13:50 +08:00
Yuxing Deng
51b180d991 fix: adding GPUStack chat model requires v1 suffix (#13237)
### What problem does this PR solve?

Refer to issue: #13236
The base url for GPUStack chat model requires `/v1` suffix. For the
other model type like `Embedding` or `Rerank`, the `/v1` suffix is not
required and will be appended in code.
So keep the same logic for chat model as other model type.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 20:13:07 +08:00
as-ondewo
194e076e26 Fix: init superuser can create duplicate users (#13221)
### What problem does this PR solve?

This PR fixes 2 bugs related to RAGFlow's init superuser functionality.

#### Bug 1

When the RAGFlow server was started with the `--init-superuser` option
it would always create a new admin user even if it already exists
resulting in duplicate users.

To fix this, I added an additional check before create the superuser and
added the *unique* constraint to the email column of the database, to
mitigate potential TOCTOU race conditions. Since existing databases
could contain duplicate emails I added email de-duplication to the
database migration.

#### Bug 2

When the RAGFlow server was started with the `--init-superuser` option
but without configured default LLM and embedding models it would fail to
start because the `init_superuser` function would always make test
request to the models even if they were not set.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 19:55:51 +08:00
balibabu
6d0100ca67 Fix: The output content of the multi-model comparison will disappear. #13227 (#13241)
### What problem does this PR solve?

Fix: The output content of the multi-model comparison will disappear.
#13227
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 19:18:40 +08:00
balibabu
861ebfc6e1 Feat: Make the embedded page of chat compatible with mobile devices. (#13262)
### What problem does this PR solve?
Feat: Make the embedded page of chat compatible with mobile devices.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-02-27 19:17:41 +08:00
avianion
5f53fbe0f1 feat: Add Avian as an LLM provider (#13256)
### What problem does this PR solve?

This PR adds [Avian](https://avian.io) as a new LLM provider to RAGFlow.
Avian provides an OpenAI-compatible API with competitive pricing,
offering access to models like DeepSeek V3.2, Kimi K2.5, GLM-5, and
MiniMax M2.5.

**Provider details:**
- API Base URL: `https://api.avian.io/v1`
- Auth: Bearer token via API key
- OpenAI-compatible (chat completions, streaming, function calling)
- Models:
  - `deepseek/deepseek-v3.2` — 164K context, $0.26/$0.38 per 1M tokens
  - `moonshotai/kimi-k2.5` — 131K context, $0.45/$2.20 per 1M tokens
  - `z-ai/glm-5` — 131K context, $0.30/$2.55 per 1M tokens
  - `minimax/minimax-m2.5` — 1M context, $0.30/$1.10 per 1M tokens

**Changes:**
- `rag/llm/chat_model.py` — Add `AvianChat` class extending `Base`
- `rag/llm/__init__.py` — Register in `SupportedLiteLLMProvider`,
`FACTORY_DEFAULT_BASE_URL`, `LITELLM_PROVIDER_PREFIX`
- `conf/llm_factories.json` — Add Avian factory with model definitions
- `web/src/constants/llm.ts` — Add to `LLMFactory` enum, `IconMap`,
`APIMapUrl`
- `web/src/components/svg-icon.tsx` — Register SVG icon
- `web/src/assets/svg/llm/avian.svg` — Provider icon
- `docs/references/supported_models.mdx` — Add to supported models table

This follows the same pattern as other OpenAI-compatible providers
(e.g., n1n #12680, TokenPony).

cc @KevinHuSh @JinHai-CN

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2026-02-27 17:36:55 +08:00
6ba3i
bb59a27e55 Doc : Add french Readme (#13254)
### What problem does this PR solve?

Add fench Readme

### Type of change

- [x] Documentation Update
2026-02-27 11:34:13 +08:00
qinling0210
8b6d363a98 Use pagination in _search_metadata (#13238)
### What problem does this PR solve?

Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210)

Remove limit in _search_metadata, use pagination in _search_metadata.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 11:24:49 +08:00
Jin Hai
a1549c0fdc Fix UI (#13239)
### What problem does this PR solve?

This pull request makes a minor update to the English locale strings for
the Table of Contents toggle buttons, changing the labels from "Show
TOC"/"Hide TOC" to "Show content"/"Hide content" for improved clarity.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-02-26 19:21:08 +08:00
Magicbook1108
c03c537bf8 Feat: optimize gmail/google-drive (#13230)
### What problem does this PR solve?

Feat: optimize gmail/google-drive

Now:
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/0c4b6044-7209-4c4f-ac0c-32070b79daf7"
/>
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/406f93d8-9b0f-4f5a-b8bb-3936990f558c"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-26 19:19:40 +08:00
6ba3i
22c4d72891 tests: improve RAGFlow coverage based on Codecov report (#13219)
### What problem does this PR solve?

Codecov’s coverage report shows that several RAGFlow code paths are
currently untested or under-tested. This makes it easier for regressions
to slip in during refactors and feature work.
This PR adds targeted automated tests to cover the files and branches
highlighted by Codecov, improving confidence in core behavior while
keeping runtime functionality unchanged.

### Type of change

- [x] Other (please describe): Test coverage improvement (adds/extends
unit and integration tests to address Codecov-reported gaps)
2026-02-26 19:03:26 +08:00
Magicbook1108
1aa49a11f0 Feat: support AWS SES smtp (#13195)
### What problem does this PR solve?

Support AWS SES smtp #13179

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-26 13:49:53 +08:00
writinwaters
74dc43406f Docs: After careful consideration, the RAGFlow team decided to hold o… (#13226)
…ff publishing this guide.

### What problem does this PR solve?

Removed failsure mode checklist per your request. @JinHai-CN 

### Type of change


- [x] Documentation Update
2026-02-26 12:39:58 +08:00
balibabu
d2dd0b7e50 Fix: The agent is embedded in the webpage; interrupting its operation will redirect to the login page. #12697 (#13224)
### What problem does this PR solve?

Fix: The agent is embedded in the webpage; interrupting its operation
will redirect to the login page. #12697

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-26 12:39:28 +08:00
chanx
8bce212284 Fix: error in retrieval testing page (#13225)
### What problem does this PR solve?

Fix: error in retrieval testing page

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-26 12:39:09 +08:00
Angel98518
024edba1b8 fix(web): prevent duplicate i18n languageChanged listeners (#13218)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring
2026-02-26 10:45:50 +08:00
PandaMan
d43aebe701 Fix/13142 auto metadata (#13217)
### What problem does this PR solve?

Close #13142

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-26 10:25:48 +08:00
Angel98518
b54260bcd7 fix(web): correct initial chat variable enabled state (#13214)
## Summary

Fixes the initial enabled/disabled state of chat variable checkboxes by
correcting a helper function that previously always returned .

## Problem

 in  had two  statements:



Because of the early , the function always returned , so all chat
variable checkboxes were initially disabled regardless of the field.
This also made the helper inconsistent with , which enables all fields
by default except .

## Fix

Update  to use the same condition as :



This ensures:
- All chat variable checkboxes are enabled by default
-  remains the only field disabled by default
- Behavior is consistent between the helper and the checkbox map
initialization in .

No API or backend changes are involved; this is a small, isolated
frontend bugfix.
2026-02-26 10:25:14 +08:00
Magicbook1108
158503a1aa Feat: optimize ingestion pipeline with preprocess (#13211)
### What problem does this PR solve?

Feat: optimize ingestion pipeline with preprocess

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-26 10:24:13 +08:00
PSBigBig × MiniPS
b7eca981d4 docs: add RAG failure modes checklist guide (refs #13138) (#13204)
### What problem does this PR solve?

This PR adds a new guide: **"RAG failure modes checklist"**.

RAG systems often fail in ways that are not immediately visible from a
single metric like accuracy or latency. In practice, debugging
production RAG applications requires identifying recurring failure
patterns across retrieval, routing, evaluation, and deployment stages.

This guide introduces a structured, pattern-based checklist (P01–P12) to
help users interpret traces, evaluation results, and dataset behavior
within RAGFlow. The goal is to provide a practical way to classify
incidents (e.g., retrieval hallucination, chunking issues, index
staleness, routing misalignment) and reason about minimal structural
fixes rather than ad-hoc prompt changes.

The change is documentation-only and does not modify any code or
configuration.

Refs #13138


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-02-25 19:35:15 +08:00
writinwaters
f9e0eb38ec Refact: Updated ingestion pipeline UI. (#13216)
### What problem does this PR solve?

Updated ingestion pipeline-specific UI tips.

### Type of change

- [x] Refactoring
2026-02-25 19:29:04 +08:00
6ba3i
38011f2c16 tests: improve RAGFlow coverage based on Codecov report (#13200)
### What problem does this PR solve?

Codecov’s coverage report shows that several RAGFlow code paths are
currently untested or under-tested. This makes it easier for regressions
to slip in during refactors and feature work.
This PR adds targeted automated tests to cover the files and branches
highlighted by Codecov, improving confidence in core behavior while
keeping runtime functionality unchanged.

### Type of change

- [x] Other (please describe): Test coverage improvement (adds/extends
unit and integration tests to address Codecov-reported gaps)
2026-02-25 19:12:11 +08:00
balibabu
2a5ddf064d Fix: Note component text area does not resize with component #13065 (#13212)
### What problem does this PR solve?

Fix: Note component text area does not resize with component #13065

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 17:24:07 +08:00
Jimmy Ben Klieve
220e611e33 refactor: ux improvements for variable picker in prompt editor (#13213)
### What problem does this PR solve?

User experience enhancement for variable picker in prompt editor:

- Add case-insensitive string search for variables.
- Add basic keyboard navigation in variable picker:
   - Hit <kbd>UpArrow</kbd> and <kbd>DownArrow</kbd> for navigating.
- Hit <kbd>Tab</kbd> or <kbd>Enter</kbd> for selecting focused item into
editor.
- Fix unexpectedly inserting invalid variable into editor by hitting
<kbd>Tab</kbd>.

_Note: you still need to pick variables inside secondary menu (agent
structured output, etc.) by using your pointing device. May finish these
later._

### Type of change

- [x] Refactoring
2026-02-25 17:22:48 +08:00
He Wang
394ff16b66 fix: OceanBase metadata not returned in document list API (#13209)
### What problem does this PR solve?

Fix #13144.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 15:29:17 +08:00
Phives
4ceb668d40 feat(api/utils): Harden file_utils for robustness and edge cases (#12915)
## Summary
Improves robustness and edge-case handling in `api.utils.file_utils` to
avoid crashes, DoS/OOM risks, and timeouts when processing user-provided
filenames, paths, and file blobs.

## Changes

### Resource limits & timeouts
- **`MAX_BLOB_SIZE_THUMBNAIL`** (50 MiB) and **`MAX_BLOB_SIZE_PDF`**
(100 MiB) to reject oversized inputs before thumbnail/PDF processing.
- **`GHOSTSCRIPT_TIMEOUT_SEC`** (120 s) for
`repair_pdf_with_ghostscript` subprocess to avoid hangs on malicious or
broken PDFs.

### `filename_type`
- Handles `None`, empty string, non-string (e.g. int/list), and
path-only input via new **`_normalize_filename_for_type()`**.
- Uses basename for type detection (e.g. `a/b/c.pdf` → PDF).
- Enforces **`FILE_NAME_LEN_LIMIT`**; invalid input returns
`FileType.OTHER`.

### `thumbnail_img`
- Rejects `None`/empty/oversized blob and invalid filename; returns
`None` instead of raising.
- Wraps PDF, image, and PPT handling in try/except so corrupt or
malformed files return `None`.
- Ensures PDF has pages and PPT has slides before use.
- Normalizes PIL image mode (RGBA/P/LA → RGB) for safe PNG export.

### `repair_pdf_with_ghostscript`
- Handles `None`/empty input; skips repair when input size exceeds
limit.
- Uses `subprocess.run(..., timeout=GHOSTSCRIPT_TIMEOUT_SEC)` and
catches `TimeoutExpired`.
- Returns original bytes when Ghostscript output is empty.

### `read_potential_broken_pdf`
- `None` → `b""`; non–sequence-like (no `len`) → `b""`; empty → return
as-is.
- Oversized blob returned as-is (no repair) to avoid DoS.

### `sanitize_path`
- Explicit `None` and non-string check; strips whitespace before
normalizing.

## Testing
- **`test/unit_test/utils/test_api_file_utils.py`** added with 36 unit
tests covering the above behavior (filename_type, sanitize_path,
read_potential_broken_pdf, thumbnail_img, thumbnail,
repair_pdf_with_ghostscript, constants).
- All tests pass.

---------

Co-authored-by: Gittensor Miner <miner@gittensor.io>
2026-02-25 14:34:47 +08:00
PentaFDevs
8ad47bf242 feat: add 'Open in new tab' button for agents (#13044)
- Add new button in agent management dropdown to open agent in new tab
- Implement token-based authentication for shared agent access
- Add translations for 9 languages (en, zh, zh-tw, de, fr, it, ru,
pt-br, vi)
- Keep existing 'Embed into webpage' functionality intact

### What problem does this PR solve?

This allows users to open agents in a separate tab to work in background
while continuing to use other parts of the application.

<img width="1920" height="1080" alt="image"
src="https://github.com/user-attachments/assets/ca1719c8-2f00-4570-a730-1321fa0bfd57"
/>
<img width="254" height="222" alt="image"
src="https://github.com/user-attachments/assets/b3dd6d9f-b7e7-46b0-83e7-f0ea86e7b156"
/>
<img width="1920" height="1080" alt="image"
src="https://github.com/user-attachments/assets/e94e99f9-9039-43f7-b2d9-862b9448630c"
/>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-25 13:39:02 +08:00
Yao Wei
cf6fd6f115 fix: When using OceanBase as storage, the list_chunk sorting is abnormal. #13198 (#13208)
Actual behavior
When using OceanBase as storage, the list_chunk sorting is abnormal. The
following is the SQL statement.

SELECT id, content_with_weight, important_kwd, question_kwd, img_id,
available_int, position_int, doc_type_kwd, create_timestamp_flt,
create_time, array_to_string(page_num_int, ',') AS page_num_int_sort,
array_to_string(top_int, ',') AS top_int_sort FROM
rag_store_284250730805059584 WHERE doc_id = '' AND kb_id IN ('') ORDER
BY page_num_int_sort ASC, top_int_sort ASC, create_timestamp_flt DESC
LIMIT 0, 20

<img width="1610" height="740" alt="image"
src="https://github.com/user-attachments/assets/84e14c30-a97f-4e8f-8c8c-6ccac915d97d"
/>

Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>
2026-02-25 13:36:18 +08:00
Ray Zhang
cbe64402db feat(migration): support docker compose -p project name for backup/restore (#13191)
### What problem does this PR solve?

When users start RAGFlow with `docker compose -p <alias>`, Docker
creates volumes prefixed with the alias (e.g., `myproject_mysql_data`).
The migration script (`docker/migration.sh`) previously hardcoded the
`docker_` prefix in volume names, causing backup/restore to silently
skip all volumes for any non-default project name.

This PR adds a `-p <project_name>` option so the script correctly
targets volumes regardless of the Docker Compose project name used.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Changes

- Add `-p <project_name>` flag (default: `docker`) for specifying Docker
Compose project name
- Build volume names dynamically: `${project_name}_${base_name}`
- Update help text with new option documentation and examples
- Show project-aware `docker compose` commands in error messages
- Fix deprecated `docker-compose` to `docker compose` in hints
- Use dynamic step count instead of hardcoded `4`
- Fully backward compatible — existing usage without `-p` works
unchanged

### Usage

```bash
# Existing usage (unchanged)
./migration.sh backup
./migration.sh restore my_backup

# New: custom project name
./migration.sh -p myproject backup
./migration.sh -p myproject restore my_backup
```
2026-02-25 13:18:47 +08:00
Yongteng Lei
2bf2abfdbc Fix: authorization bypass (IDOR) in /v1/document/web_crawl (#13203)
### What problem does this PR solve?

Fix authorization bypass (IDOR) in `/v1/document/web_crawl` allows
Cross-Tenant Dataset Modification.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 12:59:41 +08:00
Ahmad Intisar
99d1c9725c Bug mysql connector empty content resolved: Semantic ID Issue (#13206)
The RDBMS (MySQL/PostgreSQL) connector generates document filenames
using the first 100 characters of the content column
(semantic_identifier). When the content contains newline characters
(\n), the resulting filename includes those newlines — for example:
Category: غير صحيح كليًا\nTitle: تفنيد حقائق....txt
RAGFlow's filename_type() function uses re.match(r".*\.txt$", filename)
to detect file types, but .* does not match newline characters by
default in Python regex. This causes the regex to fail, returning
FileType.OTHER, which triggers:
pythonraise RuntimeError("This type of file has not been supported
yet!")
As a result, all documents synced via the MySQL/PostgreSQL connector are
silently discarded. The sync logs report success (e.g., "399 docs
synchronized"), but zero documents actually appear in the dataset. This
is the root cause of issue #13001.
Root cause trace:

rdbms_connector.py → _row_to_document() sets semantic_identifier from
raw content (may contain \n)
connector_service.py → duplicate_and_parse() uses semantic_identifier as
the filename
file_service.py → upload_document() calls filename_type(filename)
file_utils.py → filename_type() regex .*\.txt$ fails on newlines →
returns FileType.OTHER
upload_document() raises "This type of file has not been supported yet!"

Fix: Sanitize the semantic_identifier in _row_to_document() by replacing
newlines and carriage returns with spaces before truncating to 100
characters.
Relates to: #13001, #12817
Type of change

 Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
2026-02-25 12:55:04 +08:00
Yongteng Lei
72b89304c1 Fix: LFI vulnerability in document parsing API (#13196)
### What problem does this PR solve?

Fix LFI vulnerability in document parsing API.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 09:47:39 +08:00
PandaMan
f4cbdc3a3b fix(api): MinIO health check use dynamic scheme and verify (Closes #13159 and #13158) (#13197)
## Summary

Fixes MinIO SSL/TLS support in two places: the MinIO **client**
connection and the **health check** used by the Admin/Service Health
dashboard. Both now respect the `secure` and `verify` settings from the
MinIO configuration.

Closes #13158
Closes #13159

---

## Problem

**#13158 – MinIO client:** The client in `rag/utils/minio_conn.py` was
hardcoded with `secure=False`, so RAGFlow could not connect to MinIO
over HTTPS even when `secure: true` was set in config. There was also no
way to disable certificate verification for self-signed certs.

**#13159 – MinIO health check:** In `api/utils/health_utils.py`, the
MinIO liveness check always used `http://` for the health URL. When
MinIO was configured with SSL, the health check failed and the dashboard
showed "timeout" even though MinIO was reachable over HTTPS.

---

## Solution

### MinIO client (`rag/utils/minio_conn.py`)

- Read `MINIO.secure` (default `false`) and pass it into the `Minio()`
constructor so HTTPS is used when configured.
- Add `_build_minio_http_client()` that reads `MINIO.verify` (default
`true`). When `verify` is false, return an `urllib3.PoolManager` with
`cert_reqs=ssl.CERT_NONE` and pass it as `http_client` to `Minio()` so
self-signed certificates are accepted.
- Support string values for `secure` and `verify` (e.g. `"true"`,
`"false"`).

### MinIO health check (`api/utils/health_utils.py`)

- Add `_minio_scheme_and_verify()` to derive URL scheme (http/https) and
the `verify` flag from `MINIO.secure` and `MINIO.verify`.
- Update `check_minio_alive()` to use the correct scheme, pass `verify`
into `requests.get(..., verify=verify)`, and use `timeout=10`.

### Config template (`docker/service_conf.yaml.template`)

- Add commented optional MinIO keys `secure` and `verify` (and env vars
`MINIO_SECURE`, `MINIO_VERIFY`) so deployers know they can enable HTTPS
and optional cert verification.

### Tests

- **`test/unit_test/utils/test_health_utils_minio.py`** – Tests for
`_minio_scheme_and_verify()` and `check_minio_alive()` (scheme, verify,
status codes, timeout, errors).
- **`test/unit_test/utils/test_minio_conn_ssl.py`** – Tests for
`_build_minio_http_client()` (verify true/false/missing, string values,
`CERT_NONE` when verify is false).

---

## Testing

- Unit tests added/updated as above; run with the project's test runner.
- Manually: configure MinIO with HTTPS and `secure: true` (and
optionally `verify: false` for self-signed); confirm client operations
work and the Service Health dashboard shows MinIO as alive instead of
timeout.
2026-02-25 09:47:12 +08:00
Yongteng Lei
c292d617ca Fix: stored XSS via HTML File upload and inline Rendering in file get (#13202)
### What problem does this PR solve?

Fix stored XSS via HTML file upload and inline rendering in
/v1/file/get/<id>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 09:46:48 +08:00
ksufer
5a8fa7cf31 Fix #13119: Use email.utils to fix IMAP parsing for names with commas (#13120)
## Type of Change
- [x] Bug fix

## Description
Closes #13119

The current IMAP connector uses `split(',')` to parse email headers,
which crashes when a sender's display name contains a comma inside
quotes (e.g., `"Doe, John" <john@example.com>`).

This PR replaces the manual string splitting with Python's standard
`email.utils.getaddresses`. This correctly handles RFC 5322 quoted
strings and prevents the `RuntimeError: Expected a singular address`.

## Checklist
- [x] I have checked the code and it works as expected.

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-02-24 19:18:55 +08:00
as-ondewo
0a7c520579 Fix: empty response from OpenAI chat completion endpoint (#13166)
### What problem does this PR solve?

When using a chat assistant that has a hardcoded `empty_response`, that
response was not returned correctly in streaming mode when no
information is found in the knowledge base. In this case only one
response with `"content": null` was yielded. If `"references": true`,
then the `empty_response` is still put into the `final_content` so there
is technically some content returned, but when `"references": false` no
content at all is returned.

I update the OpenAI chat completion endpoint to yield an additional
response with the `empty_response` in the content.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 19:18:12 +08:00
Magicbook1108
5de92e57d3 Fix: 'None None' in log (#13192)
### What problem does this PR solve?

Fix: 'None None' in log

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 19:15:20 +08:00
Magicbook1108
46dec98f52 Fix: Chat/Agent embedded page (#13199)
### What problem does this PR solve?

Fix: Chat/Agent embedded page #13190

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 19:14:24 +08:00
tuandang-diag
d89ad8b79d fix: handle null response in LLM and improve JSON parsing in agent (#13187)
Fixes AttributeError in _remove_reasoning_content() when LLM returns
None, and improves JSON parsing regex for markdown code fences in
agent_with_tools.py
2026-02-24 13:15:09 +08:00
Lynn
67befc9119 Fix: add back MCP tool custom header (#13188)
### What problem does this PR solve?

Add back custom header when use MCP.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 13:14:21 +08:00
chanx
7db2fb200c Fix: Metadata mult-selected display error (#13189)
### What problem does this PR solve?

Fix: Metadata mult-selected display error

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 13:10:32 +08:00
PandaMan
f462a9d85a fix(web): prevent LaTeX from being cut at \right] or \big) in agent o… (#13155)
### What problem does this PR solve?

- Use negative lookbehind (?<![a-zA-Z]) so \] and \) inside commands
(e.g. \right], \big)) are not treated as block/inline delimiters
- Use greedy matching to capture up to the last valid delimiter, fixing
truncated formulas (e.g. C_{seq}(y|x) = \frac{1}{|y|} ...)
- Add unit tests for preprocessLaTeX

Closes #13134


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 12:30:06 +08:00
Bradley Boveinis
3280772934 fix(helm): exclude password keys from env range loop to prevent duplicate YAML keys (#13136)
## Summary

- Fix duplicate YAML mapping keys in `helm/templates/env.yaml` that
cause deployment failures with strict YAML parsers

## Problem

The `range` loop in `env.yaml` iterates over all `.Values.env` keys and
emits them into a Secret. The exclusion filter skips host/port/user
keys, but does **not** skip password keys (`MYSQL_PASSWORD`,
`REDIS_PASSWORD`, `MINIO_PASSWORD`, `ELASTIC_PASSWORD`,
`OPENSEARCH_PASSWORD`). These same keys are then explicitly defined
again later in the template, producing duplicate YAML mapping keys.

Go's `yaml.v3` (used by Flux's helm-controller for post-rendering)
rejects duplicate keys per the YAML spec:

```
Helm install failed: yaml: unmarshal errors:
  mapping key "MINIO_PASSWORD" already defined
  mapping key "MYSQL_PASSWORD" already defined
  mapping key "REDIS_PASSWORD" already defined
```

Plain `helm install` does not surface this because Helm's internal
parser (`yaml.v2`) silently accepts duplicate keys (last value wins).

## Fix

Add password keys to the exclusion filter on line 12 so they are only
emitted by their explicit definitions later in the template.

Note: `MINIO_ROOT_USER` is intentionally **not** excluded — it is only
emitted by the range loop and has no explicit definition elsewhere.
Excluding it causes MinIO to crash with `Missing credential environment
variable, "MINIO_ROOT_USER"`.

## Test plan

- [ ] Deploy with Flux helm-controller (uses yaml.v3) — no duplicate key
errors
- [ ] Verify all passwords are present in the rendered Secret
- [ ] Verify `MINIO_ROOT_USER` is present in the rendered Secret
- [ ] Test with `DOC_ENGINE=elasticsearch` (ELASTIC_PASSWORD)
- [ ] Test with `DOC_ENGINE=opensearch` (OPENSEARCH_PASSWORD)

Fixes #13135
2026-02-24 11:09:31 +08:00
as-ondewo
91d1a81937 fix: error during admin tenant creation when using Postgres (#13164)
### What problem does this PR solve?

This fixes the bug described in #13130. When starting RAGFlow with
Postgres the admin tenant create failed because the rerank model was not
set.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 10:57:31 +08:00
chanx
59e9e77061 fix: Add admin proxy (#13186)
### What problem does this PR solve?

fix: Add admin proxy

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 10:29:58 +08:00
Magicbook1108
98e1d5aa5c Refact: switch from google-generativeai to google-genai (#13140)
### What problem does this PR solve?

Refact: switch from oogle-generativeai to google-genai  #13132
Refact: commnet out unused pywencai.

### Type of change

- [x] Refactoring
2026-02-24 10:28:33 +08:00
zagnaan
45aa3a0e89 fix(compose): use official opensearch image instead of hub.icert.top mirror (#13131)
### What problem does this PR solve?

The Docker Compose configuration was using hub.icert.top as the registry
for the OpenSearch image. That registry is not reachable in our
environment, which causes podman pull and docker compose pull to fail
with a connection refused error. As a result, the application cannot
start because the OpenSearch image cannot be downloaded.

This PR updates the image reference to use the official Docker Hub image
(opensearchproject/opensearch:2.19.1) instead of the hub.icert.top
mirror. After this change, the image pulls successfully and the services
start as expected.


![photo_2026-02-12_15-11-56](https://github.com/user-attachments/assets/6db736a5-b701-450f-96c1-9c23f092c3ab)


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Shynggys Samarkhanov <shynggys.samarkhanov@nixs.com>
2026-02-24 09:50:02 +08:00
Trifon
ce71d87867 Add Bulgarian language support (#13147)
### What problem does this PR solve?

RAGFlow supports 12 UI languages but does not include Bulgarian. This PR
adds Bulgarian (`bg` / `Български`) as the 13th supported language,
covering the full UI translation (2001 keys across all 26 sections) and
OCR/PDF parser language mapping.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### Changes

- **`web/src/constants/common.ts`** — Registered Bulgarian in all 5
language data structures (`LanguageList`, `LanguageMap`,
`LanguageAbbreviation` enum, `LanguageAbbreviationMap`,
`LanguageTranslationMap`)
- **`web/src/locales/config.ts`** — Added lazy-loading dynamic import
for the `bg` locale
- **`web/src/locales/bg.ts`** *(new)* — Full Bulgarian translation file
with all 26 sections, matching the English source (`en.ts`). All
interpolation placeholders, HTML tags, and technical terms are preserved
as-is
- **`deepdoc/parser/mineru_parser.py`** — Mapped `'Bulgarian'` to
`'cyrillic'` in `LANGUAGE_TO_MINERU_MAP` for OCR/PDF parser support

### How it works

The language selector automatically picks up the new entry. When a user
selects "Български", the translation bundle is lazy-loaded on demand.
The preference is persisted to the database and localStorage across
sessions.
2026-02-14 16:51:29 +08:00
chanx
f612d2254d Refactor: i18n language pack for on-demand import (#13139)
### What problem does this PR solve?

Refactor:   i18n language pack for on-demand import

### Type of change

- [x] Refactoring

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-02-13 18:42:48 +08:00
chanx
f2a1d59c36 Refactor: Remove ant design component (#13143)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Refactoring
2026-02-13 18:40:41 +08:00
writinwaters
bc9ed24a85 Docs: Updated v0.24.0 release notes. (#13129)
### What problem does this PR solve?

Added more details to v0.24.0 release notes.

### Type of change

- [x] Documentation Update
2026-02-12 20:14:05 +08:00
Levi
6d6c54db19 fix(metadata): handle unhashable list values in metadata split (#13116)
### What problem does this PR solve?

This PR fixes missing metadata on documents synced from the Moodle
connector, especially for **Book** modules.

Background:
- Moodle Book metadata includes fields like `chapters`, which is a
`list[dict]`.
- During metadata normalization in
`DocMetadataService._split_combined_values`, list deduplication used
`dict.fromkeys(...)`.
- `dict.fromkeys(...)` fails for unhashable values (like `dict`),
causing metadata update to fail.
- Result: documents were imported, but metadata was not saved for
affected module types (notably Books).

What this PR changes:
- Replaces hash-based list deduplication with `dedupe_list(...)`, which
safely handles unhashable list items while preserving order.
- This allows Book metadata (and other complex list metadata) to be
persisted correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Contribution during my time at RAGcon GmbH.
2026-02-12 19:48:51 +08:00
chanx
b922a5cbdf Fix: replace session page icons and fix nested list search functionality in filters (#13127)
### What problem does this PR solve?

Fix: replace session page icons and fix nested list search functionality
in filters

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-12 19:48:35 +08:00
Ahmad Intisar
5885f150ab fix: register WebDAVConnector in data_source __init__.py (#13121)
What problem does this PR solve?
The sync_data_source.py module imports WebDAVConnector from
common.data_source, but WebDAVConnector was never registered in the
package's __init__.py. This causes an ImportError at startup, crashing
the data sync service:
ImportError: cannot import name 'WebDAVConnector' from
'common.data_source'
The webdav_connector.py file already exists in the common/data_source/
directory — it just wasn't exported. This PR adds the import and
registers it in __all__.
Type of change

 Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
2026-02-12 16:05:58 +08:00
Magicbook1108
e89fd686e2 Improve: optimize file name (with path) in box container. (#13124)
### What problem does this PR solve?

Refact: optimize file name (with path) in box container. 

### Type of change

- [x] Performance Improvement

<img width="2357" height="1258" alt="image"
src="https://github.com/user-attachments/assets/f4c5c90b-d885-4514-b7bc-f17ab62b045f"
/>
2026-02-12 15:40:55 +08:00
疯癫
e72291bc9a Fix the bug where the mcp service tools/list does not return knowledge base IDs information. (#13123)
Fix the issue where the server-side parameter validation fails when the
id parameter is None in the asynchronous list_datasets method.

### What problem does this PR solve?

Fix the issue where the server-side parameter validation fails when the
id parameter is None in the asynchronous list_datasets method.

### Type of change

- [√ ] Bug Fix (non-breaking change which fixes an issue)
2026-02-12 15:40:15 +08:00
Lynn
6e7bcf58bc Refactor: split message apis to gateway and service (#13126)
### What problem does this PR solve?

Split message apis to gateway and service

### Type of change

- [x] Refactoring
2026-02-12 14:43:52 +08:00
chanx
7210178620 Fix: Bugs fixed (#13109) (#13122)
### What problem does this PR solve?

Fix: Bugs fixed (#13109)
- chat pdf preview error
- data source add box error
- change route next-chat -> chat , next-search->search ...

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-12 13:42:12 +08:00
Liu An
65ebc06956 Refa: test file location for better organization (#13107)
### What problem does this PR solve?

Renamed test/unit/test_delete_query_construction.py to
test/unit_test/common/test_delete_query_construction.py to align with
the project's directory structure and improve test categorization.

### Type of change

- [x] Refactoring
2026-02-12 10:15:09 +08:00
Lynn
30d5fc1a07 Refactor: split memory API into gateway and service layers (#13111)
### What problem does this PR solve?

Decouple the memory API into a gateway layer (for routing/param parse)
and a service layer (for business logic).

### Type of change

- [x] Refactoring
2026-02-12 10:11:50 +08:00
Levi
4b50b8c579 Fix: persist SSO auth token on root route loader (#12784)
### What problem does this PR solve?

This PR fixes SSO/OIDC login persistence after the Vite migration
#12568. Because wrappers are ignored by React Router, the OAuth callback
never stored the auth token in localStorage, causing auth to only work
while ?auth= stayed in the URL. We move that logic into a route loader
and remove the Bearer prefix for the signed token so the backend accepts
it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Contribution during my time at RAGcon GmbH.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2026-02-12 10:09:35 +08:00
TheoG
67937a668e Fix graphrag extraction (#13113)
### What problem does this PR solve?

Fix error when extracting the graph.
A string is expected, but a tuple was provided.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-11 20:11:56 +08:00
writinwaters
57dc25f150 Docs: Updated v0.24.0 release notes (#13115)
### What problem does this PR solve?

Updated v0.24.0 release notes.

### Type of change

- [x] Documentation Update
2026-02-11 18:08:56 +08:00
writinwaters
a9272c26eb Docs: Updated sandbox reference (#13114)
### What problem does this PR solve?

Updated sandbox reference.

### Type of change

- [x] Documentation Update
2026-02-11 17:58:08 +08:00
Kevin Hu
88920d23e6 Refa: Change aliyun repo. (#13103)
### Type of change

- [x] Refactoring
2026-02-11 11:21:03 +08:00
Jim Smith
7029b8ca81 Fix: Make time_utils tests timezone-independent (#13100)
## Summary
- Replace hardcoded CST (UTC+8) expected values in `test_time_utils.py`
with dynamically computed local-time expectations using
`time.localtime()` and `time.mktime()`
- Tests previously failed in any timezone other than UTC+8; they now
pass regardless of the system's local timezone

## Test plan
- [x] `uv run pytest test/unit_test/ -v` — 317 passed, 25 skipped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Jim Smith <jhsmith0@me.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:51:53 +08:00
Ahmad Intisar
99ed8e759d Fix: Correct Gemini embedding model name in llm_factories.json (#13051)
## Problem
   RAGFlow was using incorrect model names for Google Gemini embeddings:
   - `embedding-001` (missing `gemini-` prefix)
   - `text-embedding-004` (OpenAI model name, not Gemini)
   
   This caused API errors when users tried to use Gemini embeddings.
   
   ## Solution
- Updated `conf/llm_factories.json` to use the correct model name:
`gemini-embedding-001`
   - Removed the incorrect `text-embedding-004` entry
- Added volume mount in `docker-compose.yml` to ensure config changes
persist
   
   ## Testing
Tested with a valid Gemini API key and confirmed embeddings now work
correctly.

## Changes
- Modified `conf/llm_factories.json`
- Modified `docker/docker-compose.yml`

---------

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-02-11 09:49:48 +08:00
Magicbook1108
109441628b Fix: upload image files (#13071)
### What problem does this PR solve?

Fix: upload image files

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-11 09:47:33 +08:00
writinwaters
630f05b8a1 Docs: Added v0.24.0 release notes (#13096)
### What problem does this PR solve?

Added v0.24.0 release notes.

### Type of change


- [x] Documentation Update
2026-02-10 17:38:27 +08:00
1538 changed files with 228300 additions and 41923 deletions

192
.agents/rules/named.md Normal file
View File

@@ -0,0 +1,192 @@
# Go Naming Best Practices
## 1. Package Naming
- **All lowercase, no underscores**: `package user`, not `package userService` or `package user_service`
- **Short and meaningful**: `package http`, `package json`, `package dao`
- **Avoid plurals**: `package user` not `package users`
- **Avoid generic names**: Avoid `package util`, `package common`, `package base`
```go
// Recommended
package user
package handler
package service
// Not recommended
package UserService
package user_service
package utils
```
## 2. File Naming
- **All lowercase, underscore separated**: `user_handler.go`, `user_service.go`
- **Test files**: `user_handler_test.go`
- **Platform-specific**: `user_linux.go`, `user_windows.go`
```
user/
├── user_handler.go
├── user_service.go
├── user_dao.go
└── user_test.go
```
## 3. Directory Naming
- **All lowercase, no underscores or hyphens**: `internal/`, `pkg/`, `cmd/`
- **Short and descriptive**: `handler/`, `service/`, `dao/`
```
project/
├── cmd/ # Main entry point
│ └── server_main.go
├── internal/ # Private code
│ ├── handler/
│ ├── service/
│ ├── dao/
│ ├── model/
│ └── middleware/
├── pkg/ # Public code
└── api/ # API definitions
```
## 4. Interface Naming
- **Single-method interfaces end with "-er"**: `Reader`, `Writer`, `Handler`
- **Verb form**: `Reader`, `Executor`, `Validator`
```go
// Recommended
type Reader interface {
Read(p []byte) (n int, err error)
}
type UserService interface {
Register(req *RegisterRequest) (*User, error)
Login(req *LoginRequest) (*User, error)
}
// Not recommended
type UserInterface interface {}
type IUserService interface {}
```
## 5. Struct Naming
- **CamelCase**: `UserService`, `UserHandler`
- **Avoid redundant prefixes**: `User` not `UserModel`
```go
// Recommended
type UserService struct {}
type UserHandler struct {}
type RegisterRequest struct {}
// Not recommended
type user_service struct {}
type SUserService struct {}
type UserModel struct {}
```
## 6. Method/Function Naming
- **CamelCase**
- **Start with verb**: `GetUser`, `CreateUser`, `DeleteUser`
- **Boolean returns use Is/Has/Can prefix**: `IsValid`, `HasPermission`
```go
// Recommended
func (s *UserService) Register(req *RegisterRequest) (*User, error)
func (s *UserService) GetUserByID(id uint) (*User, error)
func (s *UserService) IsEmailExists(email string) bool
// Not recommended
func (s *UserService) register_user()
func (s *UserService) get_user_by_id()
func (s *UserService) CheckEmailExists() // Should use Is/Has
```
## 7. Constant Naming
- **CamelCase**: `const MaxRetryCount = 3`
- **Enum constants**: `const StatusActive = "active"`
```go
// Recommended
const (
StatusActive = "1"
StatusInactive = "0"
MaxRetryCount = 3
)
// Not recommended
const (
STATUS_ACTIVE = "1" // Not all uppercase
status_active = "1" // Not all lowercase
)
```
## 8. Error Variable Naming
- **Start with "Err"**: `ErrNotFound`, `ErrInvalidInput`
```go
// Recommended
var (
ErrNotFound = errors.New("not found")
ErrInvalidInput = errors.New("invalid input")
ErrUnauthorized = errors.New("unauthorized")
)
```
## 9. Acronyms Keep Consistent Case
```go
// Recommended
type HTTPHandler struct {}
var URL string
func GetHTTPClient() {}
func ParseJSON() {}
// Not recommended
type HttpHandler struct {}
var Url string
func GetHttpClient() {}
```
## 10. Project Structure Naming
```
project-name/
├── cmd/ # Main programs
│ └── app_name/
│ └── main.go
├── internal/ # Private code
│ ├── handler/ # HTTP handlers
│ ├── service/ # Business logic
│ ├── repository/ # Data access
│ ├── model/ # Data models
│ └── config/ # Configuration
├── pkg/ # Public code
├── api/ # API definitions
├── configs/ # Config files
├── scripts/ # Scripts
├── docs/ # Documentation
├── go.mod
└── go.sum
```
## Summary Table
| Type | Rule | Example |
| -------------- | ----------------------------------- | ------------------- |
| Package | All lowercase, no underscores | `package user` |
| File | All lowercase, underscore separated | `user_service.go` |
| Directory | All lowercase, no separators | `internal/handler/` |
| Struct | CamelCase, capitalized first letter | `UserService` |
| Interface | CamelCase, -er suffix | `Reader`, `Writer` |
| Method | CamelCase, verb prefix | `GetUserByID` |
| Constant | CamelCase | `MaxRetryCount` |
| Error Variable | Err prefix | `ErrNotFound` |

View File

@@ -0,0 +1,6 @@
---
name: go-naming
description: Go naming conventions and best practices. Use this skill when working with Go code and need to name packages, files, directories, structs, interfaces, functions, variables, or constants. Provides comprehensive naming guidelines following Go community standards.
---
Strictly follow the naming conventions in [rules/named.md](rules/named.md)

View File

@@ -23,7 +23,7 @@ concurrency:
jobs:
release:
runs-on: [ "self-hosted", "ragflow-test" ]
runs-on: [ "self-hosted", "ragflow-release" ]
steps:
- name: Ensure workspace ownership
run: echo "chown -R ${USER} ${GITHUB_WORKSPACE}" && sudo chown -R ${USER} ${GITHUB_WORKSPACE}

View File

@@ -129,20 +129,24 @@ jobs:
fi
fi
- name: Run unit test
- name: Build ragflow go server
run: |
uv sync --python 3.12 --group test --frozen
source .venv/bin/activate
which pytest || echo "pytest not in PATH"
echo "Start to run unit test"
python3 run_tests.py
BUILDER_CONTAINER=ragflow_build_$(od -An -N4 -tx4 /dev/urandom | tr -d ' ')
echo "BUILDER_CONTAINER=${BUILDER_CONTAINER}" >> ${GITHUB_ENV}
TZ=${TZ:-$(readlink -f /etc/localtime | awk -F '/zoneinfo/' '{print $2}')}
sudo docker run --privileged -d --name ${BUILDER_CONTAINER} -e TZ=${TZ} -e UV_INDEX=https://mirrors.aliyun.com/pypi/simple -v ${PWD}:/ragflow -v ${PWD}/internal/cpp/resource:/usr/share/infinity/resource infiniflow/infinity_builder:ubuntu22_clang20
sudo docker exec ${BUILDER_CONTAINER} bash -c "git config --global safe.directory \"*\" && cd /ragflow && ./build.sh --cpp"
./build.sh --go
if [[ -n "${BUILDER_CONTAINER}" ]]; then
sudo docker rm -f -v "${BUILDER_CONTAINER}"
fi
- name: Build ragflow:nightly
run: |
RUNNER_WORKSPACE_PREFIX=${RUNNER_WORKSPACE_PREFIX:-${HOME}}
RAGFLOW_IMAGE=infiniflow/ragflow:${GITHUB_RUN_ID}
echo "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> ${GITHUB_ENV}
sudo docker pull ubuntu:22.04
sudo docker pull ubuntu:24.04
sudo DOCKER_BUILDKIT=1 docker build --build-arg NEED_MIRROR=1 --build-arg HTTPS_PROXY=${HTTPS_PROXY} --build-arg HTTP_PROXY=${HTTP_PROXY} -f Dockerfile -t ${RAGFLOW_IMAGE} .
if [[ ${GITHUB_EVENT_NAME} == "schedule" ]]; then
export HTTP_API_TEST_LEVEL=p3
@@ -152,85 +156,297 @@ jobs:
echo "HTTP_API_TEST_LEVEL=${HTTP_API_TEST_LEVEL}" >> ${GITHUB_ENV}
echo "RAGFLOW_CONTAINER=${GITHUB_RUN_ID}-ragflow-cpu-1" >> ${GITHUB_ENV}
- name: Start ragflow:nightly
- name: Run unit test
run: |
uv sync --python 3.12 --group test --frozen
source .venv/bin/activate
which pytest || echo "pytest not in PATH"
echo "Start to run unit test"
python3 run_tests.py -i
- name: Prepare function test environment
working-directory: docker
run: |
# Determine runner number (default to 1 if not found)
RUNNER_NUM=$(sudo docker inspect $(hostname) --format '{{index .Config.Labels "com.docker.compose.container-number"}}' 2>/dev/null || true)
RUNNER_NUM=${RUNNER_NUM:-1}
RUNNER_NUM=$(sudo docker inspect $(hostname) --format '{{index .Config.Labels "com.docker.compose.container-number"}}' 2>/dev/null || true)
RUNNER_NUM=${RUNNER_NUM:-1}
# Compute port numbers using bash arithmetic
ES_PORT=$((1200 + RUNNER_NUM * 10))
OS_PORT=$((1201 + RUNNER_NUM * 10))
INFINITY_THRIFT_PORT=$((23817 + RUNNER_NUM * 10))
INFINITY_HTTP_PORT=$((23820 + RUNNER_NUM * 10))
INFINITY_PSQL_PORT=$((5432 + RUNNER_NUM * 10))
EXPOSE_MYSQL_PORT=$((5455 + RUNNER_NUM * 10))
MINIO_PORT=$((9000 + RUNNER_NUM * 10))
MINIO_CONSOLE_PORT=$((9001 + RUNNER_NUM * 10))
REDIS_PORT=$((6379 + RUNNER_NUM * 10))
TEI_PORT=$((6380 + RUNNER_NUM * 10))
KIBANA_PORT=$((6601 + RUNNER_NUM * 10))
SVR_HTTP_PORT=$((9380 + RUNNER_NUM * 10))
ADMIN_SVR_HTTP_PORT=$((9381 + RUNNER_NUM * 10))
SVR_MCP_PORT=$((9382 + RUNNER_NUM * 10))
SANDBOX_EXECUTOR_MANAGER_PORT=$((9385 + RUNNER_NUM * 10))
SVR_WEB_HTTP_PORT=$((80 + RUNNER_NUM * 10))
SVR_WEB_HTTPS_PORT=$((443 + RUNNER_NUM * 10))
ES_PORT=$((1200 + RUNNER_NUM * 10))
OS_PORT=$((1201 + RUNNER_NUM * 10))
INFINITY_THRIFT_PORT=$((23817 + RUNNER_NUM * 10))
INFINITY_HTTP_PORT=$((23820 + RUNNER_NUM * 10))
INFINITY_PSQL_PORT=$((5432 + RUNNER_NUM * 10))
EXPOSE_MYSQL_PORT=$((5455 + RUNNER_NUM * 10))
MINIO_PORT=$((9000 + RUNNER_NUM * 10))
MINIO_CONSOLE_PORT=$((9001 + RUNNER_NUM * 10))
REDIS_PORT=$((6379 + RUNNER_NUM * 10))
TEI_PORT=$((6380 + RUNNER_NUM * 10))
KIBANA_PORT=$((6601 + RUNNER_NUM * 10))
SVR_HTTP_PORT=$((9380 + RUNNER_NUM * 10))
ADMIN_SVR_HTTP_PORT=$((9381 + RUNNER_NUM * 10))
SVR_MCP_PORT=$((9382 + RUNNER_NUM * 10))
GO_HTTP_PORT=$((9384 + RUNNER_NUM * 10))
GO_ADMIN_PORT=$((9383 + RUNNER_NUM * 10))
SANDBOX_EXECUTOR_MANAGER_PORT=$((9385 + RUNNER_NUM * 10))
SVR_WEB_HTTP_PORT=$((80 + RUNNER_NUM * 10))
SVR_WEB_HTTPS_PORT=$((443 + RUNNER_NUM * 10))
# Persist computed ports into docker/.env so docker-compose uses the correct host bindings
echo "" >> docker/.env
echo -e "ES_PORT=${ES_PORT}" >> docker/.env
echo -e "OS_PORT=${OS_PORT}" >> docker/.env
echo -e "INFINITY_THRIFT_PORT=${INFINITY_THRIFT_PORT}" >> docker/.env
echo -e "INFINITY_HTTP_PORT=${INFINITY_HTTP_PORT}" >> docker/.env
echo -e "INFINITY_PSQL_PORT=${INFINITY_PSQL_PORT}" >> docker/.env
echo -e "EXPOSE_MYSQL_PORT=${EXPOSE_MYSQL_PORT}" >> docker/.env
echo -e "MINIO_PORT=${MINIO_PORT}" >> docker/.env
echo -e "MINIO_CONSOLE_PORT=${MINIO_CONSOLE_PORT}" >> docker/.env
echo -e "REDIS_PORT=${REDIS_PORT}" >> docker/.env
echo -e "TEI_PORT=${TEI_PORT}" >> docker/.env
echo -e "KIBANA_PORT=${KIBANA_PORT}" >> docker/.env
echo -e "SVR_HTTP_PORT=${SVR_HTTP_PORT}" >> docker/.env
echo -e "ADMIN_SVR_HTTP_PORT=${ADMIN_SVR_HTTP_PORT}" >> docker/.env
echo -e "SVR_MCP_PORT=${SVR_MCP_PORT}" >> docker/.env
echo -e "SANDBOX_EXECUTOR_MANAGER_PORT=${SANDBOX_EXECUTOR_MANAGER_PORT}" >> docker/.env
echo -e "SVR_WEB_HTTP_PORT=${SVR_WEB_HTTP_PORT}" >> docker/.env
echo -e "SVR_WEB_HTTPS_PORT=${SVR_WEB_HTTPS_PORT}" >> docker/.env
# Persist computed ports into .env so docker-compose uses the correct host bindings
echo "" >> .env
echo -e "ES_PORT=${ES_PORT}" >> .env
echo -e "OS_PORT=${OS_PORT}" >> .env
echo -e "INFINITY_THRIFT_PORT=${INFINITY_THRIFT_PORT}" >> .env
echo -e "INFINITY_HTTP_PORT=${INFINITY_HTTP_PORT}" >> .env
echo -e "INFINITY_PSQL_PORT=${INFINITY_PSQL_PORT}" >> .env
echo -e "EXPOSE_MYSQL_PORT=${EXPOSE_MYSQL_PORT}" >> .env
echo -e "MINIO_PORT=${MINIO_PORT}" >> .env
echo -e "MINIO_CONSOLE_PORT=${MINIO_CONSOLE_PORT}" >> .env
echo -e "REDIS_PORT=${REDIS_PORT}" >> .env
echo -e "TEI_PORT=${TEI_PORT}" >> .env
echo -e "KIBANA_PORT=${KIBANA_PORT}" >> .env
echo -e "SVR_HTTP_PORT=${SVR_HTTP_PORT}" >> .env
echo -e "ADMIN_SVR_HTTP_PORT=${ADMIN_SVR_HTTP_PORT}" >> .env
echo -e "SVR_MCP_PORT=${SVR_MCP_PORT}" >> .env
echo -e "GO_HTTP_PORT=${GO_HTTP_PORT}" >> .env
echo -e "GO_ADMIN_PORT=${GO_ADMIN_PORT}" >> .env
echo -e "SANDBOX_EXECUTOR_MANAGER_PORT=${SANDBOX_EXECUTOR_MANAGER_PORT}" >> .env
echo -e "SVR_WEB_HTTP_PORT=${SVR_WEB_HTTP_PORT}" >> .env
echo -e "SVR_WEB_HTTPS_PORT=${SVR_WEB_HTTPS_PORT}" >> .env
echo -e "COMPOSE_PROFILES=\${COMPOSE_PROFILES},tei-cpu" >> .env
echo -e "TEI_MODEL=BAAI/bge-small-en-v1.5" >> .env
echo -e "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> .env
echo "HOST_ADDRESS=http://host.docker.internal:${SVR_HTTP_PORT}" >> ${GITHUB_ENV}
echo -e "COMPOSE_PROFILES=\${COMPOSE_PROFILES},tei-cpu" >> docker/.env
echo -e "TEI_MODEL=BAAI/bge-small-en-v1.5" >> docker/.env
echo -e "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> docker/.env
echo "HOST_ADDRESS=http://host.docker.internal:${SVR_HTTP_PORT}" >> ${GITHUB_ENV}
# Patch entrypoint.sh for coverage
sed -i '/"\$PY" api\/ragflow_server.py \${INIT_SUPERUSER_ARGS} &/c\ echo "Ensuring coverage is installed..."\n "$PY" -m pip install coverage -i https://mirrors.aliyun.com/pypi/simple\n export COVERAGE_FILE=/ragflow/logs/.coverage\n echo "Starting ragflow_server with coverage..."\n "$PY" -m coverage run --source=./api/apps --omit="*/tests/*,*/migrations/*" -a api/ragflow_server.py ${INIT_SUPERUSER_ARGS} &' ./entrypoint.sh
cd ..
uv sync --python 3.12 --group test --frozen && uv pip install -e sdk/python
# Patch entrypoint.sh for coverage
sed -i '/"\$PY" api\/ragflow_server.py \${INIT_SUPERUSER_ARGS} &/c\ echo "Ensuring coverage is installed..."\n "$PY" -m pip install coverage\n export COVERAGE_FILE=/ragflow/logs/.coverage\n echo "Starting ragflow_server with coverage..."\n "$PY" -m coverage run --source=./api/apps --omit="*/tests/*,*/migrations/*" -a api/ragflow_server.py ${INIT_SUPERUSER_ARGS} &' docker/entrypoint.sh
- name: Start ragflow:nightly for Infinity
run: |
sed -i 's/^DOC_ENGINE=.*$/DOC_ENGINE=infinity/' docker/.env
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} up -d
- name: Run sdk tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
echo "Start to run test sdk on Infinity"
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-infinity-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-infinity-sdk.xml test/testcases/test_sdk_api 2>&1 | tee infinity_sdk_test.log
- name: Run web api tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_web_api/test_chunk_feedback 2>&1 | tee infinity_web_api_test.log
- name: Run http api tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 2>&1 | tee infinity_http_api_test.log
- name: RAGFlow CLI retrieval test Infinity
env:
PYTHONPATH: ${{ github.workspace }}
run: |
set -euo pipefail
source .venv/bin/activate
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
EMAIL="ci-${GITHUB_RUN_ID}@example.com"
PASS="ci-pass-${GITHUB_RUN_ID}"
DATASET="ci_dataset_${GITHUB_RUN_ID}"
CLI="python admin/client/ragflow_cli.py"
LOG_FILE="infinity_cli_test.log"
: > "${LOG_FILE}"
ERROR_RE='Traceback|ModuleNotFoundError|ImportError|Parse error|Bad response|Fail to|code:\\s*[1-9]'
run_cli() {
local logfile="$1"
shift
local allow_re=""
if [[ "${1:-}" == "--allow" ]]; then
allow_re="$2"
shift 2
fi
local cmd_display="$*"
echo "===== $(date -u +\"%Y-%m-%dT%H:%M:%SZ\") CMD: ${cmd_display} =====" | tee -a "${logfile}"
local tmp_log
tmp_log="$(mktemp)"
set +e
timeout 500s "$@" 2>&1 | tee "${tmp_log}"
local status=${PIPESTATUS[0]}
set -e
cat "${tmp_log}" >> "${logfile}"
if grep -qiE "${ERROR_RE}" "${tmp_log}"; then
if [[ -n "${allow_re}" ]] && grep -qiE "${allow_re}" "${tmp_log}"; then
echo "Allowed CLI error markers in ${logfile}"
rm -f "${tmp_log}"
return 0
fi
echo "Detected CLI error markers in ${logfile}"
rm -f "${tmp_log}"
exit 1
fi
rm -f "${tmp_log}"
return ${status}
}
set -a
source docker/.env
set +a
HOST_ADDRESS="http://host.docker.internal:${SVR_HTTP_PORT}"
USER_HOST="$(echo "${HOST_ADDRESS}" | sed -E 's#^https?://([^:/]+).*#\1#')"
USER_PORT="${SVR_HTTP_PORT}"
ADMIN_HOST="${USER_HOST}"
ADMIN_PORT="${ADMIN_SVR_HTTP_PORT}"
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
admin_ready=0
for i in $(seq 1 30); do
if run_cli "${LOG_FILE}" $CLI --type admin --host "$ADMIN_HOST" --port "$ADMIN_PORT" --username "admin@ragflow.io" --password "admin" command "ping"; then
admin_ready=1
break
fi
sleep 1
done
if [[ "${admin_ready}" -ne 1 ]]; then
echo "Admin service did not become ready"
exit 1
fi
run_cli "${LOG_FILE}" $CLI --type admin --host "$ADMIN_HOST" --port "$ADMIN_PORT" --username "admin@ragflow.io" --password "admin" command "show version"
ALLOW_USER_EXISTS_RE='already exists|already exist|duplicate|already.*registered|exist(s)?'
run_cli "${LOG_FILE}" --allow "${ALLOW_USER_EXISTS_RE}" $CLI --type admin --host "$ADMIN_HOST" --port "$ADMIN_PORT" --username "admin@ragflow.io" --password "admin" command "create user '$EMAIL' '$PASS'"
user_ready=0
for i in $(seq 1 30); do
if run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "ping"; then
user_ready=1
break
fi
sleep 1
done
if [[ "${user_ready}" -ne 1 ]]; then
echo "User service did not become ready"
exit 1
fi
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "show version"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "create dataset '$DATASET' with embedding 'BAAI/bge-small-en-v1.5@Builtin' parser 'auto'"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "import 'test/benchmark/test_docs/Doc1.pdf,test/benchmark/test_docs/Doc2.pdf' into dataset '$DATASET'"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "parse dataset '$DATASET' sync"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "Benchmark 16 100 search 'what are these documents about' on datasets '$DATASET'"
- name: Stop ragflow to save coverage Infinity
if: ${{ !cancelled() }}
run: |
# Send SIGINT to ragflow_server.py to trigger coverage save
PID=$(sudo docker exec ${RAGFLOW_CONTAINER} ps aux | grep "ragflow_server.py" | grep -v grep | awk '{print $2}' | head -n 1)
if [ -n "$PID" ]; then
echo "Sending SIGINT to ragflow_server.py (PID: $PID)..."
sudo docker exec ${RAGFLOW_CONTAINER} kill -INT $PID
# Wait for process to exit and coverage file to be written
sleep 10
else
echo "ragflow_server.py not found!"
fi
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} stop
- name: Generate server coverage report Infinity
if: ${{ !cancelled() }}
run: |
# .coverage file should be in docker/ragflow-logs/.coverage
if [ -f docker/ragflow-logs/.coverage ]; then
echo "Found .coverage file"
cp docker/ragflow-logs/.coverage .coverage
source .venv/bin/activate
# Create .coveragerc to map container paths to host paths
echo "[paths]" > .coveragerc
echo "source =" >> .coveragerc
echo " ." >> .coveragerc
echo " /ragflow" >> .coveragerc
coverage xml -o coverage-infinity-server.xml
rm .coveragerc
else
echo ".coverage file not found!"
fi
- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
if: ${{ !cancelled() }}
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
- name: Collect ragflow log Infinity
if: ${{ !cancelled() }}
run: |
if [ -d docker/ragflow-logs ]; then
cp -r docker/ragflow-logs ${ARTIFACTS_DIR}/ragflow-logs-infinity
echo "ragflow log" && tail -n 200 docker/ragflow-logs/ragflow_server.log || true
else
echo "No docker/ragflow-logs directory found; skipping log collection"
fi
sudo rm -rf docker/ragflow-logs || true
- name: Stop ragflow:nightly for Infinity
if: always() # always run this step even if previous steps failed
run: |
# Sometimes `docker compose down` fail due to hang container, heavy load etc. Need to remove such containers to release resources(for example, listen ports).
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} down -v || true
sudo docker ps -a --filter "label=com.docker.compose.project=${GITHUB_RUN_ID}" -q | xargs -r sudo docker rm -f
- name: Start ragflow:nightly for Elasticsearch
run: |
sed -i 's/^DOC_ENGINE=.*$/DOC_ENGINE=elasticsearch/' docker/.env
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} up -d
uv sync --python 3.12 --group test --frozen && uv pip install -e sdk/python
- name: Run sdk tests against Elasticsearch
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
echo "Start to run test sdk on Elasticsearch"
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-infinity-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-es-sdk.xml test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log
- name: Run web api tests against Elasticsearch
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_web_api 2>&1 | tee es_web_api_test.log
- name: Run http api tests against Elasticsearch
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 2>&1 | tee es_http_api_test.log
@@ -267,7 +483,7 @@ jobs:
local tmp_log
tmp_log="$(mktemp)"
set +e
timeout 180s "$@" 2>&1 | tee "${tmp_log}"
timeout 500s "$@" 2>&1 | tee "${tmp_log}"
local status=${PIPESTATUS[0]}
set -e
cat "${tmp_log}" >> "${logfile}"
@@ -295,8 +511,8 @@ jobs:
ADMIN_HOST="${USER_HOST}"
ADMIN_PORT="${ADMIN_SVR_HTTP_PORT}"
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null 2>&1; do
echo "Waiting for service to be available... (last exit code: $?)"
sleep 5
done
@@ -383,198 +599,7 @@ jobs:
fi
sudo rm -rf docker/ragflow-logs || true
- name: Stop ragflow:nightly
if: always() # always run this step even if previous steps failed
run: |
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} down -v || true
sudo docker ps -a --filter "label=com.docker.compose.project=${GITHUB_RUN_ID}" -q | xargs -r sudo docker rm -f
- name: Start ragflow:nightly
run: |
sed -i '1i DOC_ENGINE=infinity' docker/.env
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} up -d
- name: Run sdk tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} --junitxml=pytest-infinity-sdk.xml --cov=sdk/python/ragflow_sdk --cov-branch --cov-report=xml:coverage-infinity-sdk.xml test/testcases/test_sdk_api 2>&1 | tee infinity_sdk_test.log
- name: Run web api tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_web_api/test_api_app 2>&1 | tee infinity_web_api_test.log
- name: Run http api tests against Infinity
run: |
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 2>&1 | tee infinity_http_api_test.log
- name: RAGFlow CLI retrieval test Infinity
env:
PYTHONPATH: ${{ github.workspace }}
run: |
set -euo pipefail
source .venv/bin/activate
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
EMAIL="ci-${GITHUB_RUN_ID}@example.com"
PASS="ci-pass-${GITHUB_RUN_ID}"
DATASET="ci_dataset_${GITHUB_RUN_ID}"
CLI="python admin/client/ragflow_cli.py"
LOG_FILE="infinity_cli_test.log"
: > "${LOG_FILE}"
ERROR_RE='Traceback|ModuleNotFoundError|ImportError|Parse error|Bad response|Fail to|code:\\s*[1-9]'
run_cli() {
local logfile="$1"
shift
local allow_re=""
if [[ "${1:-}" == "--allow" ]]; then
allow_re="$2"
shift 2
fi
local cmd_display="$*"
echo "===== $(date -u +\"%Y-%m-%dT%H:%M:%SZ\") CMD: ${cmd_display} =====" | tee -a "${logfile}"
local tmp_log
tmp_log="$(mktemp)"
set +e
timeout 180s "$@" 2>&1 | tee "${tmp_log}"
local status=${PIPESTATUS[0]}
set -e
cat "${tmp_log}" >> "${logfile}"
if grep -qiE "${ERROR_RE}" "${tmp_log}"; then
if [[ -n "${allow_re}" ]] && grep -qiE "${allow_re}" "${tmp_log}"; then
echo "Allowed CLI error markers in ${logfile}"
rm -f "${tmp_log}"
return 0
fi
echo "Detected CLI error markers in ${logfile}"
rm -f "${tmp_log}"
exit 1
fi
rm -f "${tmp_log}"
return ${status}
}
set -a
source docker/.env
set +a
HOST_ADDRESS="http://host.docker.internal:${SVR_HTTP_PORT}"
USER_HOST="$(echo "${HOST_ADDRESS}" | sed -E 's#^https?://([^:/]+).*#\1#')"
USER_PORT="${SVR_HTTP_PORT}"
ADMIN_HOST="${USER_HOST}"
ADMIN_PORT="${ADMIN_SVR_HTTP_PORT}"
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
echo "Waiting for service to be available..."
sleep 5
done
admin_ready=0
for i in $(seq 1 30); do
if run_cli "${LOG_FILE}" $CLI --type admin --host "$ADMIN_HOST" --port "$ADMIN_PORT" --username "admin@ragflow.io" --password "admin" command "ping"; then
admin_ready=1
break
fi
sleep 1
done
if [[ "${admin_ready}" -ne 1 ]]; then
echo "Admin service did not become ready"
exit 1
fi
run_cli "${LOG_FILE}" $CLI --type admin --host "$ADMIN_HOST" --port "$ADMIN_PORT" --username "admin@ragflow.io" --password "admin" command "show version"
ALLOW_USER_EXISTS_RE='already exists|already exist|duplicate|already.*registered|exist(s)?'
run_cli "${LOG_FILE}" --allow "${ALLOW_USER_EXISTS_RE}" $CLI --type admin --host "$ADMIN_HOST" --port "$ADMIN_PORT" --username "admin@ragflow.io" --password "admin" command "create user '$EMAIL' '$PASS'"
user_ready=0
for i in $(seq 1 30); do
if run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "ping"; then
user_ready=1
break
fi
sleep 1
done
if [[ "${user_ready}" -ne 1 ]]; then
echo "User service did not become ready"
exit 1
fi
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "show version"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "create dataset '$DATASET' with embedding 'BAAI/bge-small-en-v1.5@Builtin' parser 'auto'"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "import 'test/benchmark/test_docs/Doc1.pdf,test/benchmark/test_docs/Doc2.pdf' into dataset '$DATASET'"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "parse dataset '$DATASET' sync"
run_cli "${LOG_FILE}" $CLI --type user --host "$USER_HOST" --port "$USER_PORT" --username "$EMAIL" --password "$PASS" command "Benchmark 16 100 search 'what are these documents about' on datasets '$DATASET'"
- name: Stop ragflow to save coverage Infinity
if: ${{ !cancelled() }}
run: |
# Send SIGINT to ragflow_server.py to trigger coverage save
PID=$(sudo docker exec ${RAGFLOW_CONTAINER} ps aux | grep "ragflow_server.py" | grep -v grep | awk '{print $2}' | head -n 1)
if [ -n "$PID" ]; then
echo "Sending SIGINT to ragflow_server.py (PID: $PID)..."
sudo docker exec ${RAGFLOW_CONTAINER} kill -INT $PID
# Wait for process to exit and coverage file to be written
sleep 10
else
echo "ragflow_server.py not found!"
fi
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} stop
- name: Generate server coverage report Infinity
if: ${{ !cancelled() }}
run: |
# .coverage file should be in docker/ragflow-logs/.coverage
if [ -f docker/ragflow-logs/.coverage ]; then
echo "Found .coverage file"
cp docker/ragflow-logs/.coverage .coverage
source .venv/bin/activate
# Create .coveragerc to map container paths to host paths
echo "[paths]" > .coveragerc
echo "source =" >> .coveragerc
echo " ." >> .coveragerc
echo " /ragflow" >> .coveragerc
coverage xml -o coverage-infinity-server.xml
rm .coveragerc
else
echo ".coverage file not found!"
fi
- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
if: ${{ !cancelled() }}
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
- name: Collect ragflow log
if: ${{ !cancelled() }}
run: |
if [ -d docker/ragflow-logs ]; then
cp -r docker/ragflow-logs ${ARTIFACTS_DIR}/ragflow-logs-infinity
echo "ragflow log" && tail -n 200 docker/ragflow-logs/ragflow_server.log || true
else
echo "No docker/ragflow-logs directory found; skipping log collection"
fi
sudo rm -rf docker/ragflow-logs || true
- name: Stop ragflow:nightly
- name: Stop ragflow:nightly for Elasticsearch
if: always() # always run this step even if previous steps failed
run: |
# Sometimes `docker compose down` fail due to hang container, heavy load etc. Need to remove such containers to release resources(for example, listen ports).

22
.gitignore vendored
View File

@@ -7,7 +7,7 @@ hudet/
cv/
layout_app.py
api/flask_session
venv/
# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
Cargo.lock
@@ -205,9 +205,29 @@ ragflow_cli.egg-info
backup
*huqie.txt
.hypothesis
# Added by cargo
/target
# Do not include in PR (local dev / build artifacts)
ragflow.egg-info/
uv-aarch64*.tar.gz
uv-aarch64-unknown-linux-gnu.tar.gz
docker/launch_backend_service_windows.sh
# C++ build directories
internal/cpp/build/
internal/cpp/cmake-build-release/
internal/cpp/cmake-build-debug/
# Trae IDE config
.trae/
# Go server build output
bin/*
!bin/.gitkeep

View File

@@ -5,14 +5,16 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It's a full-stack application with:
- Python backend (Flask-based API server)
- React/TypeScript frontend (built with UmiJS)
- React/TypeScript frontend (built with vitejs)
- Microservices architecture with Docker deployment
- Multiple data stores (MySQL, Elasticsearch/Infinity, Redis, MinIO)
## Architecture
### Backend (`/api/`)
- **Main Server**: `api/ragflow_server.py` - Flask application entry point
- **Apps**: Modular Flask blueprints in `api/apps/` for different functionalities:
- `kb_app.py` - Knowledge base management
@@ -24,25 +26,29 @@ RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on d
- **Models**: Database models in `api/db/db_models.py`
### Core Processing (`/rag/`)
- **Document Processing**: `deepdoc/` - PDF parsing, OCR, layout analysis
- **LLM Integration**: `rag/llm/` - Model abstractions for chat, embedding, reranking
- **RAG Pipeline**: `rag/flow/` - Chunking, parsing, tokenization
- **Graph RAG**: `rag/graphrag/` - Knowledge graph construction and querying
### Agent System (`/agent/`)
- **Components**: Modular workflow components (LLM, retrieval, categorize, etc.)
- **Templates**: Pre-built agent workflows in `agent/templates/`
- **Tools**: External API integrations (Tavily, Wikipedia, SQL execution, etc.)
### Frontend (`/web/`)
- React/TypeScript with UmiJS framework
- Ant Design + shadcn/ui components
- React/TypeScript with vitejs framework
- shadcn/ui components
- State management with Zustand
- Tailwind CSS for styling
## Common Development Commands
### Backend Development
```bash
# Install Python dependencies
uv sync --python 3.12 --all-extras
@@ -66,6 +72,7 @@ ruff format
```
### Frontend Development
```bash
cd web
npm install
@@ -76,6 +83,7 @@ npm run test # Jest tests
```
### Docker Development
```bash
# Full stack with Docker
cd docker
@@ -104,6 +112,7 @@ docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly
## Database Engines
RAGFlow supports switching between Elasticsearch (default) and Infinity:
- Set `DOC_ENGINE=infinity` in `docker/.env` to use Infinity
- Requires container restart: `docker compose down -v && docker compose up -d`
@@ -114,3 +123,12 @@ RAGFlow supports switching between Elasticsearch (default) and Infinity:
- Docker & Docker Compose
- uv package manager
- 16GB+ RAM, 50GB+ disk space
1. Think before acting. Read existing files before writing code.
2. Be concise in output but thorough in reasoning.
3. Prefer editing over rewriting whole files.
4. Do not re-read files you have already read.
5. Test your code before declaring done.
6. No sycophantic openers or closing fluff.
7. Keep solutions simple and direct.
8. User instructions always override this file.

View File

@@ -7,7 +7,7 @@ ARG NEED_MIRROR=0
WORKDIR /ragflow
# Copy models downloaded via download_deps.py
# copy models downloaded via download_deps.py
RUN mkdir -p /ragflow/rag/res/deepdoc /root/.ragflow
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co,target=/huggingface.co \
tar --exclude='.*' -cf - \
@@ -19,49 +19,49 @@ RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co
# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
cp -r /deps/nltk_data /root/ && \
cp /deps/tika-server-standard-3.2.3.jar /deps/tika-server-standard-3.2.3.jar.md5 /ragflow/ && \
cp /deps/tika-server-standard-3.3.0.jar /deps/tika-server-standard-3.3.0.jar.md5 /ragflow/ && \
cp /deps/cl100k_base.tiktoken /ragflow/9b5ad71b2ce5302211f9c61530b329a4922fc6a4
ENV TIKA_SERVER_JAR="file:///ragflow/tika-server-standard-3.2.3.jar"
ENV TIKA_SERVER_JAR="file:///ragflow/tika-server-standard-3.3.0.jar"
ENV DEBIAN_FRONTEND=noninteractive
# Setup apt
# Python package and implicit dependencies:
# opencv-python: libglib2.0-0 libglx-mesa0 libgl1
# python-pptx: default-jdk tika-server-standard-3.2.3.jar
# python-pptx: default-jdk tika-server-standard-3.3.0.jar
# selenium: libatk-bridge2.0-0 chrome-linux64-121-0-6167-85
# Building C extensions: libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
apt update && \
apt --no-install-recommends install -y ca-certificates; \
if [ "$NEED_MIRROR" == "1" ]; then \
sed -i 's|http://archive.ubuntu.com/ubuntu|https://mirrors.tuna.tsinghua.edu.cn/ubuntu|g' /etc/apt/sources.list.d/ubuntu.sources; \
sed -i 's|http://security.ubuntu.com/ubuntu|https://mirrors.tuna.tsinghua.edu.cn/ubuntu|g' /etc/apt/sources.list.d/ubuntu.sources; \
sed -i 's|http://archive.ubuntu.com/ubuntu|https://mirrors.aliyun.com/ubuntu|g' /etc/apt/sources.list.d/ubuntu.sources; \
sed -i 's|http://security.ubuntu.com/ubuntu|https://mirrors.aliyun.com/ubuntu|g' /etc/apt/sources.list.d/ubuntu.sources; \
fi; \
rm -f /etc/apt/apt.conf.d/docker-clean && \
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache && \
chmod 1777 /tmp && \
apt update && \
apt install -y libglib2.0-0 libglx-mesa0 libgl1 && \
apt install -y pkg-config libicu-dev libgdiplus && \
apt install -y default-jdk && \
apt install -y libatk-bridge2.0-0 && \
apt install -y libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev && \
apt install -y libjemalloc-dev && \
apt install -y gnupg unzip curl wget git vim less && \
apt install -y ghostscript && \
apt install -y pandoc && \
apt install -y texlive && \
apt install -y fonts-freefont-ttf fonts-noto-cjk && \
apt install -y postgresql-client
apt install -y \
build-essential libglib2.0-0 libglx-mesa0 libgl1 pkg-config libicu-dev libgdiplus default-jdk libatk-bridge2.0-0 libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev libjemalloc-dev gnupg unzip curl wget git vim less ghostscript pandoc texlive texlive-latex-extra texlive-xetex texlive-lang-chinese fonts-freefont-ttf fonts-noto-cjk postgresql-client
# Download resource from GitHub to /usr/share/infinity
RUN mkdir -p /usr/share/infinity/resource && \
if [ "$NEED_MIRROR" == "1" ]; then \
git clone --depth 1 --single-branch https://gitee.com/infiniflow/resource /tmp/resource; \
else \
git clone --depth 1 --single-branch https://github.com/infiniflow/resource.git /tmp/resource; \
fi && \
cp -r /tmp/resource/* /usr/share/infinity/resource && \
rm -rf /tmp/resource
ARG NGINX_VERSION=1.29.5-1~noble
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
mkdir -p /etc/apt/keyrings && \
curl -fsSL https://nginx.org/keys/nginx_signing.key | gpg --dearmor -o /etc/apt/keyrings/nginx-archive-keyring.gpg && \
curl --retry 5 --retry-delay 2 --retry-all-errors -fsSL https://nginx.org/keys/nginx_signing.key | gpg --dearmor -o /etc/apt/keyrings/nginx-archive-keyring.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nginx-archive-keyring.gpg] https://nginx.org/packages/mainline/ubuntu/ noble nginx" > /etc/apt/sources.list.d/nginx.list && \
apt update && \
apt install -y nginx=${NGINX_VERSION} && \
apt -o Acquire::Retries=5 update && \
apt -o Acquire::Retries=5 install -y nginx=${NGINX_VERSION} && \
apt-mark hold nginx
# Install uv
@@ -70,7 +70,7 @@ RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps
mkdir -p /etc/uv && \
echo 'python-install-mirror = "https://registry.npmmirror.com/-/binary/python-build-standalone/"' > /etc/uv/uv.toml && \
echo '[[index]]' >> /etc/uv/uv.toml && \
echo 'url = "https://pypi.tuna.tsinghua.edu.cn/simple"' >> /etc/uv/uv.toml && \
echo 'url = "https://mirrors.aliyun.com/pypi/simple"' >> /etc/uv/uv.toml && \
echo 'default = true' >> /etc/uv/uv.toml; \
fi; \
arch="$(uname -m)"; \
@@ -80,33 +80,19 @@ RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps
&& rm -rf "uv-${uv_arch}-unknown-linux-gnu" \
&& uv python install 3.12
ENV PYTHONDONTWRITEBYTECODE=1 DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
ENV PYTHONDONTWRITEBYTECODE=1 DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 \
UV_HTTP_TIMEOUT=200 \
UV_HTTP_RETRIES=3
ENV PATH=/root/.local/bin:$PATH
# nodejs 12.22 on Ubuntu 22.04 is too old
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
apt purge -y nodejs npm cargo && \
apt purge -y nodejs npm && \
apt autoremove -y && \
apt update && \
apt install -y nodejs
# A modern version of cargo is needed for the latest version of the Rust compiler.
RUN apt update && apt install -y curl build-essential \
&& if [ "$NEED_MIRROR" == "1" ]; then \
# Use TUNA mirrors for rustup/rust dist files \
export RUSTUP_DIST_SERVER="https://mirrors.tuna.tsinghua.edu.cn/rustup"; \
export RUSTUP_UPDATE_ROOT="https://mirrors.tuna.tsinghua.edu.cn/rustup/rustup"; \
echo "Using TUNA mirrors for Rustup."; \
fi; \
# Force curl to use HTTP/1.1 \
curl --proto '=https' --tlsv1.2 --http1.1 -sSf https://sh.rustup.rs | bash -s -- -y --profile minimal \
&& echo 'export PATH="/root/.cargo/bin:${PATH}"' >> /root/.bashrc
ENV PATH="/root/.cargo/bin:${PATH}"
RUN cargo --version && rustc --version
# Add msssql ODBC driver
# macOS ARM64 environment, install msodbcsql18.
# general x86_64 environment, install msodbcsql17.
@@ -157,9 +143,9 @@ COPY pyproject.toml uv.lock ./
# uv records index url into uv.lock but doesn't failover among multiple indexes
RUN --mount=type=cache,id=ragflow_uv,target=/root/.cache/uv,sharing=locked \
if [ "$NEED_MIRROR" == "1" ]; then \
sed -i 's|pypi.org|pypi.tuna.tsinghua.edu.cn|g' uv.lock; \
sed -i 's|pypi.org|mirrors.aliyun.com/pypi|g' uv.lock; \
else \
sed -i 's|pypi.tuna.tsinghua.edu.cn|pypi.org|g' uv.lock; \
sed -i 's|mirrors.aliyun.com/pypi|pypi.org|g' uv.lock; \
fi; \
uv sync --python 3.12 --frozen && \
# Ensure pip is available in the venv for runtime package installation (fixes #12651)
@@ -168,8 +154,8 @@ RUN --mount=type=cache,id=ragflow_uv,target=/root/.cache/uv,sharing=locked \
COPY web web
COPY docs docs
RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \
export NODE_OPTIONS="--max-old-space-size=4096" && \
cd web && npm install && npm run build
cd web && NODE_OPTIONS="--max-old-space-size=8192" npm install && \
NODE_OPTIONS="--max-old-space-size=8192" VITE_BUILD_SOURCEMAP=false VITE_MINIFY=esbuild npm run build
COPY .git /ragflow/.git
@@ -202,11 +188,19 @@ COPY pyproject.toml uv.lock ./
COPY mcp mcp
COPY common common
COPY memory memory
COPY bin bin
COPY docker/service_conf.yaml.template ./conf/service_conf.yaml.template
COPY docker/entrypoint.sh ./
RUN chmod +x ./entrypoint*.sh
# Copy nginx configuration for frontend serving
COPY docker/nginx/ragflow.conf.golang docker/nginx/ragflow.conf.python docker/nginx/ragflow.conf.hybrid docker/nginx/nginx.conf docker/nginx/proxy.conf /etc/nginx/
RUN mv /etc/nginx/ragflow.conf.golang /etc/nginx/conf.d/ragflow.conf.golang && \
mv /etc/nginx/ragflow.conf.python /etc/nginx/conf.d/ragflow.conf.python && \
mv /etc/nginx/ragflow.conf.hybrid /etc/nginx/conf.d/ragflow.conf.hybrid && \
rm -f /etc/nginx/sites-enabled/default
# Copy compiled web pages
COPY --from=builder /ragflow/web/dist /ragflow/web/dist

View File

@@ -3,7 +3,7 @@
FROM scratch
# Copy resources downloaded via download_deps.py
COPY chromedriver-linux64-121-0-6167-85 chrome-linux64-121-0-6167-85 cl100k_base.tiktoken libssl1.1_1.1.1f-1ubuntu2_amd64.deb libssl1.1_1.1.1f-1ubuntu2_arm64.deb tika-server-standard-3.2.3.jar tika-server-standard-3.2.3.jar.md5 libssl*.deb uv-x86_64-unknown-linux-gnu.tar.gz uv-aarch64-unknown-linux-gnu.tar.gz /
COPY chromedriver-linux64-121-0-6167-85 chrome-linux64-121-0-6167-85 cl100k_base.tiktoken libssl1.1_1.1.1f-1ubuntu2_amd64.deb libssl1.1_1.1.1f-1ubuntu2_arm64.deb tika-server-standard-3.3.0.jar tika-server-standard-3.3.0.jar.md5 libssl*.deb uv-x86_64-unknown-linux-gnu.tar.gz uv-aarch64-unknown-linux-gnu.tar.gz /
COPY nltk_data /nltk_data

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="ragflow logo">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -76,7 +79,7 @@
## 🎮 Demo
Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
Try our demo at [https://cloud.ragflow.io](https://cloud.ragflow.io).
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -85,6 +88,7 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Latest Updates
- 2026-03-24 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — Provides an official skill for accessing RAGFlow datasets via OpenClaw.
- 2025-12-26 Supports 'Memory' for AI agent.
- 2025-11-19 Supports Gemini 3 Pro.
- 2025-11-12 Supports data synchronization from Confluence, S3, Notion, Discord, Google Drive.
@@ -188,12 +192,12 @@ releases! 🌟
> All Docker images are built for x86 platforms. We don't currently offer Docker images for ARM64.
> If you are on an ARM64 platform, follow [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a Docker image compatible with your system.
> The command below downloads the `v0.24.0` edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from `v0.24.0`, update the `RAGFLOW_IMAGE` variable accordingly in **docker/.env** before using `docker compose` to start the server.
> The command below downloads the `v0.25.0` edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from `v0.25.0`, update the `RAGFLOW_IMAGE` variable accordingly in **docker/.env** before using `docker compose` to start the server.
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# Optional: use a stable tag (see releases: https://github.com/infiniflow/ragflow/releases)
# This step ensures the **entrypoint.sh** file in the code matches the Docker image version.
@@ -325,7 +329,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
3. Launch the dependent services (MinIO, Elasticsearch, Redis, and MySQL) using Docker Compose:
@@ -389,8 +393,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

414
README_ar.md Normal file
View File

@@ -0,0 +1,414 @@
<div align="center">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="ragflow logo">
</a>
</div>
<p align="center">
<a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
<a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
<a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
<a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DBEDFA"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
</a>
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
</a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
<a href="https://deepwiki.com/infiniflow/ragflow">
<img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
</a>
</p>
<h4 align="center">
<a href="https://ragflow.io/docs/dev/">Document</a> |
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/ragflow-octoverse.png" width="1200"/>
</div>
<div align="center">
<a href="https://trendshift.io/repositories/9064" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9064" alt="infiniflow%2Fragflow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
<details open>
<summary><b>📕 جدول المحتويات</b></summary>
- 💡 [ما هو RAGFlow؟](#-what-is-ragflow)
- 🎮 [Demo](#-demo)
- 📌 [آخر التحديثات](#-latest-updates)
- 🌟 [الميزات الرئيسية](#-key-features)
- 🔎 [بنية النظام](#-system-architecture)
- 🎬 [ابدأ](#-get-started)
- 🔧 [التكوينات](#-configurations)
- 🔧 [إنشاء صورة Docker](#-build-a-docker-image)
- 🔨 [إطلاق الخدمة من المصدر للتطوير](#-launch-service-from-source-for-development)
- 📚 [التوثيق](#-documentation)
- 📜 [Roadmap](#-roadmap)
- 🏄 [المجتمع](#-community)
- 🙌 [مساهمة](#-contributing)
</details>
## 💡 ما هو RAGFlow؟
يُعد مشروع [RAGFlow](https://ragflow.io/) محركًا رائدًا ومفتوح المصدر للاسترجاع المعزز بالتوليد (<bdi dir="ltr">RAG</bdi>)، ويجمع أحدث تقنيات <bdi dir="ltr">RAG</bdi> مع قدرات الوكلاء لبناء طبقة سياق متقدمة لنماذج <bdi dir="ltr">LLMs</bdi>. يوفّر سير عمل <bdi dir="ltr">RAG</bdi> مبسّطًا وقابلًا للتكيّف مع المؤسسات بمختلف أحجامها. وبالاعتماد على [محرك سياق موحّد](https://ragflow.io/basics/what-is-agent-context-engine) وقوالب وكلاء جاهزة، يتيح <bdi dir="ltr">RAGFlow</bdi> للمطورين تحويل البيانات المعقّدة إلى أنظمة <bdi dir="ltr">AI</bdi> عالية الدقة وجاهزة للإنتاج بكفاءة وموثوقية.
## 🎮 Demo
جرّب النسخة التجريبية على [https://cloud.ragflow.io](https://cloud.ragflow.io).
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
</div>
## 🔥 آخر التحديثات
- 2026-03-24 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — توفر مهارة رسمية للوصول إلى مجموعات بيانات RAGFlow عبر OpenClaw.
- 2025-12-26 يدعم ميزة "Memory" لوكلاء الذكاء الاصطناعي.
- 11-11-2025 يدعم Gemini 3 Pro.
- 12-11-2025 يدعم مزامنة البيانات من Confluence، S3، Notion، Discord، Google Drive.
- 23-10-2025 يدعم MinerU وDocling كطرق لتحليل المستندات.
- 15-10-2025 يدعم العرض الأوركسترالي pipeline.
- 08-08-2025 يدعم أحدث موديلات سلسلة OpenAI.
- 01-08-2025 يدعم سير العمل الوكيل وMCP.
- 23-05-2025 تمت إضافة مكون منفذ كود Python/JavaScript إلى Agent.
- 05-05-2025 يدعم الاستعلام بين اللغات.
- 19-03-2025 يدعم استخدام نموذج متعدد الوسائط لفهم الصور داخل ملفات PDF أو DOCX.
## 🎉 تابعونا
⭐️ قم بتمييز مستودعنا بنجمة لتبقى على اطلاع بالميزات والتحسينات الجديدة والمثيرة! احصل على إشعارات فورية بالجديد
الإصدارات! 🌟
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
</div>
## 🌟 الميزات الرئيسية
### 🍭 **"الجودة في الداخل، الجودة في الخارج"**
- [الفهم العميق للمستندات](./deepdoc/README.md) لاستخراج المعرفة من البيانات غير المنظمة
ذات التنسيقات المعقدة.
- يجد "إبرة في كومة قش بيانات" من الرموز غير المحدودة حرفيًا.
### 🍱 **التقطيع القائم على القالب**
- ذكي وقابل للتفسير.
- الكثير من خيارات القالب للاختيار من بينها.
### 🌱 **استشهادات مؤرضة لتقليل الهلوسة**
- تصور تقطيع النص للسماح بالتدخل البشري.
- عرض سريع للمراجع الرئيسية والاستشهادات التي يمكن تتبعها لدعم الإجابات المبنية على أسس سليمة.
### 🍔 **التوافق مع مصادر البيانات غير المتجانسة**
- يدعم Word، والشرائح، وExcel، وtxt، والصور، والنسخ الممسوحة ضوئيًا، والبيانات المنظمة، وصفحات الويب، والمزيد.
### 🛀 **سير عمل RAG آلي وسهل**
- تنسيق RAG مبسط يلبي احتياجات الشركات الشخصية والكبيرة على حد سواء.
- نماذج LLMs قابلة للتكوين بالإضافة إلى نماذج embedding.
- الاستدعاء المتعدد المقترن بإعادة التصنيف المدمجة.
- APIs بديهي للتكامل السلس مع الأعمال.
## 🔎 هندسة النظام
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/user-attachments/assets/31b0dd6f-ca4f-445a-9457-70cb44a381b2" width="1000"/>
</div>
## 🎬 ابدأ
### 📝 المتطلبات الأساسية
- CPU >= 4 مراكز
- الرام >= 16 جيجا
- القرص >= 50 جيجا بايت
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
- [gVisor](https://gvisor.dev/docs/user_guide/install/): مطلوب فقط إذا كنت تنوي استخدام ميزة منفذ التعليمات البرمجية (وضع الحماية) لـ RAGFlow.
> [!TIP]
> إذا لم تقم بتثبيت Docker على جهازك المحلي (Windows أو Mac أو Linux)، راجع [تثبيت Docker Engine](https://docs.docker.com/engine/install/).
### 🚀 بدء تشغيل الخادم
1. تأكد من `vm.max_map_count` >= 262144:
> للتحقق من قيمة `vm.max_map_count`:
>
> ```bash
> $ sysctl vm.max_map_count
> ```
>
> أعد تعيين `vm.max_map_count` إلى قيمة 262144 على الأقل إذا لم تكن كذلك.
>
> ```bash
> # In this case, we set it to 262144:
> $ sudo sysctl -w vm.max_map_count=262144
> ```
>
> سيتم إعادة ضبط هذا التغيير بعد إعادة تشغيل النظام. لضمان بقاء التغيير دائمًا، قم بإضافة أو تحديث
> `vm.max_map_count` القيمة في **/etc/sysctl.conf** وفقًا لذلك:
>
> ```bash
> vm.max_map_count=262144
> ```
>
2. استنساخ الريبو:
```bash
$ git clone https://github.com/infiniflow/ragflow.git
```
3. ابدأ تشغيل الخادم باستخدام صور Docker المعدة مسبقًا:
> [!CAUTION]
> جميع الصور Docker مصممة لمنصات x86. لا نعرض حاليًا صور Docker لـ ARM64.
> إذا كنت تستخدم نظامًا أساسيًا ARM64، فاتبع [هذا الدليل](https://ragflow.io/docs/dev/build_docker_image) لإنشاء صورة Docker متوافقة مع نظامك.
> يقوم الأمر أدناه بتنزيل إصدار `v0.25.0` من الصورة RAGFlow Docker. راجع الجدول التالي للحصول على أوصاف لإصدارات RAGFlow المختلفة. لتنزيل إصدار RAGFlow مختلف عن `v0.25.0`، قم بتحديث المتغير `RAGFLOW_IMAGE` وفقًا لذلك في **docker/.env** قبل استخدام `docker compose` لبدء تشغيل الخادم.
```bash
$ cd ragflow/docker
# git checkout v0.25.0
# Optional: use a stable tag (see releases: https://github.com/infiniflow/ragflow/releases)
# This step ensures the **entrypoint.sh** file in the code matches the Docker image version.
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d
```
> ملاحظة: قبل `v0.22.0`، قدمنا ​​كلتا الصورتين بنماذج embedding وصورًا رفيعة بدون نماذج embedding. التفاصيل على النحو التالي:
| RAGFlow علامة الصورة | حجم الصورة (جيجابايت) | هل لديه نماذج embedding؟ | مستقر؟ |
|-------------------|-----------------|-----------------------|----------------|
| v0.21.1 | &approx;9 | ✔️ | إصدار مستقر |
| v0.21.1-slim | &approx;2 | ❌ | إصدار مستقر |
> بدءًا من `v0.22.0`، نقوم بشحن الإصدار النحيف فقط ولم نعد نلحق اللاحقة **-slim** بعلامة الصورة.
4. التحقق من حالة الخادم بعد تشغيل الخادم:
```bash
$ docker logs -f docker-ragflow-cpu-1
```
_النتيجة التالية تؤكد الإطلاق الناجح للنظام:_
```bash
____ ___ ______ ______ __
/ __ \ / | / ____// ____// /____ _ __
/ /_/ // /| | / / __ / /_ / // __ \| | /| / /
/ _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ /
/_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/
* Running on all addresses (0.0.0.0)
```
> إذا تخطيت خطوة التأكيد هذه وقمت بتسجيل الدخول مباشرة إلى RAGFlow، فقد يعرض متصفحك تنبيه `network abnormal`
> خطأ لأنه في تلك اللحظة، قد لا تتم تهيئة RAGFlow بشكل كامل.
>
5. في متصفح الويب الخاص بك، أدخل عنوان IP الخاص بالخادم الخاص بك وقم بتسجيل الدخول إلى RAGFlow.
> باستخدام الإعدادات الافتراضية، ما عليك سوى إدخال `http://IP_OF_YOUR_MACHINE` (**من دون** رقم المنفذ) كإعداد افتراضي
> HTTP يمكن حذف منفذ العرض `80` عند استخدام التكوينات الافتراضية.
>
6. في [service_conf.yaml.template](./docker/service_conf.yaml.template)، حدد المصنع LLM المطلوب في `user_default_llm` وقم بالتحديث
الحقل `API_KEY` مع مفتاح API المقابل.
> راجع [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) لمزيد من المعلومات.
>
_العرض بدأ!_
## 🔧 التكوينات
عندما يتعلق الأمر بتكوينات النظام، ستحتاج إلى إدارة الملفات التالية:
- [.env](./docker/.env): يحتفظ بالإعدادات الأساسية للنظام، مثل `SVR_HTTP_PORT`، `MYSQL_PASSWORD`، و
`MINIO_PASSWORD`.
- [service_conf.yaml.template](./docker/service_conf.yaml.template): تكوين الخدمات الخلفية. سيتم ملء متغيرات البيئة في هذا الملف تلقائيًا عند بدء تشغيل الحاوية Docker. ستكون أي متغيرات بيئة تم تعيينها داخل حاوية Docker متاحة للاستخدام، مما يسمح لك بتخصيص سلوك الخدمة استنادًا إلى بيئة النشر.
- [docker-compose.yml](./docker/docker-compose.yml): يعتمد النظام على [docker-compose.yml](./docker/docker-compose.yml) لبدء التشغيل.
> يوفر الملف [./docker/README](./docker/README.md) وصفًا تفصيليًا لإعدادات البيئة والخدمة
> التكوينات التي يمكن استخدامها كـ `${ENV_VARS}` في ملف [service_conf.yaml.template](./docker/service_conf.yaml.template).
لتحديث منفذ العرض الافتراضي HTTP (80)، انتقل إلى [docker-compose.yml](./docker/docker-compose.yml) وقم بتغيير `80:80`
إلى `<YOUR_SERVING_PORT>:80`.
تتطلب تحديثات التكوينات المذكورة أعلاه إعادة تشغيل جميع الحاويات لتصبح سارية المفعول:
> ```bash
> $ docker compose -f docker-compose.yml up -d
> ```
### تبديل محرك المستندات من Elasticsearch إلى Infinity
RAGFlow يستخدم Elasticsearch بشكل افتراضي لتخزين النص الكامل والمتجهات. للتبديل إلى [Infinity](https://github.com/infiniflow/infinity/)، اتبع الخطوات التالية:
1. إيقاف كافة الحاويات قيد التشغيل:
```bash
$ docker compose -f docker/docker-compose.yml down -v
```
> [!WARNING]
> `-v` سوف يحذف docker وحدات تخزين الحاوية، وسيتم مسح البيانات الموجودة.
2. اضبط `DOC_ENGINE` في **docker/.env** على `infinity`.
3. ابدأ الحاويات:
```bash
$ docker compose -f docker-compose.yml up -d
```
> [!WARNING]
> التبديل إلى Infinity على جهاز Linux/arm64 غير مدعوم رسميًا بعد.
## 🔧 أنشئ صورة Docker
يبلغ حجم هذه الصورة حوالي 2 غيغابايت وتعتمد على خدمات LLM وembedding الخارجية.
```bash
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
```
أو إذا كنت خلف وكيل، فيمكنك تمرير وسيطات الوكيل:
```bash
docker build --platform linux/amd64 \
--build-arg http_proxy=http://YOUR_PROXY:PORT \
--build-arg https_proxy=http://YOUR_PROXY:PORT \
-f Dockerfile -t infiniflow/ragflow:nightly .
```
## 🔨 إطلاق الخدمة من المصدر للتطوير
1. قم بتثبيت `uv` و`pre-commit`، أو قم بتخطي هذه الخطوة إذا كانا مثبتين بالفعل:
```bash
pipx install uv pre-commit
```
2. استنساخ الكود المصدري وتثبيت تبعيات بايثون:
```bash
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run python3 download_deps.py
pre-commit install
```
3. قم بتشغيل الخدمات التابعة (MinIO وElasticsearch وRedis وMySQL) باستخدام Docker Compose:
```bash
docker compose -f docker/docker-compose-base.yml up -d
```
أضف السطر التالي إلى `/etc/hosts` لحل كافة المضيفين المحددين في **docker/.env** إلى `127.0.0.1`:
```
127.0.0.1 es01 infinity mysql minio redis sandbox-executor-manager
```
4. إذا لم تتمكن من الوصول إلى HuggingFace، فقم بتعيين متغير البيئة `HF_ENDPOINT` لاستخدام موقع مرآة:
```bash
export HF_ENDPOINT=https://hf-mirror.com
```
5. إذا كان نظام التشغيل لديك لا يحتوي على jemalloc، فيرجى تثبيته على النحو التالي:
```bash
# Ubuntu
sudo apt-get install libjemalloc-dev
# CentOS
sudo yum install jemalloc
# OpenSUSE
sudo zypper install jemalloc
# macOS
sudo brew install jemalloc
```
6. إطلاق الخدمة الخلفية:
```bash
source .venv/bin/activate
export PYTHONPATH=$(pwd)
bash docker/launch_backend_service.sh
```
7. تثبيت تبعيات الواجهة الأمامية:
```bash
cd web
npm install
```
8. إطلاق خدمة الواجهة الأمامية:
```bash
npm run dev
```
_النتيجة التالية تؤكد الإطلاق الناجح للنظام:_
![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
9. أوقف خدمة الواجهة الأمامية والخلفية RAGFlow بعد اكتمال التطوير:
```bash
pkill -f "ragflow_server.py|task_executor.py"
```
## 📚 التوثيق
- [البدء السريع](https://ragflow.io/docs/dev/)
- [التكوين](https://ragflow.io/docs/dev/configurations)
- [ملاحظات الإصدار](https://ragflow.io/docs/dev/release_notes)
- [أدلة المستخدم](https://ragflow.io/docs/category/user-guides)
- [أدلة المطورين](https://ragflow.io/docs/category/developer-guides)
- [المراجع](https://ragflow.io/docs/dev/category/references)
- [الأسئلة الشائعة](https://ragflow.io/docs/dev/faq)
## 📜 Roadmap
راجع [RAGFlow Roadmap 2026](https://github.com/infiniflow/ragflow/issues/12241)
## 🏄 المجتمع
- [Discord](https://discord.gg/NjYzJD3GM3)
- [Twitter](https://twitter.com/infiniflowai)
- [مناقشات جيثب](https://github.com/orgs/infiniflow/discussions)
## 🙌 المساهمة
RAGFlow يزدهر من خلال التعاون مفتوح المصدر. وبهذه الروح، فإننا نحتضن المساهمات المتنوعة من المجتمع.
إذا كنت ترغب في أن تكون جزءًا، فراجع [إرشادات المساهمة](https://ragflow.io/docs/dev/contributing) أولاً.

405
README_fr.md Normal file
View File

@@ -0,0 +1,405 @@
<div align="center">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="ragflow logo">
</a>
</div>
<p align="center">
<a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
<a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
<a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
<a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DBEDFA"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="suivre sur X(Twitter)">
</a>
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Badge statique" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Dernière%20version" alt="Dernière version">
</a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="licence">
</a>
<a href="https://deepwiki.com/infiniflow/ragflow">
<img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
</a>
</p>
<h4 align="center">
<a href="https://ragflow.io/docs/dev/">Documentation</a> |
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://cloud.ragflow.io">Démo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/ragflow-octoverse.png" width="1200"/>
</div>
<div align="center">
<a href="https://trendshift.io/repositories/9064" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9064" alt="infiniflow%2Fragflow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
<details open>
<summary><b>📕 Table des matières</b></summary>
- 💡 [Qu'est-ce que RAGFlow?](#-quest-ce-que-ragflow)
- 🎮 [Démo](#-démo)
- 📌 [Dernières mises à jour](#-dernières-mises-à-jour)
- 🌟 [Fonctionnalités clés](#-fonctionnalités-clés)
- 🔎 [Architecture du système](#-architecture-du-système)
- 🎬 [Démarrage](#-démarrage)
- 🔧 [Configurations](#-configurations)
- 🔧 [Construire une image Docker](#-construire-une-image-docker)
- 🔨 [Lancer le service depuis les sources pour le développement](#-lancer-le-service-depuis-les-sources-pour-le-développement)
- 📚 [Documentation](#-documentation)
- 📜 [Roadmap](#-feuille-de-route)
- 🏄 [Communauté](#-communauté)
- 🙌 [Contribuer](#-contribuer)
</details>
## 💡 Qu'est-ce que RAGFlow?
[RAGFlow](https://ragflow.io/) est un moteur de [RAG](https://ragflow.io/basics/what-is-rag) (Retrieval-Augmented Generation) open-source de premier plan qui fusionne les technologies RAG de pointe avec des capacités Agent pour créer une couche de contexte supérieure pour les LLM. Il offre un flux de travail RAG rationalisé, adaptable aux entreprises de toute taille. Alimenté par un [moteur de contexte](https://ragflow.io/basics/what-is-agent-context-engine) convergent et des modèles d'agents préconstruits, RAGFlow permet aux développeurs de transformer des données complexes en systèmes d'IA haute-fidélité, prêts pour la production, avec une efficacité et une précision exceptionnelles.
## 🎮 Démo
Essayez notre démo sur [https://cloud.ragflow.io](https://cloud.ragflow.io).
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
</div>
## 🔥 Dernières mises à jour
- 24-03-2026 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — Fournit un skill officiel pour accéder aux datasets RAGFlow via OpenClaw.
- 26-12-2025 Prise en charge de la « Mémoire » pour l'agent IA.
- 19-11-2025 Prise en charge de Gemini 3 Pro.
- 12-11-2025 Prise en charge de la synchronisation de données depuis Confluence, S3, Notion, Discord et Google Drive.
- 23-10-2025 Prise en charge de MinerU & Docling comme méthodes d'analyse de documents.
- 15-10-2025 Prise en charge du pipeline d'ingestion orchestrable.
- 08-08-2025 Prise en charge des derniers modèles de la série GPT-5 d'OpenAI.
- 01-08-2025 Prise en charge du flux de travail agentique et de MCP.
- 23-05-2025 Ajout d'un composant exécuteur de code Python/JavaScript à l'Agent.
- 05-05-2025 Prise en charge des requêtes inter-langues.
- 19-03-2025 Prise en charge de l'utilisation d'un modèle multi-modal pour analyser les images dans les fichiers PDF ou DOCX.
## 🎉 Restez informé
⭐️ Mettez une étoile à notre dépôt pour rester informé des nouvelles fonctionnalités et améliorations passionnantes ! Recevez des notifications instantanées pour les nouvelles versions ! 🌟
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
</div>
## 🌟 Fonctionnalités clés
### 🍭 **"Quality in, quality out"**
- Extraction de connaissances basée sur la [compréhension approfondie des documents](./deepdoc/README.md) à partir de données non structurées aux formats complexes.
- Trouve "l'aiguille dans la meule de données" de tokens littéralement illimités.
### 🍱 **Découpage(Chunking) basé sur des templates**
- Intelligent et explicable.
- De nombreuses options de templates disponibles.
### 🌱 **Citations fondées avec réduction des hallucinations**
- Visualisation du découpage de texte pour permettre une intervention humaine.
- Aperçu rapide des références clés et citations traçables pour soutenir des réponses fondées.
### 🍔 **Compatibilité avec des sources de données hétérogènes**
- Prend en charge Word, présentations, Excel, txt, images, copies numérisées, données structurées, pages web, et plus encore.
### 🛀 **Flux de travail RAG automatisé et sans effort**
- Orchestration RAG rationalisée adaptée aux particuliers comme aux grandes entreprises.
- LLM et modèles d'embedding configurables.
- Rappel multiple associé à un ré-classement fusionné.
- APIs intuitives pour une intégration transparente avec les entreprises.
## 🔎 Architecture du système
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/user-attachments/assets/31b0dd6f-ca4f-445a-9457-70cb44a381b2" width="1000"/>
</div>
## 🎬 Démarrage
### 📝 Prérequis
- CPU >= 4 cœurs
- RAM >= 16 Go
- Disque >= 50 Go
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
- [gVisor](https://gvisor.dev/docs/user_guide/install/) : Requis uniquement si vous souhaitez utiliser la fonctionnalité d'exécuteur de code (sandbox) de RAGFlow.
> [!TIP]
> Si vous n'avez pas installé Docker sur votre machine locale (Windows, Mac ou Linux), consultez [Installer Docker Engine](https://docs.docker.com/engine/install/).
### 🚀 Démarrer le serveur
1. Assurez-vous que `vm.max_map_count` >= 262144 :
> Pour vérifier la valeur de `vm.max_map_count` :
>
> ```bash
> $ sysctl vm.max_map_count
> ```
>
> Réinitialisez `vm.max_map_count` à une valeur d'au moins 262144 si ce n'est pas le cas.
>
> ```bash
> # Dans ce cas, nous le définissons à 262144 :
> $ sudo sysctl -w vm.max_map_count=262144
> ```
>
> Ce changement sera réinitialisé après un redémarrage du système. Pour que votre modification reste permanente, ajoutez ou mettez à jour la valeur `vm.max_map_count` dans **/etc/sysctl.conf** :
>
> ```bash
> vm.max_map_count=262144
> ```
>
2. Clonez le dépôt :
```bash
$ git clone https://github.com/infiniflow/ragflow.git
```
3. Démarrez le serveur en utilisant les images Docker préconstruites :
> [!CAUTION]
> Toutes les images Docker sont construites pour les plateformes x86. Nous ne proposons pas actuellement d'images Docker pour ARM64.
> Si vous êtes sur une plateforme ARM64, suivez [ce guide](https://ragflow.io/docs/dev/build_docker_image) pour construire une image Docker compatible avec votre système.
> La commande ci-dessous télécharge l'édition `v0.25.0` de l'image Docker RAGFlow. Consultez le tableau suivant pour les descriptions des différentes éditions de RAGFlow. Pour télécharger une édition de RAGFlow différente de `v0.25.0`, mettez à jour la variable `RAGFLOW_IMAGE` dans **docker/.env** avant d'utiliser `docker compose` pour démarrer le serveur.
```bash
$ cd ragflow/docker
# git checkout v0.25.0
# Optionnel : utiliser un tag stable (voir les versions : https://github.com/infiniflow/ragflow/releases)
# Cette étape garantit que le fichier **entrypoint.sh** dans le code correspond à la version de l'image Docker.
# Use CPU for DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d
# To use GPU to accelerate DeepDoc tasks:
# sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d
```
> Remarque : Avant `v0.22.0`, nous fournissions à la fois des images avec des modèles d'embedding et des images slim sans modèles d'embedding. Détails ci-dessous :
| RAGFlow image tag | Image size (GB) | Has embedding models? | Stable? |
|-------------------|-----------------|-----------------------|----------------|
| v0.21.1 | &approx;9 | ✔️ | Stable release |
| v0.21.1-slim | &approx;2 | ❌ | Stable release |
> À partir de `v0.22.0`, nous ne distribuons que l'édition slim et ne rajoutons plus le suffixe **-slim** au tag d'image.
4. Vérifiez l'état du serveur après son démarrage :
```bash
$ docker logs -f docker-ragflow-cpu-1
```
_La sortie suivante confirme un lancement réussi du système :_
```bash
____ ___ ______ ______ __
/ __ \ / | / ____// ____// /____ _ __
/ /_/ // /| | / / __ / /_ / // __ \| | /| / /
/ _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ /
/_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/
* Running on all addresses (0.0.0.0)
```
> Si vous sautez cette étape de confirmation et vous connectez directement à RAGFlow, votre navigateur peut afficher une erreur `network abnormal`, car à ce moment-là, votre RAGFlow peut ne pas être entièrement initialisé.
>
5. Dans votre navigateur web, entrez l'adresse IP de votre serveur et connectez-vous à RAGFlow.
> Avec les paramètres par défaut, il vous suffit d'entrer `http://IP_OF_YOUR_MACHINE` (**sans** numéro de port), car le port HTTP par défaut `80` peut être omis lors de l'utilisation des configurations par défaut.
>
6. Dans [service_conf.yaml.template](./docker/service_conf.yaml.template), sélectionnez la fabrique LLM souhaitée dans `user_default_llm` et mettez à jour le champ `API_KEY` avec la clé API correspondante.
> Voir [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) pour plus d'informations.
>
_Le spectacle commence !_
## 🔧 Configurations
En ce qui concerne les configurations système, vous devrez gérer les fichiers suivants :
- [.env](./docker/.env) : Conserve les paramètres de base du système, tels que `SVR_HTTP_PORT`, `MYSQL_PASSWORD` et `MINIO_PASSWORD`.
- [service_conf.yaml.template](./docker/service_conf.yaml.template) : Configure les services back-end. Les variables d'environnement dans ce fichier seront automatiquement renseignées au démarrage du conteneur Docker. Toutes les variables d'environnement définies dans le conteneur Docker seront disponibles, vous permettant de personnaliser le comportement du service en fonction de l'environnement de déploiement.
- [docker-compose.yml](./docker/docker-compose.yml) : Le système s'appuie sur [docker-compose.yml](./docker/docker-compose.yml) pour démarrer.
> Le fichier [./docker/README](./docker/README.md) fournit une description détaillée des paramètres d'environnement et des configurations de services qui peuvent être utilisés comme `${ENV_VARS}` dans le fichier [service_conf.yaml.template](./docker/service_conf.yaml.template).
Pour mettre à jour le port HTTP de service par défaut (80), accédez à [docker-compose.yml](./docker/docker-compose.yml) et changez `80:80` en `<YOUR_SERVING_PORT>:80`.
Les mises à jour des configurations ci-dessus nécessitent un redémarrage de tous les conteneurs pour prendre effet :
> ```bash
> $ docker compose -f docker-compose.yml up -d
> ```
### Passer du moteur de documents Elasticsearch à Infinity
RAGFlow utilise Elasticsearch par défaut pour stocker le texte intégral et les vecteurs. Pour passer à [Infinity](https://github.com/infiniflow/infinity/), suivez ces étapes :
1. Arrêtez tous les conteneurs en cours d'exécution :
```bash
$ docker compose -f docker/docker-compose.yml down -v
```
> [!WARNING]
> `-v` supprimera les volumes des conteneurs Docker, et les données existantes seront effacées.
2. Définissez `DOC_ENGINE` dans **docker/.env** sur `infinity`.
3. Démarrez les conteneurs :
```bash
$ docker compose -f docker-compose.yml up -d
```
> [!WARNING]
> Le passage à Infinity sur une machine Linux/arm64 n'est pas encore officiellement pris en charge.
## 🔧 Construire une image Docker
Cette image fait environ 2 Go et dépend de services LLM et d'embedding externes.
```bash
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
```
Ou si vous êtes derrière un proxy, vous pouvez passer des arguments de proxy :
```bash
docker build --platform linux/amd64 \
--build-arg http_proxy=http://YOUR_PROXY:PORT \
--build-arg https_proxy=http://YOUR_PROXY:PORT \
-f Dockerfile -t infiniflow/ragflow:nightly .
```
## 🔨 Lancer le service depuis les sources pour le développement
1. Installez `uv` et `pre-commit`, ou ignorez cette étape s'ils sont déjà installés :
```bash
pipx install uv pre-commit
```
2. Clonez le code source et installez les dépendances Python :
```bash
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run python3 download_deps.py
pre-commit install
```
3. Lancez les services dépendants (MinIO, Elasticsearch, Redis et MySQL) avec Docker Compose :
```bash
docker compose -f docker/docker-compose-base.yml up -d
```
Ajoutez la ligne suivante à `/etc/hosts` pour résoudre tous les hôtes spécifiés dans **docker/.env** vers `127.0.0.1` :
```
127.0.0.1 es01 infinity mysql minio redis sandbox-executor-manager
```
4. Si vous ne pouvez pas accéder à HuggingFace, définissez la variable d'environnement `HF_ENDPOINT` pour utiliser un site miroir :
```bash
export HF_ENDPOINT=https://hf-mirror.com
```
5. Si votre système d'exploitation n'a pas jemalloc, installez-le comme suit :
```bash
# Ubuntu
sudo apt-get install libjemalloc-dev
# CentOS
sudo yum install jemalloc
# OpenSUSE
sudo zypper install jemalloc
# macOS
sudo brew install jemalloc
```
6. Lancez le service back-end :
```bash
source .venv/bin/activate
export PYTHONPATH=$(pwd)
bash docker/launch_backend_service.sh
```
7. Installez les dépendances front-end :
```bash
cd web
npm install
```
8. Lancez le service front-end :
```bash
npm run dev
```
_La sortie suivante confirme un lancement réussi du système :_
![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
9. Arrêtez les services front-end et back-end de RAGFlow une fois le développement terminé :
```bash
pkill -f "ragflow_server.py|task_executor.py"
```
## 📚 Documentation
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)
## 📜 Roadmap
Voir la [Feuille de route RAGFlow 2026](https://github.com/infiniflow/ragflow/issues/12241)
## 🏄 Communauté
- [Discord](https://discord.gg/NjYzJD3GM3)
- [Twitter](https://twitter.com/infiniflowai)
- [GitHub Discussions](https://github.com/orgs/infiniflow/discussions)
## 🙌 Contribuer
RAGFlow s'épanouit grâce à la collaboration open-source. Dans cet esprit, nous accueillons des contributions diverses de la communauté.
Si vous souhaitez en faire partie, consultez d'abord nos [Directives de contribution](https://ragflow.io/docs/dev/contributing).

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="Logo ragflow">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DBEDFA"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="Ikuti di X (Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Lencana Daring" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Rilis%20Terbaru" alt="Rilis Terbaru">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Peta Jalan</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -76,7 +79,7 @@
## 🎮 Demo
Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
Coba demo kami di [https://cloud.ragflow.io](https://cloud.ragflow.io).
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -85,6 +88,7 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Pembaruan Terbaru
- 2026-03-24 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — Menyediakan skill resmi untuk mengakses dataset RAGFlow melalui OpenClaw.
- 2025-12-26 Mendukung 'Memori' untuk agen AI.
- 2025-11-19 Mendukung Gemini 3 Pro.
- 2025-11-12 Mendukung sinkronisasi data dari Confluence, S3, Notion, Discord, Google Drive.
@@ -188,12 +192,12 @@ Coba demo kami di [https://demo.ragflow.io](https://demo.ragflow.io).
> Semua gambar Docker dibangun untuk platform x86. Saat ini, kami tidak menawarkan gambar Docker untuk ARM64.
> Jika Anda menggunakan platform ARM64, [silakan gunakan panduan ini untuk membangun gambar Docker yang kompatibel dengan sistem Anda](https://ragflow.io/docs/dev/build_docker_image).
> Perintah di bawah ini mengunduh edisi v0.24.0 dari gambar Docker RAGFlow. Silakan merujuk ke tabel berikut untuk deskripsi berbagai edisi RAGFlow. Untuk mengunduh edisi RAGFlow yang berbeda dari v0.24.0, perbarui variabel RAGFLOW_IMAGE di docker/.env sebelum menggunakan docker compose untuk memulai server.
> Perintah di bawah ini mengunduh edisi v0.25.0 dari gambar Docker RAGFlow. Silakan merujuk ke tabel berikut untuk deskripsi berbagai edisi RAGFlow. Untuk mengunduh edisi RAGFlow yang berbeda dari v0.25.0, perbarui variabel RAGFLOW_IMAGE di docker/.env sebelum menggunakan docker compose untuk memulai server.
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# Opsional: gunakan tag stabil (lihat releases: https://github.com/infiniflow/ragflow/releases)
# This steps ensures the **entrypoint.sh** file in the code matches the Docker image version.
@@ -299,7 +303,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
3. Jalankan aplikasi yang diperlukan (MinIO, Elasticsearch, Redis, dan MySQL) menggunakan Docker Compose:
@@ -361,8 +365,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="350" alt="ragflow logo">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -57,7 +60,7 @@
## 🎮 Demo
デモをお試しください:[https://demo.ragflow.io](https://demo.ragflow.io)。
デモをお試しください:[https://cloud.ragflow.io](https://cloud.ragflow.io)。
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -66,6 +69,7 @@
## 🔥 最新情報
- 2026-03-24 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — OpenClaw経由でRAGFlowデータセットにアクセスする公式スキルを提供。
- 2025-12-26 AIエージェントの「メモリ」機能をサポート。
- 2025-11-19 Gemini 3 Proをサポートしています。
- 2025-11-12 Confluence、S3、Notion、Discord、Google Drive からのデータ同期をサポートします。
@@ -168,12 +172,12 @@
> 現在、公式に提供されているすべての Docker イメージは x86 アーキテクチャ向けにビルドされており、ARM64 用の Docker イメージは提供されていません。
> ARM64 アーキテクチャのオペレーティングシステムを使用している場合は、[このドキュメント](https://ragflow.io/docs/dev/build_docker_image)を参照して Docker イメージを自分でビルドしてください。
> 以下のコマンドは、RAGFlow Docker イメージの v0.24.0 エディションをダウンロードします。異なる RAGFlow エディションの説明については、以下の表を参照してください。v0.24.0 とは異なるエディションをダウンロードするには、docker/.env ファイルの RAGFLOW_IMAGE 変数を適宜更新し、docker compose を使用してサーバーを起動してください。
> 以下のコマンドは、RAGFlow Docker イメージの v0.25.0 エディションをダウンロードします。異なる RAGFlow エディションの説明については、以下の表を参照してください。v0.25.0 とは異なるエディションをダウンロードするには、docker/.env ファイルの RAGFLOW_IMAGE 変数を適宜更新し、docker compose を使用してサーバーを起動してください。
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# 任意: 安定版タグを利用 (一覧: https://github.com/infiniflow/ragflow/releases)
# この手順は、コード内の entrypoint.sh ファイルが Docker イメージのバージョンと一致していることを確認します。
@@ -299,7 +303,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
3. Docker Compose を使用して依存サービスMinIO、Elasticsearch、Redis、MySQLを起動する:
@@ -361,8 +365,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="ragflow logo">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DBEDFA"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -58,7 +61,7 @@
## 🎮 데모
데모를 [https://demo.ragflow.io](https://demo.ragflow.io)에서 실행해 보세요.
데모를 [https://cloud.ragflow.io](https://cloud.ragflow.io)에서 실행해 보세요.
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -67,6 +70,7 @@
## 🔥 업데이트
- 2026-03-24 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — OpenClaw를 통해 RAGFlow 데이터셋에 접근하는 공식 스킬 제공.
- 2025-12-26 AI 에이전트의 '메모리' 기능 지원.
- 2025-11-19 Gemini 3 Pro를 지원합니다.
- 2025-11-12 Confluence, S3, Notion, Discord, Google Drive에서 데이터 동기화를 지원합니다.
@@ -170,12 +174,12 @@
> 모든 Docker 이미지는 x86 플랫폼을 위해 빌드되었습니다. 우리는 현재 ARM64 플랫폼을 위한 Docker 이미지를 제공하지 않습니다.
> ARM64 플랫폼을 사용 중이라면, [시스템과 호환되는 Docker 이미지를 빌드하려면 이 가이드를 사용해 주세요](https://ragflow.io/docs/dev/build_docker_image).
> 아래 명령어는 RAGFlow Docker 이미지의 v0.24.0 버전을 다운로드합니다. 다양한 RAGFlow 버전에 대한 설명은 다음 표를 참조하십시오. v0.24.0과 다른 RAGFlow 버전을 다운로드하려면, docker/.env 파일에서 RAGFLOW_IMAGE 변수를 적절히 업데이트한 후 docker compose를 사용하여 서버를 시작하십시오.
> 아래 명령어는 RAGFlow Docker 이미지의 v0.25.0 버전을 다운로드합니다. 다양한 RAGFlow 버전에 대한 설명은 다음 표를 참조하십시오. v0.25.0과 다른 RAGFlow 버전을 다운로드하려면, docker/.env 파일에서 RAGFLOW_IMAGE 변수를 적절히 업데이트한 후 docker compose를 사용하여 서버를 시작하십시오.
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# Optional: use a stable tag (see releases: https://github.com/infiniflow/ragflow/releases)
# 이 단계는 코드의 entrypoint.sh 파일이 Docker 이미지 버전과 일치하도록 보장합니다.
@@ -294,7 +298,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
@@ -365,8 +369,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="ragflow logo">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DBEDFA"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="seguir no X(Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Badge Estático" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Última%20Relese" alt="Última Versão">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -77,7 +80,7 @@
## 🎮 Demo
Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
Experimente nossa demo em [https://cloud.ragflow.io](https://cloud.ragflow.io).
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -86,6 +89,7 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
## 🔥 Últimas Atualizações
- 24-03-2026 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — Fornece um skill oficial para acessar datasets do RAGFlow via OpenClaw.
- 26-12-2025 Suporte à função 'Memória' para agentes de IA.
- 19-11-2025 Suporta Gemini 3 Pro.
- 12-11-2025 Suporta a sincronização de dados do Confluence, S3, Notion, Discord e Google Drive.
@@ -188,12 +192,12 @@ Experimente nossa demo em [https://demo.ragflow.io](https://demo.ragflow.io).
> Todas as imagens Docker são construídas para plataformas x86. Atualmente, não oferecemos imagens Docker para ARM64.
> Se você estiver usando uma plataforma ARM64, por favor, utilize [este guia](https://ragflow.io/docs/dev/build_docker_image) para construir uma imagem Docker compatível com o seu sistema.
> O comando abaixo baixa a edição`v0.24.0` da imagem Docker do RAGFlow. Consulte a tabela a seguir para descrições de diferentes edições do RAGFlow. Para baixar uma edição do RAGFlow diferente da `v0.24.0`, atualize a variável `RAGFLOW_IMAGE` conforme necessário no **docker/.env** antes de usar `docker compose` para iniciar o servidor.
> O comando abaixo baixa a edição`v0.25.0` da imagem Docker do RAGFlow. Consulte a tabela a seguir para descrições de diferentes edições do RAGFlow. Para baixar uma edição do RAGFlow diferente da `v0.25.0`, atualize a variável `RAGFLOW_IMAGE` conforme necessário no **docker/.env** antes de usar `docker compose` para iniciar o servidor.
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# Opcional: use uma tag estável (veja releases: https://github.com/infiniflow/ragflow/releases)
# Esta etapa garante que o arquivo entrypoint.sh no código corresponda à versão da imagem do Docker.
@@ -316,7 +320,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # instala os módulos Python dependentes do RAGFlow
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
3. Inicie os serviços dependentes (MinIO, Elasticsearch, Redis e MySQL) usando Docker Compose:
@@ -378,8 +382,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

409
README_tr.md Normal file
View File

@@ -0,0 +1,409 @@
<div align="center">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="520" alt="ragflow logo">
</a>
</div>
<p align="center">
<a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-DFE0E5"></a>
<a href="./README_zh.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/简体中文-DFE0E5"></a>
<a href="./README_tzh.md"><img alt="繁體版中文自述文件" src="https://img.shields.io/badge/繁體中文-DFE0E5"></a>
<a href="./README_ja.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-DFE0E5"></a>
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DBEDFA"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="X(Twitter)'da takip et">
</a>
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Çevrimiçi Demo" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Son%20Sürüm" alt="Son Sürüm">
</a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/Lisans-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="lisans">
</a>
<a href="https://deepwiki.com/infiniflow/ragflow">
<img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg">
</a>
</p>
<h4 align="center">
<a href="https://ragflow.io/docs/dev/">Dokümantasyon</a> |
<a href="https://github.com/infiniflow/ragflow/issues/12241">Yol Haritası</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/ragflow-octoverse.png" width="1200"/>
</div>
<div align="center">
<a href="https://trendshift.io/repositories/9064" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9064" alt="infiniflow%2Fragflow | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
<details open>
<summary><b>📕 İçindekiler</b></summary>
- 💡 [RAGFlow Nedir?](#-ragflow-nedir)
- 🎮 [Demo](#-demo)
- 📌 [Son Güncellemeler](#-son-güncellemeler)
- 🌟 [Temel Özellikler](#-temel-özellikler)
- 🔎 [Sistem Mimarisi](#-sistem-mimarisi)
- 🎬 [Başlarken](#-başlarken)
- 🔧 [Yapılandırmalar](#-yapılandırmalar)
- 🔧 [Docker İmajı Oluşturma](#-docker-i̇majı-oluşturma)
- 🔨 [Geliştirme İçin Kaynaktan Hizmet Başlatma](#-geliştirme-i̇çin-kaynaktan-hizmet-başlatma)
- 📚 [Dokümantasyon](#-dokümantasyon)
- 📜 [Yol Haritası](#-yol-haritası)
- 🏄 [Topluluk](#-topluluk)
- 🙌 [Katkıda Bulunma](#-katkıda-bulunma)
</details>
## 💡 RAGFlow Nedir?
[RAGFlow](https://ragflow.io/), derin doküman anlayışına dayalı, açık kaynaklı ve öncü bir Artırılmış Üretim ile Bilgi Erişimi ([RAG](https://ragflow.io/basics/what-is-rag)) motorudur. En son RAG teknolojisini Ajan yetenekleriyle birleştirerek LLM'ler için üstün bir bağlam katmanı oluşturur. Her ölçekteki kuruluşa uyarlanabilir, kolaylaştırılmış bir RAG iş akışı sunar. Yakınsanmış bir [bağlam motoru](https://ragflow.io/basics/what-is-agent-context-engine) ve hazır ajan şablonlarıyla donatılmış RAGFlow, geliştiricilerin karmaşık verileri yüksek doğrulukta, üretime hazır yapay zeka sistemlerine olağanüstü verimlilik ve hassasiyetle dönüştürmesini sağlar.
## 🎮 Demo
Demomuzu [https://cloud.ragflow.io](https://cloud.ragflow.io) adresinden deneyebilirsiniz.
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/agentic-dark.gif" width="1200"/>
</div>
## 🔥 Son Güncellemeler
- 2026-03-24 [RAGFlow Skill on OpenClaw](https://clawhub.ai/yingfeng/ragflow-skill) — OpenClaw üzerinden RAGFlow veri setlerine erişmek için resmi bir skill sağlar.
- 2025-12-26 Yapay zeka ajanı için 'Bellek' desteği eklendi.
- 2025-11-19 Gemini 3 Pro desteği eklendi.
- 2025-11-12 Confluence, S3, Notion, Discord, Google Drive'dan veri senkronizasyonu desteği eklendi.
- 2025-10-23 Doküman ayrıştırma yöntemi olarak MinerU ve Docling desteği eklendi.
- 2025-10-15 Düzenlenebilir veri alım hattı desteği eklendi.
- 2025-08-08 OpenAI'ın en yeni GPT-5 serisi modelleri için destek eklendi.
- 2025-08-01 Ajanlı iş akışı ve MCP desteği eklendi.
- 2025-05-23 Ajana Python/JavaScript kod çalıştırıcı bileşeni eklendi.
- 2025-05-05 Diller arası sorgu desteği eklendi.
- 2025-03-19 PDF veya DOCX dosyalarındaki görselleri yorumlamak için çok modlu model desteği eklendi.
## 🎉 Bizi Takip Edin
⭐️ Heyecan verici yeni özellikler ve iyileştirmelerden haberdar olmak için depomuzı yıldızlayın! Yeni sürümler için anında bildirim alın! 🌟
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/user-attachments/assets/18c9707e-b8aa-4caf-a154-037089c105ba" width="1200"/>
</div>
## 🌟 Temel Özellikler
### 🍭 **"Kaliteli girdi, kaliteli çıktı"**
- Karmaşık formatlara sahip yapılandırılmamış verilerden [derin doküman anlayışı](./deepdoc/README.md) tabanlı bilgi çıkarımı.
- Kelimenin tam anlamıyla sınırsız token içinde "samanlıkta iğne bulma" yeteneği.
### 🍱 **Şablon tabanlı parçalama**
- Akıllı ve açıklanabilir.
- Aralarından seçim yapabileceğiniz çok sayıda şablon seçeneği.
### 🌱 **Azaltılmış halüsinasyonlarla temellendirilmiş alıntılar**
- İnsan müdahalesine olanak tanıyan metin parçalama görselleştirmesi.
- Temellendirilmiş yanıtları desteklemek için anahtar referansların hızlı görüntülenmesi ve izlenebilir alıntılar.
### 🍔 **Heterojen veri kaynaklarıyla uyumluluk**
- Word, slaytlar, Excel, txt, görseller, taranmış kopyalar, yapılandırılmış veriler, web sayfaları ve daha fazlasını destekler.
### 🛀 **Otomatik ve zahmetsiz RAG iş akışı**
- Hem bireysel hem de büyük işletmeler için özelleştirilmiş kolaylaştırılmış RAG düzenlemesi.
- Yapılandırılabilir LLM'ler ve gömme (embedding) modelleri.
- Birleştirilmiş yeniden sıralama ile çoklu geri çağırma.
- İş süreçlerine sorunsuz entegrasyon için sezgisel API'ler.
## 🔎 Sistem Mimarisi
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/user-attachments/assets/31b0dd6f-ca4f-445a-9457-70cb44a381b2" width="1000"/>
</div>
## 🎬 Başlarken
### 📝 Ön Koşullar
- CPU >= 4 çekirdek
- RAM >= 16 GB
- Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1
- [gVisor](https://gvisor.dev/docs/user_guide/install/): Yalnızca RAGFlow'un kod çalıştırıcı (sandbox) özelliğini kullanmayı planlıyorsanız gereklidir.
> [!TIP]
> Yerel makinenize (Windows, Mac veya Linux) Docker yüklemediyseniz, [Docker Engine Kurulumu](https://docs.docker.com/engine/install/) sayfasına bakın.
### 🚀 Sunucuyu Başlatma
1. `vm.max_map_count` değerinin >= 262144 olduğundan emin olun:
> `vm.max_map_count` değerini kontrol etmek için:
>
> ```bash
> $ sysctl vm.max_map_count
> ```
>
> Değer 262144'ten düşükse, en az 262144 olarak ayarlayın.
>
> ```bash
> # Bu örnekte 262144 olarak ayarlıyoruz:
> $ sudo sysctl -w vm.max_map_count=262144
> ```
>
> Bu değişiklik sistem yeniden başlatıldığında sıfırlanacaktır. Değişikliğin kalıcı olmasını sağlamak için
> **/etc/sysctl.conf** dosyasındaki `vm.max_map_count` değerini buna göre ekleyin veya güncelleyin:
>
> ```bash
> vm.max_map_count=262144
> ```
>
2. Depoyu klonlayın:
```bash
$ git clone https://github.com/infiniflow/ragflow.git
```
3. Önceden oluşturulmuş Docker imajlarını kullanarak sunucuyu başlatın:
> [!CAUTION]
> Tüm Docker imajları x86 platformları için oluşturulmuştur. Şu anda ARM64 için Docker imajı sunmuyoruz.
> ARM64 platformundaysanız, sisteminizle uyumlu bir Docker imajı oluşturmak için [bu kılavuzu](https://ragflow.io/docs/dev/build_docker_image) takip edin.
> Aşağıdaki komut RAGFlow Docker imajının `v0.25.0` sürümünü indirir. Farklı RAGFlow sürümleri için aşağıdaki tabloya bakın. `v0.25.0` dışında bir sürüm indirmek için, `docker compose` ile sunucuyu başlatmadan önce **docker/.env** dosyasındaki `RAGFLOW_IMAGE` değişkenini güncelleyin.
```bash
$ cd ragflow/docker
# git checkout v0.25.0
# İsteğe bağlı: Kararlı bir etiket kullanın (sürümler: https://github.com/infiniflow/ragflow/releases)
# Bu adım, koddaki **entrypoint.sh** dosyasının Docker imaj sürümüyle eşleşmesini sağlar.
# DeepDoc görevleri için CPU kullanımı:
$ docker compose -f docker-compose.yml up -d
# DeepDoc görevlerini hızlandırmak için GPU kullanımı:
# sed -i '1i DEVICE=gpu' .env
# docker compose -f docker-compose.yml up -d
```
> Not: `v0.22.0` öncesinde hem gömme modelleri içeren imajlar hem de gömme modelleri içermeyen ince (slim) imajlar sunuyorduk. Detaylar aşağıdadır:
| RAGFlow imaj etiketi | İmaj boyutu (GB) | Gömme modelleri var mı? | Kararlı mı? |
|-----------------------|-------------------|-------------------------|-----------------|
| v0.21.1 | &approx;9 | ✔️ | Kararlı sürüm |
| v0.21.1-slim | &approx;2 | ❌ | Kararlı sürüm |
> `v0.22.0`'dan itibaren yalnızca ince (slim) sürümü sunuyoruz ve imaj etiketine artık **-slim** son eki eklemiyoruz.
4. Sunucu çalışır duruma geldikten sonra sunucu durumunu kontrol edin:
```bash
$ docker logs -f docker-ragflow-cpu-1
```
_Aşağıdaki çıktı, sistemin başarıyla başlatıldığını onaylar:_
```bash
____ ___ ______ ______ __
/ __ \ / | / ____// ____// /____ _ __
/ /_/ // /| | / / __ / /_ / // __ \| | /| / /
/ _, _// ___ |/ /_/ // __/ / // /_/ /| |/ |/ /
/_/ |_|/_/ |_|\____//_/ /_/ \____/ |__/|__/
* Running on all addresses (0.0.0.0)
```
> Bu onay adımını atlayıp doğrudan RAGFlow'a giriş yaparsanız, o anda RAGFlow tam olarak başlatılmamış olabileceğinden
> tarayıcınız `ağ hatası` uyarısı verebilir.
>
5. Web tarayıcınıza sunucunuzun IP adresini girin ve RAGFlow'a giriş yapın.
> Varsayılan ayarlarla, yalnızca `http://MAKİNENİZİN_IP_ADRESİ` girmeniz yeterlidir (port numarası **gerekmez**),
> çünkü varsayılan HTTP sunucu portu `80` varsayılan yapılandırmalar kullanıldığında ihmal edilebilir.
>
6. [service_conf.yaml.template](./docker/service_conf.yaml.template) dosyasında, `user_default_llm` içinde istediğiniz LLM sağlayıcısını seçin ve
`API_KEY` alanını ilgili API anahtarıyla güncelleyin.
> Daha fazla bilgi için [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) sayfasına bakın.
>
_Gösteri başlasın!_
## 🔧 Yapılandırmalar
Sistem yapılandırmaları söz konusu olduğunda, aşağıdaki dosyaları yönetmeniz gerekecektir:
- [.env](./docker/.env): `SVR_HTTP_PORT`, `MYSQL_PASSWORD` ve `MINIO_PASSWORD` gibi temel sistem ayarlarını içerir.
- [service_conf.yaml.template](./docker/service_conf.yaml.template): Arka uç hizmetlerini yapılandırır. Bu dosyadaki ortam değişkenleri, Docker konteyneri başladığında otomatik olarak doldurulacaktır. Docker konteyneri içinde ayarlanan tüm ortam değişkenleri kullanıma hazır olacak ve hizmet davranışını dağıtım ortamına göre özelleştirmenize olanak tanıyacaktır.
- [docker-compose.yml](./docker/docker-compose.yml): Sistem, başlatılmak için [docker-compose.yml](./docker/docker-compose.yml) dosyasına dayanır.
> [./docker/README](./docker/README.md) dosyası, [service_conf.yaml.template](./docker/service_conf.yaml.template) dosyasında `${ENV_VARS}` olarak kullanılabilen ortam ayarları ve hizmet yapılandırmalarının ayrıntılı bir açıklamasını sağlar.
Varsayılan HTTP sunucu portunu (80) değiştirmek için [docker-compose.yml](./docker/docker-compose.yml) dosyasında `80:80` ifadesini `<SUNUCU_PORTUNUZ>:80` olarak değiştirin.
Yukarıdaki yapılandırma değişikliklerinin etkili olması için tüm konteynerlerin yeniden başlatılması gerekir:
> ```bash
> $ docker compose -f docker-compose.yml up -d
> ```
### Doküman Motorunu Elasticsearch'ten Infinity'ye Geçirme
RAGFlow varsayılan olarak tam metin ve vektörlerin depolanması için Elasticsearch kullanır. [Infinity](https://github.com/infiniflow/infinity/)'ye geçmek için şu adımları izleyin:
1. Çalışan tüm konteynerleri durdurun:
```bash
$ docker compose -f docker/docker-compose.yml down -v
```
> [!WARNING]
> `-v` seçeneği Docker konteyner birimlerini silecek ve mevcut veriler temizlenecektir.
2. **docker/.env** dosyasında `DOC_ENGINE` değerini `infinity` olarak ayarlayın.
3. Konteynerleri başlatın:
```bash
$ docker compose -f docker-compose.yml up -d
```
> [!WARNING]
> Linux/arm64 makinesinde Infinity'ye geçiş henüz resmi olarak desteklenmemektedir.
## 🔧 Docker İmajı Oluşturma
Bu imaj yaklaşık 2 GB boyutundadır ve harici LLM ile gömme hizmetlerine bağlıdır.
```bash
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .
```
Veya bir proxy arkasındaysanız, proxy parametrelerini iletebilirsiniz:
```bash
docker build --platform linux/amd64 \
--build-arg http_proxy=http://PROXY_ADRESINIZ:PORT \
--build-arg https_proxy=http://PROXY_ADRESINIZ:PORT \
-f Dockerfile -t infiniflow/ragflow:nightly .
```
## 🔨 Geliştirme İçin Kaynaktan Hizmet Başlatma
1. `uv` ve `pre-commit` yükleyin veya zaten yüklüyse bu adımı atlayın:
```bash
pipx install uv pre-commit
```
2. Kaynak kodunu klonlayın ve Python bağımlılıklarını yükleyin:
```bash
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # RAGFlow'un bağımlı Python modüllerini yükler
uv run python3 download_deps.py
pre-commit install
```
3. Bağımlı hizmetleri (MinIO, Elasticsearch, Redis ve MySQL) Docker Compose kullanarak başlatın:
```bash
docker compose -f docker/docker-compose-base.yml up -d
```
**docker/.env** dosyasında belirtilen tüm ana bilgisayar adlarını `127.0.0.1`'e çözümlemek için `/etc/hosts` dosyasına aşağıdaki satırı ekleyin:
```
127.0.0.1 es01 infinity mysql minio redis sandbox-executor-manager
```
4. HuggingFace'e erişemiyorsanız, bir ayna site kullanmak için `HF_ENDPOINT` ortam değişkenini ayarlayın:
```bash
export HF_ENDPOINT=https://hf-mirror.com
```
5. İşletim sisteminizde jemalloc yoksa, aşağıdaki şekilde yükleyin:
```bash
# Ubuntu
sudo apt-get install libjemalloc-dev
# CentOS
sudo yum install jemalloc
# OpenSUSE
sudo zypper install jemalloc
# macOS
sudo brew install jemalloc
```
6. Arka uç hizmetini başlatın:
```bash
source .venv/bin/activate
export PYTHONPATH=$(pwd)
bash docker/launch_backend_service.sh
```
7. Ön yüz bağımlılıklarını yükleyin:
```bash
cd web
npm install
```
8. Ön yüz hizmetini başlatın:
```bash
npm run dev
```
_Aşağıdaki çıktı, sistemin başarıyla başlatıldığını onaylar:_
![](https://github.com/user-attachments/assets/0daf462c-a24d-4496-a66f-92533534e187)
9. Geliştirme tamamlandıktan sonra RAGFlow ön yüz ve arka uç hizmetini durdurun:
```bash
pkill -f "ragflow_server.py|task_executor.py"
```
## 📚 Dokümantasyon
- [Hızlı Başlangıç](https://ragflow.io/docs/dev/)
- [Yapılandırma](https://ragflow.io/docs/dev/configurations)
- [Sürüm Notları](https://ragflow.io/docs/dev/release_notes)
- [Kullanıcı Kılavuzları](https://ragflow.io/docs/category/user-guides)
- [Geliştirici Kılavuzları](https://ragflow.io/docs/category/developer-guides)
- [Referanslar](https://ragflow.io/docs/dev/category/references)
- [SSS](https://ragflow.io/docs/dev/faq)
## 📜 Yol Haritası
[RAGFlow Yol Haritası 2026](https://github.com/infiniflow/ragflow/issues/12241) sayfasına bakın.
## 🏄 Topluluk
- [Discord](https://discord.gg/NjYzJD3GM3)
- [Twitter](https://twitter.com/infiniflowai)
- [GitHub Tartışmalar](https://github.com/orgs/infiniflow/discussions)
## 🙌 Katkıda Bulunma
RAGFlow, açık kaynak iş birliği sayesinde gelişmektedir. Bu anlayışla, topluluktan gelen çeşitli katkıları benimsiyoruz.
Bir parçası olmak istiyorsanız, önce [Katkıda Bulunma Kılavuzumuzu](https://ragflow.io/docs/dev/contributing) inceleyin.

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="350" alt="ragflow logo">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -76,7 +79,7 @@
## 🎮 Demo 試用
請登入網址 [https://demo.ragflow.io](https://demo.ragflow.io) 試用 demo。
請登入網址 [https://cloud.ragflow.io](https://cloud.ragflow.io) 試用 demo。
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -85,6 +88,7 @@
## 🔥 近期更新
- 2026-03-24 發布 [RAGFlow 官方 Skill](https://clawhub.ai/yingfeng/ragflow-skill) — 提供官方 Skill 以透過 OpenClaw 訪問 RAGFlow 數據集。
- 2025-12-26 支援AI代理的「記憶」功能。
- 2025-11-19 支援 Gemini 3 Pro。
- 2025-11-12 支援從 Confluence、S3、Notion、Discord、Google Drive 進行資料同步。
@@ -187,12 +191,12 @@
> 所有 Docker 映像檔都是為 x86 平台建置的。目前,我們不提供 ARM64 平台的 Docker 映像檔。
> 如果您使用的是 ARM64 平台,請使用 [這份指南](https://ragflow.io/docs/dev/build_docker_image) 來建置適合您系統的 Docker 映像檔。
> 執行以下指令會自動下載 RAGFlow Docker 映像 `v0.24.0`。請參考下表查看不同 Docker 發行版的說明。如需下載不同於 `v0.24.0` 的 Docker 映像,請在執行 `docker compose` 啟動服務之前先更新 **docker/.env** 檔案內的 `RAGFLOW_IMAGE` 變數。
> 執行以下指令會自動下載 RAGFlow Docker 映像 `v0.25.0`。請參考下表查看不同 Docker 發行版的說明。如需下載不同於 `v0.25.0` 的 Docker 映像,請在執行 `docker compose` 啟動服務之前先更新 **docker/.env** 檔案內的 `RAGFLOW_IMAGE` 變數。
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# 可選使用穩定版標籤查看發佈https://github.com/infiniflow/ragflow/releases
# 此步驟確保程式碼中的 entrypoint.sh 檔案與 Docker 映像版本一致。
@@ -326,7 +330,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
3. 透過 Docker Compose 啟動依賴的服務MinIO, Elasticsearch, Redis, and MySQL
@@ -392,8 +396,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

View File

@@ -1,5 +1,5 @@
<div align="center">
<a href="https://demo.ragflow.io/">
<a href="https://cloud.ragflow.io/">
<img src="web/src/assets/logo-with-text.svg" width="350" alt="ragflow logo">
</a>
</div>
@@ -12,17 +12,20 @@
<a href="./README_ko.md"><img alt="한국어" src="https://img.shields.io/badge/한국어-DFE0E5"></a>
<a href="./README_id.md"><img alt="Bahasa Indonesia" src="https://img.shields.io/badge/Bahasa Indonesia-DFE0E5"></a>
<a href="./README_pt_br.md"><img alt="Português(Brasil)" src="https://img.shields.io/badge/Português(Brasil)-DFE0E5"></a>
<a href="./README_fr.md"><img alt="README en Français" src="https://img.shields.io/badge/Français-DFE0E5"></a>
<a href="./README_ar.md"><img alt="README in Arabic" src="https://img.shields.io/badge/Arabic-DFE0E5"></a>
<a href="./README_tr.md"><img alt="Türkçe README" src="https://img.shields.io/badge/Türkçe-DFE0E5"></a>
</p>
<p align="center">
<a href="https://x.com/intent/follow?screen_name=infiniflowai" target="_blank">
<img src="https://img.shields.io/twitter/follow/infiniflow?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)">
</a>
<a href="https://demo.ragflow.io" target="_blank">
<a href="https://cloud.ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/Online-Demo-4e6b99">
</a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.24.0">
<img src="https://img.shields.io/docker/pulls/infiniflow/ragflow?label=Docker%20Pulls&color=0db7ed&logo=docker&logoColor=white&style=flat-square" alt="docker pull infiniflow/ragflow:v0.25.0">
</a>
<a href="https://github.com/infiniflow/ragflow/releases/latest">
<img src="https://img.shields.io/github/v/release/infiniflow/ragflow?color=blue&label=Latest%20Release" alt="Latest Release">
@@ -40,7 +43,7 @@
<a href="https://github.com/infiniflow/ragflow/issues/12241">Roadmap</a> |
<a href="https://twitter.com/infiniflowai">Twitter</a> |
<a href="https://discord.gg/NjYzJD3GM3">Discord</a> |
<a href="https://demo.ragflow.io">Demo</a>
<a href="https://cloud.ragflow.io">Demo</a>
</h4>
<div align="center" style="margin-top:20px;margin-bottom:20px;">
@@ -76,7 +79,7 @@
## 🎮 Demo 试用
请登录网址 [https://demo.ragflow.io](https://demo.ragflow.io) 试用 demo。
请登录网址 [https://cloud.ragflow.io](https://cloud.ragflow.io) 试用 demo。
<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://raw.githubusercontent.com/infiniflow/ragflow-docs/refs/heads/image/image/chunking.gif" width="1200"/>
@@ -85,7 +88,8 @@
## 🔥 近期更新
- 2025-12-26 支持AI代理的“记忆”功能
- 2026-03-24 发布 [RAGFlow 官方 Skill](https://clawhub.ai/yingfeng/ragflow-skill) — 提供官方 Skill 以通过 OpenClaw 访问 RAGFlow 数据集
- 2025-12-26 支持AI代理的"记忆"功能。
- 2025-11-19 支持 Gemini 3 Pro。
- 2025-11-12 支持从 Confluence、S3、Notion、Discord、Google Drive 进行数据同步。
- 2025-10-23 支持 MinerU 和 Docling 作为文档解析方法。
@@ -188,12 +192,12 @@
> 请注意,目前官方提供的所有 Docker 镜像均基于 x86 架构构建,并不提供基于 ARM64 的 Docker 镜像。
> 如果你的操作系统是 ARM64 架构,请参考[这篇文档](https://ragflow.io/docs/dev/build_docker_image)自行构建 Docker 镜像。
> 运行以下命令会自动下载 RAGFlow Docker 镜像 `v0.24.0`。请参考下表查看不同 Docker 发行版的描述。如需下载不同于 `v0.24.0` 的 Docker 镜像,请在运行 `docker compose` 启动服务之前先更新 **docker/.env** 文件内的 `RAGFLOW_IMAGE` 变量。
> 运行以下命令会自动下载 RAGFlow Docker 镜像 `v0.25.0`。请参考下表查看不同 Docker 发行版的描述。如需下载不同于 `v0.25.0` 的 Docker 镜像,请在运行 `docker compose` 启动服务之前先更新 **docker/.env** 文件内的 `RAGFLOW_IMAGE` 变量。
```bash
$ cd ragflow/docker
# git checkout v0.24.0
# git checkout v0.25.0
# 可选使用稳定版本标签查看发布https://github.com/infiniflow/ragflow/releases
# 这一步确保代码中的 entrypoint.sh 文件与 Docker 镜像的版本保持一致。
@@ -326,7 +330,7 @@ docker build --platform linux/amd64 \
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.12 # install RAGFlow dependent python modules
uv run download_deps.py
uv run python3 download_deps.py
pre-commit install
```
@@ -395,8 +399,8 @@ docker build --platform linux/amd64 \
- [Quickstart](https://ragflow.io/docs/dev/)
- [Configuration](https://ragflow.io/docs/dev/configurations)
- [Release notes](https://ragflow.io/docs/dev/release_notes)
- [User guides](https://ragflow.io/docs/dev/category/guides)
- [Developer guides](https://ragflow.io/docs/dev/category/developers)
- [User guides](https://ragflow.io/docs/category/user-guides)
- [Developer guides](https://ragflow.io/docs/category/developer-guides)
- [References](https://ragflow.io/docs/dev/category/references)
- [FAQs](https://ragflow.io/docs/dev/faq)

779
admin/client/COMMAND.md Normal file
View File

@@ -0,0 +1,779 @@
# RAGFlow CLI User Command Reference
This document describes the user commands available in RAGFlow CLI. All commands must end with a semicolon (`;`).
## Command List
### ping_server
**Description**
Tests the connection status to the server.
**Usage**
```
PING;
```
**Parameters**
No parameters.
**Example**
```
ragflow> PING;
```
**Display Effect**
(Sample output will be provided by the user)
---
### show_current_user
**Description**
Displays information about the currently logged-in user.
**Usage**
```
SHOW CURRENT USER;
```
**Parameters**
No parameters.
**Example**
```
ragflow> SHOW CURRENT USER;
```
**Display Effect**
(Sample output will be provided by the user)
---
### create_model_provider
**Description**
Creates a new model provider.
**Usage**
```
CREATE MODEL PROVIDER <provider_name> <provider_key>;
```
**Parameters**
- `provider_name`: Provider name, quoted string.
- `provider_key`: Provider key, quoted string.
**Example**
```
ragflow> CREATE MODEL PROVIDER 'openai' 'sk-...';
```
**Display Effect**
(Sample output will be provided by the user)
---
### drop_model_provider
**Description**
Deletes a model provider.
**Usage**
```
DROP MODEL PROVIDER <provider_name>;
```
**Parameters**
- `provider_name`: Name of the provider to delete, quoted string.
**Example**
```
ragflow> DROP MODEL PROVIDER 'openai';
```
**Display Effect**
(Sample output will be provided by the user)
---
### set_default_llm
**Description**
Sets the default LLM (Large Language Model).
**Usage**
```
SET DEFAULT LLM <llm_id>;
```
**Parameters**
- `llm_id`: LLM identifier, quoted string.
**Example**
```
ragflow> SET DEFAULT LLM 'gpt-4';
```
**Display Effect**
(Sample output will be provided by the user)
---
### set_default_vlm
**Description**
Sets the default VLM (Vision Language Model).
**Usage**
```
SET DEFAULT VLM <vlm_id>;
```
**Parameters**
- `vlm_id`: VLM identifier, quoted string.
**Example**
```
ragflow> SET DEFAULT VLM 'clip-vit-large';
```
**Display Effect**
(Sample output will be provided by the user)
---
### set_default_embedding
**Description**
Sets the default embedding model.
**Usage**
```
SET DEFAULT EMBEDDING <embedding_id>;
```
**Parameters**
- `embedding_id`: Embedding model identifier, quoted string.
**Example**
```
ragflow> SET DEFAULT EMBEDDING 'text-embedding-ada-002';
```
**Display Effect**
(Sample output will be provided by the user)
---
### set_default_reranker
**Description**
Sets the default reranker model.
**Usage**
```
SET DEFAULT RERANKER <reranker_id>;
```
**Parameters**
- `reranker_id`: Reranker model identifier, quoted string.
**Example**
```
ragflow> SET DEFAULT RERANKER 'bge-reranker-large';
```
**Display Effect**
(Sample output will be provided by the user)
---
### set_default_asr
**Description**
Sets the default ASR (Automatic Speech Recognition) model.
**Usage**
```
SET DEFAULT ASR <asr_id>;
```
**Parameters**
- `asr_id`: ASR model identifier, quoted string.
**Example**
```
ragflow> SET DEFAULT ASR 'whisper-large';
```
**Display Effect**
(Sample output will be provided by the user)
---
### set_default_tts
**Description**
Sets the default TTS (Text-to-Speech) model.
**Usage**
```
SET DEFAULT TTS <tts_id>;
```
**Parameters**
- `tts_id`: TTS model identifier, quoted string.
**Example**
```
ragflow> SET DEFAULT TTS 'tts-1';
```
**Display Effect**
(Sample output will be provided by the user)
---
### reset_default_llm
**Description**
Resets the default LLM to system default.
**Usage**
```
RESET DEFAULT LLM;
```
**Parameters**
No parameters.
**Example**
```
ragflow> RESET DEFAULT LLM;
```
**Display Effect**
(Sample output will be provided by the user)
---
### reset_default_vlm
**Description**
Resets the default VLM to system default.
**Usage**
```
RESET DEFAULT VLM;
```
**Parameters**
No parameters.
**Example**
```
ragflow> RESET DEFAULT VLM;
```
**Display Effect**
(Sample output will be provided by the user)
---
### reset_default_embedding
**Description**
Resets the default embedding model to system default.
**Usage**
```
RESET DEFAULT EMBEDDING;
```
**Parameters**
No parameters.
**Example**
```
ragflow> RESET DEFAULT EMBEDDING;
```
**Display Effect**
(Sample output will be provided by the user)
---
### reset_default_reranker
**Description**
Resets the default reranker model to system default.
**Usage**
```
RESET DEFAULT RERANKER;
```
**Parameters**
No parameters.
**Example**
```
ragflow> RESET DEFAULT RERANKER;
```
**Display Effect**
(Sample output will be provided by the user)
---
### reset_default_asr
**Description**
Resets the default ASR model to system default.
**Usage**
```
RESET DEFAULT ASR;
```
**Parameters**
No parameters.
**Example**
```
ragflow> RESET DEFAULT ASR;
```
**Display Effect**
(Sample output will be provided by the user)
---
### reset_default_tts
**Description**
Resets the default TTS model to system default.
**Usage**
```
RESET DEFAULT TTS;
```
**Parameters**
No parameters.
**Example**
```
ragflow> RESET DEFAULT TTS;
```
**Display Effect**
(Sample output will be provided by the user)
---
### create_user_dataset_with_parser
**Description**
Creates a user dataset with the specified parser.
**Usage**
```
CREATE DATASET <dataset_name> WITH EMBEDDING <embedding> PARSER <parser_type>;
```
**Parameters**
- `dataset_name`: Dataset name, quoted string.
- `embedding`: Embedding model name, quoted string.
- `parser_type`: Parser type, quoted string.
**Example**
```
ragflow> CREATE DATASET 'my_dataset' WITH EMBEDDING 'text-embedding-ada-002' PARSER 'pdf';
```
**Display Effect**
(Sample output will be provided by the user)
---
### create_user_dataset_with_pipeline
**Description**
Creates a user dataset with the specified pipeline.
**Usage**
```
CREATE DATASET <dataset_name> WITH EMBEDDING <embedding> PIPELINE <pipeline>;
```
**Parameters**
- `dataset_name`: Dataset name, quoted string.
- `embedding`: Embedding model name, quoted string.
- `pipeline`: Pipeline name, quoted string.
**Example**
```
ragflow> CREATE DATASET 'my_dataset' WITH EMBEDDING 'text-embedding-ada-002' PIPELINE 'standard';
```
**Display Effect**
(Sample output will be provided by the user)
---
### drop_user_dataset
**Description**
Deletes a user dataset.
**Usage**
```
DROP DATASET <dataset_name>;
```
**Parameters**
- `dataset_name`: Name of the dataset to delete, quoted string.
**Example**
```
ragflow> DROP DATASET 'my_dataset';
```
**Display Effect**
(Sample output will be provided by the user)
---
### list_user_datasets
**Description**
Lists all datasets for the current user.
**Usage**
```
LIST DATASETS;
```
**Parameters**
No parameters.
**Example**
```
ragflow> LIST DATASETS;
```
**Display Effect**
(Sample output will be provided by the user)
---
### list_user_dataset_files
**Description**
Lists all files in the specified dataset.
**Usage**
```
LIST FILES OF DATASET <dataset_name>;
```
**Parameters**
- `dataset_name`: Dataset name, quoted string.
**Example**
```
ragflow> LIST FILES OF DATASET 'my_dataset';
```
**Display Effect**
(Sample output will be provided by the user)
---
### list_user_agents
**Description**
Lists all agents for the current user.
**Usage**
```
LIST AGENTS;
```
**Parameters**
No parameters.
**Example**
```
ragflow> LIST AGENTS;
```
**Display Effect**
(Sample output will be provided by the user)
---
### list_user_chats
**Description**
Lists all chat sessions for the current user.
**Usage**
```
LIST CHATS;
```
**Parameters**
No parameters.
**Example**
```
ragflow> LIST CHATS;
```
**Display Effect**
(Sample output will be provided by the user)
---
### create_user_chat
**Description**
Creates a new chat session.
**Usage**
```
CREATE CHAT <chat_name>;
```
**Parameters**
- `chat_name`: Chat session name, quoted string.
**Example**
```
ragflow> CREATE CHAT 'my_chat';
```
**Display Effect**
(Sample output will be provided by the user)
---
### drop_user_chat
**Description**
Deletes a chat session.
**Usage**
```
DROP CHAT <chat_name>;
```
**Parameters**
- `chat_name`: Name of the chat session to delete, quoted string.
**Example**
```
ragflow> DROP CHAT 'my_chat';
```
**Display Effect**
(Sample output will be provided by the user)
---
### list_user_model_providers
**Description**
Lists all model providers for the current user.
**Usage**
```
LIST MODEL PROVIDERS;
```
**Parameters**
No parameters.
**Example**
```
ragflow> LIST MODEL PROVIDERS;
```
**Display Effect**
(Sample output will be provided by the user)
---
### list_user_default_models
**Description**
Lists all default model settings for the current user.
**Usage**
```
LIST DEFAULT MODELS;
```
**Parameters**
No parameters.
**Example**
```
ragflow> LIST DEFAULT MODELS;
```
**Display Effect**
(Sample output will be provided by the user)
---
### import_docs_into_dataset
**Description**
Imports documents into the specified dataset.
**Usage**
```
IMPORT <document_list> INTO DATASET <dataset_name>;
```
**Parameters**
- `document_list`: List of document paths, multiple paths can be separated by commas, or as a space-separated quoted string.
- `dataset_name`: Target dataset name, quoted string.
**Example**
```
ragflow> IMPORT '/path/to/doc1.pdf,/path/to/doc2.pdf' INTO DATASET 'my_dataset';
```
**Display Effect**
(Sample output will be provided by the user)
---
### search_on_datasets
**Description**
Searches in one or more specified datasets.
**Usage**
```
SEARCH <question> ON DATASETS <dataset_list>;
```
**Parameters**
- `question`: Search question, quoted string.
- `dataset_list`: List of dataset names, multiple names can be separated by commas, or as a space-separated quoted string.
**Example**
```
ragflow> SEARCH 'What is RAG?' ON DATASETS 'dataset1,dataset2';
```
**Display Effect**
(Sample output will be provided by the user)
---
### parse_dataset_docs
**Description**
Parses specified documents in a dataset.
**Usage**
```
PARSE <document_names> OF DATASET <dataset_name>;
```
**Parameters**
- `document_names`: List of document names, multiple names can be separated by commas, or as a space-separated quoted string.
- `dataset_name`: Dataset name, quoted string.
**Example**
```
ragflow> PARSE 'doc1.pdf,doc2.pdf' OF DATASET 'my_dataset';
```
**Display Effect**
(Sample output will be provided by the user)
---
### parse_dataset_sync
**Description**
Synchronously parses the entire dataset.
**Usage**
```
PARSE DATASET <dataset_name> SYNC;
```
**Parameters**
- `dataset_name`: Dataset name, quoted string.
**Example**
```
ragflow> PARSE DATASET 'my_dataset' SYNC;
```
**Display Effect**
(Sample output will be provided by the user)
---
### parse_dataset_async
**Description**
Asynchronously parses the entire dataset.
**Usage**
```
PARSE DATASET <dataset_name> ASYNC;
```
**Parameters**
- `dataset_name`: Dataset name, quoted string.
**Example**
```
ragflow> PARSE DATASET 'my_dataset' ASYNC;
```
**Display Effect**
(Sample output will be provided by the user)
---
### benchmark
**Description**
Performs performance benchmark testing on the specified user command.
**Usage**
```
BENCHMARK <concurrency> <iterations> <user_command>;
```
**Parameters**
- `concurrency`: Concurrency number, positive integer.
- `iterations`: Number of iterations, positive integer.
- `user_command`: User command to test (must be a valid user command, such as `PING;`).
**Example**
```
ragflow> BENCHMARK 5 10 PING;
```
**Display Effect**
(Sample output will be provided by the user)
---
**Notes**
- All string parameters (such as names, IDs, paths) must be enclosed in single quotes (`'`) or double quotes (`"`).
- Commands must end with a semicolon (`;`).
- The prompt is `ragflow>`.

View File

@@ -48,7 +48,7 @@ It consists of a server-side Service and a command-line client (CLI), both imple
1. Ensure the Admin Service is running.
2. Install ragflow-cli.
```bash
pip install ragflow-cli==0.24.0
pip install ragflow-cli==0.25.0
```
3. Launch the CLI client:
```bash

View File

@@ -77,10 +77,17 @@ sql_command: login_user
| drop_user_dataset
| list_user_datasets
| list_user_dataset_files
| list_user_dataset_documents
| list_user_datasets_metadata
| list_user_documents_metadata_summary
| list_user_agents
| list_user_chats
| create_user_chat
| drop_user_chat
| create_dataset_table
| drop_dataset_table
| create_metadata_table
| drop_metadata_table
| list_user_model_providers
| list_user_default_models
| parse_dataset_docs
@@ -88,15 +95,35 @@ sql_command: login_user
| parse_dataset_async
| import_docs_into_dataset
| search_on_datasets
| get_chunk
| list_chunks
| insert_dataset_from_file
| insert_metadata_from_file
| update_chunk
| set_metadata
| remove_tags
| remove_chunks
| create_chat_session
| drop_chat_session
| list_chat_sessions
| chat_on_session
| list_server_configs
| show_fingerprint
| set_license
| set_license_config
| show_license
| check_license
| benchmark
// meta command definition
meta_command: "\\" meta_command_name [meta_args]
COMMA: ","
meta_command_name: /[a-zA-Z?]+/
meta_args: (meta_arg)+
meta_arg: /[^\\s"']+/ | quoted_string
meta_arg: /[^\s"',]+/ | quoted_string
// command definition
@@ -117,6 +144,7 @@ ALTER: "ALTER"i
ACTIVE: "ACTIVE"i
ADMIN: "ADMIN"i
PASSWORD: "PASSWORD"i
DATASET_TABLE: "DATASET TABLE"i
DATASET: "DATASET"i
DATASETS: "DATASETS"i
OF: "OF"i
@@ -151,11 +179,18 @@ DEFAULT: "DEFAULT"i
CHATS: "CHATS"i
CHAT: "CHAT"i
FILES: "FILES"i
DOCUMENT: "DOCUMENT"i
DOCUMENTS: "DOCUMENTS"i
METADATA: "METADATA"i
SUMMARY: "SUMMARY"i
AS: "AS"i
PARSE: "PARSE"i
IMPORT: "IMPORT"i
INTO: "INTO"i
IN: "IN"i
WITH: "WITH"i
VECTOR: "VECTOR"i
SIZE: "SIZE"i
PARSER: "PARSER"i
PIPELINE: "PIPELINE"i
SEARCH: "SEARCH"i
@@ -170,8 +205,28 @@ ASYNC: "ASYNC"i
SYNC: "SYNC"i
BENCHMARK: "BENCHMARK"i
PING: "PING"i
SESSION: "SESSION"i
SESSIONS: "SESSIONS"i
SERVER: "SERVER"i
FINGERPRINT: "FINGERPRINT"i
LICENSE: "LICENSE"i
CHECK: "CHECK"i
CONFIG: "CONFIG"i
INDEX: "INDEX"i
TABLE: "TABLE"i
CHUNK: "CHUNK"i
CHUNKS: "CHUNKS"i
GET: "GET"i
INSERT: "INSERT"i
PAGE: "PAGE"i
KEYWORDS: "KEYWORDS"i
AVAILABLE: "AVAILABLE"i
FILE: "FILE"i
UPDATE: "UPDATE"i
REMOVE: "REMOVE"i
TAGS: "TAGS"i
login_user: LOGIN USER quoted_string ";"
login_user: LOGIN USER quoted_string (PASSWORD quoted_string)? ";"
list_services: LIST SERVICES ";"
show_service: SHOW SERVICE NUMBER ";"
startup_service: STARTUP SERVICE NUMBER ";"
@@ -215,6 +270,14 @@ list_variables: LIST VARS ";"
list_configs: LIST CONFIGS ";"
list_environments: LIST ENVS ";"
show_fingerprint: SHOW FINGERPRINT ";"
set_license: SET LICENSE quoted_string ";"
set_license_config: SET LICENSE CONFIG NUMBER NUMBER ";"
show_license: SHOW LICENSE ";"
check_license: CHECK LICENSE ";"
list_server_configs: LIST SERVER CONFIGS ";"
benchmark: BENCHMARK NUMBER NUMBER user_statement
user_statement: ping_server
@@ -246,6 +309,13 @@ user_statement: ping_server
| list_user_default_models
| import_docs_into_dataset
| search_on_datasets
| update_chunk
| set_metadata
| remove_tags
| create_chat_session
| drop_chat_session
| list_chat_sessions
| chat_on_session
ping_server: PING ";"
show_current_user: SHOW CURRENT USER ";"
@@ -270,24 +340,46 @@ create_user_dataset_with_parser: CREATE DATASET quoted_string WITH EMBEDDING quo
create_user_dataset_with_pipeline: CREATE DATASET quoted_string WITH EMBEDDING quoted_string PIPELINE quoted_string ";"
drop_user_dataset: DROP DATASET quoted_string ";"
list_user_dataset_files: LIST FILES OF DATASET quoted_string ";"
list_user_dataset_documents: LIST DOCUMENTS OF DATASET quoted_string ";"
list_user_datasets_metadata: LIST METADATA OF DATASETS quoted_string (COMMA quoted_string)* ";"
list_user_documents_metadata_summary: LIST METADATA SUMMARY OF DATASET quoted_string (DOCUMENTS quoted_string (COMMA quoted_string)*)? ";"
list_user_agents: LIST AGENTS ";"
list_user_chats: LIST CHATS ";"
create_user_chat: CREATE CHAT quoted_string ";"
drop_user_chat: DROP CHAT quoted_string ";"
create_chat_session: CREATE CHAT quoted_string SESSION ";"
drop_chat_session: DROP CHAT quoted_string SESSION quoted_string ";"
list_chat_sessions: LIST CHAT quoted_string SESSIONS ";"
chat_on_session: CHAT quoted_string ON quoted_string SESSION quoted_string ";"
list_user_model_providers: LIST MODEL PROVIDERS ";"
list_user_default_models: LIST DEFAULT MODELS ";"
import_docs_into_dataset: IMPORT quoted_string INTO DATASET quoted_string ";"
search_on_datasets: SEARCH quoted_string ON DATASETS quoted_string ";"
get_chunk: GET CHUNK quoted_string ";"
list_chunks: LIST CHUNKS OF DOCUMENT quoted_string ("PAGE" NUMBER)? ("SIZE" NUMBER)? ("KEYWORDS" quoted_string)? ("AVAILABLE" NUMBER)? ";"
set_metadata: SET METADATA OF DOCUMENT quoted_string TO quoted_string ";"
remove_tags: REMOVE TAGS quoted_string (COMMA quoted_string)* FROM DATASET quoted_string ";"
remove_chunks: REMOVE CHUNKS quoted_string (COMMA quoted_string)* FROM DOCUMENT quoted_string ";"
| REMOVE ALL CHUNKS FROM DOCUMENT quoted_string ";"
parse_dataset_docs: PARSE quoted_string OF DATASET quoted_string ";"
parse_dataset_sync: PARSE DATASET quoted_string SYNC ";"
parse_dataset_async: PARSE DATASET quoted_string ASYNC ";"
identifier_list: identifier ("," identifier)*
// Internal CLI only for GO
create_dataset_table: CREATE DATASET TABLE quoted_string VECTOR SIZE NUMBER ";"
drop_dataset_table: DROP DATASET TABLE quoted_string ";"
create_metadata_table: CREATE METADATA TABLE ";"
drop_metadata_table: DROP METADATA TABLE ";"
insert_dataset_from_file: INSERT DATASET FROM FILE quoted_string ";"
insert_metadata_from_file: INSERT METADATA FROM FILE quoted_string ";"
update_chunk: UPDATE CHUNK quoted_string OF DATASET quoted_string SET quoted_string ";"
identifier_list: identifier (COMMA identifier)*
identifier: WORD
quoted_string: QUOTED_STRING
status: WORD
status: ON | WORD
QUOTED_STRING: /'[^']+'/ | /"[^"]+"/
WORD: /[a-zA-Z0-9_\-\.]+/
@@ -307,7 +399,13 @@ class RAGFlowCLITransformer(Transformer):
def login_user(self, items):
email = items[2].children[0].strip("'\"")
return {"type": "login_user", "email": email}
if len(items) == 5:
# With password: LOGIN USER email PASSWORD password
password = items[4].children[0].strip("'\"")
return {"type": "login_user", "email": email, "password": password}
else:
# Without password: LOGIN USER email
return {"type": "login_user", "email": email}
def ping_server(self, items):
return {"type": "ping_server"}
@@ -459,6 +557,27 @@ class RAGFlowCLITransformer(Transformer):
def list_environments(self, items):
return {"type": "list_environments"}
def show_fingerprint(self, items):
return {"type": "show_fingerprint"}
def set_license(self, items):
license = items[2].children[0].strip("'\"")
return {"type": "set_license", "license": license}
def set_license_config(self, items):
value1: int = int(items[3])
value2: int = int(items[4])
return {"type": "set_license_config", "value1": value1, "value2": value2}
def show_license(self, items):
return {"type": "show_license"}
def check_license(self, items):
return {"type": "check_license"}
def list_server_configs(self, items):
return {"type": "list_server_configs"}
def create_model_provider(self, items):
provider_name = items[3].children[0].strip("'\"")
provider_key = items[4].children[0].strip("'\"")
@@ -538,6 +657,28 @@ class RAGFlowCLITransformer(Transformer):
dataset_name = items[4].children[0].strip("'\"")
return {"type": "list_user_dataset_files", "dataset_name": dataset_name}
def list_user_dataset_documents(self, items):
dataset_name = items[4].children[0].strip("'\"")
return {"type": "list_user_dataset_documents", "dataset_name": dataset_name}
def list_user_datasets_metadata(self, items):
dataset_names = []
dataset_names.append(items[4].children[0].strip("'\""))
for i in range(5, len(items)):
if items[i] and hasattr(items[i], 'children') and items[i].children:
dataset_names.append(items[i].children[0].strip("'\""))
return {"type": "list_user_datasets_metadata", "dataset_names": dataset_names}
def list_user_documents_metadata_summary(self, items):
dataset_name = items[5].children[0].strip("'\"")
doc_ids = []
if len(items) > 6 and items[6] == "DOCUMENTS":
for i in range(7, len(items)):
if items[i] and hasattr(items[i], 'children') and items[i].children:
doc_id = items[i].children[0].strip("'\"")
doc_ids.append(doc_id)
return {"type": "list_user_documents_metadata_summary", "dataset_name": dataset_name, "document_ids": doc_ids}
def list_user_agents(self, items):
return {"type": "list_user_agents"}
@@ -552,6 +693,30 @@ class RAGFlowCLITransformer(Transformer):
chat_name = items[2].children[0].strip("'\"")
return {"type": "drop_user_chat", "chat_name": chat_name}
def create_dataset_table(self, items):
dataset_name = None
vector_size = None
for i, item in enumerate(items):
if hasattr(item, 'data') and item.data == 'quoted_string':
dataset_name = item.children[0].strip("'\"")
if hasattr(item, 'type') and item.type == 'NUMBER':
if i > 0 and items[i-1].type == 'SIZE' and items[i-2].type == 'VECTOR':
vector_size = int(item)
return {"type": "create_dataset_table", "dataset_name": dataset_name, "vector_size": vector_size}
def drop_dataset_table(self, items):
dataset_name = None
for item in items:
if hasattr(item, 'data') and item.data == 'quoted_string':
dataset_name = item.children[0].strip("'\"")
return {"type": "drop_dataset_table", "dataset_name": dataset_name}
def create_metadata_table(self, items):
return {"type": "create_metadata_table"}
def drop_metadata_table(self, items):
return {"type": "drop_metadata_table"}
def list_user_model_providers(self, items):
return {"type": "list_user_model_providers"}
@@ -575,6 +740,25 @@ class RAGFlowCLITransformer(Transformer):
dataset_name = items[2].children[0].strip("'\"")
return {"type": "parse_dataset", "dataset_name": dataset_name, "method": "async"}
def create_chat_session(self, items):
chat_name = items[2].children[0].strip("'\"")
return {"type": "create_chat_session", "chat_name": chat_name}
def drop_chat_session(self, items):
chat_name = items[2].children[0].strip("'\"")
session_id = items[4].children[0].strip("'\"")
return {"type": "drop_chat_session", "chat_name": chat_name, "session_id": session_id}
def list_chat_sessions(self, items):
chat_name = items[2].children[0].strip("'\"")
return {"type": "list_chat_sessions", "chat_name": chat_name}
def chat_on_session(self, items):
message = items[1].children[0].strip("'\"")
chat_name = items[3].children[0].strip("'\"")
session_id = items[5].children[0].strip("'\"")
return {"type": "chat_on_session", "message": message, "chat_name": chat_name, "session_id": session_id}
def import_docs_into_dataset(self, items):
document_list_str = items[1].children[0].strip("'\"")
document_paths = document_list_str.split(",")
@@ -593,6 +777,103 @@ class RAGFlowCLITransformer(Transformer):
datasets = datasets.split(" ")
return {"type": "search_on_datasets", "datasets": datasets, "question": question}
def get_chunk(self, items):
chunk_id = items[2].children[0].strip("'\"")
return {"type": "get_chunk", "chunk_id": chunk_id}
def insert_dataset_from_file(self, items):
file_path = items[4].children[0].strip("'\"")
return {"type": "insert_dataset_from_file", "file_path": file_path}
def insert_metadata_from_file(self, items):
file_path = items[4].children[0].strip("'\"")
return {"type": "insert_metadata_from_file", "file_path": file_path}
def update_chunk(self, items):
def get_quoted_value(item):
if hasattr(item, 'children') and item.children:
return item.children[0].strip("'\"")
return str(item).strip("'\"")
chunk_id = get_quoted_value(items[2])
dataset_name = get_quoted_value(items[5])
json_body = get_quoted_value(items[7])
return {"type": "update_chunk", "chunk_id": chunk_id, "dataset_name": dataset_name, "json_body": json_body}
def set_metadata(self, items):
doc_id = items[4].children[0].strip("'\"")
meta_json = items[6].children[0].strip("'\"")
return {"type": "set_metadata", "doc_id": doc_id, "meta": meta_json}
def remove_tags(self, items):
# items: REMOVE, TAGS, quoted_string(tag1), quoted_string(tag2), ..., FROM, DATASET, quoted_string(dataset_name), ";"
tags = []
# Start from index 2 (after TAGS keyword) and parse quoted strings until FROM
for i in range(2, len(items)):
item = items[i]
# Check for FROM token to stop
if hasattr(item, 'type') and item.type == 'FROM':
break
if hasattr(item, 'children') and item.children:
tag = item.children[0].strip("'\"")
tags.append(tag)
# Find dataset_name: quoted_string after DATASET
dataset_name = None
for i, item in enumerate(items):
# Check if item is a DATASET token
if hasattr(item, 'type') and item.type == 'DATASET':
# Next item should be quoted_string
dataset_name = items[i + 1].children[0].strip("'\"")
break
return {"type": "remove_tags", "dataset_name": dataset_name, "tags": tags}
def remove_chunks(self, items):
# Handle two cases:
# 1. REMOVE CHUNKS quoted_string (COMMA quoted_string)* FROM DOCUMENT quoted_string ";"
# 2. REMOVE ALL CHUNKS FROM DOCUMENT quoted_string ";"
# Check if it's "REMOVE ALL CHUNKS"
for item in items:
if hasattr(item, 'type') and item.type == 'ALL':
# Find doc_id
for j, inner_item in enumerate(items):
if hasattr(inner_item, 'type') and inner_item.type == 'DOCUMENT':
doc_id = items[j + 1].children[0].strip("'\"")
return {"type": "remove_chunks", "doc_id": doc_id, "delete_all": True}
# Otherwise, we have chunk_ids
chunk_ids = []
doc_id = None
for i, item in enumerate(items):
if hasattr(item, 'type') and item.type == 'DOCUMENT':
doc_id = items[i + 1].children[0].strip("'\"")
elif hasattr(item, 'children') and item.children:
val = item.children[0].strip("'\"")
# Skip if it's "FROM" or "DOCUMENT"
if val.upper() in ['FROM', 'DOCUMENT']:
continue
chunk_ids.append(val)
return {"type": "remove_chunks", "doc_id": doc_id, "chunk_ids": chunk_ids}
def list_chunks(self, items):
doc_id = items[4].children[0].strip("'\"")
result = {"type": "list_chunks", "doc_id": doc_id}
# Parse optional parameters: PAGE, SIZE, KEYWORDS, AVAILABLE
# items structure varies based on which params are present
for i, item in enumerate(items):
if str(item) == "PAGE":
result["page"] = int(items[i + 1])
elif str(item) == "SIZE":
result["size"] = int(items[i + 1])
elif str(item) == "KEYWORDS":
result["keywords"] = items[i + 1].children[0].strip("'\"")
elif str(item) == "AVAILABLE":
result["available_int"] = int(items[i + 1])
return result
def benchmark(self, items):
concurrency: int = int(items[1])
iterations: int = int(items[2])

View File

@@ -1,6 +1,6 @@
[project]
name = "ragflow-cli"
version = "0.24.0"
version = "0.25.0"
description = "Admin Service's client of [RAGFlow](https://github.com/infiniflow/ragflow). The Admin Service provides user management and system monitoring. "
authors = [{ name = "Lynn", email = "lynn_inf@hotmail.com" }]
license = { text = "Apache License, Version 2.0" }
@@ -11,17 +11,17 @@ dependencies = [
"beartype>=0.20.0,<1.0.0",
"pycryptodomex>=3.10.0",
"lark>=1.1.0",
"requests-toolbelt>=1.0.0",
]
[dependency-groups]
test = [
"pytest>=8.3.5",
"requests>=2.32.3",
"requests-toolbelt>=1.0.0",
]
[tool.setuptools]
py-modules = ["ragflow_cli", "parser"]
py-modules = ["ragflow_cli", "parser", "http_client", "ragflow_client", "user"]
[project.scripts]
ragflow-cli = "ragflow_cli:main"

View File

@@ -18,6 +18,9 @@ import sys
import argparse
import base64
import getpass
import os
import atexit
import readline
from cmd import Cmd
from typing import Any, Dict, List
@@ -61,6 +64,12 @@ class RAGFlowCLI(Cmd):
self.port: int = 0
self.mode: str = "admin"
self.ragflow_client = None
# History file for readline persistence
self.history_file = os.path.expanduser("~/.ragflow_cli_history")
# Load existing history
self._load_history()
# Register cleanup to save history on exit
atexit.register(self._save_history)
intro = r"""Type "\h" for help."""
prompt = "ragflow> "
@@ -99,6 +108,7 @@ class RAGFlowCLI(Cmd):
return {"type": "empty"}
self.command_history.append(command_str)
readline.add_history(command_str)
try:
result = self.parser.parse(command_str)
@@ -210,6 +220,21 @@ class RAGFlowCLI(Cmd):
print(separator)
def _load_history(self):
"""Load command history from file."""
try:
if os.path.exists(self.history_file):
readline.read_history_file(self.history_file)
except Exception:
pass # Ignore errors loading history
def _save_history(self):
"""Save command history to file."""
try:
readline.write_history_file(self.history_file)
except Exception:
pass # Ignore errors saving history
def run_interactive(self, args):
if self.verify_auth(args, single_command=False, auth=args["auth"]):
print(r"""

File diff suppressed because it is too large Load Diff

View File

@@ -26,7 +26,19 @@ class AuthException(Exception):
def encrypt_password(password_plain: str) -> str:
try:
from api.utils.crypt import crypt
import base64
from Cryptodome.PublicKey import RSA
from Cryptodome.Cipher import PKCS1_v1_5 as Cipher_pkcs1_v1_5
def crypt(line):
"""
decrypt(crypt(input_string)) == base64(input_string), which frontend and ragflow_cli use.
"""
pub = "-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArq9XTUSeYr2+N1h3Afl/z8Dse/2yD0ZGrKwx+EEEcdsBLca9Ynmx3nIB5obmLlSfmskLpBo0UACBmB5rEjBp2Q2f3AG3Hjd4B+gNCG6BDaawuDlgANIhGnaTLrIqWrrcm4EMzJOnAOI1fgzJRsOOUEfaS318Eq9OVO3apEyCCt0lOQK6PuksduOjVxtltDav+guVAA068NrPYmRNabVKRNLJpL8w4D44sfth5RvZ3q9t+6RTArpEtc5sh5ChzvqPOzKGMXW83C95TxmXqpbK6olN4RevSfVjEAgCydH6HN6OhtOQEcnrU97r9H0iZOWwbw3pVrZiUkuRD1R56Wzs2wIDAQAB\n-----END PUBLIC KEY-----"
rsa_key = RSA.importKey(pub)
cipher = Cipher_pkcs1_v1_5.new(rsa_key)
password_base64 = base64.b64encode(line.encode('utf-8')).decode("utf-8")
encrypted_password = cipher.encrypt(password_base64.encode())
return base64.b64encode(encrypted_password).decode('utf-8')
except Exception as exc:
raise AuthException(
"Password encryption unavailable; install pycryptodomex (uv sync --python 3.12 --group test)."

138
admin/client/uv.lock generated
View File

@@ -1,6 +1,6 @@
version = 1
revision = 3
requires-python = ">=3.10, <3.13"
requires-python = ">=3.12, <3.15"
[[package]]
name = "beartype"
@@ -26,38 +26,6 @@ version = "3.4.4"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/13/69/33ddede1939fdd074bce5434295f38fae7136463422fe4fd3e0e89b98062/charset_normalizer-3.4.4.tar.gz", hash = "sha256:94537985111c35f28720e43603b8e7b43a6ecfb2ce1d3058bbe955b73404e21a", size = 129418, upload-time = "2025-10-14T04:42:32.879Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/1f/b8/6d51fc1d52cbd52cd4ccedd5b5b2f0f6a11bbf6765c782298b0f3e808541/charset_normalizer-3.4.4-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:e824f1492727fa856dd6eda4f7cee25f8518a12f3c4a56a74e8095695089cf6d", size = 209709, upload-time = "2025-10-14T04:40:11.385Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/5c/af/1f9d7f7faafe2ddfb6f72a2e07a548a629c61ad510fe60f9630309908fef/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4bd5d4137d500351a30687c2d3971758aac9a19208fc110ccb9d7188fbe709e8", size = 148814, upload-time = "2025-10-14T04:40:13.135Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/79/3d/f2e3ac2bbc056ca0c204298ea4e3d9db9b4afe437812638759db2c976b5f/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:027f6de494925c0ab2a55eab46ae5129951638a49a34d87f4c3eda90f696b4ad", size = 144467, upload-time = "2025-10-14T04:40:14.728Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ec/85/1bf997003815e60d57de7bd972c57dc6950446a3e4ccac43bc3070721856/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f820802628d2694cb7e56db99213f930856014862f3fd943d290ea8438d07ca8", size = 162280, upload-time = "2025-10-14T04:40:16.14Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/3e/8e/6aa1952f56b192f54921c436b87f2aaf7c7a7c3d0d1a765547d64fd83c13/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:798d75d81754988d2565bff1b97ba5a44411867c0cf32b77a7e8f8d84796b10d", size = 159454, upload-time = "2025-10-14T04:40:17.567Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/36/3b/60cbd1f8e93aa25d1c669c649b7a655b0b5fb4c571858910ea9332678558/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9d1bb833febdff5c8927f922386db610b49db6e0d4f4ee29601d71e7c2694313", size = 153609, upload-time = "2025-10-14T04:40:19.08Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/64/91/6a13396948b8fd3c4b4fd5bc74d045f5637d78c9675585e8e9fbe5636554/charset_normalizer-3.4.4-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:9cd98cdc06614a2f768d2b7286d66805f94c48cde050acdbbb7db2600ab3197e", size = 151849, upload-time = "2025-10-14T04:40:20.607Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/b7/7a/59482e28b9981d105691e968c544cc0df3b7d6133152fb3dcdc8f135da7a/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:077fbb858e903c73f6c9db43374fd213b0b6a778106bc7032446a8e8b5b38b93", size = 151586, upload-time = "2025-10-14T04:40:21.719Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/92/59/f64ef6a1c4bdd2baf892b04cd78792ed8684fbc48d4c2afe467d96b4df57/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_armv7l.whl", hash = "sha256:244bfb999c71b35de57821b8ea746b24e863398194a4014e4c76adc2bbdfeff0", size = 145290, upload-time = "2025-10-14T04:40:23.069Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/6b/63/3bf9f279ddfa641ffa1962b0db6a57a9c294361cc2f5fcac997049a00e9c/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:64b55f9dce520635f018f907ff1b0df1fdc31f2795a922fb49dd14fbcdf48c84", size = 163663, upload-time = "2025-10-14T04:40:24.17Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ed/09/c9e38fc8fa9e0849b172b581fd9803bdf6e694041127933934184e19f8c3/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:faa3a41b2b66b6e50f84ae4a68c64fcd0c44355741c6374813a800cd6695db9e", size = 151964, upload-time = "2025-10-14T04:40:25.368Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d2/d1/d28b747e512d0da79d8b6a1ac18b7ab2ecfd81b2944c4c710e166d8dd09c/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:6515f3182dbe4ea06ced2d9e8666d97b46ef4c75e326b79bb624110f122551db", size = 161064, upload-time = "2025-10-14T04:40:26.806Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/bb/9a/31d62b611d901c3b9e5500c36aab0ff5eb442043fb3a1c254200d3d397d9/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:cc00f04ed596e9dc0da42ed17ac5e596c6ccba999ba6bd92b0e0aef2f170f2d6", size = 155015, upload-time = "2025-10-14T04:40:28.284Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/1f/f3/107e008fa2bff0c8b9319584174418e5e5285fef32f79d8ee6a430d0039c/charset_normalizer-3.4.4-cp310-cp310-win32.whl", hash = "sha256:f34be2938726fc13801220747472850852fe6b1ea75869a048d6f896838c896f", size = 99792, upload-time = "2025-10-14T04:40:29.613Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/eb/66/e396e8a408843337d7315bab30dbf106c38966f1819f123257f5520f8a96/charset_normalizer-3.4.4-cp310-cp310-win_amd64.whl", hash = "sha256:a61900df84c667873b292c3de315a786dd8dac506704dea57bc957bd31e22c7d", size = 107198, upload-time = "2025-10-14T04:40:30.644Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/b5/58/01b4f815bf0312704c267f2ccb6e5d42bcc7752340cd487bc9f8c3710597/charset_normalizer-3.4.4-cp310-cp310-win_arm64.whl", hash = "sha256:cead0978fc57397645f12578bfd2d5ea9138ea0fac82b2f63f7f7c6877986a69", size = 100262, upload-time = "2025-10-14T04:40:32.108Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ed/27/c6491ff4954e58a10f69ad90aca8a1b6fe9c5d3c6f380907af3c37435b59/charset_normalizer-3.4.4-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:6e1fcf0720908f200cd21aa4e6750a48ff6ce4afe7ff5a79a90d5ed8a08296f8", size = 206988, upload-time = "2025-10-14T04:40:33.79Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/94/59/2e87300fe67ab820b5428580a53cad894272dbb97f38a7a814a2a1ac1011/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5f819d5fe9234f9f82d75bdfa9aef3a3d72c4d24a6e57aeaebba32a704553aa0", size = 147324, upload-time = "2025-10-14T04:40:34.961Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/07/fb/0cf61dc84b2b088391830f6274cb57c82e4da8bbc2efeac8c025edb88772/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:a59cb51917aa591b1c4e6a43c132f0cdc3c76dbad6155df4e28ee626cc77a0a3", size = 142742, upload-time = "2025-10-14T04:40:36.105Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/62/8b/171935adf2312cd745d290ed93cf16cf0dfe320863ab7cbeeae1dcd6535f/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8ef3c867360f88ac904fd3f5e1f902f13307af9052646963ee08ff4f131adafc", size = 160863, upload-time = "2025-10-14T04:40:37.188Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/09/73/ad875b192bda14f2173bfc1bc9a55e009808484a4b256748d931b6948442/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d9e45d7faa48ee908174d8fe84854479ef838fc6a705c9315372eacbc2f02897", size = 157837, upload-time = "2025-10-14T04:40:38.435Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/6d/fc/de9cce525b2c5b94b47c70a4b4fb19f871b24995c728e957ee68ab1671ea/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:840c25fb618a231545cbab0564a799f101b63b9901f2569faecd6b222ac72381", size = 151550, upload-time = "2025-10-14T04:40:40.053Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/55/c2/43edd615fdfba8c6f2dfbd459b25a6b3b551f24ea21981e23fb768503ce1/charset_normalizer-3.4.4-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ca5862d5b3928c4940729dacc329aa9102900382fea192fc5e52eb69d6093815", size = 149162, upload-time = "2025-10-14T04:40:41.163Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/03/86/bde4ad8b4d0e9429a4e82c1e8f5c659993a9a863ad62c7df05cf7b678d75/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d9c7f57c3d666a53421049053eaacdd14bbd0a528e2186fcb2e672effd053bb0", size = 150019, upload-time = "2025-10-14T04:40:42.276Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/1f/86/a151eb2af293a7e7bac3a739b81072585ce36ccfb4493039f49f1d3cae8c/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:277e970e750505ed74c832b4bf75dac7476262ee2a013f5574dd49075879e161", size = 143310, upload-time = "2025-10-14T04:40:43.439Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/b5/fe/43dae6144a7e07b87478fdfc4dbe9efd5defb0e7ec29f5f58a55aeef7bf7/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:31fd66405eaf47bb62e8cd575dc621c56c668f27d46a61d975a249930dd5e2a4", size = 162022, upload-time = "2025-10-14T04:40:44.547Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/80/e6/7aab83774f5d2bca81f42ac58d04caf44f0cc2b65fc6db2b3b2e8a05f3b3/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:0d3d8f15c07f86e9ff82319b3d9ef6f4bf907608f53fe9d92b28ea9ae3d1fd89", size = 149383, upload-time = "2025-10-14T04:40:46.018Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/4f/e8/b289173b4edae05c0dde07f69f8db476a0b511eac556dfe0d6bda3c43384/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:9f7fcd74d410a36883701fafa2482a6af2ff5ba96b9a620e9e0721e28ead5569", size = 159098, upload-time = "2025-10-14T04:40:47.081Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d8/df/fe699727754cae3f8478493c7f45f777b17c3ef0600e28abfec8619eb49c/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:ebf3e58c7ec8a8bed6d66a75d7fb37b55e5015b03ceae72a8e7c74495551e224", size = 152991, upload-time = "2025-10-14T04:40:48.246Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/1a/86/584869fe4ddb6ffa3bd9f491b87a01568797fb9bd8933f557dba9771beaf/charset_normalizer-3.4.4-cp311-cp311-win32.whl", hash = "sha256:eecbc200c7fd5ddb9a7f16c7decb07b566c29fa2161a16cf67b8d068bd21690a", size = 99456, upload-time = "2025-10-14T04:40:49.376Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/65/f6/62fdd5feb60530f50f7e38b4f6a1d5203f4d16ff4f9f0952962c044e919a/charset_normalizer-3.4.4-cp311-cp311-win_amd64.whl", hash = "sha256:5ae497466c7901d54b639cf42d5b8c1b6a4fead55215500d2f486d34db48d016", size = 106978, upload-time = "2025-10-14T04:40:50.844Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/7a/9d/0710916e6c82948b3be62d9d398cb4fcf4e97b56d6a6aeccd66c4b2f2bd5/charset_normalizer-3.4.4-cp311-cp311-win_arm64.whl", hash = "sha256:65e2befcd84bc6f37095f5961e68a6f077bf44946771354a28ad434c2cce0ae1", size = 99969, upload-time = "2025-10-14T04:40:52.272Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f3/85/1637cd4af66fa687396e757dec650f28025f2a2f5a5531a3208dc0ec43f2/charset_normalizer-3.4.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:0a98e6759f854bd25a58a73fa88833fba3b7c491169f86ce1180c948ab3fd394", size = 208425, upload-time = "2025-10-14T04:40:53.353Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/9d/6a/04130023fef2a0d9c62d0bae2649b69f7b7d8d24ea5536feef50551029df/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b5b290ccc2a263e8d185130284f8501e3e36c5e02750fc6b6bdeb2e9e96f1e25", size = 148162, upload-time = "2025-10-14T04:40:54.558Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/78/29/62328d79aa60da22c9e0b9a66539feae06ca0f5a4171ac4f7dc285b83688/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74bb723680f9f7a6234dcf67aea57e708ec1fbdf5699fb91dfd6f511b0a320ef", size = 144558, upload-time = "2025-10-14T04:40:55.677Z" },
@@ -74,6 +42,38 @@ wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/a8/ef/89297262b8092b312d29cdb2517cb1237e51db8ecef2e9af5edbe7b683b1/charset_normalizer-3.4.4-cp312-cp312-win32.whl", hash = "sha256:5833d2c39d8896e4e19b689ffc198f08ea58116bee26dea51e362ecc7cd3ed26", size = 99694, upload-time = "2025-10-14T04:41:09.23Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/3d/2d/1e5ed9dd3b3803994c155cd9aacb60c82c331bad84daf75bcb9c91b3295e/charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl", hash = "sha256:a79cfe37875f822425b89a82333404539ae63dbdddf97f84dcbc3d339aae9525", size = 107131, upload-time = "2025-10-14T04:41:10.467Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d0/d9/0ed4c7098a861482a7b6a95603edce4c0d9db2311af23da1fb2b75ec26fc/charset_normalizer-3.4.4-cp312-cp312-win_arm64.whl", hash = "sha256:376bec83a63b8021bb5c8ea75e21c4ccb86e7e45ca4eb81146091b56599b80c3", size = 100390, upload-time = "2025-10-14T04:41:11.915Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/97/45/4b3a1239bbacd321068ea6e7ac28875b03ab8bc0aa0966452db17cd36714/charset_normalizer-3.4.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:e1f185f86a6f3403aa2420e815904c67b2f9ebc443f045edd0de921108345794", size = 208091, upload-time = "2025-10-14T04:41:13.346Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/7d/62/73a6d7450829655a35bb88a88fca7d736f9882a27eacdca2c6d505b57e2e/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b39f987ae8ccdf0d2642338faf2abb1862340facc796048b604ef14919e55ed", size = 147936, upload-time = "2025-10-14T04:41:14.461Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/89/c5/adb8c8b3d6625bef6d88b251bbb0d95f8205831b987631ab0c8bb5d937c2/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:3162d5d8ce1bb98dd51af660f2121c55d0fa541b46dff7bb9b9f86ea1d87de72", size = 144180, upload-time = "2025-10-14T04:41:15.588Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/91/ed/9706e4070682d1cc219050b6048bfd293ccf67b3d4f5a4f39207453d4b99/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:81d5eb2a312700f4ecaa977a8235b634ce853200e828fbadf3a9c50bab278328", size = 161346, upload-time = "2025-10-14T04:41:16.738Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d5/0d/031f0d95e4972901a2f6f09ef055751805ff541511dc1252ba3ca1f80cf5/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5bd2293095d766545ec1a8f612559f6b40abc0eb18bb2f5d1171872d34036ede", size = 158874, upload-time = "2025-10-14T04:41:17.923Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f5/83/6ab5883f57c9c801ce5e5677242328aa45592be8a00644310a008d04f922/charset_normalizer-3.4.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a8a8b89589086a25749f471e6a900d3f662d1d3b6e2e59dcecf787b1cc3a1894", size = 153076, upload-time = "2025-10-14T04:41:19.106Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/75/1e/5ff781ddf5260e387d6419959ee89ef13878229732732ee73cdae01800f2/charset_normalizer-3.4.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc7637e2f80d8530ee4a78e878bce464f70087ce73cf7c1caf142416923b98f1", size = 150601, upload-time = "2025-10-14T04:41:20.245Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d7/57/71be810965493d3510a6ca79b90c19e48696fb1ff964da319334b12677f0/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f8bf04158c6b607d747e93949aa60618b61312fe647a6369f88ce2ff16043490", size = 150376, upload-time = "2025-10-14T04:41:21.398Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/e5/d5/c3d057a78c181d007014feb7e9f2e65905a6c4ef182c0ddf0de2924edd65/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:554af85e960429cf30784dd47447d5125aaa3b99a6f0683589dbd27e2f45da44", size = 144825, upload-time = "2025-10-14T04:41:22.583Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/e6/8c/d0406294828d4976f275ffbe66f00266c4b3136b7506941d87c00cab5272/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:74018750915ee7ad843a774364e13a3db91682f26142baddf775342c3f5b1133", size = 162583, upload-time = "2025-10-14T04:41:23.754Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d7/24/e2aa1f18c8f15c4c0e932d9287b8609dd30ad56dbe41d926bd846e22fb8d/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:c0463276121fdee9c49b98908b3a89c39be45d86d1dbaa22957e38f6321d4ce3", size = 150366, upload-time = "2025-10-14T04:41:25.27Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/e4/5b/1e6160c7739aad1e2df054300cc618b06bf784a7a164b0f238360721ab86/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:362d61fd13843997c1c446760ef36f240cf81d3ebf74ac62652aebaf7838561e", size = 160300, upload-time = "2025-10-14T04:41:26.725Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/7a/10/f882167cd207fbdd743e55534d5d9620e095089d176d55cb22d5322f2afd/charset_normalizer-3.4.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9a26f18905b8dd5d685d6d07b0cdf98a79f3c7a918906af7cc143ea2e164c8bc", size = 154465, upload-time = "2025-10-14T04:41:28.322Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/89/66/c7a9e1b7429be72123441bfdbaf2bc13faab3f90b933f664db506dea5915/charset_normalizer-3.4.4-cp313-cp313-win32.whl", hash = "sha256:9b35f4c90079ff2e2edc5b26c0c77925e5d2d255c42c74fdb70fb49b172726ac", size = 99404, upload-time = "2025-10-14T04:41:29.95Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/c4/26/b9924fa27db384bdcd97ab83b4f0a8058d96ad9626ead570674d5e737d90/charset_normalizer-3.4.4-cp313-cp313-win_amd64.whl", hash = "sha256:b435cba5f4f750aa6c0a0d92c541fb79f69a387c91e61f1795227e4ed9cece14", size = 107092, upload-time = "2025-10-14T04:41:31.188Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/af/8f/3ed4bfa0c0c72a7ca17f0380cd9e4dd842b09f664e780c13cff1dcf2ef1b/charset_normalizer-3.4.4-cp313-cp313-win_arm64.whl", hash = "sha256:542d2cee80be6f80247095cc36c418f7bddd14f4a6de45af91dfad36d817bba2", size = 100408, upload-time = "2025-10-14T04:41:32.624Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/2a/35/7051599bd493e62411d6ede36fd5af83a38f37c4767b92884df7301db25d/charset_normalizer-3.4.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:da3326d9e65ef63a817ecbcc0df6e94463713b754fe293eaa03da99befb9a5bd", size = 207746, upload-time = "2025-10-14T04:41:33.773Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/10/9a/97c8d48ef10d6cd4fcead2415523221624bf58bcf68a802721a6bc807c8f/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8af65f14dc14a79b924524b1e7fffe304517b2bff5a58bf64f30b98bbc5079eb", size = 147889, upload-time = "2025-10-14T04:41:34.897Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/10/bf/979224a919a1b606c82bd2c5fa49b5c6d5727aa47b4312bb27b1734f53cd/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74664978bb272435107de04e36db5a9735e78232b85b77d45cfb38f758efd33e", size = 143641, upload-time = "2025-10-14T04:41:36.116Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ba/33/0ad65587441fc730dc7bd90e9716b30b4702dc7b617e6ba4997dc8651495/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:752944c7ffbfdd10c074dc58ec2d5a8a4cd9493b314d367c14d24c17684ddd14", size = 160779, upload-time = "2025-10-14T04:41:37.229Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/67/ed/331d6b249259ee71ddea93f6f2f0a56cfebd46938bde6fcc6f7b9a3d0e09/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d1f13550535ad8cff21b8d757a3257963e951d96e20ec82ab44bc64aeb62a191", size = 159035, upload-time = "2025-10-14T04:41:38.368Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/67/ff/f6b948ca32e4f2a4576aa129d8bed61f2e0543bf9f5f2b7fc3758ed005c9/charset_normalizer-3.4.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ecaae4149d99b1c9e7b88bb03e3221956f68fd6d50be2ef061b2381b61d20838", size = 152542, upload-time = "2025-10-14T04:41:39.862Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/16/85/276033dcbcc369eb176594de22728541a925b2632f9716428c851b149e83/charset_normalizer-3.4.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:cb6254dc36b47a990e59e1068afacdcd02958bdcce30bb50cc1700a8b9d624a6", size = 149524, upload-time = "2025-10-14T04:41:41.319Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/9e/f2/6a2a1f722b6aba37050e626530a46a68f74e63683947a8acff92569f979a/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:c8ae8a0f02f57a6e61203a31428fa1d677cbe50c93622b4149d5c0f319c1d19e", size = 150395, upload-time = "2025-10-14T04:41:42.539Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/60/bb/2186cb2f2bbaea6338cad15ce23a67f9b0672929744381e28b0592676824/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:47cc91b2f4dd2833fddaedd2893006b0106129d4b94fdb6af1f4ce5a9965577c", size = 143680, upload-time = "2025-10-14T04:41:43.661Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/7d/a5/bf6f13b772fbb2a90360eb620d52ed8f796f3c5caee8398c3b2eb7b1c60d/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:82004af6c302b5d3ab2cfc4cc5f29db16123b1a8417f2e25f9066f91d4411090", size = 162045, upload-time = "2025-10-14T04:41:44.821Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/df/c5/d1be898bf0dc3ef9030c3825e5d3b83f2c528d207d246cbabe245966808d/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:2b7d8f6c26245217bd2ad053761201e9f9680f8ce52f0fcd8d0755aeae5b2152", size = 149687, upload-time = "2025-10-14T04:41:46.442Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/a5/42/90c1f7b9341eef50c8a1cb3f098ac43b0508413f33affd762855f67a410e/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:799a7a5e4fb2d5898c60b640fd4981d6a25f1c11790935a44ce38c54e985f828", size = 160014, upload-time = "2025-10-14T04:41:47.631Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/76/be/4d3ee471e8145d12795ab655ece37baed0929462a86e72372fd25859047c/charset_normalizer-3.4.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:99ae2cffebb06e6c22bdc25801d7b30f503cc87dbd283479e7b606f70aff57ec", size = 154044, upload-time = "2025-10-14T04:41:48.81Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/b0/6f/8f7af07237c34a1defe7defc565a9bc1807762f672c0fde711a4b22bf9c0/charset_normalizer-3.4.4-cp314-cp314-win32.whl", hash = "sha256:f9d332f8c2a2fcbffe1378594431458ddbef721c1769d78e2cbc06280d8155f9", size = 99940, upload-time = "2025-10-14T04:41:49.946Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/4b/51/8ade005e5ca5b0d80fb4aff72a3775b325bdc3d27408c8113811a7cbe640/charset_normalizer-3.4.4-cp314-cp314-win_amd64.whl", hash = "sha256:8a6562c3700cce886c5be75ade4a5db4214fda19fede41d9792d100288d8f94c", size = 107104, upload-time = "2025-10-14T04:41:51.051Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/da/5f/6b8f83a55bb8278772c5ae54a577f3099025f9ade59d0136ac24a0df4bde/charset_normalizer-3.4.4-cp314-cp314-win_arm64.whl", hash = "sha256:de00632ca48df9daf77a2c65a484531649261ec9f25489917f09e455cb09ddb2", size = 100743, upload-time = "2025-10-14T04:41:52.122Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
]
@@ -86,18 +86,6 @@ wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
]
[[package]]
name = "exceptiongroup"
version = "1.3.1"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/50/79/66800aadf48771f6b62f7eb014e352e5d06856655206165d775e675a02c9/exceptiongroup-1.3.1.tar.gz", hash = "sha256:8b412432c6055b0b7d14c310000ae93352ed6754f70fa8f7c34141f91c4e3219", size = 30371, upload-time = "2025-11-21T23:01:54.787Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/8a/0e/97c33bf5009bdbac74fd2beace167cab3f978feb69cc36f1ef79360d6c4e/exceptiongroup-1.3.1-py3-none-any.whl", hash = "sha256:a7a39a3bd276781e98394987d3a5701d0c4edffb633bb7a5144577f82c773598", size = 16740, upload-time = "2025-11-21T23:01:53.443Z" },
]
[[package]]
name = "idna"
version = "3.11"
@@ -149,6 +137,17 @@ version = "3.23.0"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/c9/85/e24bf90972a30b0fcd16c73009add1d7d7cd9140c2498a68252028899e41/pycryptodomex-3.23.0.tar.gz", hash = "sha256:71909758f010c82bc99b0abf4ea12012c98962fbf0583c2164f8b84533c2e4da", size = 4922157, upload-time = "2025-05-17T17:23:41.434Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/2e/00/10edb04777069a42490a38c137099d4b17ba6e36a4e6e28bdc7470e9e853/pycryptodomex-3.23.0-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:7b37e08e3871efe2187bc1fd9320cc81d87caf19816c648f24443483005ff886", size = 2498764, upload-time = "2025-05-17T17:22:21.453Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/6b/3f/2872a9c2d3a27eac094f9ceaa5a8a483b774ae69018040ea3240d5b11154/pycryptodomex-3.23.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:91979028227543010d7b2ba2471cf1d1e398b3f183cb105ac584df0c36dac28d", size = 1643012, upload-time = "2025-05-17T17:22:23.702Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/70/af/774c2e2b4f6570fbf6a4972161adbb183aeeaa1863bde31e8706f123bf92/pycryptodomex-3.23.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b8962204c47464d5c1c4038abeadd4514a133b28748bcd9fa5b6d62e3cec6fa", size = 2187643, upload-time = "2025-05-17T17:22:26.37Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/de/a3/71065b24cb889d537954cedc3ae5466af00a2cabcff8e29b73be047e9a19/pycryptodomex-3.23.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a33986a0066860f7fcf7c7bd2bc804fa90e434183645595ae7b33d01f3c91ed8", size = 2273762, upload-time = "2025-05-17T17:22:28.313Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/c9/0b/ff6f43b7fbef4d302c8b981fe58467b8871902cdc3eb28896b52421422cc/pycryptodomex-3.23.0-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c7947ab8d589e3178da3d7cdeabe14f841b391e17046954f2fbcd941705762b5", size = 2313012, upload-time = "2025-05-17T17:22:30.57Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/02/de/9d4772c0506ab6da10b41159493657105d3f8bb5c53615d19452afc6b315/pycryptodomex-3.23.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:c25e30a20e1b426e1f0fa00131c516f16e474204eee1139d1603e132acffc314", size = 2186856, upload-time = "2025-05-17T17:22:32.819Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/28/ad/8b30efcd6341707a234e5eba5493700a17852ca1ac7a75daa7945fcf6427/pycryptodomex-3.23.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:da4fa650cef02db88c2b98acc5434461e027dce0ae8c22dd5a69013eaf510006", size = 2347523, upload-time = "2025-05-17T17:22:35.386Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/0f/02/16868e9f655b7670dbb0ac4f2844145cbc42251f916fc35c414ad2359849/pycryptodomex-3.23.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:58b851b9effd0d072d4ca2e4542bf2a4abcf13c82a29fd2c93ce27ee2a2e9462", size = 2272825, upload-time = "2025-05-17T17:22:37.632Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ca/18/4ca89ac737230b52ac8ffaca42f9c6f1fd07c81a6cd821e91af79db60632/pycryptodomex-3.23.0-cp313-cp313t-win32.whl", hash = "sha256:a9d446e844f08299236780f2efa9898c818fe7e02f17263866b8550c7d5fb328", size = 1772078, upload-time = "2025-05-17T17:22:40Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/73/34/13e01c322db027682e00986873eca803f11c56ade9ba5bbf3225841ea2d4/pycryptodomex-3.23.0-cp313-cp313t-win_amd64.whl", hash = "sha256:bc65bdd9fc8de7a35a74cab1c898cab391a4add33a8fe740bda00f5976ca4708", size = 1803656, upload-time = "2025-05-17T17:22:42.139Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/54/68/9504c8796b1805d58f4425002bcca20f12880e6fa4dc2fc9a668705c7a08/pycryptodomex-3.23.0-cp313-cp313t-win_arm64.whl", hash = "sha256:c885da45e70139464f082018ac527fdaad26f1657a99ee13eecdce0f0ca24ab4", size = 1707172, upload-time = "2025-05-17T17:22:44.704Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/dd/9c/1a8f35daa39784ed8adf93a694e7e5dc15c23c741bbda06e1d45f8979e9e/pycryptodomex-3.23.0-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:06698f957fe1ab229a99ba2defeeae1c09af185baa909a31a5d1f9d42b1aaed6", size = 2499240, upload-time = "2025-05-17T17:22:46.953Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/7a/62/f5221a191a97157d240cf6643747558759126c76ee92f29a3f4aee3197a5/pycryptodomex-3.23.0-cp37-abi3-macosx_10_9_x86_64.whl", hash = "sha256:b2c2537863eccef2d41061e82a881dcabb04944c5c06c5aa7110b577cc487545", size = 1644042, upload-time = "2025-05-17T17:22:49.098Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/8c/fd/5a054543c8988d4ed7b612721d7e78a4b9bf36bc3c5ad45ef45c22d0060e/pycryptodomex-3.23.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:43c446e2ba8df8889e0e16f02211c25b4934898384c1ec1ec04d7889c0333587", size = 2186227, upload-time = "2025-05-17T17:22:51.139Z" },
@@ -160,11 +159,6 @@ wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/8d/67/09ee8500dd22614af5fbaa51a4aee6e342b5fa8aecf0a6cb9cbf52fa6d45/pycryptodomex-3.23.0-cp37-abi3-win32.whl", hash = "sha256:189afbc87f0b9f158386bf051f720e20fa6145975f1e76369303d0f31d1a8d7c", size = 1771969, upload-time = "2025-05-17T17:23:07.115Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/69/96/11f36f71a865dd6df03716d33bd07a67e9d20f6b8d39820470b766af323c/pycryptodomex-3.23.0-cp37-abi3-win_amd64.whl", hash = "sha256:52e5ca58c3a0b0bd5e100a9fbc8015059b05cffc6c66ce9d98b4b45e023443b9", size = 1803124, upload-time = "2025-05-17T17:23:09.267Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f9/93/45c1cdcbeb182ccd2e144c693eaa097763b08b38cded279f0053ed53c553/pycryptodomex-3.23.0-cp37-abi3-win_arm64.whl", hash = "sha256:02d87b80778c171445d67e23d1caef279bf4b25c3597050ccd2e13970b57fd51", size = 1707161, upload-time = "2025-05-17T17:23:11.414Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f3/b8/3e76d948c3c4ac71335bbe75dac53e154b40b0f8f1f022dfa295257a0c96/pycryptodomex-3.23.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:ebfff755c360d674306e5891c564a274a47953562b42fb74a5c25b8fc1fb1cb5", size = 1627695, upload-time = "2025-05-17T17:23:17.38Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/6a/cf/80f4297a4820dfdfd1c88cf6c4666a200f204b3488103d027b5edd9176ec/pycryptodomex-3.23.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:eca54f4bb349d45afc17e3011ed4264ef1cc9e266699874cdd1349c504e64798", size = 1675772, upload-time = "2025-05-17T17:23:19.202Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/d1/42/1e969ee0ad19fe3134b0e1b856c39bd0b70d47a4d0e81c2a8b05727394c9/pycryptodomex-3.23.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4f2596e643d4365e14d0879dc5aafe6355616c61c2176009270f3048f6d9a61f", size = 1668083, upload-time = "2025-05-17T17:23:21.867Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/6e/c3/1de4f7631fea8a992a44ba632aa40e0008764c0fb9bf2854b0acf78c2cf2/pycryptodomex-3.23.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fdfac7cda115bca3a5abb2f9e43bc2fb66c2b65ab074913643803ca7083a79ea", size = 1706056, upload-time = "2025-05-17T17:23:24.031Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f2/5f/af7da8e6f1e42b52f44a24d08b8e4c726207434e2593732d39e7af5e7256/pycryptodomex-3.23.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:14c37aaece158d0ace436f76a7bb19093db3b4deade9797abfc39ec6cd6cc2fe", size = 1806478, upload-time = "2025-05-17T17:23:26.066Z" },
]
[[package]]
@@ -182,12 +176,10 @@ version = "9.0.1"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
{ name = "exceptiongroup", marker = "python_full_version < '3.11'" },
{ name = "iniconfig" },
{ name = "packaging" },
{ name = "pluggy" },
{ name = "pygments" },
{ name = "tomli", marker = "python_full_version < '3.11'" },
]
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/07/56/f013048ac4bc4c1d9be45afd4ab209ea62822fb1598f40687e6bf45dcea4/pytest-9.0.1.tar.gz", hash = "sha256:3e9c069ea73583e255c3b21cf46b8d3c56f6e3a1a8f6da94ccb0fcf57b9d73c8", size = 1564125, upload-time = "2025-11-12T13:05:09.333Z" }
wheels = [
@@ -196,7 +188,7 @@ wheels = [
[[package]]
name = "ragflow-cli"
version = "0.24.0"
version = "0.25.0"
source = { virtual = "." }
dependencies = [
{ name = "beartype" },
@@ -254,45 +246,11 @@ wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" },
]
[[package]]
name = "tomli"
version = "2.3.0"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/52/ed/3f73f72945444548f33eba9a87fc7a6e969915e7b1acc8260b30e1f76a2f/tomli-2.3.0.tar.gz", hash = "sha256:64be704a875d2a59753d80ee8a533c3fe183e3f06807ff7dc2232938ccb01549", size = 17392, upload-time = "2025-10-08T22:01:47.119Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/b3/2e/299f62b401438d5fe1624119c723f5d877acc86a4c2492da405626665f12/tomli-2.3.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:88bd15eb972f3664f5ed4b57c1634a97153b4bac4479dcb6a495f41921eb7f45", size = 153236, upload-time = "2025-10-08T22:01:00.137Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/86/7f/d8fffe6a7aefdb61bced88fcb5e280cfd71e08939da5894161bd71bea022/tomli-2.3.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:883b1c0d6398a6a9d29b508c331fa56adbcdff647f6ace4dfca0f50e90dfd0ba", size = 148084, upload-time = "2025-10-08T22:01:01.63Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/47/5c/24935fb6a2ee63e86d80e4d3b58b222dafaf438c416752c8b58537c8b89a/tomli-2.3.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d1381caf13ab9f300e30dd8feadb3de072aeb86f1d34a8569453ff32a7dea4bf", size = 234832, upload-time = "2025-10-08T22:01:02.543Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/89/da/75dfd804fc11e6612846758a23f13271b76d577e299592b4371a4ca4cd09/tomli-2.3.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a0e285d2649b78c0d9027570d4da3425bdb49830a6156121360b3f8511ea3441", size = 242052, upload-time = "2025-10-08T22:01:03.836Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/70/8c/f48ac899f7b3ca7eb13af73bacbc93aec37f9c954df3c08ad96991c8c373/tomli-2.3.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:0a154a9ae14bfcf5d8917a59b51ffd5a3ac1fd149b71b47a3a104ca4edcfa845", size = 239555, upload-time = "2025-10-08T22:01:04.834Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ba/28/72f8afd73f1d0e7829bfc093f4cb98ce0a40ffc0cc997009ee1ed94ba705/tomli-2.3.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:74bf8464ff93e413514fefd2be591c3b0b23231a77f901db1eb30d6f712fc42c", size = 245128, upload-time = "2025-10-08T22:01:05.84Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/b6/eb/a7679c8ac85208706d27436e8d421dfa39d4c914dcf5fa8083a9305f58d9/tomli-2.3.0-cp311-cp311-win32.whl", hash = "sha256:00b5f5d95bbfc7d12f91ad8c593a1659b6387b43f054104cda404be6bda62456", size = 96445, upload-time = "2025-10-08T22:01:06.896Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/0a/fe/3d3420c4cb1ad9cb462fb52967080575f15898da97e21cb6f1361d505383/tomli-2.3.0-cp311-cp311-win_amd64.whl", hash = "sha256:4dc4ce8483a5d429ab602f111a93a6ab1ed425eae3122032db7e9acf449451be", size = 107165, upload-time = "2025-10-08T22:01:08.107Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/ff/b7/40f36368fcabc518bb11c8f06379a0fd631985046c038aca08c6d6a43c6e/tomli-2.3.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d7d86942e56ded512a594786a5ba0a5e521d02529b3826e7761a05138341a2ac", size = 154891, upload-time = "2025-10-08T22:01:09.082Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f9/3f/d9dd692199e3b3aab2e4e4dd948abd0f790d9ded8cd10cbaae276a898434/tomli-2.3.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:73ee0b47d4dad1c5e996e3cd33b8a76a50167ae5f96a2607cbe8cc773506ab22", size = 148796, upload-time = "2025-10-08T22:01:10.266Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/60/83/59bff4996c2cf9f9387a0f5a3394629c7efa5ef16142076a23a90f1955fa/tomli-2.3.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:792262b94d5d0a466afb5bc63c7daa9d75520110971ee269152083270998316f", size = 242121, upload-time = "2025-10-08T22:01:11.332Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/45/e5/7c5119ff39de8693d6baab6c0b6dcb556d192c165596e9fc231ea1052041/tomli-2.3.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4f195fe57ecceac95a66a75ac24d9d5fbc98ef0962e09b2eddec5d39375aae52", size = 250070, upload-time = "2025-10-08T22:01:12.498Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/45/12/ad5126d3a278f27e6701abde51d342aa78d06e27ce2bb596a01f7709a5a2/tomli-2.3.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e31d432427dcbf4d86958c184b9bfd1e96b5b71f8eb17e6d02531f434fd335b8", size = 245859, upload-time = "2025-10-08T22:01:13.551Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/fb/a1/4d6865da6a71c603cfe6ad0e6556c73c76548557a8d658f9e3b142df245f/tomli-2.3.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7b0882799624980785240ab732537fcfc372601015c00f7fc367c55308c186f6", size = 250296, upload-time = "2025-10-08T22:01:14.614Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/a0/b7/a7a7042715d55c9ba6e8b196d65d2cb662578b4d8cd17d882d45322b0d78/tomli-2.3.0-cp312-cp312-win32.whl", hash = "sha256:ff72b71b5d10d22ecb084d345fc26f42b5143c5533db5e2eaba7d2d335358876", size = 97124, upload-time = "2025-10-08T22:01:15.629Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/06/1e/f22f100db15a68b520664eb3328fb0ae4e90530887928558112c8d1f4515/tomli-2.3.0-cp312-cp312-win_amd64.whl", hash = "sha256:1cb4ed918939151a03f33d4242ccd0aa5f11b3547d0cf30f7c74a408a5b99878", size = 107698, upload-time = "2025-10-08T22:01:16.51Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/77/b8/0135fadc89e73be292b473cb820b4f5a08197779206b33191e801feeae40/tomli-2.3.0-py3-none-any.whl", hash = "sha256:e95b1af3c5b07d9e643909b5abbec77cd9f1217e6d0bca72b0234736b9fb1f1b", size = 14408, upload-time = "2025-10-08T22:01:46.04Z" },
]
[[package]]
name = "typing-extensions"
version = "4.15.0"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
]
[[package]]
name = "urllib3"
version = "2.5.0"
version = "2.6.3"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/15/22/9ee70a2574a4f4599c47dd506532914ce044817c7752a79b6a51286319bc/urllib3-2.5.0.tar.gz", hash = "sha256:3fc47733c7e419d4bc3f6b3dc2b4f890bb743906a30d56ba4a5bfa4bbff92760", size = 393185, upload-time = "2025-06-18T14:07:41.644Z" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/c7/24/5f1b3bdffd70275f6661c76461e25f024d5a38a46f04aaca912426a2b1d3/urllib3-2.6.3.tar.gz", hash = "sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed", size = 435556, upload-time = "2026-01-07T16:24:43.925Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl", hash = "sha256:e6b01673c0fa6a13e374b50871808eb3bf7046c4b125b216f6bf1cc604cff0dc", size = 129795, upload-time = "2025-06-18T14:07:40.39Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/39/08/aaaad47bc4e9dc8c725e68f9d04865dbcb2052843ff09c97b08904852d84/urllib3-2.6.3-py3-none-any.whl", hash = "sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4", size = 131584, upload-time = "2026-01-07T16:24:42.685Z" },
]

View File

@@ -21,7 +21,6 @@ import os
import signal
import logging
import threading
import traceback
import faulthandler
from flask import Flask
@@ -58,7 +57,7 @@ if __name__ == '__main__':
os.environ.get("MAX_CONTENT_LENGTH", 1024 * 1024 * 1024)
)
Session(app)
logging.info(f'RAGFlow version: {get_ragflow_version()}')
logging.info(f'RAGFlow admin version: {get_ragflow_version()}')
show_configs()
login_manager = LoginManager()
login_manager.init_app(app)
@@ -75,10 +74,10 @@ if __name__ == '__main__':
application=app,
threaded=True,
use_reloader=False,
use_debugger=True,
use_debugger=False,
)
except Exception:
traceback.print_exc()
except Exception as e:
logging.exception(f"Unhandled exception: {e}")
stop_event.set()
time.sleep(1)
os.kill(os.getpid(), signal.SIGKILL)

View File

@@ -22,7 +22,6 @@ from datetime import datetime
from flask import jsonify, request
from flask_login import current_user, login_user
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from api.common.exceptions import AdminException, UserNotFoundError
from api.common.base64 import encode_to_base64
@@ -40,18 +39,34 @@ from common import settings
def setup_auth(login_manager):
@login_manager.request_loader
def load_user(web_request):
jwt = Serializer(secret_key=settings.SECRET_KEY)
# Authorization header contains JWT-encoded access token
# First decode JWT to get the UUID, then query database
from itsdangerous.url_safe import URLSafeTimedSerializer as Serializer
from common import settings
authorization = web_request.headers.get("Authorization")
if authorization:
try:
access_token = str(jwt.loads(authorization))
# Strip "Bearer " prefix if present
jwt_token = authorization
if jwt_token.startswith("Bearer "):
jwt_token = jwt_token[7:]
if not access_token or not access_token.strip():
logging.warning("Authentication attempt with empty access token")
jwt_token = jwt_token.strip()
if not jwt_token:
logging.warning("Authentication attempt with empty JWT token")
return None
# Access tokens should be UUIDs (32 hex characters)
if len(access_token.strip()) < 32:
# Decode JWT to get the UUID access_token
jwt = Serializer(secret_key=settings.SECRET_KEY)
access_token = str(jwt.loads(jwt_token))
if not access_token or not access_token.strip():
logging.warning("Authentication attempt with empty access token after JWT decode")
return None
# Access tokens stored in database are UUIDs (32 hex characters)
if len(access_token) < 32:
logging.warning(f"Authentication attempt with invalid token format: {len(access_token)} chars")
return None
@@ -110,7 +125,8 @@ def add_tenant_for_admin(user_info: dict, role: str):
"embd_id": settings.EMBEDDING_MDL,
"asr_id": settings.ASR_MDL,
"parser_ids": settings.PARSERS,
"img2txt_id": settings.IMAGE2TEXT_MDL
"img2txt_id": settings.IMAGE2TEXT_MDL,
"rerank_id": settings.RERANK_MDL,
}
usr_tenant = {
"tenant_id": user_info["id"],

View File

@@ -264,6 +264,19 @@ def load_configurations(config_path: str) -> list[BaseConfig]:
db_name=database, detail_func_name="get_infinity_status")
configurations.append(config)
id_count += 1
case "minio_0":
name: str = 'minio_0'
url = v['host']
parts = url.split(':', 1)
host = parts[0]
port = int(parts[1])
user = v.get('user')
password = v.get('password')
config = MinioConfig(id=id_count, name=name, host=host, port=port, user=user, password=password,
service_type="file_store",
store_type="minio", detail_func_name="check_minio_alive")
configurations.append(config)
id_count += 1
case "minio":
name: str = 'minio'
url = v['host']
@@ -310,6 +323,14 @@ def load_configurations(config_path: str) -> list[BaseConfig]:
service_type="task_executor", detail_func_name="check_task_executor_alive")
configurations.append(config)
id_count += 1
case "rabbitmq":
name: str = 'rabbitmq'
host: str = v.get('host')
port: int = v.get('port')
config = RabbitMQConfig(id=id_count, name=name, host=host, port=port,
service_type="message_queue", mq_type="rabbitmq", detail_func_name="check_rabbitmq_alive")
configurations.append(config)
id_count += 1
case _:
logging.warning(f"Unknown configuration key: {k}")
continue

View File

@@ -30,13 +30,14 @@ from roles import RoleMgr
from api.common.exceptions import AdminException
from common.versions import get_ragflow_version
from api.utils.api_utils import generate_confirmation_token
from common.log_utils import get_log_levels, set_log_level
admin_bp = Blueprint("admin", __name__, url_prefix="/api/v1/admin")
@admin_bp.route("/ping", methods=["GET"])
def ping():
return success_response("PONG")
return success_response(message="pong")
@admin_bp.route("/login", methods=["POST"])
@@ -652,3 +653,39 @@ def test_sandbox_connection():
return error_response(str(e), 400)
except Exception as e:
return error_response(str(e), 500)
@admin_bp.route("/log_levels", methods=["GET"])
@login_required
@check_admin_auth
def get_logger_levels():
"""Get current log levels for all packages."""
try:
res = get_log_levels()
return success_response(res, "Get log levels", 0)
except Exception as e:
return error_response(str(e), 500)
@admin_bp.route("/log_levels", methods=["PUT"])
@login_required
@check_admin_auth
def set_logger_level():
"""Set log level for a package."""
try:
data = request.get_json()
if not data or "pkg_name" not in data or "level" not in data:
return error_response("pkg_name and level are required", 400)
pkg_name = data["pkg_name"]
level = data["level"]
if not isinstance(pkg_name, str) or not isinstance(level, str):
return error_response("pkg_name and level must be strings", 400)
success = set_log_level(pkg_name, level)
if success:
return success_response({"pkg_name": pkg_name, "level": level}, "Log level updated successfully")
else:
return error_response(f"Invalid log level: {level}", 400)
except Exception as e:
return error_response(str(e), 500)

View File

@@ -15,6 +15,7 @@
#
import asyncio
import base64
import datetime
import inspect
import binascii
import json
@@ -28,9 +29,11 @@ from typing import Any, Union, Tuple
from agent.component import component_class
from agent.component.base import ComponentBase
from agent.dsl_migration import normalize_chunker_dsl
from api.db.services.file_service import FileService
from api.db.services.llm_service import LLMBundle
from api.db.services.task_service import has_canceled
from api.db.joint_services.tenant_model_service import get_tenant_default_model_by_type
from common.constants import LLMType
from common.misc_utils import get_uuid, hash_str2int
from common.exceptions import TaskCanceledException
@@ -82,7 +85,8 @@ class Graph:
self.path = []
self.components = {}
self.error = ""
self.dsl = json.loads(dsl)
# Accept legacy DSL on read, but keep the in-memory canvas in the latest schema.
self.dsl = normalize_chunker_dsl(json.loads(dsl))
self._tenant_id = tenant_id
self.task_id = task_id if task_id else get_uuid()
self.custom_header = custom_header
@@ -286,7 +290,8 @@ class Canvas(Graph):
"sys.user_id": tenant_id,
"sys.conversation_turns": 0,
"sys.files": [],
"sys.history": []
"sys.history": [],
"sys.date": datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
}
self.variables = {}
super().__init__(dsl, tenant_id, task_id, custom_header=custom_header)
@@ -299,13 +304,16 @@ class Canvas(Graph):
self.globals = self.dsl["globals"]
if "sys.history" not in self.globals:
self.globals["sys.history"] = []
if "sys.date" not in self.globals:
self.globals["sys.date"] = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
else:
self.globals = {
"sys.query": "",
"sys.user_id": "",
"sys.conversation_turns": 0,
"sys.files": [],
"sys.history": []
"sys.history": [],
"sys.date": datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
}
if "variables" in self.dsl:
self.variables = self.dsl["variables"]
@@ -367,6 +375,7 @@ class Canvas(Graph):
self.globals[k] = ""
async def run(self, **kwargs):
self.globals["sys.date"] = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
st = time.perf_counter()
self._loop = asyncio.get_running_loop()
self.message_id = get_uuid()
@@ -386,10 +395,16 @@ class Canvas(Graph):
continue
self.components[k]["obj"].set_output(kk, vv)
layout_recognize = None
for cpn in self.components.values():
if cpn["obj"].component_name.lower() == "begin":
layout_recognize = getattr(cpn["obj"]._param, "layout_recognize", None)
break
for k in kwargs.keys():
if k in ["query", "user_id", "files"] and kwargs[k]:
if k == "files":
self.globals[f"sys.{k}"] = await self.get_files_async(kwargs[k])
self.globals[f"sys.{k}"] = await self.get_files_async(kwargs[k], layout_recognize)
else:
self.globals[f"sys.{k}"] = kwargs[k]
if not self.globals["sys.conversation_turns"] :
@@ -502,7 +517,8 @@ class Canvas(Graph):
cpn_obj = self.get_component_obj(self.path[i])
if cpn_obj.component_name.lower() == "message":
if cpn_obj.get_param("auto_play"):
tts_mdl = LLMBundle(self._tenant_id, LLMType.TTS)
tts_model_config = get_tenant_default_model_by_type(self._tenant_id, LLMType.TTS)
tts_mdl = LLMBundle(self._tenant_id, tts_model_config)
if isinstance(cpn_obj.output("content"), partial):
_m = ""
buff_m = ""
@@ -547,18 +563,10 @@ class Canvas(Graph):
yield decorate("message", {"content": "", "audio_binary": self.tts(tts_mdl, buff_m)})
buff_m = ""
cpn_obj.set_output("content", _m)
cite = re.search(r"\[ID:[ 0-9]+\]", _m)
else:
yield decorate("message", {"content": cpn_obj.output("content")})
cite = re.search(r"\[ID:[ 0-9]+\]", cpn_obj.output("content"))
message_end = {}
if cpn_obj.get_param("status"):
message_end["status"] = cpn_obj.get_param("status")
if isinstance(cpn_obj.output("attachment"), dict):
message_end["attachment"] = cpn_obj.output("attachment")
if cite:
message_end["reference"] = self.get_reference()
message_end = self._build_message_end(cpn_obj)
yield decorate("message_end", message_end)
while partials:
@@ -748,7 +756,7 @@ class Canvas(Graph):
def get_component_input_elements(self, cpnnm):
return self.components[cpnnm]["obj"].get_input_elements()
async def get_files_async(self, files: Union[None, list[dict]]) -> list[str]:
async def get_files_async(self, files: Union[None, list[dict]], layout_recognize: str = None) -> list[str]:
if not files:
return []
def image_to_base64(file):
@@ -756,7 +764,7 @@ class Canvas(Graph):
base64.b64encode(FileService.get_blob(file["created_by"], file["id"])).decode("utf-8"))
def parse_file(file):
blob = FileService.get_blob(file["created_by"], file["id"])
return FileService.parse(file["name"], blob, True, file["created_by"])
return FileService.parse(file["name"], blob, True, file["created_by"], layout_recognize)
loop = asyncio.get_running_loop()
tasks = []
for file in files:
@@ -766,15 +774,15 @@ class Canvas(Graph):
tasks.append(loop.run_in_executor(self._thread_pool, parse_file, file))
return await asyncio.gather(*tasks)
def get_files(self, files: Union[None, list[dict]]) -> list[str]:
def get_files(self, files: Union[None, list[dict]], layout_recognize: str = None) -> list[str]:
"""
Synchronous wrapper for get_files_async, used by sync component invoke paths.
"""
loop = getattr(self, "_loop", None)
if loop and loop.is_running():
return asyncio.run_coroutine_threadsafe(self.get_files_async(files), loop).result()
return asyncio.run_coroutine_threadsafe(self.get_files_async(files, layout_recognize), loop).result()
return asyncio.run(self.get_files_async(files))
return asyncio.run(self.get_files_async(files, layout_recognize))
def tool_use_callback(self, agent_id: str, func_name: str, params: dict, result: Any, elapsed_time=None):
agent_ids = agent_id.split("-->")
@@ -820,6 +828,22 @@ class Canvas(Graph):
return {"chunks": {}, "doc_aggs": {}}
return self.retrieval[-1]
def _has_reference(self) -> bool:
ref = self.get_reference()
if not isinstance(ref, dict):
return False
return bool(ref.get("chunks") or ref.get("doc_aggs"))
def _build_message_end(self, cpn_obj) -> dict:
message_end = {}
if cpn_obj.get_param("status"):
message_end["status"] = cpn_obj.get_param("status")
if isinstance(cpn_obj.output("attachment"), dict):
message_end["attachment"] = cpn_obj.output("attachment")
if self._has_reference():
message_end["reference"] = self.get_reference()
return message_end
def add_memory(self, user:str, assist:str, summ: str):
self.memory.append((user, assist, summ))

View File

@@ -20,19 +20,20 @@ import os
import re
from copy import deepcopy
from functools import partial
from timeit import default_timer as timer
from typing import Any
import json_repair
from timeit import default_timer as timer
from agent.tools.base import LLMToolPluginCallSession, ToolParamBase, ToolBase, ToolMeta
from agent.component.llm import LLM, LLMParam
from agent.tools.base import LLMToolPluginCallSession, ToolBase, ToolMeta, ToolParamBase
from api.db.joint_services.tenant_model_service import get_model_config_by_type_and_name
from api.db.services.llm_service import LLMBundle
from api.db.services.tenant_llm_service import TenantLLMService
from api.db.services.mcp_server_service import MCPServerService
from api.db.services.tenant_llm_service import TenantLLMService
from common.connection_utils import timeout
from rag.prompts.generator import next_step_async, COMPLETE_TASK, \
citation_prompt, kb_prompt, citation_plus, full_question, message_fit_in, structured_output_prompt
from common.mcp_tool_call_conn import MCPToolCallSession, mcp_tool_metadata_to_openai_tool
from agent.component.llm import LLMParam, LLM
from rag.prompts.generator import citation_plus, citation_prompt, full_question, kb_prompt, message_fit_in, structured_output_prompt
class AgentParam(LLMParam, ToolParamBase):
@@ -41,35 +42,25 @@ class AgentParam(LLMParam, ToolParamBase):
"""
def __init__(self):
self.meta:ToolMeta = {
"name": "agent",
"description": "This is an agent for a specific task.",
"parameters": {
"user_prompt": {
"type": "string",
"description": "This is the order you need to send to the agent.",
"default": "",
"required": True
},
"reasoning": {
"type": "string",
"description": (
"Supervisor's reasoning for choosing the this agent. "
"Explain why this agent is being invoked and what is expected of it."
),
"required": True
},
"context": {
"type": "string",
"description": (
"All relevant background information, prior facts, decisions, "
"and state needed by the agent to solve the current query. "
"Should be as detailed and self-contained as possible."
),
"required": True
},
}
}
self.meta: ToolMeta = {
"name": "agent",
"description": "This is an agent for a specific task.",
"parameters": {
"user_prompt": {"type": "string", "description": "This is the order you need to send to the agent.", "default": "", "required": True},
"reasoning": {
"type": "string",
"description": ("Supervisor's reasoning for choosing the this agent. Explain why this agent is being invoked and what is expected of it."),
"required": True,
},
"context": {
"type": "string",
"description": (
"All relevant background information, prior facts, decisions, and state needed by the agent to solve the current query. Should be as detailed and self-contained as possible."
),
"required": True,
},
},
}
super().__init__()
self.function_name = "agent"
self.tools = []
@@ -79,7 +70,6 @@ class AgentParam(LLMParam, ToolParamBase):
self.custom_header = {}
class Agent(LLM, ToolBase):
component_name = "Agent"
@@ -91,13 +81,15 @@ class Agent(LLM, ToolBase):
original_name = cpn.get_meta()["function"]["name"]
indexed_name = f"{original_name}_{idx}"
self.tools[indexed_name] = cpn
self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), TenantLLMService.llm_id2llm_type(self._param.llm_id), self._param.llm_id,
max_retries=self._param.max_retries,
retry_interval=self._param.delay_after_error,
max_rounds=self._param.max_rounds,
verbose_tool_use=True
)
chat_model_config = get_model_config_by_type_and_name(self._canvas.get_tenant_id(), TenantLLMService.llm_id2llm_type(self._param.llm_id), self._param.llm_id)
self.chat_mdl = LLMBundle(
self._canvas.get_tenant_id(),
chat_model_config,
max_retries=self._param.max_retries,
retry_interval=self._param.delay_after_error,
max_rounds=self._param.max_rounds,
verbose_tool_use=False,
)
self.tool_meta = []
for indexed_name, tool_obj in self.tools.items():
original_meta = tool_obj.get_meta()
@@ -114,10 +106,30 @@ class Agent(LLM, ToolBase):
self.tools[tnm] = tool_call_session
self.callback = partial(self._canvas.tool_use_callback, id)
self.toolcall_session = LLMToolPluginCallSession(self.tools, self.callback)
#self.chat_mdl.bind_tools(self.toolcall_session, self.tool_metas)
if self.tool_meta:
self.chat_mdl.bind_tools(self.toolcall_session, self.tool_meta)
def _fit_messages(self, prompt: str, msg: list[dict]) -> list[dict]:
_, fitted_messages = message_fit_in(
[{"role": "system", "content": prompt}, *msg],
int(self.chat_mdl.max_length * 0.97),
)
return fitted_messages
@staticmethod
def _append_system_prompt(msg: list[dict], extra_prompt: str) -> None:
if extra_prompt and msg and msg[0]["role"] == "system":
msg[0]["content"] += "\n" + extra_prompt
@staticmethod
def _clean_formatted_answer(ans: str) -> str:
ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
ans = re.sub(r"^.*```json", "", ans, flags=re.DOTALL)
return re.sub(r"```\n*$", "", ans, flags=re.DOTALL)
def _load_tool_obj(self, cpn: dict) -> object:
from agent.component import component_class
tool_name = cpn["component_name"]
param = component_class(tool_name + "Param")()
param.update(cpn["params"])
@@ -130,7 +142,7 @@ class Agent(LLM, ToolBase):
return component_class(cpn["component_name"])(self._canvas, cpn_id, param)
def get_meta(self) -> dict[str, Any]:
self._param.function_name= self._id.split("-->")[-1]
self._param.function_name = self._id.split("-->")[-1]
m = super().get_meta()
if hasattr(self._param, "user_prompt") and self._param.user_prompt:
m["function"]["parameters"]["properties"]["user_prompt"] = self._param.user_prompt
@@ -139,10 +151,7 @@ class Agent(LLM, ToolBase):
def get_input_form(self) -> dict[str, dict]:
res = {}
for k, v in self.get_input_elements().items():
res[k] = {
"type": "line",
"name": v["name"]
}
res[k] = {"type": "line", "name": v["name"]}
for cpn in self._param.tools:
if not isinstance(cpn, LLM):
continue
@@ -175,7 +184,7 @@ class Agent(LLM, ToolBase):
def _invoke(self, **kwargs):
return asyncio.run(self._invoke_async(**kwargs))
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 20*60)))
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 20 * 60)))
async def _invoke_async(self, **kwargs):
if self.check_if_canceled("Agent processing"):
return
@@ -204,19 +213,17 @@ class Agent(LLM, ToolBase):
schema = json.dumps(output_schema, ensure_ascii=False, indent=2)
schema_prompt = structured_output_prompt(schema)
downstreams = self._canvas.get_component(self._id)["downstream"] if self._canvas.get_component(self._id) else []
component = self._canvas.get_component(self._id)
downstreams = component["downstream"] if component else []
ex = self.exception_handler()
if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) and not (ex and ex["goto"]) and not output_schema:
has_message_downstream = any(self._canvas.get_component_obj(cid).component_name.lower() == "message" for cid in downstreams)
if has_message_downstream and not (ex and ex["goto"]) and not output_schema:
self.set_output("content", partial(self.stream_output_with_tools_async, prompt, deepcopy(msg), user_defined_prompt))
return
_, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
use_tools = []
ans = ""
async for delta_ans, _tk in self._react_with_tools_streamly_async_simple(prompt, msg, use_tools, user_defined_prompt,schema_prompt=schema_prompt):
if self.check_if_canceled("Agent processing"):
return
ans += delta_ans
msg = self._fit_messages(prompt, msg)
self._append_system_prompt(msg, schema_prompt)
ans = await self._generate_async(msg)
if ans.find("**ERROR**") >= 0:
logging.error(f"Agent._chat got error. response: {ans}")
@@ -230,14 +237,8 @@ class Agent(LLM, ToolBase):
error = ""
for _ in range(self._param.max_retries + 1):
try:
def clean_formated_answer(ans: str) -> str:
ans = re.sub(r"^.*</think>", "", ans, flags=re.DOTALL)
ans = re.sub(r"^.*```json", "", ans, flags=re.DOTALL)
return re.sub(r"```\n*$", "", ans, flags=re.DOTALL)
obj = json_repair.loads(clean_formated_answer(ans))
obj = json_repair.loads(self._clean_formatted_answer(ans))
self.set_output("structured", obj)
if use_tools:
self.set_output("use_tools", use_tools)
return obj
except Exception:
error = "The answer cannot be parsed as JSON"
@@ -248,330 +249,92 @@ class Agent(LLM, ToolBase):
self.set_output("_ERROR", error)
return
artifact_md = self._collect_tool_artifact_markdown(existing_text=ans)
if artifact_md:
ans += "\n\n" + artifact_md
self.set_output("content", ans)
if use_tools:
self.set_output("use_tools", use_tools)
return ans
async def stream_output_with_tools_async(self, prompt, msg, user_defined_prompt={}):
_, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
answer_without_toolcall = ""
use_tools = []
async for delta_ans, _ in self._react_with_tools_streamly_async_simple(prompt, msg, use_tools, user_defined_prompt):
if len(msg) > 3:
st = timer()
user_request = await full_question(messages=msg, chat_mdl=self.chat_mdl)
self.callback("Multi-turn conversation optimization", {}, user_request, elapsed_time=timer() - st)
msg = [*msg[:-1], {"role": "user", "content": user_request}]
msg = self._fit_messages(prompt, msg)
need2cite = self._param.cite and self._canvas.get_reference()["chunks"] and self._id.find("-->") < 0
cited = False
if need2cite and len(msg) < 7:
self._append_system_prompt(msg, citation_prompt())
cited = True
answer = ""
async for delta in self._generate_streamly(msg):
if self.check_if_canceled("Agent streaming"):
return
if delta_ans.find("**ERROR**") >= 0:
if delta.find("**ERROR**") >= 0:
if self.get_exception_default_value():
self.set_output("content", self.get_exception_default_value())
yield self.get_exception_default_value()
else:
self.set_output("_ERROR", delta_ans)
return
answer_without_toolcall += delta_ans
yield delta_ans
self.set_output("content", answer_without_toolcall)
if use_tools:
self.set_output("use_tools", use_tools)
async def _react_with_tools_streamly_async_simple(self, prompt, history: list[dict], use_tools, user_defined_prompt={}, schema_prompt: str = ""):
token_count = 0
tool_metas = self.tool_meta
hist = deepcopy(history)
last_calling = ""
if len(hist) > 3:
st = timer()
user_request = await full_question(messages=history, chat_mdl=self.chat_mdl)
self.callback("Multi-turn conversation optimization", {}, user_request, elapsed_time=timer()-st)
else:
user_request = history[-1]["content"]
def build_task_desc(prompt: str, user_request: str, user_defined_prompt: dict | None = None) -> str:
"""Build a minimal task_desc by concatenating prompt, query, and tool schemas."""
user_defined_prompt = user_defined_prompt or {}
task_desc = (
"### Agent Prompt\n"
f"{prompt}\n\n"
"### User Request\n"
f"{user_request}\n\n"
)
if user_defined_prompt:
udp_json = json.dumps(user_defined_prompt, ensure_ascii=False, indent=2)
task_desc += "\n### User Defined Prompts\n" + udp_json + "\n"
return task_desc
async def use_tool_async(name, args):
nonlocal hist, use_tools, last_calling
logging.info(f"{last_calling=} == {name=}")
last_calling = name
tool_response = await self.toolcall_session.tool_call_async(name, args)
use_tools.append({
"name": name,
"arguments": args,
"results": tool_response
})
return name, tool_response
async def complete():
nonlocal hist
need2cite = self._param.cite and self._canvas.get_reference()["chunks"] and self._id.find("-->") < 0
if schema_prompt:
need2cite = False
cited = False
if hist and hist[0]["role"] == "system":
if schema_prompt:
hist[0]["content"] += "\n" + schema_prompt
if need2cite and len(hist) < 7:
hist[0]["content"] += citation_prompt()
cited = True
yield "", token_count
_hist = hist
if len(hist) > 12:
_hist = [hist[0], hist[1], *hist[-10:]]
entire_txt = ""
async for delta_ans in self._generate_streamly(_hist):
if not need2cite or cited:
yield delta_ans, 0
entire_txt += delta_ans
if not need2cite or cited:
self.set_output("_ERROR", delta)
return
if not need2cite or cited:
yield delta
answer += delta
st = timer()
txt = ""
async for delta_ans in self._gen_citations_async(entire_txt):
if self.check_if_canceled("Agent streaming"):
return
yield delta_ans, 0
txt += delta_ans
self.callback("gen_citations", {}, txt, elapsed_time=timer()-st)
def build_observation(tool_call_res: list[tuple]) -> str:
"""
Build a Observation from tool call results.
No LLM involved.
"""
if not tool_call_res:
return ""
lines = ["Observation:"]
for name, result in tool_call_res:
lines.append(f"[{name} result]")
lines.append(str(result))
return "\n".join(lines)
def append_user_content(hist, content):
if hist[-1]["role"] == "user":
hist[-1]["content"] += content
else:
hist.append({"role": "user", "content": content})
if not need2cite or cited:
artifact_md = self._collect_tool_artifact_markdown(existing_text=answer)
if artifact_md:
yield "\n\n" + artifact_md
answer += "\n\n" + artifact_md
self.set_output("content", answer)
return
st = timer()
task_desc = build_task_desc(prompt, user_request, user_defined_prompt)
self.callback("analyze_task", {}, task_desc, elapsed_time=timer()-st)
for _ in range(self._param.max_rounds + 1):
cited_answer = ""
async for delta in self._gen_citations_async(answer):
if self.check_if_canceled("Agent streaming"):
return
response, tk = await next_step_async(self.chat_mdl, hist, tool_metas, task_desc, user_defined_prompt)
# self.callback("next_step", {}, str(response)[:256]+"...")
token_count += tk or 0
hist.append({"role": "assistant", "content": response})
try:
functions = json_repair.loads(re.sub(r"```.*", "", response))
if not isinstance(functions, list):
raise TypeError(f"List should be returned, but `{functions}`")
for f in functions:
if not isinstance(f, dict):
raise TypeError(f"An object type should be returned, but `{f}`")
tool_tasks = []
for func in functions:
name = func["name"]
args = func["arguments"]
if name == COMPLETE_TASK:
append_user_content(hist, f"Respond with a formal answer. FORGET(DO NOT mention) about `{COMPLETE_TASK}`. The language for the response MUST be as the same as the first user request.\n")
async for txt, tkcnt in complete():
yield txt, tkcnt
return
tool_tasks.append(asyncio.create_task(use_tool_async(name, args)))
results = await asyncio.gather(*tool_tasks) if tool_tasks else []
st = timer()
reflection = build_observation(results)
append_user_content(hist, reflection)
self.callback("reflection", {}, str(reflection), elapsed_time=timer()-st)
except Exception as e:
logging.exception(msg=f"Wrong JSON argument format in LLM ReAct response: {e}")
e = f"\nTool call error, please correct the input parameter of response format and call it again.\n *** Exception ***\n{e}"
append_user_content(hist, str(e))
logging.warning( f"Exceed max rounds: {self._param.max_rounds}")
final_instruction = f"""
{user_request}
IMPORTANT: You have reached the conversation limit. Based on ALL the information and research you have gathered so far, please provide a DIRECT and COMPREHENSIVE final answer to the original request.
Instructions:
1. SYNTHESIZE all information collected during this conversation
2. Provide a COMPLETE response using existing data - do not suggest additional research
3. Structure your response as a FINAL DELIVERABLE, not a plan
4. If information is incomplete, state what you found and provide the best analysis possible with available data
5. DO NOT mention conversation limits or suggest further steps
6. Focus on delivering VALUE with the information already gathered
Respond immediately with your final comprehensive answer.
"""
if self.check_if_canceled("Agent final instruction"):
return
append_user_content(hist, final_instruction)
async for txt, tkcnt in complete():
yield txt, tkcnt
# async def _react_with_tools_streamly_async(self, prompt, history: list[dict], use_tools, user_defined_prompt={}, schema_prompt: str = ""):
# token_count = 0
# tool_metas = self.tool_meta
# hist = deepcopy(history)
# last_calling = ""
# if len(hist) > 3:
# st = timer()
# user_request = await full_question(messages=history, chat_mdl=self.chat_mdl)
# self.callback("Multi-turn conversation optimization", {}, user_request, elapsed_time=timer()-st)
# else:
# user_request = history[-1]["content"]
# async def use_tool_async(name, args):
# nonlocal hist, use_tools, last_calling
# logging.info(f"{last_calling=} == {name=}")
# last_calling = name
# tool_response = await self.toolcall_session.tool_call_async(name, args)
# use_tools.append({
# "name": name,
# "arguments": args,
# "results": tool_response
# })
# # self.callback("add_memory", {}, "...")
# #self.add_memory(hist[-2]["content"], hist[-1]["content"], name, args, str(tool_response), user_defined_prompt)
# return name, tool_response
# async def complete():
# nonlocal hist
# need2cite = self._param.cite and self._canvas.get_reference()["chunks"] and self._id.find("-->") < 0
# if schema_prompt:
# need2cite = False
# cited = False
# if hist and hist[0]["role"] == "system":
# if schema_prompt:
# hist[0]["content"] += "\n" + schema_prompt
# if need2cite and len(hist) < 7:
# hist[0]["content"] += citation_prompt()
# cited = True
# yield "", token_count
# _hist = hist
# if len(hist) > 12:
# _hist = [hist[0], hist[1], *hist[-10:]]
# entire_txt = ""
# async for delta_ans in self._generate_streamly(_hist):
# if not need2cite or cited:
# yield delta_ans, 0
# entire_txt += delta_ans
# if not need2cite or cited:
# return
# st = timer()
# txt = ""
# async for delta_ans in self._gen_citations_async(entire_txt):
# if self.check_if_canceled("Agent streaming"):
# return
# yield delta_ans, 0
# txt += delta_ans
# self.callback("gen_citations", {}, txt, elapsed_time=timer()-st)
# def append_user_content(hist, content):
# if hist[-1]["role"] == "user":
# hist[-1]["content"] += content
# else:
# hist.append({"role": "user", "content": content})
# st = timer()
# task_desc = await analyze_task_async(self.chat_mdl, prompt, user_request, tool_metas, user_defined_prompt)
# self.callback("analyze_task", {}, task_desc, elapsed_time=timer()-st)
# for _ in range(self._param.max_rounds + 1):
# if self.check_if_canceled("Agent streaming"):
# return
# response, tk = await next_step_async(self.chat_mdl, hist, tool_metas, task_desc, user_defined_prompt)
# # self.callback("next_step", {}, str(response)[:256]+"...")
# token_count += tk or 0
# hist.append({"role": "assistant", "content": response})
# try:
# functions = json_repair.loads(re.sub(r"```.*", "", response))
# if not isinstance(functions, list):
# raise TypeError(f"List should be returned, but `{functions}`")
# for f in functions:
# if not isinstance(f, dict):
# raise TypeError(f"An object type should be returned, but `{f}`")
# tool_tasks = []
# for func in functions:
# name = func["name"]
# args = func["arguments"]
# if name == COMPLETE_TASK:
# append_user_content(hist, f"Respond with a formal answer. FORGET(DO NOT mention) about `{COMPLETE_TASK}`. The language for the response MUST be as the same as the first user request.\n")
# async for txt, tkcnt in complete():
# yield txt, tkcnt
# return
# tool_tasks.append(asyncio.create_task(use_tool_async(name, args)))
# results = await asyncio.gather(*tool_tasks) if tool_tasks else []
# st = timer()
# reflection = await reflect_async(self.chat_mdl, hist, results, user_defined_prompt)
# append_user_content(hist, reflection)
# self.callback("reflection", {}, str(reflection), elapsed_time=timer()-st)
# except Exception as e:
# logging.exception(msg=f"Wrong JSON argument format in LLM ReAct response: {e}")
# e = f"\nTool call error, please correct the input parameter of response format and call it again.\n *** Exception ***\n{e}"
# append_user_content(hist, str(e))
# logging.warning( f"Exceed max rounds: {self._param.max_rounds}")
# final_instruction = f"""
# {user_request}
# IMPORTANT: You have reached the conversation limit. Based on ALL the information and research you have gathered so far, please provide a DIRECT and COMPREHENSIVE final answer to the original request.
# Instructions:
# 1. SYNTHESIZE all information collected during this conversation
# 2. Provide a COMPLETE response using existing data - do not suggest additional research
# 3. Structure your response as a FINAL DELIVERABLE, not a plan
# 4. If information is incomplete, state what you found and provide the best analysis possible with available data
# 5. DO NOT mention conversation limits or suggest further steps
# 6. Focus on delivering VALUE with the information already gathered
# Respond immediately with your final comprehensive answer.
# """
# if self.check_if_canceled("Agent final instruction"):
# return
# append_user_content(hist, final_instruction)
# async for txt, tkcnt in complete():
# yield txt, tkcnt
yield delta
cited_answer += delta
artifact_md = self._collect_tool_artifact_markdown(existing_text=cited_answer)
if artifact_md:
yield "\n\n" + artifact_md
cited_answer += "\n\n" + artifact_md
self.callback("gen_citations", {}, cited_answer, elapsed_time=timer() - st)
self.set_output("content", cited_answer)
async def _gen_citations_async(self, text):
retrievals = self._canvas.get_reference()
retrievals = {"chunks": list(retrievals["chunks"].values()), "doc_aggs": list(retrievals["doc_aggs"].values())}
formated_refer = kb_prompt(retrievals, self.chat_mdl.max_length, True)
async for delta_ans in self._generate_streamly([{"role": "system", "content": citation_plus("\n\n".join(formated_refer))},
{"role": "user", "content": text}
]):
async for delta_ans in self._generate_streamly([{"role": "system", "content": citation_plus("\n\n".join(formated_refer))}, {"role": "user", "content": text}]):
yield delta_ans
def _collect_tool_artifact_markdown(self, existing_text: str = "") -> str:
md_parts = []
for tool_obj in self.tools.values():
if not hasattr(tool_obj, "_param") or not hasattr(tool_obj._param, "outputs"):
continue
artifacts_meta = tool_obj._param.outputs.get("_ARTIFACTS", {})
artifacts = artifacts_meta.get("value") if isinstance(artifacts_meta, dict) else None
if not artifacts:
continue
for art in artifacts:
if not isinstance(art, dict):
continue
url = art.get("url", "")
if url and (f"![]({url})" in existing_text or f"![{art.get('name', '')}]({url})" in existing_text):
continue
if art.get("mime_type", "").startswith("image/"):
md_parts.append(f"![{art['name']}]({url})")
else:
md_parts.append(f"[Download {art['name']}]({url})")
return "\n\n".join(md_parts)
def reset(self, only_output=False):
"""
Reset all tools if they have a reset method. This avoids errors for tools like MCPToolCallSession.

View File

@@ -41,6 +41,7 @@ class Begin(UserFillUp):
if self.check_if_canceled("Begin processing"):
return
layout_recognize = self._param.layout_recognize or None
for k, v in kwargs.get("inputs", {}).items():
if self.check_if_canceled("Begin processing"):
return
@@ -52,7 +53,7 @@ class Begin(UserFillUp):
file_value = v["value"]
# Support both single file (backward compatibility) and multiple files
files = file_value if isinstance(file_value, list) else [file_value]
v = FileService.get_files(files)
v = FileService.get_files(files, layout_recognize=layout_recognize)
else:
v = v.get("value")
self.set_output(k, v)

View File

@@ -21,6 +21,7 @@ from abc import ABC
from common.constants import LLMType
from api.db.services.llm_service import LLMBundle
from api.db.joint_services.tenant_model_service import get_model_config_by_type_and_name
from agent.component.llm import LLMParam, LLM
from common.connection_utils import timeout
from rag.llm.chat_model import ERROR_PREFIX
@@ -122,7 +123,8 @@ class Categorize(LLM, ABC):
msg[-1]["content"] = query_value
self.set_input_value(query_key, msg[-1]["content"])
self._param.update_prompt()
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
chat_model_config = get_model_config_by_type_and_name(self._canvas.get_tenant_id(), LLMType.CHAT, self._param.llm_id)
chat_mdl = LLMBundle(self._canvas.get_tenant_id(), chat_model_config)
user_prompt = """
---- Real Data ----

View File

@@ -94,9 +94,9 @@ class DataOperations(ComponentBase,ABC):
def _recursive_eval(self, data):
if isinstance(data, dict):
return {k: self.recursive_eval(v) for k, v in data.items()}
return {k: self._recursive_eval(v) for k, v in data.items()}
if isinstance(data, list):
return [self.recursive_eval(item) for item in data]
return [self._recursive_eval(item) for item in data]
if isinstance(data, str):
try:
if (

File diff suppressed because it is too large Load Diff

View File

@@ -27,6 +27,7 @@ class UserFillUpParam(ComponentParamBase):
super().__init__()
self.enable_tips = True
self.tips = "Please fill up the form"
self.layout_recognize = ""
def check(self) -> bool:
return True
@@ -61,6 +62,7 @@ class UserFillUp(ComponentBase):
content = re.sub(r"\{%s\}"%k, ans, content)
self.set_output("tips", content)
layout_recognize = self._param.layout_recognize or None
for k, v in kwargs.get("inputs", {}).items():
if self.check_if_canceled("UserFillUp processing"):
return
@@ -71,7 +73,7 @@ class UserFillUp(ComponentBase):
file_value = v["value"]
# Support both single file (backward compatibility) and multiple files
files = file_value if isinstance(file_value, list) else [file_value]
v = FileService.get_files(files)
v = FileService.get_files(files, layout_recognize=layout_recognize)
else:
v = v.get("value")
self.set_output(k, v)

View File

@@ -19,6 +19,7 @@ import os
import re
import time
from abc import ABC
from functools import partial
import requests
@@ -29,7 +30,7 @@ from deepdoc.parser import HtmlParser
class InvokeParam(ComponentParamBase):
"""
Define the Crawler component parameters.
Define the Invoke component parameters.
"""
def __init__(self):
@@ -41,7 +42,7 @@ class InvokeParam(ComponentParamBase):
self.url = ""
self.timeout = 60
self.clean_html = False
self.datatype = "json" # New parameter to determine data posting type
self.datatype = "json"
def check(self):
self.check_valid_value(self.method.lower(), "Type of content from the crawler", ["get", "post", "put"])
@@ -53,92 +54,199 @@ class InvokeParam(ComponentParamBase):
class Invoke(ComponentBase, ABC):
component_name = "Invoke"
header_variable_ref_patt = r"\{([a-zA-Z_][a-zA-Z0-9_.@-]*)\}"
@staticmethod
def _coerce_json_arg_if_possible(key, value):
raw_value = value
if isinstance(value, str):
try:
value = json.loads(value)
logging.debug(
"Invoke JSON arg coercion succeeded. key=%s parsed_type=%s",
key,
type(value).__name__,
)
except json.JSONDecodeError as exc:
logging.info(
"Invoke JSON arg coercion skipped; value is not valid JSON. key=%s raw=%r error=%s",
key,
raw_value,
exc,
)
return raw_value
try:
json.dumps(value, allow_nan=False)
except (TypeError, ValueError) as exc:
logging.warning(
"Invoke JSON arg is not JSON-serializable. key=%s value_type=%s value=%r error=%s",
key,
type(value).__name__,
value,
exc,
)
raise ValueError(f"Invoke JSON argument '{key}' is not JSON-serializable.") from exc
return value
def get_input_form(self) -> dict[str, dict]:
res = {}
for item in self._param.variables or []:
if not isinstance(item, dict):
continue
ref = (item.get("ref") or "").strip()
if not ref or ref in res:
continue
elements = self.get_input_elements_from_text("{" + ref + "}")
element = elements.get(ref, {})
res[ref] = {
"type": "line",
"name": element.get("name") or item.get("key") or ref,
}
return res
def _resolve_variable_value(self, variable_name: str, kwargs: dict | None = None):
kwargs = kwargs or {}
value = kwargs.get(variable_name, self._canvas.get_variable_value(variable_name))
if isinstance(value, partial):
value = "".join(value())
self.set_input_value(variable_name, value)
return "" if value is None else value
def _render_template(self, content: str, pattern: str, kwargs: dict | None = None, *, flags: int = 0) -> str:
content = content or ""
if not content:
return content
def replace_variable(match_obj):
return str(self._resolve_variable_value(match_obj.group(1), kwargs))
return re.sub(pattern, replace_variable, content, flags=flags)
def _resolve_template_text(self, content: str, kwargs: dict | None = None) -> str:
return self._render_template(content, self.variable_ref_patt, kwargs, flags=re.DOTALL)
def _resolve_header_text(self, content: str, kwargs: dict | None = None) -> str:
# Headers support plain {token} placeholders, so they cannot reuse the canvas variable regex.
return self._render_template(content, self.header_variable_ref_patt, kwargs)
def _resolve_arg_value(self, para: dict, kwargs: dict) -> object:
ref = (para.get("ref") or "").strip()
if ref and (ref in kwargs or self._canvas.get_variable_value(ref) is not None):
return self._resolve_variable_value(ref, kwargs)
if para.get("value") is not None:
value = para["value"]
if isinstance(value, str):
return self._resolve_template_text(value, kwargs)
return value
if ref:
return self._resolve_variable_value(ref, kwargs)
return ""
def _is_json_mode(self) -> bool:
return self._param.datatype.lower() == "json"
def _build_request_args(self, kwargs: dict) -> dict:
args = {}
for para in self._param.variables:
key = para["key"]
value = self._resolve_arg_value(para, kwargs)
if self._is_json_mode():
# JSON mode accepts stringified JSON so complex payloads can be passed through variables.
value = self._coerce_json_arg_if_possible(key, value)
args[key] = value
if para.get("ref"):
self.set_input_value(para["ref"], value)
return args
def _build_url(self, kwargs: dict) -> str:
url = self._resolve_template_text(self._param.url.strip(), kwargs)
if not url.startswith(("http://", "https://")):
url = "http://" + url
return url
def _build_headers(self, kwargs: dict) -> dict:
if not self._param.headers:
return {}
headers = json.loads(self._param.headers)
if not isinstance(headers, dict):
raise ValueError("Invoke headers must be a JSON object.")
return {
key: self._resolve_header_text(value, kwargs) if isinstance(value, str) else value
for key, value in headers.items()
}
def _build_proxies(self) -> dict | None:
if not re.sub(r"https?:?/?/?", "", self._param.proxy):
return None
return {"http": self._param.proxy, "https": self._param.proxy}
def _send_request(self, url: str, args: dict, headers: dict, proxies: dict | None):
method = self._param.method.lower()
request = getattr(requests, method)
request_kwargs = {
"url": url,
"headers": headers,
"proxies": proxies,
"timeout": self._param.timeout,
}
# GET sends query params; POST/PUT send either JSON or form data based on datatype.
if method == "get":
request_kwargs["params"] = args
return request(**request_kwargs)
body_key = "json" if self._is_json_mode() else "data"
request_kwargs[body_key] = args
return request(**request_kwargs)
def _format_response(self, response) -> str:
if not self._param.clean_html:
return response.text
# HtmlParser keeps the Invoke output text-focused when the endpoint returns HTML.
sections = HtmlParser()(None, response.content)
return "\n".join(sections)
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 3)))
def _invoke(self, **kwargs):
if self.check_if_canceled("Invoke processing"):
return
args = {}
for para in self._param.variables:
if para.get("value"):
args[para["key"]] = para["value"]
else:
args[para["key"]] = self._canvas.get_variable_value(para["ref"])
args = self._build_request_args(kwargs)
url = self._build_url(kwargs)
headers = self._build_headers(kwargs)
proxies = self._build_proxies()
url = self._param.url.strip()
def replace_variable(match):
var_name = match.group(1)
try:
value = self._canvas.get_variable_value(var_name)
return str(value or "")
except Exception:
return ""
# {base_url} or {component_id@variable_name}
url = re.sub(r"\{([a-zA-Z_][a-zA-Z0-9_.@-]*)\}", replace_variable, url)
if url.find("http") != 0:
url = "http://" + url
method = self._param.method.lower()
headers = {}
if self._param.headers:
headers = json.loads(self._param.headers)
proxies = None
if re.sub(r"https?:?/?/?", "", self._param.proxy):
proxies = {"http": self._param.proxy, "https": self._param.proxy}
last_e = ""
last_error = None
for _ in range(self._param.max_retries + 1):
if self.check_if_canceled("Invoke processing"):
return
try:
if method == "get":
response = requests.get(url=url, params=args, headers=headers, proxies=proxies, timeout=self._param.timeout)
if self._param.clean_html:
sections = HtmlParser()(None, response.content)
self.set_output("result", "\n".join(sections))
else:
self.set_output("result", response.text)
if method == "put":
if self._param.datatype.lower() == "json":
response = requests.put(url=url, json=args, headers=headers, proxies=proxies, timeout=self._param.timeout)
else:
response = requests.put(url=url, data=args, headers=headers, proxies=proxies, timeout=self._param.timeout)
if self._param.clean_html:
sections = HtmlParser()(None, response.content)
self.set_output("result", "\n".join(sections))
else:
self.set_output("result", response.text)
if method == "post":
if self._param.datatype.lower() == "json":
response = requests.post(url=url, json=args, headers=headers, proxies=proxies, timeout=self._param.timeout)
else:
response = requests.post(url=url, data=args, headers=headers, proxies=proxies, timeout=self._param.timeout)
if self._param.clean_html:
self.set_output("result", "\n".join(sections))
else:
self.set_output("result", response.text)
return self.output("result")
response = self._send_request(url, args, headers, proxies)
result = self._format_response(response)
self.set_output("result", result)
return result
except Exception as e:
if self.check_if_canceled("Invoke processing"):
return
last_e = e
last_error = e
logging.exception(f"Http request error: {e}")
time.sleep(self._param.delay_after_error)
if last_e:
self.set_output("_ERROR", str(last_e))
return f"Http request error: {last_e}"
assert False, self.output()
if last_error:
self.set_output("_ERROR", str(last_error))
return f"Http request error: {last_error}"
def thoughts(self) -> str:
return "Waiting for the server respond..."

View File

@@ -69,7 +69,7 @@ class IterationItem(ComponentBase, ABC):
if p._id != pid:
continue
if p.component_name.lower() in ["categorize", "message", "switch", "userfillup", "interationitem"]:
if p.component_name.lower() in ["categorize", "message", "switch", "userfillup", "iterationitem"]:
continue
for k, o in p._param.outputs.items():

View File

@@ -25,6 +25,7 @@ from functools import partial
from common.constants import LLMType
from api.db.services.llm_service import LLMBundle
from api.db.services.tenant_llm_service import TenantLLMService
from api.db.joint_services.tenant_model_service import get_model_config_by_type_and_name
from agent.component.base import ComponentBase, ComponentParamBase
from common.connection_utils import timeout
from rag.prompts.generator import tool_call_summary, message_fit_in, citation_prompt, structured_output_prompt
@@ -84,10 +85,10 @@ class LLM(ComponentBase):
def __init__(self, canvas, component_id, param: ComponentParamBase):
super().__init__(canvas, component_id, param)
self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), TenantLLMService.llm_id2llm_type(self._param.llm_id),
self._param.llm_id, max_retries=self._param.max_retries,
retry_interval=self._param.delay_after_error
)
chat_model_config = get_model_config_by_type_and_name(self._canvas.get_tenant_id(), TenantLLMService.llm_id2llm_type(self._param.llm_id), self._param.llm_id)
self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), chat_model_config,
max_retries=self._param.max_retries,
retry_interval=self._param.delay_after_error)
self.imgs = []
def get_input_form(self) -> dict[str, dict]:
@@ -125,23 +126,119 @@ class LLM(ComponentBase):
msg.append(p)
return msg, self.string_format(self._param.sys_prompt, args)
def _prepare_prompt_variables(self):
if self._param.visual_files_var:
self.imgs = self._canvas.get_variable_value(self._param.visual_files_var)
if not self.imgs:
self.imgs = []
self.imgs = [img for img in self.imgs if img[:len("data:image/")] == "data:image/"]
if self.imgs and TenantLLMService.llm_id2llm_type(self._param.llm_id) == LLMType.CHAT.value:
self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.IMAGE2TEXT.value,
self._param.llm_id, max_retries=self._param.max_retries,
retry_interval=self._param.delay_after_error
)
@staticmethod
def _extract_data_images(value) -> list[str]:
imgs = []
def walk(v):
if v is None:
return
if isinstance(v, str):
v = v.strip()
if v.startswith("data:image/"):
imgs.append(v)
return
if isinstance(v, (list, tuple, set)):
for item in v:
walk(item)
return
if isinstance(v, dict):
if "content" in v:
walk(v.get("content"))
else:
for item in v.values():
walk(item)
walk(value)
return imgs
@staticmethod
def _uniq_images(images: list[str]) -> list[str]:
seen = set()
uniq = []
for img in images:
if not isinstance(img, str):
continue
if not img.startswith("data:image/"):
continue
if img in seen:
continue
seen.add(img)
uniq.append(img)
return uniq
@classmethod
def _remove_data_images(cls, value):
if value is None:
return None
if isinstance(value, str):
return None if value.strip().startswith("data:image/") else value
if isinstance(value, list):
cleaned = []
for item in value:
v = cls._remove_data_images(item)
if v is None:
continue
if isinstance(v, (list, tuple, set, dict)) and not v:
continue
cleaned.append(v)
return cleaned
if isinstance(value, tuple):
cleaned = []
for item in value:
v = cls._remove_data_images(item)
if v is None:
continue
if isinstance(v, (list, tuple, set, dict)) and not v:
continue
cleaned.append(v)
return tuple(cleaned)
if isinstance(value, set):
cleaned = []
for item in value:
v = cls._remove_data_images(item)
if v is None:
continue
if isinstance(v, (list, tuple, set, dict)) and not v:
continue
cleaned.append(v)
return cleaned
if isinstance(value, dict):
if value.get("type") in {"image_url", "input_image", "image"} and cls._extract_data_images(value):
return None
cleaned = {}
for k, item in value.items():
v = cls._remove_data_images(item)
if v is None:
continue
if isinstance(v, (list, tuple, set, dict)) and not v:
continue
cleaned[k] = v
return cleaned
return value
def _prepare_prompt_variables(self):
self.imgs = []
if self._param.visual_files_var:
visual_val = self._canvas.get_variable_value(self._param.visual_files_var)
self.imgs.extend(self._extract_data_images(visual_val))
args = {}
vars = self.get_input_elements() if not self._param.debug_inputs else self._param.debug_inputs
extracted_imgs = []
for k, o in vars.items():
args[k] = o["value"]
raw_value = o["value"]
extracted_imgs.extend(self._extract_data_images(raw_value))
args[k] = self._remove_data_images(raw_value)
if args[k] is None:
args[k] = ""
if not isinstance(args[k], str):
try:
args[k] = json.dumps(args[k], ensure_ascii=False)
@@ -149,6 +246,13 @@ class LLM(ComponentBase):
args[k] = str(args[k])
self.set_input_value(k, args[k])
self.imgs = self._uniq_images(self.imgs + extracted_imgs)
if self.imgs and TenantLLMService.llm_id2llm_type(self._param.llm_id) == LLMType.CHAT.value:
self.chat_mdl = LLMBundle(self._canvas.get_tenant_id(), LLMType.IMAGE2TEXT.value,
self._param.llm_id, max_retries=self._param.max_retries,
retry_interval=self._param.delay_after_error
)
msg, sys_prompt = self._sys_prompt_and_msg(self._canvas.get_history(self._param.message_history_window_size)[:-1], args)
user_defined_prompt, sys_prompt = self._extract_prompts(sys_prompt)
if self._param.cite and self._canvas.get_reference()["chunks"]:

View File

@@ -14,8 +14,11 @@
# limitations under the License.
#
import asyncio
import nest_asyncio
nest_asyncio.apply()
try:
import nest_asyncio
nest_asyncio.apply()
except Exception:
pass
import inspect
import json
import os
@@ -27,7 +30,9 @@ from functools import partial
from typing import Any
from agent.component.base import ComponentBase, ComponentParamBase
from jinja2 import Template as Jinja2Template
from jinja2.sandbox import SandboxedEnvironment
_jinja2_sandbox = SandboxedEnvironment()
from common.connection_utils import timeout
from common.misc_utils import get_uuid
@@ -49,6 +54,9 @@ class MessageParam(ComponentParamBase):
self.outputs = {
"content": {
"type": "str"
},
"downloads": {
"type": "list"
}
}
@@ -61,10 +69,66 @@ class MessageParam(ComponentParamBase):
class Message(ComponentBase):
component_name = "Message"
@staticmethod
def _is_download_info(value: Any) -> bool:
return isinstance(value, dict) and all(
key in value for key in ("doc_id", "filename", "mime_type")
)
def _extract_downloads(self, value: Any) -> list[dict[str, Any]]:
if isinstance(value, str):
try:
value = json.loads(value)
except Exception:
return []
if self._is_download_info(value):
return [value]
if isinstance(value, list) and all(self._is_download_info(item) for item in value):
return value
return []
def _stringify_message_value(
self,
value: Any,
delimiter: str = None,
downloads: list[dict[str, Any]] | None = None,
fallback_to_str: bool = False,
) -> str:
extracted_downloads = self._extract_downloads(value)
if extracted_downloads:
if downloads is not None:
downloads.extend(extracted_downloads)
return ""
if value is None:
return ""
if isinstance(value, list) and delimiter:
return delimiter.join([str(vv) for vv in value])
if isinstance(value, str):
return value
try:
return json.dumps(value, ensure_ascii=False)
except Exception:
if fallback_to_str:
return str(value)
return ""
def get_input_elements(self) -> dict[str, Any]:
return self.get_input_elements_from_text("".join(self._param.content))
def get_kwargs(self, script:str, kwargs:dict = {}, delimiter:str=None) -> tuple[str, dict[str, str | list | Any]]:
def get_kwargs(
self,
script: str,
kwargs: dict = {},
delimiter: str = None,
downloads: list[dict[str, Any]] | None = None,
) -> tuple[str, dict[str, str | list | Any]]:
for k,v in self.get_input_elements_from_text(script).items():
if k in kwargs:
continue
@@ -79,15 +143,8 @@ class Message(ComponentBase):
else:
for t in iter_obj:
ans += t
elif isinstance(v, list) and delimiter:
ans = delimiter.join([str(vv) for vv in v])
elif not isinstance(v, str):
try:
ans = json.dumps(v, ensure_ascii=False)
except Exception:
pass
else:
ans = v
ans = self._stringify_message_value(v, delimiter, downloads)
if not ans:
ans = ""
kwargs[k] = ans
@@ -110,6 +167,7 @@ class Message(ComponentBase):
s = 0
all_content = ""
cache = {}
downloads = []
for r in re.finditer(self.variable_ref_patt, rand_cnt, flags=re.DOTALL):
if self.check_if_canceled("Message streaming"):
return
@@ -149,11 +207,9 @@ class Message(ComponentBase):
continue
elif inspect.isawaitable(v):
v = await v
elif not isinstance(v, str):
try:
v = json.dumps(v, ensure_ascii=False)
except Exception:
v = str(v)
v = self._stringify_message_value(
v, downloads=downloads, fallback_to_str=True
)
yield v
self.set_input_value(exp, v)
all_content += v
@@ -166,6 +222,7 @@ class Message(ComponentBase):
all_content += rand_cnt[s: ]
yield rand_cnt[s: ]
self.set_output("downloads", downloads)
self.set_output("content", all_content)
self._convert_content(all_content)
await self._save_to_memory(all_content)
@@ -186,12 +243,14 @@ class Message(ComponentBase):
self.set_output("content", partial(self._stream, rand_cnt))
return
rand_cnt, kwargs = self.get_kwargs(rand_cnt, kwargs)
template = Jinja2Template(rand_cnt)
downloads = []
rand_cnt, kwargs = self.get_kwargs(rand_cnt, kwargs, downloads=downloads)
template = _jinja2_sandbox.from_string(rand_cnt)
try:
content = template.render(kwargs)
except Exception:
pass
except Exception as e:
logging.warning(f"Jinja2 template rendering failed: {e}")
content = rand_cnt # fallback to unrendered content
if self.check_if_canceled("Message processing"):
return
@@ -199,6 +258,7 @@ class Message(ComponentBase):
for n, v in kwargs.items():
content = re.sub(n, v, content)
self.set_output("downloads", downloads)
self.set_output("content", content)
self._convert_content(content)
self._save_to_memory(content)
@@ -224,6 +284,38 @@ class Message(ComponentBase):
rows = []
headers = None
def _coerce_excel_cell_type(cell: str):
# Convert markdown cell text to native numeric types when safe,so Excel writes numeric cells instead of text.
if not isinstance(cell, str):
return cell
value = cell.strip()
if value == "":
return ""
# Keep values like "00123" as text to avoid losing leading zeros.
if re.match(r"^[+-]?0\d+$", value):
return cell
# Support thousand separators like 1,234 or 1,234.56
numeric_candidate = value
if re.match(r"^[+-]?\d{1,3}(,\d{3})+(\.\d+)?$", value):
numeric_candidate = value.replace(",", "")
if re.match(r"^[+-]?\d+$", numeric_candidate):
try:
return int(numeric_candidate)
except ValueError:
return cell
if re.match(r"^[+-]?(\d+\.\d+|\d+\.|\.\d+)([eE][+-]?\d+)?$", numeric_candidate) or re.match(r"^[+-]?\d+[eE][+-]?\d+$", numeric_candidate):
try:
return float(numeric_candidate)
except ValueError:
return cell
return cell
for line in table_lines:
# Split by | and clean up
@@ -234,6 +326,7 @@ class Message(ComponentBase):
if headers is None:
headers = cells
else:
cells = [_coerce_excel_cell_type(c) for c in cells]
rows.append(cells)
if headers and rows:
@@ -430,8 +523,15 @@ class Message(ComponentBase):
if not hasattr(self._param, "memory_ids") or not self._param.memory_ids:
return True, "No memory selected."
user_id = self._param.user_id if hasattr(self._param, "user_id") else ""
if user_id:
import re
# is variable
if re.match(r"^{.*}$", user_id):
user_id = self._canvas.get_variable_value(user_id)
message_dict = {
"user_id": self._canvas._tenant_id,
"user_id": user_id,
"agent_id": self._canvas._id,
"session_id": self._canvas.task_id,
"user_input": self._canvas.get_sys_query(),

View File

@@ -18,7 +18,9 @@ import re
from abc import ABC
from typing import Any
from jinja2 import Template as Jinja2Template
from jinja2.sandbox import SandboxedEnvironment
_jinja2_sandbox = SandboxedEnvironment()
from agent.component.base import ComponentParamBase
from common.connection_utils import timeout
from .message import Message
@@ -96,7 +98,7 @@ class StringTransform(Message, ABC):
script, kwargs = self.get_kwargs(script, kwargs, self._param.delimiters[0])
if self._is_jinjia2(script):
template = Jinja2Template(script)
template = _jinja2_sandbox.from_string(script)
try:
script = template.render(kwargs)
except Exception:

View File

@@ -134,7 +134,7 @@ class Switch(ComponentBase, ABC):
except Exception:
return True if input <= value else False
raise ValueError('Not supported operator' + operator)
raise ValueError(f'Not supported operator: {operator}')
def thoughts(self) -> str:
return "Im weighing a few options and will pick the next step shortly."

178
agent/dsl_migration.py Normal file
View File

@@ -0,0 +1,178 @@
#
# Copyright 2026 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import copy
import re
# Keep all legacy chunker renames in one place so the migration rule stays readable.
COMPONENT_RENAMES = {
"Splitter": "TokenChunker",
"HierarchicalMerger": "TitleChunker",
"PDFGenerator": "DocGenerator",
}
NODE_TYPE_RENAMES = {
"splitterNode": "chunkerNode",
}
VARIABLE_REF_PATTERN = re.compile(r"(\{+\s*)([A-Za-z0-9:_-]+)(@[A-Za-z0-9_.-]+)(\s*\}+)")
def normalize_chunker_dsl(dsl: dict) -> dict:
"""
Rewrite legacy chunker component names and ids into the current DSL schema.
This is intentionally a pure migration step:
- it does not change business params
- it only rewrites structural identifiers used by the canvas/runtime
- custom human-authored names are preserved unless they are still the exact
built-in legacy operator name
"""
if not isinstance(dsl, dict):
return dsl
normalized = copy.deepcopy(dsl)
components = normalized.get("components")
if not isinstance(components, dict):
return normalized
component_id_map: dict[str, str] = {}
for component_id in components.keys():
new_component_id = component_id
for old_name, new_name in COMPONENT_RENAMES.items():
prefix = f"{old_name}:"
if component_id.startswith(prefix):
new_component_id = f"{new_name}:{component_id[len(prefix):]}"
break
component_id_map[component_id] = new_component_id
def rewrite_variable_refs(text: str) -> str:
if text in component_id_map:
return component_id_map[text]
def repl(match: re.Match[str]) -> str:
component_id = match.group(2)
return (
match.group(1)
+ component_id_map.get(component_id, component_id)
+ match.group(3)
+ match.group(4)
)
return VARIABLE_REF_PATTERN.sub(repl, text)
def rewrite_value(value):
if isinstance(value, str):
return rewrite_variable_refs(value)
if isinstance(value, list):
return [rewrite_value(item) for item in value]
if isinstance(value, dict):
return {key: rewrite_value(item) for key, item in value.items()}
return value
rewritten_components = {}
for old_component_id, component in components.items():
new_component_id = component_id_map[old_component_id]
new_component = rewrite_value(component)
if isinstance(new_component, dict):
obj = new_component.get("obj")
if isinstance(obj, dict):
component_name = obj.get("component_name")
obj["component_name"] = COMPONENT_RENAMES.get(component_name, component_name)
if isinstance(new_component.get("downstream"), list):
new_component["downstream"] = [
component_id_map.get(component_id, component_id)
for component_id in new_component["downstream"]
]
if isinstance(new_component.get("upstream"), list):
new_component["upstream"] = [
component_id_map.get(component_id, component_id)
for component_id in new_component["upstream"]
]
parent_id = new_component.get("parent_id")
if isinstance(parent_id, str):
new_component["parent_id"] = component_id_map.get(parent_id, parent_id)
rewritten_components[new_component_id] = new_component
normalized["components"] = rewritten_components
if isinstance(normalized.get("path"), list):
normalized["path"] = [
component_id_map.get(component_id, component_id)
for component_id in normalized["path"]
]
graph = normalized.get("graph")
if isinstance(graph, dict):
nodes = graph.get("nodes")
if isinstance(nodes, list):
for node in nodes:
if not isinstance(node, dict):
continue
node_id = node.get("id")
if isinstance(node_id, str):
node["id"] = component_id_map.get(node_id, node_id)
parent_id = node.get("parentId")
if isinstance(parent_id, str):
node["parentId"] = component_id_map.get(parent_id, parent_id)
node_type = node.get("type")
if isinstance(node_type, str):
node["type"] = NODE_TYPE_RENAMES.get(node_type, node_type)
data = node.get("data")
if not isinstance(data, dict):
continue
label = data.get("label")
if isinstance(label, str):
data["label"] = COMPONENT_RENAMES.get(label, label)
name = data.get("name")
if isinstance(name, str) and name in COMPONENT_RENAMES:
data["name"] = COMPONENT_RENAMES[name]
if "form" in data:
data["form"] = rewrite_value(data["form"])
edges = graph.get("edges")
if isinstance(edges, list):
replacements = sorted(component_id_map.items(), key=lambda item: len(item[0]), reverse=True)
for edge in edges:
if not isinstance(edge, dict):
continue
for key in ("source", "target"):
value = edge.get(key)
if isinstance(value, str):
edge[key] = component_id_map.get(value, value)
edge_id = edge.get("id")
if isinstance(edge_id, str):
for old_component_id, new_component_id in replacements:
edge_id = edge_id.replace(old_component_id, new_component_id)
edge["id"] = edge_id
for key in ("history", "messages", "reference"):
if key in normalized:
normalized[key] = rewrite_value(normalized[key])
return normalized

99
agent/plugin/README_tr.md Normal file
View File

@@ -0,0 +1,99 @@
[English](./README.md) | [简体中文](./README_zh.md) | Türkçe
# Eklentiler
Bu klasör, RAGFlow'un eklenti mekanizmasını içerir.
RAGFlow, `embedded_plugins` alt klasöründen eklentileri özyinelemeli olarak yükleyecektir.
## Desteklenen eklenti türleri
Şu anda desteklenen tek eklenti türü `llm_tools`'dur.
- `llm_tools`: LLM'nin çağırması için bir araç.
## Eklenti nasıl eklenir
Bir LLM araç eklentisi eklemek basittir: bir eklenti dosyası oluşturun, içine `LLMToolPlugin` sınıfından türetilmiş bir sınıf koyun, ardından `get_metadata` ve `invoke` metodlarını uygulayın.
- `get_metadata` metodu: Bu metod, aracın açıklamasını içeren bir `LLMToolMetadata` nesnesi döndürür.
ıklama, LLM'ye çağrı için ve RAGFlow web ön yüzüne görüntüleme amacıyla sağlanacaktır.
- `invoke` metodu: Bu metod, LLM tarafından üretilen parametreleri kabul eder ve aracın yürütme sonucunu içeren bir `str` döndürür.
Bu aracın tüm yürütme mantığı bu metoda konulmalıdır.
RAGFlow'u başlattığınızda, günlükte eklentinizin yüklendiğini göreceksiniz:
```
2025-05-15 19:29:08,959 INFO 34670 Recursively importing plugins from path `/some-path/ragflow/agent/plugin/embedded_plugins`
2025-05-15 19:29:08,960 INFO 34670 Loaded llm_tools plugin BadCalculatorPlugin version 1.0.0
```
Veya eklentinizi düzeltmeniz gereken hatalar da içerebilir.
### Örnek
Yanlış cevaplar veren bir hesap makinesi aracı ekleyerek eklenti ekleme sürecini göstereceğiz.
Önce, `embedded_plugins/llm_tools` klasörü altında `bad_calculator.py` adında bir eklenti dosyası oluşturun.
Ardından, `LLMToolPlugin` temel sınıfından türetilmiş bir `BadCalculatorPlugin` sınıfı oluşturuyoruz:
```python
class BadCalculatorPlugin(LLMToolPlugin):
_version_ = "1.0.0"
```
`_version_` alanı zorunludur ve eklentinin sürüm numarasını belirtir.
Hesap makinemizin girdileri olarak `a` ve `b` olmak üzere iki sayısı vardır, bu yüzden `BadCalculatorPlugin` sınıfımıza aşağıdaki `invoke` metodunu ekliyoruz:
```python
def invoke(self, a: int, b: int) -> str:
return str(a + b + 100)
```
`invoke` metodu LLM tarafından çağrılacaktır. Birçok parametreye sahip olabilir, ancak dönüş tipi `str` olmalıdır.
Son olarak, LLM'ye `bad_calculator` aracımızı nasıl kullanacağını anlatmak için bir `get_metadata` metodu eklememiz gerekiyor:
```python
@classmethod
def get_metadata(cls) -> LLMToolMetadata:
return {
# Bu aracın adı, LLM'ye sağlanır
"name": "bad_calculator",
# Bu aracın görüntüleme adı, RAGFlow ön yüzüne sağlanır
"displayName": "$t:bad_calculator.name",
# Bu aracın kullanım açıklaması, LLM'ye sağlanır
"description": "A tool to calculate the sum of two numbers (will give wrong answer)",
# Bu aracın açıklaması, RAGFlow ön yüzüne sağlanır
"displayDescription": "$t:bad_calculator.description",
# Bu aracın parametreleri
"parameters": {
# Birinci parametre - a
"a": {
# Parametre tipi, seçenekler: number, string veya LLM'nin tanıyabileceği herhangi bir tip
"type": "number",
# Bu parametrenin açıklaması, LLM'ye sağlanır
"description": "The first number",
# Bu parametrenin açıklaması, RAGFlow ön yüzüne sağlanır
"displayDescription": "$t:bad_calculator.params.a",
# Bu parametrenin zorunlu olup olmadığı
"required": True
},
# İkinci parametre - b
"b": {
"type": "number",
"description": "The second number",
"displayDescription": "$t:bad_calculator.params.b",
"required": True
}
}
```
`get_metadata` metodu bir `classmethod`'dur. Bu aracın açıklamasını LLM'ye sağlayacaktır.
`display` ile başlayan alanlar özel bir gösterim kullanabilir: `$t:xxx`, bu gösterim RAGFlow ön yüzündeki uluslararasılaştırma (i18n) mekanizmasını kullanarak `llmTools` kategorisinden metin alır. Bu gösterimi kullanmazsanız, ön yüz buraya yazdığınız metni doğrudan gösterecektir.
Artık aracımız hazırdır. `Yanıt Üret` bileşeninde seçip deneyebilirsiniz.

View File

@@ -189,7 +189,19 @@ Currently, the following languages are officially supported:
### 🐍 Python
To add Python dependencies, simply edit the following file:
Pre-installed packages: `requests`, `numpy`, `pandas`, `matplotlib`.
> `matplotlib` uses the `Agg` (non-interactive) backend by default in the sandbox (`MPLBACKEND=Agg`). No display server is available, so always save figures to files (e.g. `fig.savefig("artifacts/chart.png")`) rather than calling `plt.show()`.
>
> Tip: if Chinese text renders as missing boxes/squares in `matplotlib`, install Debian package `fonts-noto-cjk` in your custom image. We do not preinstall it by default to keep the base image smaller. The sandbox base image ships a `matplotlibrc` that already lists common CJK fonts in the `font.sans-serif` fallback chain, so no code-level font configuration is needed — just install the font package and rebuild the image.
>
> Example:
>
> ```dockerfile
> RUN apt-get update && apt-get install -y --no-install-recommends fonts-noto-cjk && rm -rf /var/lib/apt/lists/*
> ```
To add more dependencies, edit:
```bash
sandbox_base_image/python/requirements.txt
@@ -199,6 +211,8 @@ Add any additional packages you need, one per line (just like a normal pip requi
### 🟨 Node.js
Pre-installed packages: `axios`.
To add Node.js dependencies:
1. Navigate to the Node.js base image directory:

View File

@@ -7,7 +7,7 @@ services:
runtime: runc
privileged: true
ports:
- "${EXECUTOR_PORT:-9385}:9385"
- "${SANDBOX_EXECUTOR_MANAGER_PORT:-9385}:9385"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:

View File

@@ -19,6 +19,7 @@ from api.handlers import healthz_handler, run_code_handler
router = APIRouter()
router.get("/")(healthz_handler)
router.get("/healthz")(healthz_handler)
router.post("/run")(run_code_handler)

View File

@@ -14,13 +14,26 @@
# limitations under the License.
#
import base64
from typing import Optional
from typing import Any, Optional
from pydantic import BaseModel, Field, field_validator
from models.enums import ResourceLimitType, ResultStatus, RuntimeErrorType, SupportLanguage, UnauthorizedAccessType
class ArtifactItem(BaseModel):
name: str
mime_type: str
size: int
content_b64: str
class ExecutionStructuredResult(BaseModel):
present: bool
value: Any = None
type: str = "json"
class CodeExecutionResult(BaseModel):
status: ResultStatus
stdout: str
@@ -37,6 +50,12 @@ class CodeExecutionResult(BaseModel):
unauthorized_access_type: Optional[UnauthorizedAccessType] = None
runtime_error_type: Optional[RuntimeErrorType] = None
# File artifacts produced by code execution (images, PDFs, CSVs, etc.)
artifacts: list[ArtifactItem] = []
# Structured return value produced by main()
result: Optional[ExecutionStructuredResult] = None
class CodeExecutionRequest(BaseModel):
code_b64: str = Field(..., description="Base64 encoded code string")

View File

@@ -19,17 +19,181 @@ import json
import os
import time
import uuid
from core.config import TIMEOUT
from core.container import allocate_container_blocking, release_container
from core.logger import logger
from models.enums import ResourceLimitType, ResultStatus, RuntimeErrorType, SupportLanguage, UnauthorizedAccessType
from models.schemas import CodeExecutionRequest, CodeExecutionResult
from models.schemas import ArtifactItem, CodeExecutionRequest, CodeExecutionResult, ExecutionStructuredResult
from utils.common import async_run_command
RESULT_MARKER_PREFIX = "__RAGFLOW_RESULT__:"
def _extract_result_envelope(stdout: str) -> tuple[str, ExecutionStructuredResult | None]:
if not stdout:
return "", None
cleaned_lines: list[str] = []
envelope: ExecutionStructuredResult | None = None
for line in str(stdout).splitlines():
if line.startswith(RESULT_MARKER_PREFIX):
payload_b64 = line[len(RESULT_MARKER_PREFIX) :].strip()
if not payload_b64:
continue
try:
payload = base64.b64decode(payload_b64).decode("utf-8")
envelope = ExecutionStructuredResult.model_validate_json(payload)
except Exception as exc:
logger.warning(f"Failed to decode structured result marker: {exc}")
cleaned_lines.append(line)
continue
cleaned_lines.append(line)
cleaned_stdout = "\n".join(cleaned_lines)
if stdout.endswith("\n") and cleaned_stdout and not cleaned_stdout.endswith("\n"):
cleaned_stdout += "\n"
return cleaned_stdout, envelope
def _build_execution_bundle(req: CodeExecutionRequest, workdir: str) -> dict[str, str | bytes]:
arguments = req.arguments or {}
args_source = json.dumps(arguments, ensure_ascii=False)
args_name = "args.json"
code_bytes = base64.b64decode(req.code_b64)
if req.language == SupportLanguage.PYTHON:
code_name = "main.py"
runner_name = "runner.py"
runner_source = f"""import base64
import json
import os
import sys
os.makedirs(os.path.join(os.getcwd(), "artifacts"), exist_ok=True)
sys.path.insert(0, os.path.dirname(__file__))
from main import main
RESULT_MARKER_PREFIX = {RESULT_MARKER_PREFIX!r}
def emit_result(value):
payload = json.dumps(
{{
"present": True,
"value": value,
"type": "json",
}},
ensure_ascii=False,
separators=(",", ":"),
)
print(RESULT_MARKER_PREFIX + base64.b64encode(payload.encode("utf-8")).decode("ascii"))
if __name__ == "__main__":
with open(os.path.join(os.path.dirname(__file__), "args.json"), encoding="utf-8") as f:
args = json.load(f)
result = main(**args)
emit_result(result)
"""
elif req.language == SupportLanguage.NODEJS:
code_name = "main.js"
runner_name = "runner.js"
runner_source = """
const fs = require('fs');
const path = require('path');
const args = JSON.parse(fs.readFileSync(path.join(__dirname, 'args.json'), 'utf8'));
const mainPath = path.join(__dirname, 'main.js');
const RESULT_MARKER_PREFIX = '__RESULT_MARKER_PREFIX__';
function isPromise(value) {
return Boolean(value && typeof value.then === 'function');
}
function emitResult(value) {
if (typeof value === 'undefined') {
console.error('Error: main() must return a value. Use null for an empty result.');
process.exit(1);
}
const payload = JSON.stringify({ present: true, value, type: 'json' });
if (typeof payload === 'undefined') {
console.error('Error: main() returned a non-JSON-serializable value.');
process.exit(1);
}
console.log(RESULT_MARKER_PREFIX + Buffer.from(payload, 'utf8').toString('base64'));
}
if (fs.existsSync(mainPath)) {
const mod = require(mainPath);
const main = typeof mod === 'function' ? mod : mod.main;
if (typeof main !== 'function') {
console.error('Error: main is not a function');
process.exit(1);
}
if (typeof args === 'object' && args !== null) {
try {
const result = Promise.resolve(main(args));
if (isPromise(result)) {
result.then(output => {
emitResult(output);
}).catch(err => {
console.error('Error in async main function:', err);
process.exit(1);
});
} else {
emitResult(result);
}
} catch (err) {
console.error('Error when executing main:', err);
process.exit(1);
}
} else {
console.error('Error: args is not a valid object:', args);
process.exit(1);
}
} else {
console.error('main.js not found in the current directory');
process.exit(1);
}
"""
runner_source = runner_source.replace("__RESULT_MARKER_PREFIX__", RESULT_MARKER_PREFIX)
else:
assert False, "Will never reach here"
return {
"code_name": code_name,
"code_bytes": code_bytes,
"runner_name": runner_name,
"runner_source": runner_source,
"args_name": args_name,
"args_source": args_source,
}
def _build_container_run_args(language: SupportLanguage, task_id: str, container: str, runner_name: str) -> list[str]:
run_args = [
"docker",
"exec",
"--workdir",
f"/workspace/{task_id}",
container,
"timeout",
str(TIMEOUT),
language,
]
if language == SupportLanguage.PYTHON:
run_args.extend(["-I", "-B"])
run_args.append(runner_name)
return run_args
async def execute_code(req: CodeExecutionRequest):
"""Fully asynchronous execution logic"""
language = req.language
container = await allocate_container_blocking(language)
if not container:
@@ -46,93 +210,31 @@ async def execute_code(req: CodeExecutionRequest):
os.makedirs(workdir, mode=0o700, exist_ok=True)
try:
if language == SupportLanguage.PYTHON:
code_name = "main.py"
# code
code_path = os.path.join(workdir, code_name)
with open(code_path, "wb") as f:
f.write(base64.b64decode(req.code_b64))
# runner
runner_name = "runner.py"
runner_path = os.path.join(workdir, runner_name)
with open(runner_path, "w") as f:
f.write("""import json
import os
import sys
sys.path.insert(0, os.path.dirname(__file__))
from main import main
if __name__ == "__main__":
args = json.loads(sys.argv[1])
result = main(**args)
if result is not None:
print(result)
""")
bundle = _build_execution_bundle(req, workdir)
code_name = str(bundle["code_name"])
runner_name = str(bundle["runner_name"])
elif language == SupportLanguage.NODEJS:
code_name = "main.js"
code_path = os.path.join(workdir, "main.js")
with open(code_path, "wb") as f:
f.write(base64.b64decode(req.code_b64))
code_path = os.path.join(workdir, code_name)
with open(code_path, "wb") as f:
f.write(bundle["code_bytes"])
runner_name = "runner.js"
runner_path = os.path.join(workdir, "runner.js")
with open(runner_path, "w") as f:
f.write("""
const fs = require('fs');
const path = require('path');
runner_path = os.path.join(workdir, runner_name)
with open(runner_path, "w", encoding="utf-8") as f:
f.write(str(bundle["runner_source"]))
const args = JSON.parse(process.argv[2]);
const mainPath = path.join(__dirname, 'main.js');
args_path = os.path.join(workdir, str(bundle["args_name"]))
with open(args_path, "w", encoding="utf-8") as f:
f.write(str(bundle["args_source"]))
function isPromise(value) {
return Boolean(value && typeof value.then === 'function');
}
if (fs.existsSync(mainPath)) {
const mod = require(mainPath);
const main = typeof mod === 'function' ? mod : mod.main;
if (typeof main !== 'function') {
console.error('Error: main is not a function');
process.exit(1);
}
if (typeof args === 'object' && args !== null) {
try {
const result = main(args);
if (isPromise(result)) {
result.then(output => {
if (output !== null) {
console.log(output);
}
}).catch(err => {
console.error('Error in async main function:', err);
});
} else {
if (result !== null) {
console.log(result);
}
}
} catch (err) {
console.error('Error when executing main:', err);
}
} else {
console.error('Error: args is not a valid object:', args);
}
} else {
console.error('main.js not found in the current directory');
}
""")
# dirs
returncode, _, stderr = await async_run_command("docker", "exec", container, "mkdir", "-p", f"/workspace/{task_id}", timeout=5)
if returncode != 0:
raise RuntimeError(f"Directory creation failed: {stderr}")
# archive
tar_proc = await asyncio.create_subprocess_exec("tar", "czf", "-", "-C", workdir, code_name, runner_name, stdout=asyncio.subprocess.PIPE)
tar_proc = await asyncio.create_subprocess_exec(
"tar", "czf", "-", "-C", workdir, code_name, runner_name, str(bundle["args_name"]), stdout=asyncio.subprocess.PIPE
)
tar_stdout, _ = await tar_proc.communicate()
# unarchive
docker_proc = await asyncio.create_subprocess_exec(
"docker", "exec", "-i", container, "tar", "xzf", "-", "-C", f"/workspace/{task_id}", stdin=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
)
@@ -141,29 +243,11 @@ if (fs.existsSync(mainPath)) {
if docker_proc.returncode != 0:
raise RuntimeError(stderr.decode())
# exec
start_time = time.time()
try:
logger.info(f"Passed in args: {req.arguments}")
args_json = json.dumps(req.arguments or {})
run_args = [
"docker",
"exec",
"--workdir",
f"/workspace/{task_id}",
container,
"timeout",
str(TIMEOUT),
language,
]
# flags
if language == SupportLanguage.PYTHON:
run_args.extend(["-I", "-B"])
elif language == SupportLanguage.NODEJS:
run_args.extend([])
else:
assert False, "Will never reach here"
run_args.extend([runner_name, args_json])
arguments = req.arguments or {}
logger.info("Passed in args keys=%s size_bytes=%s", list(arguments.keys()), len(json.dumps(arguments, ensure_ascii=False).encode("utf-8")))
run_args = _build_container_run_args(language=language, task_id=task_id, container=container, runner_name=runner_name)
returncode, stdout, stderr = await async_run_command(
*run_args,
@@ -177,15 +261,18 @@ if (fs.existsSync(mainPath)) {
logger.info(f"{returncode=}")
logger.info(f"{stdout=}")
logger.info(f"{stderr=}")
logger.info(f"{args_json=}")
if returncode == 0:
clean_stdout, structured_result = _extract_result_envelope(stdout)
artifacts = await _collect_artifacts(container, task_id, workdir)
return CodeExecutionResult(
status=ResultStatus.SUCCESS,
stdout=str(stdout),
stdout=clean_stdout,
stderr=stderr,
exit_code=0,
time_used_ms=time_used_ms,
artifacts=artifacts,
result=structured_result,
)
elif returncode == 124:
return CodeExecutionResult(
@@ -223,12 +310,89 @@ if (fs.existsSync(mainPath)) {
return CodeExecutionResult(status=ResultStatus.PROGRAM_RUNNER_ERROR, stdout="", stderr=str(e), exit_code=-3, detail="internal_error")
finally:
# cleanup
cleanup_tasks = [async_run_command("docker", "exec", container, "rm", "-rf", f"/workspace/{task_id}"), async_run_command("rm", "-rf", workdir)]
await asyncio.gather(*cleanup_tasks, return_exceptions=True)
await release_container(container, language)
ALLOWED_ARTIFACT_EXTENSIONS = {
".png": "image/png",
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".svg": "image/svg+xml",
".pdf": "application/pdf",
".csv": "text/csv",
".json": "application/json",
".html": "text/html",
}
MAX_ARTIFACT_COUNT = 10
MAX_ARTIFACT_SIZE = 10 * 1024 * 1024 # 10MB per file
async def _collect_artifacts(container: str, task_id: str, host_workdir: str) -> list[ArtifactItem]:
artifacts_path = f"/workspace/{task_id}/artifacts"
# List files in the artifacts directory inside the container
returncode, stdout, _ = await async_run_command(
"docker", "exec", container, "find", artifacts_path,
"-maxdepth", "1", "-type", "f", timeout=5,
)
if returncode != 0 or not stdout.strip():
return []
raw_names = [line.split("/")[-1] for line in stdout.strip().splitlines() if line.strip()]
# Sanitize: reject names with path traversal or control characters
filenames = [n for n in raw_names if n and "/" not in n and "\\" not in n and ".." not in n and not n.startswith(".")]
if not filenames:
return []
items: list[ArtifactItem] = []
for fname in filenames[:MAX_ARTIFACT_COUNT]:
ext = os.path.splitext(fname)[1].lower()
mime_type = ALLOWED_ARTIFACT_EXTENSIONS.get(ext)
if not mime_type:
logger.warning(f"Skipping artifact with disallowed extension: {fname}")
continue
file_path = f"{artifacts_path}/{fname}"
# Check file size inside the container
returncode, size_str, _ = await async_run_command(
"docker", "exec", container, "stat", "-c", "%s", file_path, timeout=5,
)
if returncode != 0:
logger.warning(f"Failed to stat artifact {fname}")
continue
file_size = int(size_str.strip())
if file_size > MAX_ARTIFACT_SIZE:
logger.warning(f"Artifact {fname} too large ({file_size} bytes), skipping")
continue
if file_size == 0:
continue
# Read file content via docker exec (docker cp doesn't work with gVisor tmpfs)
returncode, content_b64, stderr = await async_run_command(
"docker", "exec", container, "base64", file_path, timeout=30,
)
if returncode != 0:
logger.warning(f"Failed to read artifact {fname}: {stderr}")
continue
content_b64 = content_b64.replace("\n", "").strip()
items.append(ArtifactItem(
name=fname,
mime_type=mime_type,
size=file_size,
content_b64=content_b64,
))
logger.info(f"Collected artifact: {fname} ({file_size} bytes, {mime_type})")
return items
def analyze_error_result(stderr: str, exit_code: int) -> CodeExecutionResult:
"""Analyze the error result and classify it"""
if "Permission denied" in stderr:

View File

@@ -14,6 +14,7 @@
# limitations under the License.
#
import ast
import re
from typing import List, Tuple
from core.logger import logger
@@ -151,6 +152,26 @@ class SecurePythonAnalyzer(ast.NodeVisitor):
self.generic_visit(node)
class SecureJavaScriptAnalyzer:
DANGEROUS_PATTERNS = [
(re.compile(r"""require\s*\(\s*['"]child_process['"]\s*\)"""), "Require: child_process"),
(re.compile(r"""require\s*\(\s*['"]fs['"]\s*\)"""), "Require: fs"),
(re.compile(r"""require\s*\(\s*['"]worker_threads['"]\s*\)"""), "Require: worker_threads"),
(re.compile(r"""\beval\s*\("""), "Call: eval"),
(re.compile(r"""\bFunction\s*\("""), "Call: Function"),
(re.compile(r"""\bprocess\s*\.\s*binding\s*\("""), "Call: process.binding"),
]
@classmethod
def analyze(cls, code: str) -> List[Tuple[str, int]]:
issues: List[Tuple[str, int]] = []
for pattern, description in cls.DANGEROUS_PATTERNS:
for match in pattern.finditer(code):
lineno = code.count("\n", 0, match.start()) + 1
issues.append((description, lineno))
return issues
def analyze_code_security(code: str, language: SupportLanguage) -> Tuple[bool, List[Tuple[str, int]]]:
"""
Analyze the provided code string and return whether it's safe and why.
@@ -168,6 +189,9 @@ def analyze_code_security(code: str, language: SupportLanguage) -> Tuple[bool, L
except Exception as e:
logger.error(f"[SafeCheck] Python parsing failed: {str(e)}")
return False, [(f"Parsing Error: {str(e)}", -1)]
else:
logger.warning(f"[SafeCheck] Unsupported language for security analysis: {language} — defaulting to SAFE (manual review recommended)")
return True, [(f"Unsupported language for security analysis: {language} — defaulted to SAFE, manual review recommended", -1)]
if language == SupportLanguage.NODEJS:
issues = SecureJavaScriptAnalyzer.analyze(code)
return len(issues) == 0, issues
logger.warning(f"[SafeCheck] Unsupported language for security analysis: {language}")
return False, [(f"Unsupported language for security analysis: {language}", -1)]

View File

@@ -30,6 +30,8 @@ https://api.aliyun.com/api/AgentRun/2025-09-10/CreateSandbox?lang=PYTHON
import logging
import os
import time
import base64
import json
from typing import Dict, Any, List, Optional
from datetime import datetime, timezone
@@ -40,6 +42,7 @@ from agentrun.utils.exception import ServerError
from .base import SandboxProvider, SandboxInstance, ExecutionResult
logger = logging.getLogger(__name__)
RESULT_MARKER_PREFIX = "__RAGFLOW_RESULT__:"
class AliyunCodeInterpreterProvider(SandboxProvider):
@@ -51,9 +54,9 @@ class AliyunCodeInterpreterProvider(SandboxProvider):
"""
def __init__(self):
self.access_key_id: Optional[str] = None
self.access_key_secret: Optional[str] = None
self.account_id: Optional[str] = None
self.access_key_id: Optional[str] = ""
self.access_key_secret: Optional[str] = ""
self.account_id: Optional[str] = ""
self.region: str = "cn-hangzhou"
self.template_name: str = ""
self.timeout: int = 30
@@ -68,7 +71,7 @@ class AliyunCodeInterpreterProvider(SandboxProvider):
config: Configuration dictionary with keys:
- access_key_id: Aliyun AccessKey ID
- access_key_secret: Aliyun AccessKey Secret
- account_id: Aliyun primary account ID (主账号ID)
- account_id: Aliyun primary account ID
- region: Region (default: "cn-hangzhou")
- template_name: Optional sandbox template name
- timeout: Request timeout in seconds (default: 30, max 30)
@@ -97,7 +100,7 @@ class AliyunCodeInterpreterProvider(SandboxProvider):
return False
if not self.account_id:
logger.error("Aliyun Code Interpreter: Missing account_id (主账号ID)")
logger.error("Aliyun Code Interpreter: Missing account_id (primary account ID)")
return False
# Create SDK configuration
@@ -146,8 +149,6 @@ class AliyunCodeInterpreterProvider(SandboxProvider):
try:
# Get or create template
from agentrun.sandbox import Sandbox
if self.template_name:
# Use existing template
template_name = self.template_name
@@ -226,48 +227,17 @@ class AliyunCodeInterpreterProvider(SandboxProvider):
# Connect to existing sandbox instance
sandbox = Sandbox.connect(sandbox_id=instance_id, config=self._config)
# Convert language string to CodeLanguage enum
code_language = CodeLanguage.PYTHON if normalized_lang == "python" else CodeLanguage.JAVASCRIPT
# agentrun-sdk 0.0.26 only exposes CodeLanguage.PYTHON; keep JS as string fallback.
code_language = CodeLanguage.PYTHON if normalized_lang == "python" else "javascript"
# Wrap code to call main() function
# Matches self_managed provider behavior: call main(**arguments)
if normalized_lang == "python":
# Build arguments string for main() call
if arguments:
import json as json_module
args_json = json_module.dumps(arguments)
wrapped_code = f'''{code}
if __name__ == "__main__":
import json
result = main(**{args_json})
print(json.dumps(result) if isinstance(result, dict) else result)
'''
else:
wrapped_code = f'''{code}
if __name__ == "__main__":
import json
result = main()
print(json.dumps(result) if isinstance(result, dict) else result)
'''
else: # javascript
if arguments:
import json as json_module
args_json = json_module.dumps(arguments)
wrapped_code = f'''{code}
// Call main and output result
const result = main({args_json});
console.log(typeof result === 'object' ? JSON.stringify(result) : String(result));
'''
else:
wrapped_code = f'''{code}
// Call main and output result
const result = main();
console.log(typeof result === 'object' ? JSON.stringify(result) : String(result));
'''
args_json = json.dumps(arguments or {})
wrapped_code = (
self._build_python_wrapper(code, args_json)
if normalized_lang == "python"
else self._build_javascript_wrapper(code, args_json)
)
logger.debug(f"Aliyun Code Interpreter: Wrapped code (first 200 chars): {wrapped_code[:200]}")
start_time = time.time()
@@ -314,6 +284,7 @@ console.log(typeof result === 'object' ? JSON.stringify(result) : String(result)
stdout = "\n".join(stdout_parts)
stderr = "\n".join(stderr_parts)
stdout, structured_result = self._extract_structured_result(stdout)
logger.info(f"Aliyun Code Interpreter: stdout length={len(stdout)}, stderr length={len(stderr)}, exit_code={exit_code}")
if stdout:
@@ -331,6 +302,9 @@ console.log(typeof result === 'object' ? JSON.stringify(result) : String(result)
"language": normalized_lang,
"context_id": result.get("contextId") if isinstance(result, dict) else None,
"timeout": timeout,
"result_present": structured_result.get("present", False),
"result_value": structured_result.get("value"),
"result_type": structured_result.get("type"),
},
)
@@ -390,6 +364,71 @@ console.log(typeof result === 'object' ? JSON.stringify(result) : String(result)
# If we get any response (even an error), the service is reachable
return "connection" not in str(e).lower()
@staticmethod
def _build_python_wrapper(code: str, args_json: str) -> str:
marker = RESULT_MARKER_PREFIX
return f'''{code}
if __name__ == "__main__":
import base64
import json
result = main(**{args_json})
payload = json.dumps({{"present": True, "value": result, "type": "json"}}, ensure_ascii=False, separators=(",", ":"))
print("{marker}" + base64.b64encode(payload.encode("utf-8")).decode("ascii"))
'''
@staticmethod
def _build_javascript_wrapper(code: str, args_json: str) -> str:
marker = RESULT_MARKER_PREFIX
return f'''{code}
const __ragflowArgs = {args_json};
(async () => {{
try {{
const output = await Promise.resolve(main(__ragflowArgs));
if (typeof output === 'undefined') {{
throw new Error('main() must return a value. Use null for an empty result.');
}}
const payload = JSON.stringify({{ present: true, value: output, type: 'json' }});
if (typeof payload === 'undefined') {{
throw new Error('main() returned a non-JSON-serializable value.');
}}
console.log('{marker}' + Buffer.from(payload, 'utf8').toString('base64'));
}} catch (err) {{
console.error(err instanceof Error ? err.stack || err.message : String(err));
}}
}})();
'''
@staticmethod
def _extract_structured_result(stdout: str) -> tuple[str, Dict[str, Any]]:
if not stdout:
return "", {}
cleaned_lines: list[str] = []
structured_result: Dict[str, Any] = {}
for line in str(stdout).splitlines():
if line.startswith(RESULT_MARKER_PREFIX):
payload_b64 = line[len(RESULT_MARKER_PREFIX) :].strip()
if not payload_b64:
continue
try:
payload = base64.b64decode(payload_b64).decode("utf-8")
structured_result = json.loads(payload)
except Exception as exc:
logger.warning(f"Aliyun Code Interpreter: failed to decode structured result marker: {exc}")
cleaned_lines.append(line)
continue
cleaned_lines.append(line)
cleaned_stdout = "\n".join(cleaned_lines)
if stdout.endswith("\n") and cleaned_stdout and not cleaned_stdout.endswith("\n"):
cleaned_stdout += "\n"
return cleaned_stdout, structured_result
def get_supported_languages(self) -> List[str]:
"""
Get list of supported programming languages.
@@ -429,7 +468,7 @@ console.log(typeof result === 'object' ? JSON.stringify(result) : String(result)
"required": True,
"label": "Account ID",
"placeholder": "1234567890...",
"description": "Aliyun primary account ID (主账号ID), required for API calls",
"description": "Aliyun primary account ID, required for API calls",
},
"region": {
"type": "string",

View File

@@ -70,7 +70,7 @@ class SelfManagedProvider(SandboxProvider):
# Try to fall back to SANDBOX_HOST from settings if we are using localhost
if "localhost" in self.endpoint or "127.0.0.1" in self.endpoint:
try:
from api import settings
from common import settings
if settings.SANDBOX_HOST and settings.SANDBOX_HOST not in self.endpoint:
original_endpoint = self.endpoint
self.endpoint = f"http://{settings.SANDBOX_HOST}:9385"
@@ -187,6 +187,7 @@ class SelfManagedProvider(SandboxProvider):
)
result = response.json()
structured_result = result.get("result") or {}
return ExecutionResult(
stdout=result.get("stdout", ""),
@@ -199,6 +200,10 @@ class SelfManagedProvider(SandboxProvider):
"memory_used_kb": result.get("memory_used_kb"),
"detail": result.get("detail"),
"instance_id": instance_id,
"artifacts": result.get("artifacts", []),
"result_present": structured_result.get("present", False),
"result_value": structured_result.get("value"),
"result_type": structured_result.get("type"),
}
)

View File

@@ -8,7 +8,7 @@ dependencies = [
"fastapi>=0.115.12",
"httpx>=0.28.1",
"pydantic>=2.11.4",
"requests>=2.32.3",
"requests>=2.32.4",
"slowapi>=0.1.9",
"uvicorn>=0.34.2",
]

View File

@@ -19,13 +19,13 @@
"license": "MIT"
},
"node_modules/axios": {
"version": "1.12.0",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.12.0.tgz",
"integrity": "sha512-oXTDccv8PcfjZmPGlWsPSwtOJCZ/b6W5jAMCNcfwJbCzDckwG0jrYJFaWH1yvivfCXjVzV/SPDEhMB3Q+DSurg==",
"version": "1.13.6",
"resolved": "https://registry.npmjs.org/axios/-/axios-1.13.6.tgz",
"integrity": "sha512-ChTCHMouEe2kn713WHbQGcuYrr6fXTBiu460OTwWrWob16g1bXn4vtz07Ope7ewMozJAnEquLk5lWQWtBig9DQ==",
"license": "MIT",
"dependencies": {
"follow-redirects": "^1.15.6",
"form-data": "^4.0.4",
"follow-redirects": "^1.15.11",
"form-data": "^4.0.5",
"proxy-from-env": "^1.1.0"
}
},
@@ -123,9 +123,9 @@
}
},
"node_modules/follow-redirects": {
"version": "1.15.9",
"resolved": "https://registry.npmmirror.com/follow-redirects/-/follow-redirects-1.15.9.tgz",
"integrity": "sha512-gew4GsXizNgdoRyqmyfMHyAmXsZDk6mHkSxZFCzW9gwlbtOW44CDtYavM+y+72qD/Vq2l550kMF52DT8fOLJqQ==",
"version": "1.15.11",
"resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.11.tgz",
"integrity": "sha512-deG2P0JfjrTxl50XGCDyfI97ZGVCxIpfKYmfyrQ54n5FO/0gfIES8C/Psl6kWVDolizcaaxZJnTS0QSMxvnsBQ==",
"funding": [
{
"type": "individual",
@@ -143,9 +143,9 @@
}
},
"node_modules/form-data": {
"version": "4.0.4",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.4.tgz",
"integrity": "sha512-KrGhL9Q4zjj0kiUt5OO4Mr/A/jlI2jDYs5eHBpYHPcBEVSiipAvn2Ko2HnPe20rmcuuvMHNdZFp+4IlGTMF0Ow==",
"version": "4.0.5",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.5.tgz",
"integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==",
"license": "MIT",
"dependencies": {
"asynckit": "^0.4.0",

View File

@@ -2,12 +2,17 @@ FROM python:3.11-slim-bookworm
COPY --from=ghcr.io/astral-sh/uv:0.7.5 /uv /uvx /bin/
ENV UV_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
ENV MPLBACKEND=Agg
ENV MPLCONFIGDIR=/tmp/matplotlib
ENV MATPLOTLIBRC=/usr/local/etc/matplotlibrc
COPY requirements.txt .
COPY matplotlibrc /usr/local/etc/matplotlibrc
RUN grep -rl 'deb.debian.org' /etc/apt/ | xargs sed -i 's|http[s]*://deb.debian.org|https://mirrors.tuna.tsinghua.edu.cn|g' && \
apt-get update && \
apt-get install -y curl gcc && \
mkdir -p /tmp/matplotlib && \
uv pip install --system -r requirements.txt
WORKDIR /workspace

View File

@@ -0,0 +1,11 @@
## RAGFlow sandbox matplotlib defaults
## Only overrides are listed; all other settings use matplotlib built-in defaults.
# Prefer CJK-capable fonts so Chinese / Japanese / Korean text renders correctly.
# matplotlib silently skips fonts that are not installed, falling back to the
# next entry in the list, so this is safe even without any CJK font package.
font.family: sans-serif
font.sans-serif: Noto Sans CJK SC, Noto Sans CJK TC, Noto Sans CJK JP, Noto Sans CJK KR, Source Han Sans SC, Source Han Sans CN, WenQuanYi Zen Hei, Microsoft YaHei, SimHei, PingFang SC, Heiti SC, STHeiti, Arial Unicode MS, DejaVu Sans, Bitstream Vera Sans, Computer Modern Sans Serif, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif
# Use ASCII hyphen-minus for the minus sign so it renders correctly with any font.
axes.unicode_minus: False

View File

@@ -1,3 +1,4 @@
numpy
pandas
matplotlib
requests

View File

@@ -654,7 +654,7 @@ class AliyunCodeInterpreterProvider(SandboxProvider):
"type": "string",
"required": True,
"label": "Account ID",
"description": "Aliyun primary account ID (主账号ID), required for API calls"
"description": "Aliyun primary account ID, required for API calls"
},
"region": {
"type": "string",
@@ -1739,8 +1739,9 @@ def execute_code(
1. **Self-managed provider** ([self_managed.py:164](agent/sandbox/providers/self_managed.py:164)):
- Passes arguments via HTTP API: `"arguments": arguments or {}`
- executor_manager receives and passes to code via command line
- Runner script: `args = json.loads(sys.argv[1])` then `result = main(**args)`
- executor_manager writes `args.json` into the per-task workspace
- Runner script loads arguments from `args.json`
- Python runner calls `main(**args)` and JavaScript runner calls `main(args)`
2. **Aliyun Code Interpreter** ([aliyun_codeinterpreter.py:260-275](agent/sandbox/providers/aliyun_codeinterpreter.py:260-275)):
- Wraps user code to call `main(**arguments)` or `main()` if no arguments

View File

@@ -1,53 +1,53 @@
# Aliyun Code Interpreter Provider - 使用官方 SDK
# Aliyun Code Interpreter Provider - Using the Official SDK
## 重要变更
## Important Changes
### 官方资源
### Official Resources
- **Code Interpreter API**: https://help.aliyun.com/zh/functioncompute/fc/sandbox-sandbox-code-interepreter
- **官方 SDK**: https://github.com/Serverless-Devs/agentrun-sdk-python
- **SDK 文档**: https://docs.agent.run
- **Official SDK**: https://github.com/Serverless-Devs/agentrun-sdk-python
- **SDK Documentation**: https://docs.agent.run
## 使用官方 SDK 的优势
## Advantages of Using the Official SDK
从手动 HTTP 请求迁移到官方 SDK (`agentrun-sdk`) 有以下优势:
Migrating from manual HTTP requests to the official SDK (`agentrun-sdk`) offers the following benefits:
### 1. **自动签名认证**
- SDK 自动处理 Aliyun API 签名(无需手动实现 `Authorization` 头)
- 支持多种认证方式:AccessKeySTS Token
- 自动读取环境变量
### 1. **Automatic Signature Authentication**
- The SDK automatically handles Aliyun API signing (no need to manually implement `Authorization` headers)
- Supports multiple authentication methods: AccessKey, STS Token
- Automatically reads environment variables
### 2. **简化的 API**
### 2. **Simplified API**
```python
# 旧实现(手动 HTTP 请求)
# Old implementation (manual HTTP requests)
response = requests.post(
f"{DATA_ENDPOINT}/sandboxes/{sandbox_id}/execute",
headers={"X-Acs-Parent-Id": account_id},
json={"code": code, "language": "python"}
)
# 新实现(使用 SDK
# New implementation (using SDK)
sandbox = CodeInterpreterSandbox(template_name="python-sandbox", config=config)
result = sandbox.context.execute(code="print('hello')")
```
### 3. **更好的错误处理**
- 结构化的异常类型 (`ServerError`)
- 自动重试机制
- 详细的错误信息
### 3. **Better Error Handling**
- Structured exception types (`ServerError`)
- Automatic retry mechanism
- Detailed error messages
## 主要变更
## Key Changes
### 1. 文件重命名
### 1. File Renames
| 旧文件名 | 新文件名 | 说明 |
| Old Filename | New Filename | Description |
|---------|---------|------|
| `aliyun_opensandbox.py` | `aliyun_codeinterpreter.py` | 提供商实现 |
| `test_aliyun_provider.py` | `test_aliyun_codeinterpreter.py` | 单元测试 |
| `test_aliyun_integration.py` | `test_aliyun_codeinterpreter_integration.py` | 集成测试 |
| `aliyun_opensandbox.py` | `aliyun_codeinterpreter.py` | Provider implementation |
| `test_aliyun_provider.py` | `test_aliyun_codeinterpreter.py` | Unit tests |
| `test_aliyun_integration.py` | `test_aliyun_codeinterpreter_integration.py` | Integration tests |
### 2. 配置字段变更
### 2. Configuration Field Changes
#### 旧配置(OpenSandbox
#### Old Configuration (OpenSandbox)
```json
{
"access_key_id": "LTAI5t...",
@@ -57,59 +57,59 @@ result = sandbox.context.execute(code="print('hello')")
}
```
#### 新配置(Code Interpreter
#### New Configuration (Code Interpreter)
```json
{
"access_key_id": "LTAI5t...",
"access_key_secret": "...",
"account_id": "1234567890...", // 新增阿里云主账号ID必需
"account_id": "1234567890...", // New: Aliyun primary account ID (required)
"region": "cn-hangzhou",
"template_name": "python-sandbox", // 新增:沙箱模板名称
"timeout": 30 // 最大 30 秒(硬限制)
"template_name": "python-sandbox", // New: sandbox template name
"timeout": 30 // Max 30 seconds (hard limit)
}
```
### 3. 关键差异
### 3. Key Differences
| 特性 | OpenSandbox | Code Interpreter |
| Feature | OpenSandbox | Code Interpreter |
|------|-------------|-----------------|
| **API 端点** | `opensandbox.{region}.aliyuncs.com` | `agentrun.{region}.aliyuncs.com` (控制面) |
| **API 版本** | `2024-01-01` | `2025-09-10` |
| **认证** | 需要 AccessKey | 需要 AccessKey + 主账号ID |
| **请求头** | 标准签名 | 需要 `X-Acs-Parent-Id` |
| **超时限制** | 可配置 | **最大 30 **(硬限制) |
| **上下文** | 不支持 | 支持上下文(Jupyter kernel |
| **API Endpoint** | `opensandbox.{region}.aliyuncs.com` | `agentrun.{region}.aliyuncs.com` (control plane) |
| **API Version** | `2024-01-01` | `2025-09-10` |
| **Authentication** | AccessKey required | AccessKey + primary account ID required |
| **Request Headers** | Standard signature | Requires `X-Acs-Parent-Id` header |
| **Timeout Limit** | Configurable | **Max 30 seconds** (hard limit) |
| **Context** | Not supported | Supports context (Jupyter kernel) |
### 4. API 调用方式变更
### 4. API Call Changes
#### 旧实现(假设的 OpenSandbox
#### Old Implementation (assumed OpenSandbox)
```python
# 单一端点
# Single endpoint
API_ENDPOINT = "https://opensandbox.cn-hangzhou.aliyuncs.com"
# 简单的请求/响应
# Simple request/response
response = requests.post(
f"{API_ENDPOINT}/execute",
json={"code": "print('hello')", "language": "python"}
)
```
#### 新实现(Code Interpreter
#### New Implementation (Code Interpreter)
```python
# 控制面 API - 管理沙箱生命周期
# Control plane API - manage sandbox lifecycle
CONTROL_ENDPOINT = "https://agentrun.cn-hangzhou.aliyuncs.com/2025-09-10"
# 数据面 API - 执行代码
# Data plane API - execute code
DATA_ENDPOINT = "https://{account_id}.agentrun-data.cn-hangzhou.aliyuncs.com"
# 创建沙箱(控制面)
# Create sandbox (control plane)
response = requests.post(
f"{CONTROL_ENDPOINT}/sandboxes",
headers={"X-Acs-Parent-Id": account_id},
json={"templateName": "python-sandbox"}
)
# 执行代码(数据面)
# Execute code (data plane)
response = requests.post(
f"{DATA_ENDPOINT}/sandboxes/{sandbox_id}/execute",
headers={"X-Acs-Parent-Id": account_id},
@@ -117,13 +117,13 @@ response = requests.post(
)
```
### 5. 迁移步骤
### 5. Migration Steps
#### 步骤 1: 更新配置
#### Step 1: Update Configuration
如果您之前使用的是 `aliyun_opensandbox`
If you were previously using `aliyun_opensandbox`:
**旧配置**:
**Old configuration**:
```json
{
"name": "sandbox.provider_type",
@@ -131,7 +131,7 @@ response = requests.post(
}
```
**新配置**:
**New configuration**:
```json
{
"name": "sandbox.provider_type",
@@ -139,123 +139,123 @@ response = requests.post(
}
```
#### 步骤 2: 添加必需的 account_id
#### Step 2: Add the Required account_id
在 Aliyun 控制台右上角点击头像,获取主账号 ID
1. 登录 [阿里云控制台](https://ram.console.aliyun.com/manage/ak)
2. 点击右上角头像
3. 复制主账号 ID16 位数字)
Get your primary account ID from the Aliyun console:
1. Log in to the [Aliyun Console](https://ram.console.aliyun.com/manage/ak)
2. Click on your avatar in the top-right corner
3. Copy the primary account ID (16-digit number)
#### 步骤 3: 更新环境变量
#### Step 3: Update Environment Variables
```bash
# 新增必需的环境变量
# New required environment variable
export ALIYUN_ACCOUNT_ID="1234567890123456"
# 其他环境变量保持不变
# Other environment variables remain unchanged
export ALIYUN_ACCESS_KEY_ID="LTAI5t..."
export ALIYUN_ACCESS_KEY_SECRET="..."
export ALIYUN_REGION="cn-hangzhou"
```
#### 步骤 4: 运行测试
#### Step 4: Run Tests
```bash
# 单元测试(不需要真实凭据)
# Unit tests (no real credentials required)
pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v
# 集成测试(需要真实凭据)
# Integration tests (real credentials required)
pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v -m integration
```
## 文件变更清单
## File Change Checklist
### ✅ 已完成
### ✅ Completed
- [x] 创建 `aliyun_codeinterpreter.py` - 新的提供商实现
- [x] 更新 `sandbox_spec.md` - 规范文档
- [x] 更新 `admin/services.py` - 服务管理器
- [x] 更新 `providers/__init__.py` - 包导出
- [x] 创建 `test_aliyun_codeinterpreter.py` - 单元测试
- [x] 创建 `test_aliyun_codeinterpreter_integration.py` - 集成测试
- [x] Created `aliyun_codeinterpreter.py` - new provider implementation
- [x] Updated `sandbox_spec.md` - specification document
- [x] Updated `admin/services.py` - service manager
- [x] Updated `providers/__init__.py` - package exports
- [x] Created `test_aliyun_codeinterpreter.py` - unit tests
- [x] Created `test_aliyun_codeinterpreter_integration.py` - integration tests
### 📝 可选清理
### 📝 Optional Cleanup
如果您想删除旧的 OpenSandbox 实现:
If you want to remove the old OpenSandbox implementation:
```bash
# 删除旧文件(可选)
# Remove old files (optional)
rm agent/sandbox/providers/aliyun_opensandbox.py
rm agent/sandbox/tests/test_aliyun_provider.py
rm agent/sandbox/tests/test_aliyun_integration.py
```
**注意**: 保留旧文件不会影响新功能,只是代码冗余。
**Note**: Keeping the old files does not affect the new functionality; it just results in redundant code.
## API 参考
## API Reference
### 控制面 API沙箱管理
### Control Plane API (Sandbox Management)
| 端点 | 方法 | 说明 |
| Endpoint | Method | Description |
|------|------|------|
| `/sandboxes` | POST | 创建沙箱实例 |
| `/sandboxes/{id}/stop` | POST | 停止实例 |
| `/sandboxes/{id}` | DELETE | 删除实例 |
| `/templates` | GET | 列出模板 |
| `/sandboxes` | POST | Create a sandbox instance |
| `/sandboxes/{id}/stop` | POST | Stop an instance |
| `/sandboxes/{id}` | DELETE | Delete an instance |
| `/templates` | GET | List templates |
### 数据面 API代码执行
### Data Plane API (Code Execution)
| 端点 | 方法 | 说明 |
| Endpoint | Method | Description |
|------|------|------|
| `/sandboxes/{id}/execute` | POST | 执行代码(简化版) |
| `/sandboxes/{id}/contexts` | POST | 创建上下文 |
| `/sandboxes/{id}/contexts/{ctx_id}/execute` | POST | 在上下文中执行 |
| `/sandboxes/{id}/health` | GET | 健康检查 |
| `/sandboxes/{id}/files` | GET/POST | 文件读写 |
| `/sandboxes/{id}/processes/cmd` | POST | 执行 Shell 命令 |
| `/sandboxes/{id}/execute` | POST | Execute code (simplified) |
| `/sandboxes/{id}/contexts` | POST | Create a context |
| `/sandboxes/{id}/contexts/{ctx_id}/execute` | POST | Execute within a context |
| `/sandboxes/{id}/health` | GET | Health check |
| `/sandboxes/{id}/files` | GET/POST | File read/write |
| `/sandboxes/{id}/processes/cmd` | POST | Execute shell command |
## 常见问题
## FAQ
### Q: 为什么要添加 account_id
### Q: Why is account_id required?
**A**: Code Interpreter API 需要在请求头中提供 `X-Acs-Parent-Id`阿里云主账号ID进行身份验证。这是 Aliyun Code Interpreter API 的必需参数。
**A**: The Code Interpreter API requires the `X-Acs-Parent-Id` (Aliyun primary account ID) header for authentication. This is a required parameter for the Aliyun Code Interpreter API.
### Q: 30 秒超时限制可以绕过吗?
### Q: Can the 30-second timeout limit be bypassed?
**A**: 不可以。这是 Aliyun Code Interpreter 的**硬限制**,无法通过配置或请求参数绕过。如果代码执行时间超过 30 秒,请考虑:
1. 优化代码逻辑
2. 分批处理数据
3. 使用上下文保持状态
**A**: No. This is a **hard limit** of Aliyun Code Interpreter and cannot be bypassed through configuration or request parameters. If your code execution exceeds 30 seconds, consider:
1. Optimizing the code logic
2. Processing data in batches
3. Using contexts to maintain state
### Q: 旧的 OpenSandbox 配置还能用吗?
### Q: Can the old OpenSandbox configuration still be used?
**A**: 不能。OpenSandbox Code Interpreter 是两个不同的服务API 不兼容。必须迁移到新的配置格式。
**A**: No. OpenSandbox and Code Interpreter are two different services with incompatible APIs. You must migrate to the new configuration format.
### Q: 如何获取阿里云主账号 ID
### Q: How do I get the Aliyun primary account ID?
**A**:
1. 登录阿里云控制台
2. 点击右上角的头像
3. 在弹出的信息中可以看到"主账号ID"
1. Log in to the Aliyun console
2. Click on your avatar in the top-right corner
3. The primary account ID will be displayed in the popup
### Q: 迁移后会影响现有功能吗?
### Q: Will the migration affect existing functionality?
**A**:
- **自我管理提供商(self_managed**: 不受影响
- **E2B 提供商**: 不受影响
- **Aliyun 提供商**: 需要更新配置并重新测试
- **Self-managed provider (self_managed)**: Not affected
- **E2B provider**: Not affected
- **Aliyun provider**: Configuration update and re-testing required
## 相关文档
## Related Documentation
- [官方文档](https://help.aliyun.com/zh/functioncompute/fc/sandbox-sandbox-code-interepreter)
- [sandbox 规范](../docs/develop/sandbox_spec.md)
- [测试指南](./README.md)
- [快速开始](./QUICKSTART.md)
- [Official Documentation](https://help.aliyun.com/zh/functioncompute/fc/sandbox-sandbox-code-interepreter)
- [Sandbox Specification](../docs/develop/sandbox_spec.md)
- [Testing Guide](./README.md)
- [Quick Start](./QUICKSTART.md)
## 技术支持
## Support
如有问题,请:
1. 查看官方文档
2. 检查配置是否正确
3. 查看测试输出中的错误信息
4. 联系 RAGFlow 团队
If you have any issues:
1. Review the official documentation
2. Verify the configuration is correct
3. Check the error messages in the test output
4. Contact the RAGFlow team

View File

@@ -1,45 +1,45 @@
# Aliyun OpenSandbox Provider - 快速测试指南
# Aliyun OpenSandbox Provider - Quick Test Guide
## 测试说明
## Test Overview
### 1. 单元测试(不需要真实凭据)
### 1. Unit Tests (No Credentials Required)
单元测试使用 mock**不需要**真实的 Aliyun 凭据,可以随时运行。
Unit tests use mocks and do **not** require real Aliyun credentials; they can be run at any time.
```bash
# 运行 Aliyun 提供商的单元测试
# Run unit tests for the Aliyun provider
pytest agent/sandbox/tests/test_aliyun_provider.py -v
# 预期输出:
# Expected output:
# test_aliyun_provider.py::TestAliyunOpenSandboxProvider::test_provider_initialization PASSED
# test_aliyun_provider.py::TestAliyunOpenSandboxProvider::test_initialize_success PASSED
# ...
# ========================= 48 passed in 2.34s ==========================
```
### 2. 集成测试(需要真实凭据)
### 2. Integration Tests (Real Credentials Required)
集成测试会调用真实的 Aliyun API需要配置凭据。
Integration tests call the real Aliyun API and require credentials to be configured.
#### 步骤 1: 配置环境变量
#### Step 1: Configure Environment Variables
```bash
export ALIYUN_ACCESS_KEY_ID="LTAI5t..." # 替换为真实的 Access Key ID
export ALIYUN_ACCESS_KEY_SECRET="..." # 替换为真实的 Access Key Secret
export ALIYUN_REGION="cn-hangzhou" # 可选,默认为 cn-hangzhou
export ALIYUN_ACCESS_KEY_ID="LTAI5t..." # Replace with your real Access Key ID
export ALIYUN_ACCESS_KEY_SECRET="..." # Replace with your real Access Key Secret
export ALIYUN_REGION="cn-hangzhou" # Optional, defaults to cn-hangzhou
```
#### 步骤 2: 运行集成测试
#### Step 2: Run Integration Tests
```bash
# 运行所有集成测试
# Run all integration tests
pytest agent/sandbox/tests/test_aliyun_integration.py -v -m integration
# 运行特定测试
# Run a specific test
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_health_check -v
```
#### 步骤 3: 预期输出
#### Step 3: Expected Output
```
test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_initialize_provider PASSED
@@ -49,130 +49,130 @@ test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_execute_pytho
========================== 10 passed in 15.67s ==========================
```
### 3. 测试场景
### 3. Test Scenarios
#### 基础功能测试
#### Basic Functionality Tests
```bash
# 健康检查
# Health check
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_health_check -v
# 创建实例
# Create instance
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_create_python_instance -v
# 执行代码
# Execute code
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_execute_python_code -v
# 销毁实例
# Destroy instance
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_destroy_instance -v
```
#### 错误处理测试
#### Error Handling Tests
```bash
# 代码执行错误
# Code execution error
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_execute_python_code_with_error -v
# 超时处理
# Timeout handling
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunOpenSandboxIntegration::test_execute_python_code_timeout -v
```
#### 真实场景测试
#### Real-World Scenario Tests
```bash
# 数据处理工作流
# Data processing workflow
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunRealWorldScenarios::test_data_processing_workflow -v
# 字符串操作
# String manipulation
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunRealWorldScenarios::test_string_manipulation -v
# 多次执行
# Multiple executions
pytest agent/sandbox/tests/test_aliyun_integration.py::TestAliyunRealWorldScenarios::test_multiple_executions_same_instance -v
```
## 常见问题
## FAQ
### Q: 没有凭据怎么办?
### Q: What if I don't have credentials?
**A:** 运行单元测试即可,不需要真实凭据:
**A:** Just run the unit tests — no real credentials needed:
```bash
pytest agent/sandbox/tests/test_aliyun_provider.py -v
```
### Q: 如何跳过集成测试?
### Q: How do I skip integration tests?
**A:** 使用 pytest 标记跳过:
**A:** Use pytest markers to skip them:
```bash
# 只运行单元测试,跳过集成测试
# Run only unit tests, skip integration tests
pytest agent/sandbox/tests/ -v -m "not integration"
```
### Q: 集成测试失败怎么办?
### Q: What should I do if integration tests fail?
**A:** 检查以下几点:
**A:** Check the following:
1. **凭据是否正确**
1. **Are the credentials correct?**
```bash
echo $ALIYUN_ACCESS_KEY_ID
echo $ALIYUN_ACCESS_KEY_SECRET
```
2. **网络连接是否正常**
2. **Is the network connection working?**
```bash
curl -I https://opensandbox.cn-hangzhou.aliyuncs.com
```
3. **是否有 OpenSandbox 服务权限**
- 登录阿里云控制台
- 检查是否已开通 OpenSandbox 服务
- 检查 AccessKey 权限
3. **Do you have OpenSandbox service permissions?**
- Log in to the Aliyun console
- Check if the OpenSandbox service is enabled
- Verify AccessKey permissions
4. **查看详细错误信息**
4. **View detailed error messages:**
```bash
pytest agent/sandbox/tests/test_aliyun_integration.py -v -s
```
### Q: 测试超时怎么办?
### Q: What should I do if tests time out?
**A:** 增加超时时间或检查网络:
**A:** Increase the timeout or check network connectivity:
```bash
# 使用更长的超时
# Use a longer timeout
pytest agent/sandbox/tests/test_aliyun_integration.py -v --timeout=60
```
## 测试命令速查表
## Quick Reference: Test Commands
| 命令 | 说明 | 需要凭据 |
| Command | Description | Credentials Required |
|------|------|---------|
| `pytest agent/sandbox/tests/test_aliyun_provider.py -v` | 单元测试 | ❌ |
| `pytest agent/sandbox/tests/test_aliyun_integration.py -v` | 集成测试 | ✅ |
| `pytest agent/sandbox/tests/ -v -m "not integration"` | 仅单元测试 | ❌ |
| `pytest agent/sandbox/tests/ -v -m integration` | 仅集成测试 | ✅ |
| `pytest agent/sandbox/tests/ -v` | 所有测试 | 部分需要 |
| `pytest agent/sandbox/tests/test_aliyun_provider.py -v` | Unit tests | ❌ |
| `pytest agent/sandbox/tests/test_aliyun_integration.py -v` | Integration tests | ✅ |
| `pytest agent/sandbox/tests/ -v -m "not integration"` | Unit tests only | ❌ |
| `pytest agent/sandbox/tests/ -v -m integration` | Integration tests only | ✅ |
| `pytest agent/sandbox/tests/ -v` | All tests | Partially required |
## 获取 Aliyun 凭据
## Getting Aliyun Credentials
1. 访问 [阿里云控制台](https://ram.console.aliyun.com/manage/ak)
2. 创建 AccessKey
3. 保存 AccessKey ID AccessKey Secret
4. 设置环境变量
1. Visit the [Aliyun Console](https://ram.console.aliyun.com/manage/ak)
2. Create an AccessKey
3. Save your AccessKey ID and AccessKey Secret
4. Set the environment variables
⚠️ **安全提示:**
- 不要在代码中硬编码凭据
- 使用环境变量或配置文件
- 定期轮换 AccessKey
- 限制 AccessKey 权限
⚠️ **Security Tips:**
- Do not hardcode credentials in your code
- Use environment variables or configuration files
- Rotate AccessKeys regularly
- Restrict AccessKey permissions
## 下一步
## Next Steps
1.**运行单元测试** - 验证代码逻辑
2. 🔧 **配置凭据** - 设置环境变量
3. 🚀 **运行集成测试** - 测试真实 API
4. 📊 **查看结果** - 确保所有测试通过
5. 🎯 **集成到系统** - 使用 admin API 配置提供商
1.**Run unit tests** - Verify code logic
2. 🔧 **Configure credentials** - Set environment variables
3. 🚀 **Run integration tests** - Test the real API
4. 📊 **Review results** - Ensure all tests pass
5. 🎯 **Integrate into your system** - Configure the provider via the admin API
## 需要帮助?
## Need Help?
- 查看 [完整文档](README.md)
- 检查 [sandbox 规范](../../../../../docs/develop/sandbox_spec.md)
- 联系 RAGFlow 团队
- See the [full documentation](README.md)
- Check the [sandbox specification](../../../../../docs/develop/sandbox_spec.md)
- Contact the RAGFlow team

View File

@@ -101,13 +101,15 @@ class TestAliyunCodeInterpreterProvider:
assert provider.region == "cn-hangzhou"
assert provider.template_name == ""
@patch("agent.sandbox.providers.aliyun_codeinterpreter.CodeInterpreterSandbox")
def test_create_instance_python(self, mock_sandbox_class):
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Template")
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Sandbox")
def test_create_instance_python(self, mock_sandbox_class, mock_template):
"""Test creating a Python instance."""
# Mock successful instance creation
mock_sandbox = MagicMock()
mock_sandbox.sandbox_id = "01JCED8Z9Y6XQVK8M2NRST5WXY"
mock_sandbox_class.return_value = mock_sandbox
mock_sandbox_class.create.return_value = mock_sandbox
mock_template.get_by_name.return_value = MagicMock()
provider = AliyunCodeInterpreterProvider()
provider._initialized = True
@@ -119,12 +121,14 @@ class TestAliyunCodeInterpreterProvider:
assert instance.status == "READY"
assert instance.metadata["language"] == "python"
@patch("agent.sandbox.providers.aliyun_codeinterpreter.CodeInterpreterSandbox")
def test_create_instance_javascript(self, mock_sandbox_class):
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Template")
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Sandbox")
def test_create_instance_javascript(self, mock_sandbox_class, mock_template):
"""Test creating a JavaScript instance."""
mock_sandbox = MagicMock()
mock_sandbox.sandbox_id = "01JCED8Z9Y6XQVK8M2NRST5WXY"
mock_sandbox_class.return_value = mock_sandbox
mock_sandbox_class.create.return_value = mock_sandbox
mock_template.get_by_name.return_value = MagicMock()
provider = AliyunCodeInterpreterProvider()
provider._initialized = True
@@ -141,7 +145,7 @@ class TestAliyunCodeInterpreterProvider:
with pytest.raises(RuntimeError, match="Provider not initialized"):
provider.create_instance("python")
@patch("agent.sandbox.providers.aliyun_codeinterpreter.CodeInterpreterSandbox")
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Sandbox")
def test_execute_code_success(self, mock_sandbox_class):
"""Test successful code execution."""
# Mock sandbox instance
@@ -150,7 +154,7 @@ class TestAliyunCodeInterpreterProvider:
"results": [{"type": "stdout", "text": "Hello, World!"}, {"type": "result", "text": "None"}, {"type": "endOfExecution", "status": "ok"}],
"contextId": "kernel-12345-67890",
}
mock_sandbox_class.return_value = mock_sandbox
mock_sandbox_class.connect.return_value = mock_sandbox
provider = AliyunCodeInterpreterProvider()
provider._initialized = True
@@ -163,14 +167,14 @@ class TestAliyunCodeInterpreterProvider:
assert result.exit_code == 0
assert result.execution_time > 0
@patch("agent.sandbox.providers.aliyun_codeinterpreter.CodeInterpreterSandbox")
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Sandbox")
def test_execute_code_timeout(self, mock_sandbox_class):
"""Test code execution timeout."""
from agentrun.utils.exception import ServerError
mock_sandbox = MagicMock()
mock_sandbox.context.execute.side_effect = ServerError(408, "Request timeout")
mock_sandbox_class.return_value = mock_sandbox
mock_sandbox_class.connect.return_value = mock_sandbox
provider = AliyunCodeInterpreterProvider()
provider._initialized = True
@@ -179,14 +183,14 @@ class TestAliyunCodeInterpreterProvider:
with pytest.raises(TimeoutError, match="Execution timed out"):
provider.execute_code(instance_id="01JCED8Z9Y6XQVK8M2NRST5WXY", code="while True: pass", language="python", timeout=5)
@patch("agent.sandbox.providers.aliyun_codeinterpreter.CodeInterpreterSandbox")
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Sandbox")
def test_execute_code_with_error(self, mock_sandbox_class):
"""Test code execution with error."""
mock_sandbox = MagicMock()
mock_sandbox.context.execute.return_value = {
"results": [{"type": "stderr", "text": "Traceback..."}, {"type": "error", "text": "NameError: name 'x' is not defined"}, {"type": "endOfExecution", "status": "error"}]
}
mock_sandbox_class.return_value = mock_sandbox
mock_sandbox_class.connect.return_value = mock_sandbox
provider = AliyunCodeInterpreterProvider()
provider._initialized = True
@@ -197,6 +201,34 @@ class TestAliyunCodeInterpreterProvider:
assert result.exit_code != 0
assert len(result.stderr) > 0
@patch("agent.sandbox.providers.aliyun_codeinterpreter.Sandbox")
def test_execute_code_uses_structured_result_marker_for_async_javascript(self, mock_sandbox_class):
"""Test JavaScript wrapper uses the structured result marker and awaits async main."""
mock_sandbox = MagicMock()
mock_sandbox.context.execute.return_value = {
"results": [{"type": "stdout", "text": "__RAGFLOW_RESULT__:eyJwcmVzZW50Ijp0cnVlLCJ2YWx1ZSI6eyJhIjoiYiJ9LCJ0eXBlIjoianNvbiJ9"}],
"contextId": "kernel-12345-67890",
}
mock_sandbox_class.connect.return_value = mock_sandbox
provider = AliyunCodeInterpreterProvider()
provider._initialized = True
provider._config = MagicMock()
result = provider.execute_code(
instance_id="01JCED8Z9Y6XQVK8M2NRST5WXY",
code="async function main(args) { return { a: 'b' }; }",
language="javascript",
timeout=10,
)
wrapped_code = mock_sandbox.context.execute.call_args.kwargs["code"]
assert "__RAGFLOW_RESULT__:" in wrapped_code
assert "await Promise.resolve(main(" in wrapped_code
assert result.metadata["result_present"] is True
assert result.metadata["result_value"] == {"a": "b"}
assert result.metadata["result_type"] == "json"
def test_get_supported_languages(self):
"""Test getting supported languages."""
provider = AliyunCodeInterpreterProvider()

View File

@@ -22,7 +22,7 @@ To run these tests, set the following environment variables:
export AGENTRUN_ACCESS_KEY_ID="LTAI5t..."
export AGENTRUN_ACCESS_KEY_SECRET="..."
export AGENTRUN_ACCOUNT_ID="1234567890..." # Aliyun primary account ID (主账号ID)
export AGENTRUN_ACCOUNT_ID="1234567890..." # Aliyun primary account ID
export AGENTRUN_REGION="cn-hangzhou" # Note: AGENTRUN_REGION (SDK will read this)
Then run:

View File

@@ -254,6 +254,41 @@ class TestSelfManagedProvider:
assert result.metadata["status"] == "success"
assert result.metadata["instance_id"] == "test-123"
@patch('requests.post')
def test_execute_code_maps_structured_result_into_metadata(self, mock_post):
"""Test successful code execution with structured result envelope."""
mock_response = Mock()
mock_response.status_code = 200
mock_response.json.return_value = {
"status": "success",
"stdout": "debug line\n",
"stderr": "",
"exit_code": 0,
"time_used_ms": 100.0,
"memory_used_kb": 1024.0,
"result": {
"present": True,
"value": {"items": ["a", "b"]},
"type": "json",
},
}
mock_post.return_value = mock_response
provider = SelfManagedProvider()
provider._initialized = True
result = provider.execute_code(
instance_id="test-123",
code="def main(): return {'items': ['a', 'b']}",
language="python",
timeout=10
)
assert result.stdout == "debug line\n"
assert result.metadata["result_present"] is True
assert result.metadata["result_value"] == {"items": ["a", "b"]}
assert result.metadata["result_type"] == "json"
@patch('requests.post')
def test_execute_code_timeout(self, mock_post):
"""Test code execution timeout."""

View File

@@ -0,0 +1,55 @@
#
# Copyright 2026 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import sys
from pathlib import Path
EXECUTOR_MANAGER_ROOT = Path(__file__).resolve().parents[1] / "executor_manager"
if str(EXECUTOR_MANAGER_ROOT) not in sys.path:
sys.path.insert(0, str(EXECUTOR_MANAGER_ROOT))
from models.enums import SupportLanguage # noqa: E402
from services.security import analyze_code_security # noqa: E402
def test_javascript_child_process_is_rejected():
is_safe, issues = analyze_code_security(
"const cp = require('child_process'); async function main() { return 'ok'; }",
SupportLanguage.NODEJS,
)
assert is_safe is False
assert any("child_process" in issue for issue, _ in issues)
def test_javascript_eval_is_rejected():
is_safe, issues = analyze_code_security(
"async function main() { return eval('1+1'); }",
SupportLanguage.NODEJS,
)
assert is_safe is False
assert any("eval" in issue.lower() for issue, _ in issues)
def test_javascript_safe_code_still_passes():
is_safe, issues = analyze_code_security(
"async function main(args) { return { answer: args.value ?? null }; }",
SupportLanguage.NODEJS,
)
assert is_safe is True
assert issues == []

10
agent/sandbox/uv.lock generated
View File

@@ -1,6 +1,6 @@
version = 1
revision = 3
requires-python = ">=3.10"
requires-python = ">=3.12, <3.15"
[[package]]
name = "annotated-doc"
@@ -161,7 +161,7 @@ requires-dist = [
{ name = "fastapi", specifier = ">=0.115.12" },
{ name = "httpx", specifier = ">=0.28.1" },
{ name = "pydantic", specifier = ">=2.11.4" },
{ name = "requests", specifier = ">=2.32.3" },
{ name = "requests", specifier = ">=2.32.4" },
{ name = "slowapi", specifier = ">=0.1.9" },
{ name = "uvicorn", specifier = ">=0.34.2" },
]
@@ -313,7 +313,7 @@ wheels = [
[[package]]
name = "requests"
version = "2.32.3"
version = "2.32.5"
source = { registry = "https://pypi.tuna.tsinghua.edu.cn/simple" }
dependencies = [
{ name = "certifi" },
@@ -321,9 +321,9 @@ dependencies = [
{ name = "idna" },
{ name = "urllib3" },
]
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/63/70/2bf7780ad2d390a8d301ad0b550f1581eadbd9a20f896afe06353c2a2913/requests-2.32.3.tar.gz", hash = "sha256:55365417734eb18255590a9ff9eb97e9e1da868d4ccd6402399eaf68af20a760", size = 131218, upload-time = "2024-05-29T15:37:49.536Z" }
sdist = { url = "https://pypi.tuna.tsinghua.edu.cn/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
wheels = [
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/f9/9b/335f9764261e915ed497fcdeb11df5dfd6f7bf257d4a6a2a686d80da4d54/requests-2.32.3-py3-none-any.whl", hash = "sha256:70761cfe03c773ceb22aa2f671b4757976145175cdfca038c02654d061d6dcc6", size = 64928, upload-time = "2024-05-29T15:37:47.027Z" },
{ url = "https://pypi.tuna.tsinghua.edu.cn/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
]
[[package]]

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -94,7 +94,7 @@
"type": "integer"
},
"item": {
"type": "unkown"
"type": "unknown"
}
}
}
@@ -252,7 +252,7 @@
"type": "integer"
},
"item": {
"type": "unkown"
"type": "unknown"
}
}
},

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,333 +0,0 @@
{
"id": 21,
"title": {
"en": "Report Agent Using Knowledge Base",
"de": "Berichtsagent mit Wissensdatenbank",
"zh": "知识库检索智能体"},
"description": {
"en": "A report generation assistant using local knowledge base, with advanced capabilities in task planning, reasoning, and reflective analysis. Recommended for academic research paper Q&A",
"de": "Ein Berichtsgenerierungsassistent, der eine lokale Wissensdatenbank nutzt, mit erweiterten Fähigkeiten in Aufgabenplanung, Schlussfolgerung und reflektierender Analyse. Empfohlen für akademische Forschungspapier-Fragen und -Antworten.",
"zh": "一个使用本地知识库的报告生成助手,具备高级能力,包括任务规划、推理和反思性分析。推荐用于学术研究论文问答。"},
"canvas_type": "Recommended",
"dsl": {
"components": {
"Agent:NewPumasLick": {
"downstream": [
"Message:OrangeYearsShine"
],
"obj": {
"component_name": "Agent",
"params": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "qwen3-235b-a22b-instruct-2507@Tongyi-Qianwen",
"maxTokensEnabled": true,
"max_retries": 3,
"max_rounds": 3,
"max_tokens": 128000,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "# User Query\n {sys.query}",
"role": "user"
}
],
"sys_prompt": "## Role & Task\nYou are a **\u201cKnowledge Base Retrieval Q\\&A Agent\u201d** whose goal is to break down the user\u2019s question into retrievable subtasks, and then produce a multi-source-verified, structured, and actionable research report using the internal knowledge base.\n## Execution Framework (Detailed Steps & Key Points)\n1. **Assessment & Decomposition**\n * Actions:\n * Automatically extract: main topic, subtopics, entities (people/organizations/products/technologies), time window, geographic/business scope.\n * Output as a list: N facts/data points that must be collected (*N* ranges from 5\u201320 depending on question complexity).\n2. **Query Type Determination (Rule-Based)**\n * Example rules:\n * If the question involves a single issue but requests \u201cmethod comparison/multiple explanations\u201d \u2192 use **depth-first**.\n * If the question can naturally be split into \u22653 independent sub-questions \u2192 use **breadth-first**.\n * If the question can be answered by a single fact/specification/definition \u2192 use **simple query**.\n3. **Research Plan Formulation**\n * Depth-first: define 3\u20135 perspectives (methodology/stakeholders/time dimension/technical route, etc.), assign search keywords, target document types, and output format for each perspective.\n * Breadth-first: list subtasks, prioritize them, and assign search terms.\n * Simple query: directly provide the search sentence and required fields.\n4. **Retrieval Execution**\n * After retrieval: perform coverage check (does it contain the key facts?) and quality check (source diversity, authority, latest update time).\n * If standards are not met, automatically loop: rewrite queries (synonyms/cross-domain terms) and retry \u22643 times, or flag as requiring external search.\n5. **Integration & Reasoning**\n * Build the answer using a **fact\u2013evidence\u2013reasoning** chain. For each conclusion, attach 1\u20132 strongest pieces of evidence.\n---\n## Quality Gate Checklist (Verify at Each Stage)\n* **Stage 1 (Decomposition)**:\n * [ ] Key concepts and expected outputs identified\n * [ ] Required facts/data points listed\n* **Stage 2 (Retrieval)**:\n * [ ] Meets quality standards (see above)\n * [ ] If not met: execute query iteration\n* **Stage 3 (Generation)**:\n * [ ] Each conclusion has at least one direct evidence source\n * [ ] State assumptions/uncertainties\n * [ ] Provide next-step suggestions or experiment/retrieval plans\n * [ ] Final length and depth match user expectations (comply with word count/format if specified)\n---\n## Core Principles\n1. **Strict reliance on the knowledge base**: answers must be **fully bounded** by the content retrieved from the knowledge base.\n2. **No fabrication**: do not generate, infer, or create information that is not explicitly present in the knowledge base.\n3. **Accuracy first**: prefer incompleteness over inaccurate content.\n4. **Output format**:\n * Hierarchically clear modular structure\n * Logical grouping according to the MECE principle\n * Professionally presented formatting\n * Step-by-step cognitive guidance\n * Reasonable use of headings and dividers for clarity\n * *Italicize* key parameters\n * **Bold** critical information\n5. **LaTeX formula requirements**:\n * Inline formulas: start and end with `$`\n * Block formulas: start and end with `$$`, each `$$` on its own line\n * Block formula content must comply with LaTeX math syntax\n * Verify formula correctness\n---\n## Additional Notes (Interaction & Failure Strategy)\n* If the knowledge base does not cover critical facts: explicitly inform the user (with sample wording)\n* For time-sensitive issues: enforce time filtering in the search request, and indicate the latest retrieval date in the answer.\n* Language requirement: answer in the user\u2019s preferred language\n",
"temperature": "0.1",
"temperatureEnabled": true,
"tools": [
{
"component_name": "Retrieval",
"name": "Retrieval",
"params": {
"cross_languages": [],
"description": "",
"empty_response": "",
"kb_ids": [],
"keywords_similarity_weight": 0.7,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
}
},
"rerank_id": "",
"similarity_threshold": 0.2,
"top_k": 1024,
"top_n": 8,
"use_kg": false
}
}
],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"begin"
]
},
"Message:OrangeYearsShine": {
"downstream": [],
"obj": {
"component_name": "Message",
"params": {
"content": [
"{Agent:NewPumasLick@content}"
]
}
},
"upstream": [
"Agent:NewPumasLick"
]
},
"begin": {
"downstream": [
"Agent:NewPumasLick"
],
"obj": {
"component_name": "Begin",
"params": {
"enablePrologue": true,
"inputs": {},
"mode": "conversational",
"prologue": "\u4f60\u597d\uff01 \u6211\u662f\u4f60\u7684\u52a9\u7406\uff0c\u6709\u4ec0\u4e48\u53ef\u4ee5\u5e2e\u5230\u4f60\u7684\u5417\uff1f"
}
},
"upstream": []
}
},
"globals": {
"sys.conversation_turns": 0,
"sys.files": [],
"sys.query": "",
"sys.user_id": ""
},
"graph": {
"edges": [
{
"data": {
"isHovered": false
},
"id": "xy-edge__beginstart-Agent:NewPumasLickend",
"source": "begin",
"sourceHandle": "start",
"target": "Agent:NewPumasLick",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:NewPumasLickstart-Message:OrangeYearsShineend",
"markerEnd": "logo",
"source": "Agent:NewPumasLick",
"sourceHandle": "start",
"style": {
"stroke": "rgba(91, 93, 106, 1)",
"strokeWidth": 1
},
"target": "Message:OrangeYearsShine",
"targetHandle": "end",
"type": "buttonEdge",
"zIndex": 1001
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:NewPumasLicktool-Tool:AllBirdsNailend",
"selected": false,
"source": "Agent:NewPumasLick",
"sourceHandle": "tool",
"target": "Tool:AllBirdsNail",
"targetHandle": "end"
}
],
"nodes": [
{
"data": {
"form": {
"enablePrologue": true,
"inputs": {},
"mode": "conversational",
"prologue": "\u4f60\u597d\uff01 \u6211\u662f\u4f60\u7684\u52a9\u7406\uff0c\u6709\u4ec0\u4e48\u53ef\u4ee5\u5e2e\u5230\u4f60\u7684\u5417\uff1f"
},
"label": "Begin",
"name": "begin"
},
"dragging": false,
"id": "begin",
"measured": {
"height": 48,
"width": 200
},
"position": {
"x": -9.569875358221438,
"y": 205.84018385864917
},
"selected": false,
"sourcePosition": "left",
"targetPosition": "right",
"type": "beginNode"
},
{
"data": {
"form": {
"content": [
"{Agent:NewPumasLick@content}"
]
},
"label": "Message",
"name": "Response"
},
"dragging": false,
"id": "Message:OrangeYearsShine",
"measured": {
"height": 56,
"width": 200
},
"position": {
"x": 734.4061285881053,
"y": 199.9706031723009
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "messageNode"
},
{
"data": {
"form": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "qwen3-235b-a22b-instruct-2507@Tongyi-Qianwen",
"maxTokensEnabled": true,
"max_retries": 3,
"max_rounds": 3,
"max_tokens": 128000,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "# User Query\n {sys.query}",
"role": "user"
}
],
"sys_prompt": "## Role & Task\nYou are a **\u201cKnowledge Base Retrieval Q\\&A Agent\u201d** whose goal is to break down the user\u2019s question into retrievable subtasks, and then produce a multi-source-verified, structured, and actionable research report using the internal knowledge base.\n## Execution Framework (Detailed Steps & Key Points)\n1. **Assessment & Decomposition**\n * Actions:\n * Automatically extract: main topic, subtopics, entities (people/organizations/products/technologies), time window, geographic/business scope.\n * Output as a list: N facts/data points that must be collected (*N* ranges from 5\u201320 depending on question complexity).\n2. **Query Type Determination (Rule-Based)**\n * Example rules:\n * If the question involves a single issue but requests \u201cmethod comparison/multiple explanations\u201d \u2192 use **depth-first**.\n * If the question can naturally be split into \u22653 independent sub-questions \u2192 use **breadth-first**.\n * If the question can be answered by a single fact/specification/definition \u2192 use **simple query**.\n3. **Research Plan Formulation**\n * Depth-first: define 3\u20135 perspectives (methodology/stakeholders/time dimension/technical route, etc.), assign search keywords, target document types, and output format for each perspective.\n * Breadth-first: list subtasks, prioritize them, and assign search terms.\n * Simple query: directly provide the search sentence and required fields.\n4. **Retrieval Execution**\n * After retrieval: perform coverage check (does it contain the key facts?) and quality check (source diversity, authority, latest update time).\n * If standards are not met, automatically loop: rewrite queries (synonyms/cross-domain terms) and retry \u22643 times, or flag as requiring external search.\n5. **Integration & Reasoning**\n * Build the answer using a **fact\u2013evidence\u2013reasoning** chain. For each conclusion, attach 1\u20132 strongest pieces of evidence.\n---\n## Quality Gate Checklist (Verify at Each Stage)\n* **Stage 1 (Decomposition)**:\n * [ ] Key concepts and expected outputs identified\n * [ ] Required facts/data points listed\n* **Stage 2 (Retrieval)**:\n * [ ] Meets quality standards (see above)\n * [ ] If not met: execute query iteration\n* **Stage 3 (Generation)**:\n * [ ] Each conclusion has at least one direct evidence source\n * [ ] State assumptions/uncertainties\n * [ ] Provide next-step suggestions or experiment/retrieval plans\n * [ ] Final length and depth match user expectations (comply with word count/format if specified)\n---\n## Core Principles\n1. **Strict reliance on the knowledge base**: answers must be **fully bounded** by the content retrieved from the knowledge base.\n2. **No fabrication**: do not generate, infer, or create information that is not explicitly present in the knowledge base.\n3. **Accuracy first**: prefer incompleteness over inaccurate content.\n4. **Output format**:\n * Hierarchically clear modular structure\n * Logical grouping according to the MECE principle\n * Professionally presented formatting\n * Step-by-step cognitive guidance\n * Reasonable use of headings and dividers for clarity\n * *Italicize* key parameters\n * **Bold** critical information\n5. **LaTeX formula requirements**:\n * Inline formulas: start and end with `$`\n * Block formulas: start and end with `$$`, each `$$` on its own line\n * Block formula content must comply with LaTeX math syntax\n * Verify formula correctness\n---\n## Additional Notes (Interaction & Failure Strategy)\n* If the knowledge base does not cover critical facts: explicitly inform the user (with sample wording)\n* For time-sensitive issues: enforce time filtering in the search request, and indicate the latest retrieval date in the answer.\n* Language requirement: answer in the user\u2019s preferred language\n",
"temperature": "0.1",
"temperatureEnabled": true,
"tools": [
{
"component_name": "Retrieval",
"name": "Retrieval",
"params": {
"cross_languages": [],
"description": "",
"empty_response": "",
"kb_ids": [],
"keywords_similarity_weight": 0.7,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
}
},
"rerank_id": "",
"similarity_threshold": 0.2,
"top_k": 1024,
"top_n": 8,
"use_kg": false
}
}
],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Knowledge Base Agent"
},
"dragging": false,
"id": "Agent:NewPumasLick",
"measured": {
"height": 84,
"width": 200
},
"position": {
"x": 347.00048227952215,
"y": 186.49109364794631
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"description": "This is an agent for a specific task.",
"user_prompt": "This is the order you need to send to the agent."
},
"label": "Tool",
"name": "flow.tool_10"
},
"dragging": false,
"id": "Tool:AllBirdsNail",
"measured": {
"height": 48,
"width": 200
},
"position": {
"x": 220.24819746977118,
"y": 403.31576836482583
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "toolNode"
}
]
},
"history": [],
"memory": [],
"messages": [],
"path": [],
"retrieval": []
},
"avatar": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAH0klEQVR4nO2ZC1BU1wGG/3uRp/IygG+DGK0GOjE1U6cxI4tT03Y0E+kENbaJbKpj60wzgNMwnTjuEtu0miGasY+0krI202kMVEnVxtoOLG00oVa0LajVBDcSEI0REFBgkZv/3GWXfdzdvctuHs7kmzmec9//d+45914XCXc4Xwjk1+59VJGGF7C5QAFSWBvgyWmWLl7IKiny6QNL173B5YjB84bOyrpKA4B1DLySdQpLKAiZGtZ7a/KMVoQJz6UfEZyhTWwaEBmssiLvCueu6BJg8EwFqGTTAC+uvNWC9w82sRWcux/JwaSHstjywcogRt4RG0KExwWG4QsVYCebKSwe3L5lR9OOWjyzfg2WL/0a1/jncO3b2FHxGnKeWYqo+Giu8UEMrWJKWBACPMY/DG+63txhvnKshUu+DF2/hayMDFRsL+VScDb++AVc6OjAuInxXPJl2tfnIikrzUyJMi7qQmLRhOEr2fOFbX/7P6STF7BqoWevfdij4NWGQfx+57OYO2sG1wSnsek8Nm15EU8sikF6ouelXz9ph7JwDqYt+5IIZaGEkauDIrH4wPBmhjexCSEws+VdVG1M4NIoj+2xYzBuJtavWcEl/VS8dggx/ZdQvcGzQwp+cxOXsu5RBQQMVkYJM4LA/Txh+ELFMWFVPARS5kFiabZdx8Olh7l17BzdvhzZmROhdJ3j6D/nIyBgOCMlLAgA9xmF4TMV4BSbrgnrLiBl5rOsRCRRbDUsBzQFiJjY91PCBj9w+yiP1lXWsTLAjc9YQGB9I8+Yx1oTiUWFvW9QgDo2PdASaDp/EQ8/sRnhcPTVcuTMncXwQQVESL9DidscaPW+QEtAICRu9PSxFTpJiePV8AI9AsTvXZBY/Pa+wJ9ApNApIILm8S5Y4QXXQwhYFH6csemDP4G3G5v579i5d04mknknQhDYS4HCrCVr/mC3D305KnbCEpvVIia5Onw6WaWw+KAl0Np+FUXbdiMcyoqfUoeRHoFrJ1uRtnBG1/9Mf/3LtElp+VwF2wcd7woJib1vUPwMH4GWQCQJJtBa/V9cPmFD8uQUpMdNGDhY8bNYrobh8acHu270/l0ImJWRt64Wn6WACN9z5gq2lXwPW8pfweT0icP/fH23vO9QLYq3/QKyLBmFQI3CUcT9NdESEEPItKsSN3r7MBaSJoxHWZERM6ZmMLy2gDP8/pd/og418dTL37hFSUpMUC5f+UiWZcnY9s5+ixCwUiCXx2iiJdDNx6f4pgkH8Q3lbxK7h8+enoHha1cRNdMp8axiHxo6+/5bVdk8DSROYIW1X7QEIom3wHD3gEf4vu1bVYEJZeWQ0zJQvmcfyiv2QZak6raG/QWfK4Ez9mTc5v8xPMJfuojoxXmIX/9DOMe+FCWbcHu4BJJ0YEwCx0824bFNW9HesB+CqYu+jepfPYcHF+aoPXS8sQl/+vU2bgmOU2C+qRc9/YrrPPbGBtzavd0nvCxLxui4pJrBm911PFwak4CYA80cj+JCAiGUzYkmxrSY4N2c3GLi6UEIFL/wRxxqkhmHnTEpDQcrfq6ea+hcE8bNy3GFzyq4H22HW1Kd4WMSkg1jmsSRpKj0Rzhy4gNUv/y8Gjrv8SJK3OWScA+fMn/ysVPPvTmeh6nh1TcxBUJ+jEaKYr7N36x7h+Edj0pB6+WrLokn87+BrTt/p4ZPzZ6MM7/8R2//h33vOcNzdwgBMwVMbGvySQmo4a0NqOZccU7YmGXLEfPQUlUid/XT6B8YdIU/99vjsPcOdEhDsfOd4QVCwKB8yp8SWuG1njbTl83DpMWz1PCKAswuWPDI0e8WebyAJBbxNdrF7cls+hBpAb3h3XtehL/3+4u7D35rQwpP4YFTwMJ91rHpQyQFQgmf9sAMNL9Ur4afv/FBjIuPVj+n4YVTwMD96tj0IVICoYYXv/q1VJ1Sl8UveQyaRwErvOB6B5SwKhqP00gI6A0vhsycJ7/KIzxhyHqGN0ADbnNAAYOicRfCFdAb/p50Gbfuc/wy5w1D5lOghk0fuG0USlgVr7sQjoDe8C8WxKGKPy2KjzlvAQb02/sCbh+FApngX1QUtyeSuwDi0hxFByV7L+LIf3r5kvpp4PBr07Hqvn71Y85bgOG6WS2ggA1+4D6eUKKQApVsqngI6KSkqh9HzsoM/3zg8Oz5VQ9E8wjf30YFDGdkeAsCwH18oYRZGXk7C4HuYxcwe6rjQsFovzaEvoFxqNkTOPzMjGikJso8wsF77XYkLx6dAwxWxvBmBIH7aUMJi8J3w0DnTVz7dyvX6KPzVBt+kL8cmzesRq9ps2Z48bRJmOIapS7E4zM2lXNt5CcU6ID7+ocSZkqY2NRN6ysnsHbJEpR8ZwV6t5Yg+iuLELf2KVd48VwXQf3BQGUMb4ZOuH9gKFEIYJfiNrEDcXZHHV4q3YRv5i7ikgM94RlETNgihrcgBHhccCiRCf7VhBK5rAPyr9I/Y/WKPEyfksH/9NjQ2dODhsYzwcLXsypkeBtCRGLRDUUMAMyKHxEx4dtrzyP97nQMygripiQiKi4aSbPvQmKW7+OXF69ntYvBa1iPCYklZEZECsGm4ja0Ops7EJsaj4SprlU+8IJiqIjAFga3Ikx4vvAYkTGALxyWFArlsnbBC9Sz6mI5zWKNRGh3JJY7mjte4GOz+r4tkRbxQQAAAABJRU5ErkJggg=="
}

View File

@@ -1,14 +1,15 @@
{
"id": 12,
"title": {
"en": "Generate SEO Blog",
"de": "SEO Blog generieren",
"zh": "生成SEO博客"},
"en": "SEO article writer",
"de": "SEO-Blog-Magnetiseur",
"zh": "SEO 博客写手"},
"description": {
"en": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don't need any writing experience. Just provide a topic or short request — the system will handle the rest.",
"de": "Dieser Workflow generiert automatisch einen vollständigen SEO-optimierten Blogartikel basierend auf einer einfachen Benutzereingabe. Sie benötigen keine Schreiberfahrung. Geben Sie einfach ein Thema oder eine kurze Anfrage ein das System übernimmt den Rest.",
"zh": "此工作流根据简单的用户输入自动生成完整的SEO博客文章。你无需任何写作经验只需提供一个主题或简短请求系统将处理其余部分。"},
"en": "This SEO article writer automatically generates a complete SEO-optimized blog article based on a simple user input. You don't need any writing experience. Just provide a topic or short request — the system will handle the rest.",
"de": "SEO-Blog-Magnetiseur automatisch einen vollständigen SEO-optimierten Blogartikel basierend auf einer einfachen Benutzereingabe. Sie benötigen keine Schreiberfahrung. Geben Sie einfach ein Thema oder eine kurze Anfrage ein das System übernimmt den Rest.",
"zh": "此 SEO 博客写手根据简单的用户输入自动生成完整的SEO博客文章。你无需任何写作经验只需提供一个主题或简短请求系统将处理其余部分。"},
"canvas_type": "Marketing",
"canvas_types": ["Marketing", "Recommended"],
"dsl": {
"components": {
"Agent:BetterSitesSend": {
@@ -918,4 +919,4 @@
"retrieval": []
},
"avatar": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gHYSUNDX1BST0ZJTEUAAQEAAAHIAAAAAAQwAABtbnRyUkdCIFhZWiAH4AABAAEAAAAAAABhY3NwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA9tYAAQAAAADTLQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlkZXNjAAAA8AAAACRyWFlaAAABFAAAABRnWFlaAAABKAAAABRiWFlaAAABPAAAABR3dHB0AAABUAAAABRyVFJDAAABZAAAAChnVFJDAAABZAAAAChiVFJDAAABZAAAAChjcHJ0AAABjAAAADxtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJYWVogAAAAAAAAb6IAADj1AAADkFhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAAwADADASIAAhEBAxEB/8QAGQAAAwEBAQAAAAAAAAAAAAAABgkKBwUI/8QAMBAAAAYCAQIEBQQCAwAAAAAAAQIDBAUGBxEhCAkAEjFBFFFhcaETFiKRFyOx8PH/xAAaAQACAwEBAAAAAAAAAAAAAAACAwABBgQF/8QALBEAAgIBAgUCBAcAAAAAAAAAAQIDBBEFEgATITFRIkEGIzJhFBUWgaGx8P/aAAwDAQACEQMRAD8AfF2hez9089t7pvxgQMa1Gb6qZ6oQE9m/NEvCIStyPfJSOF/M1epzMugo/qtMqbiRc1mJjoJKCLMNIxKcsLJedfO1Ct9cI63x9fx6CA/19t+oh4LFA5HfuAgP/A8eOIsnsTBrkBHXA7+v53+Q+ficTgJft9gIgA+/P9/1r342O/YA8A8k3/if+IbAN7+2/f8AAiI6H19PGoPyESTMZQPKUAHkQEN+3r9dh78/YPGUTk2wb/qAZZIugH1OHH5DjkdfbnWw2DsOxPj+xjrnx2H39unBopJGBn9s+PHv1HXjPJtH+J+B40O9a16h/wB/92j/ALrPa/wR104UyAobHlXhuo2HrEtK4qy3CwjKOuJLRHJLSkXWrFKs/gVrJVrE8TUiH8bPrP20UEu8m4hNpMJJuTOfnbUw/kUqyZgMHGjAO9+mtDsQ53sdcB6eMhnpEjhNQxRKICAgHy5+/roOdjr7c+J6O4x07dx484/n7nzw1gexBGfIPkZ/3t39uGpqc6+fP5/Ht8vGFZCzJjWpWuBxvO2yPjrtclUUK7BqmUI4fuASeyhG5FzFI0Bw4aQ0iZNoDgzvRW4qtyFkI4XmwyEk2YNnDp0sVBu3IUyy5iqH8gqKERSIRNIii67hddRJs1at01Xbx2sgzZoLu10UFJR+4V1A5cxF3FqNcLvjwcno43uuLrOxZYjujaClcb4QQfxEizpFiQyM9olcueRnjC2ZMt9iY06zL0qytrMSqSOVGsfHMaGhZ3l4lSRI2MqE74zJvRTveNFWWIh3RWw+XCAM5icKQLrCH57T17FhErSlRXnWvyZXKQwWJ3eraD14p5YuZCFgacskK2oGkVuKO5GYTHzf7DaD12cBD3DgPOIDrWw9PnrXPgDkpVsUDGMG+DD6E9gHXIjrYjwUPQTCXYgHPhIV974+F6E1hpC14Yzmzj56YaQEeZhXsayD1zLPW7pygxaMf81Nzu1iJsnIuDIKnaJAkPldqrHaoORZ73tMVEbFdSXT9nVgRQgnBq6j8e/HCIEATpAnH5KlmRVkFRFJwks/bqImSXJ5VFyA3N6Ikh3bCW3YHp5cowOmCfTgA+xJCnrjtwHKcLvJj2ZGcTRFj19kEhckdzgEjKnABGSSzdc1Fe5byXXGNjKdvRcw5NxvLidNZFFCxUa62KrzMaChw8hhYScFJtROAgmuLByq1MsgkZYPaVVuDe0wraRaqAdJwgRQo+YR8xTlAQNx6b49w41vXiJpCalLh1jZhyrTqRM4+jstdRmYryNkydLQRWg1LNGcWd5jIFFvCythlIySa0mNu74sKRQtaWsTmupqPItw0lE52ufpyYzrSkx6cw5bLmBEpkTsz+dt8P5QFuCRtAIkBH9MuwKHICIaDQhnojMs9mKaeGcrMxXlQtAYkdVljimRrE5MqI4zL8oSqQ6wxjodBqK05qdK3Vo3aCSVkBW7bjuC1NFJJBPaqyx6fp6pWkliYLXK2XrukkRu2CCVoSWMgsdMyySKwoLFcIGWSTUMg4IBgTcICoBhRcplMcpFkhIqQp1ClMBTmA0Zfe1zpjvHfXff65bZlzXpB3jjGTgiirmPjAfs16PHqHeQ75Wbj3xxZpOEkV3LRJJSPdomUBZISJLncV2k+8D07dxXp7xsYuTapA9UkJUYWIzNhadnWEZeCXGLQQiJi1ViHfhHL2unWh+mlORsrW0JFpEFnGVfm1mU4kq0FY3eD6corJncv6dr5NLSMNXVaTUksjTiMnaq8uFfSVuDyiJ1iZpy0LOJtpa3YfkcQ5fdozyxI2m5qqcrHN61YYmHsh6v3o9ParYmYJEtlhIx6+gUbjgD23M6oqg92YL0JyF6Bps+qDValVA9h9Lj5SZI3SHXdEQlj1wiQtLLIe6pGzjO3BlBkK1hxpblLVH5wdW0BcFKf/JwRtjsot2z8omaSdxbzzk1iEjsE0AM9rrRZNRIrVyo7dGO6E+oh8axLlJ5H5VaJKx7ePRGFbW6vUeFfHQIWPTI9Tm7HHfuhqY7E6C7JFqUzM6iZXIoncNxX7+bIVdJnTT48x3OQU1krIDW3UeixVhyISzYz6cadY5Xph6TseRNTRsTElzzBn9Vlly0TAERsdgnMYyLROjyFbg5R4ZlsGaMT4yNi2Zlq1GwjZB3jq0PsaJfA3t0jL0W0Y9xf1V41lpWckXMLaZiwxuKYPqc6LlHdkeRF+Qxswx5ASDqBVrsL+2A/N6SiCbYymV2BywJiMZj3GRRMTnL+lVyHCll3R7Szv0vqXMtQ74T+HijljIScLaEpkKCB3rqMBIi0jPs5JeOKTZMZEi5VVnouzy0k3jXjWSMlY6UcVGDxlKMVDqx91SILWSi3D2KdgYy3kP8E9X/AE1SnRXBNdNRMlefT6g7aY6giK+cPLGNg0bY68rcnpsNh9PqIBve/EcPQ3WIq2dR93xpSgk5SAZ9R6MLAOZFUkpLSUDXp6/KPpGUkmTdswlnKnwbl5ITMdGwcXJi7LKsqzUmT5tWYmkXuF9wjBvb76b7dHheazJ9RElUJOCxViuMlUJC0Gtz6PKyjLBY4qMWUe12r1xZ6lOyT6XPEBKN2CkTDOlZd02TBdTMt7Upx2knrkdCv1UKjDKn1A7XBYH6SCOOrWn5Oi/DtRiu+GleRthDL8rXdVjZlcfWrSIxVlGGGCOnH//Z"
}
}

View File

@@ -1,13 +1,13 @@
{
"id": 13,
"title": {
"en": "ImageLingo",
"de": "ImageLingo",
"zh": "图片解析"},
"en": "Photo text translator",
"de": "Bild-Dolmetscher",
"zh": "图片文字快译"},
"description": {
"en": "ImageLingo lets you snap any photo containing text—menus, signs, or documents—and instantly recognize and translate it into your language of choice using advanced AI-powered translation technology.",
"de": "ImageLingo ermöglicht es Ihnen, jedes Foto mit Text Menüs, Schilder oder Dokumente zu fotografieren und es sofort in Ihre gewünschte Sprache zu erkennen und zu übersetzen, unter Verwendung fortschrittlicher KI-gestützter Übersetzungstechnologie.",
"zh": "多模态大模型允许您拍摄任何包含文本的照片——菜单、标志或文档——立即识别并转换成您选择的语言。"},
"en": "Photo text translator lets you snap any photo containing text—menus, signs, or documents—and instantly recognize and translate it into your language of choice using advanced AI-powered translation technology.",
"de": "Bild-Dolmetscher ermöglicht es Ihnen, jedes Foto mit Text Menüs, Schilder oder Dokumente zu fotografieren und es sofort in Ihre gewünschte Sprache zu erkennen und zu übersetzen, unter Verwendung fortschrittlicher KI-gestützter Übersetzungstechnologie.",
"zh": "图片文字快译允许您拍摄任何包含文本的照片——菜单、标志或文档——立即识别并转换成您选择的语言。"},
"canvas_type": "Consumer App",
"dsl": {
"components": {

View File

@@ -1,14 +1,15 @@
{
"id": 20,
"title": {
"en": "Report Agent Using Knowledge Base",
"de": "Berichtsagent mit Wissensdatenbank",
"zh": "知识库检索智能体"},
"en": "Reflective academic paper generator",
"de": "Schreibhilfe für Reflexionspapiere",
"zh": "学术论文生成助手"},
"description": {
"en": "A report generation assistant using local knowledge base, with advanced capabilities in task planning, reasoning, and reflective analysis. Recommended for academic research paper Q&A",
"en": "A reflective academic paper generator using local knowledge base, with advanced capabilities in task planning, reasoning, and reflective analysis. Recommended for academic research paper Q&A",
"de": "Ein Berichtsgenerierungsassistent, der eine lokale Wissensdatenbank nutzt, mit erweiterten Fähigkeiten in Aufgabenplanung, Schlussfolgerung und reflektierender Analyse. Empfohlen für akademische Forschungspapier-Fragen und -Antworten.",
"zh": "一个使用本地知识库的报告生成助手,具备高级能力,包括任务规划、推理和反思性分析。推荐用于学术研究论文问答。"},
"zh": "一个使用本地知识库的学术论文生成助手,具备高级能力,包括任务规划、推理和反思性分析。推荐用于学术研究论文问答。"},
"canvas_type": "Agent",
"canvas_types": ["Agent", "Recommended"],
"dsl": {
"components": {
"Agent:NewPumasLick": {
@@ -330,4 +331,4 @@
"retrieval": []
},
"avatar": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAH0klEQVR4nO2ZC1BU1wGG/3uRp/IygG+DGK0GOjE1U6cxI4tT03Y0E+kENbaJbKpj60wzgNMwnTjuEtu0miGasY+0krI202kMVEnVxtoOLG00oVa0LajVBDcSEI0REFBgkZv/3GWXfdzdvctuHs7kmzmec9//d+45914XCXc4Xwjk1+59VJGGF7C5QAFSWBvgyWmWLl7IKiny6QNL173B5YjB84bOyrpKA4B1DLySdQpLKAiZGtZ7a/KMVoQJz6UfEZyhTWwaEBmssiLvCueu6BJg8EwFqGTTAC+uvNWC9w82sRWcux/JwaSHstjywcogRt4RG0KExwWG4QsVYCebKSwe3L5lR9OOWjyzfg2WL/0a1/jncO3b2FHxGnKeWYqo+Giu8UEMrWJKWBACPMY/DG+63txhvnKshUu+DF2/hayMDFRsL+VScDb++AVc6OjAuInxXPJl2tfnIikrzUyJMi7qQmLRhOEr2fOFbX/7P6STF7BqoWevfdij4NWGQfx+57OYO2sG1wSnsek8Nm15EU8sikF6ouelXz9ph7JwDqYt+5IIZaGEkauDIrH4wPBmhjexCSEws+VdVG1M4NIoj+2xYzBuJtavWcEl/VS8dggx/ZdQvcGzQwp+cxOXsu5RBQQMVkYJM4LA/Txh+ELFMWFVPARS5kFiabZdx8Olh7l17BzdvhzZmROhdJ3j6D/nIyBgOCMlLAgA9xmF4TMV4BSbrgnrLiBl5rOsRCRRbDUsBzQFiJjY91PCBj9w+yiP1lXWsTLAjc9YQGB9I8+Yx1oTiUWFvW9QgDo2PdASaDp/EQ8/sRnhcPTVcuTMncXwQQVESL9DidscaPW+QEtAICRu9PSxFTpJiePV8AI9AsTvXZBY/Pa+wJ9ApNApIILm8S5Y4QXXQwhYFH6csemDP4G3G5v579i5d04mknknQhDYS4HCrCVr/mC3D305KnbCEpvVIia5Onw6WaWw+KAl0Np+FUXbdiMcyoqfUoeRHoFrJ1uRtnBG1/9Mf/3LtElp+VwF2wcd7woJib1vUPwMH4GWQCQJJtBa/V9cPmFD8uQUpMdNGDhY8bNYrobh8acHu270/l0ImJWRt64Wn6WACN9z5gq2lXwPW8pfweT0icP/fH23vO9QLYq3/QKyLBmFQI3CUcT9NdESEEPItKsSN3r7MBaSJoxHWZERM6ZmMLy2gDP8/pd/og418dTL37hFSUpMUC5f+UiWZcnY9s5+ixCwUiCXx2iiJdDNx6f4pgkH8Q3lbxK7h8+enoHha1cRNdMp8axiHxo6+/5bVdk8DSROYIW1X7QEIom3wHD3gEf4vu1bVYEJZeWQ0zJQvmcfyiv2QZak6raG/QWfK4Ez9mTc5v8xPMJfuojoxXmIX/9DOMe+FCWbcHu4BJJ0YEwCx0824bFNW9HesB+CqYu+jepfPYcHF+aoPXS8sQl/+vU2bgmOU2C+qRc9/YrrPPbGBtzavd0nvCxLxui4pJrBm911PFwak4CYA80cj+JCAiGUzYkmxrSY4N2c3GLi6UEIFL/wRxxqkhmHnTEpDQcrfq6ea+hcE8bNy3GFzyq4H22HW1Kd4WMSkg1jmsSRpKj0Rzhy4gNUv/y8Gjrv8SJK3OWScA+fMn/ysVPPvTmeh6nh1TcxBUJ+jEaKYr7N36x7h+Edj0pB6+WrLokn87+BrTt/p4ZPzZ6MM7/8R2//h33vOcNzdwgBMwVMbGvySQmo4a0NqOZccU7YmGXLEfPQUlUid/XT6B8YdIU/99vjsPcOdEhDsfOd4QVCwKB8yp8SWuG1njbTl83DpMWz1PCKAswuWPDI0e8WebyAJBbxNdrF7cls+hBpAb3h3XtehL/3+4u7D35rQwpP4YFTwMJ91rHpQyQFQgmf9sAMNL9Ur4afv/FBjIuPVj+n4YVTwMD96tj0IVICoYYXv/q1VJ1Sl8UveQyaRwErvOB6B5SwKhqP00gI6A0vhsycJ7/KIzxhyHqGN0ADbnNAAYOicRfCFdAb/p50Gbfuc/wy5w1D5lOghk0fuG0USlgVr7sQjoDe8C8WxKGKPy2KjzlvAQb02/sCbh+FApngX1QUtyeSuwDi0hxFByV7L+LIf3r5kvpp4PBr07Hqvn71Y85bgOG6WS2ggA1+4D6eUKKQApVsqngI6KSkqh9HzsoM/3zg8Oz5VQ9E8wjf30YFDGdkeAsCwH18oYRZGXk7C4HuYxcwe6rjQsFovzaEvoFxqNkTOPzMjGikJso8wsF77XYkLx6dAwxWxvBmBIH7aUMJi8J3w0DnTVz7dyvX6KPzVBt+kL8cmzesRq9ps2Z48bRJmOIapS7E4zM2lXNt5CcU6ID7+ocSZkqY2NRN6ysnsHbJEpR8ZwV6t5Yg+iuLELf2KVd48VwXQf3BQGUMb4ZOuH9gKFEIYJfiNrEDcXZHHV4q3YRv5i7ikgM94RlETNgihrcgBHhccCiRCf7VhBK5rAPyr9I/Y/WKPEyfksH/9NjQ2dODhsYzwcLXsypkeBtCRGLRDUUMAMyKHxEx4dtrzyP97nQMygripiQiKi4aSbPvQmKW7+OXF69ntYvBa1iPCYklZEZECsGm4ja0Ops7EJsaj4SprlU+8IJiqIjAFga3Ikx4vvAYkTGALxyWFArlsnbBC9Sz6mI5zWKNRGh3JJY7mjte4GOz+r4tkRbxQQAAAABJRU5ErkJggg=="
}
}

View File

@@ -1,13 +1,13 @@
{
"id": 8,
"title": {
"en": "Generate SEO Blog",
"de": "SEO Blog generieren",
"zh": "生成SEO博客"},
"en": "SEO article writer",
"de": "SEO-Blog-Magnetiseur",
"zh": "SEO 博客写手"},
"description": {
"en": "This is a multi-agent version of the SEO blog generation workflow. It simulates a small team of AI “writers”, where each agent plays a specialized role — just like a real editorial team.",
"de": "Dies ist eine Multi-Agenten-Version des Workflows zur Erstellung von SEO-Blogs. Sie simuliert ein kleines Team von KI-„Autoren“, in dem jeder Agent eine spezielle Rolle übernimmt genau wie in einem echten Redaktionsteam.",
"zh": "多智能体架构可根据简单的用户输入自动生成完整的SEO博客文章。模拟小型“作家”团队其中每个智能体扮演一个专业角色——就像真正的编辑团队。"},
"zh": "SEO 博客写手可根据简单的用户输入自动生成完整的SEO博客文章。模拟小型“作家”团队其中每个智能体扮演一个专业角色——就像真正的编辑团队。"},
"canvas_type": "Agent",
"dsl": {
"components": {

View File

@@ -1,921 +0,0 @@
{
"id": 4,
"title": {
"en": "Generate SEO Blog",
"de": "SEO Blog generieren",
"zh": "生成SEO博客"},
"description": {
"en": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don't need any writing experience. Just provide a topic or short request — the system will handle the rest.",
"de": "Dieser Workflow generiert automatisch einen vollständigen SEO-optimierten Blogartikel basierend auf einer einfachen Benutzereingabe. Sie benötigen keine Schreiberfahrung. Geben Sie einfach ein Thema oder eine kurze Anfrage ein das System übernimmt den Rest.",
"zh": "此工作流根据简单的用户输入自动生成完整的SEO博客文章。你无需任何写作经验只需提供一个主题或简短请求系统将处理其余部分。"},
"canvas_type": "Recommended",
"dsl": {
"components": {
"Agent:BetterSitesSend": {
"downstream": [
"Agent:EagerNailsRemain"
],
"obj": {
"component_name": "Agent",
"params": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.3,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 3,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Balance",
"presencePenaltyEnabled": false,
"presence_penalty": 0.2,
"prompts": [
{
"content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Outline_Agent**, responsible for generating a clear and SEO-optimized blog outline based on the user's parsed writing intent and keyword strategy.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
"temperature": 0.5,
"temperatureEnabled": true,
"tools": [
{
"component_name": "TavilySearch",
"name": "TavilySearch",
"params": {
"api_key": "",
"days": 7,
"exclude_domains": [],
"include_answer": false,
"include_domains": [],
"include_image_descriptions": false,
"include_images": false,
"include_raw_content": true,
"max_results": 5,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
},
"json": {
"type": "Array<Object>",
"value": []
}
},
"query": "sys.query",
"search_depth": "basic",
"topic": "general"
}
}
],
"topPEnabled": false,
"top_p": 0.85,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"Agent:ClearRabbitsScream"
]
},
"Agent:ClearRabbitsScream": {
"downstream": [
"Agent:BetterSitesSend"
],
"obj": {
"component_name": "Agent",
"params": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 1,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "The user query is {sys.query}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Parse_And_Keyword_Agent**, responsible for interpreting a user's blog writing request and generating a structured writing intent summary and keyword strategy for SEO-optimized content generation.\n\n# Goals\n\n1. Extract and infer the user's true writing intent, even if the input is informal or vague.\n\n2. Identify the writing type, target audience, and implied goal.\n\n3. Suggest 3\u20135 long-tail keywords based on the input and context.\n\n4. Output all data in a Markdown format for downstream agents.\n\n# Operating Guidelines\n\n\n- If the user's input lacks clarity, make reasonable and **conservative** assumptions based on SEO best practices.\n\n- Always choose one clear \"Writing Type\" from the list below.\n\n- Your job is not to write the blog \u2014 only to structure the brief.\n\n# Output Format\n\n```markdown\n## Writing Type\n\n[Choose one: Tutorial / Informative Guide / Marketing Content / Case Study / Opinion Piece / How-to / Comparison Article]\n\n## Target Audience\n\n[Try to be specific based on clues in the input: e.g., marketing managers, junior developers, SEO beginners]\n\n## User Intent Summary\n\n[A 1\u20132 sentence summary of what the user wants to achieve with the blog post]\n\n## Suggested Long-tail Keywords\n\n- keyword 1\n\n- keyword 2\n\n- keyword 3\n\n- keyword 4 (optional)\n\n- keyword 5 (optional)\n\n\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\n\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
"temperature": 0.2,
"temperatureEnabled": true,
"tools": [],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"begin"
]
},
"Agent:EagerNailsRemain": {
"downstream": [
"Agent:LovelyHeadsOwn"
],
"obj": {
"component_name": "Agent",
"params": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 5,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\n\n\nThe Outline agent output is {Agent:BetterSitesSend@content}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Body_Agent**, responsible for generating the full content of each section of an SEO-optimized blog based on the provided outline and keyword strategy.\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
"temperature": 0.2,
"temperatureEnabled": true,
"tools": [
{
"component_name": "TavilySearch",
"name": "TavilySearch",
"params": {
"api_key": "",
"days": 7,
"exclude_domains": [],
"include_answer": false,
"include_domains": [],
"include_image_descriptions": false,
"include_images": false,
"include_raw_content": true,
"max_results": 5,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
},
"json": {
"type": "Array<Object>",
"value": []
}
},
"query": "sys.query",
"search_depth": "basic",
"topic": "general"
}
}
],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"Agent:BetterSitesSend"
]
},
"Agent:LovelyHeadsOwn": {
"downstream": [
"Message:LegalBeansBet"
],
"obj": {
"component_name": "Agent",
"params": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 5,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\nThe Outline agent output is {Agent:BetterSitesSend@content}\n\nThe Body agent output is {Agent:EagerNailsRemain@content}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Editor_Agent**, responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n\n",
"temperature": 0.2,
"temperatureEnabled": true,
"tools": [],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
}
},
"upstream": [
"Agent:EagerNailsRemain"
]
},
"Message:LegalBeansBet": {
"downstream": [],
"obj": {
"component_name": "Message",
"params": {
"content": [
"{Agent:LovelyHeadsOwn@content}"
]
}
},
"upstream": [
"Agent:LovelyHeadsOwn"
]
},
"begin": {
"downstream": [
"Agent:ClearRabbitsScream"
],
"obj": {
"component_name": "Begin",
"params": {
"enablePrologue": true,
"inputs": {},
"mode": "conversational",
"prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
}
},
"upstream": []
}
},
"globals": {
"sys.conversation_turns": 0,
"sys.files": [],
"sys.query": "",
"sys.user_id": ""
},
"graph": {
"edges": [
{
"data": {
"isHovered": false
},
"id": "xy-edge__beginstart-Agent:ClearRabbitsScreamend",
"source": "begin",
"sourceHandle": "start",
"target": "Agent:ClearRabbitsScream",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:ClearRabbitsScreamstart-Agent:BetterSitesSendend",
"source": "Agent:ClearRabbitsScream",
"sourceHandle": "start",
"target": "Agent:BetterSitesSend",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:BetterSitesSendtool-Tool:SharpPensBurnend",
"source": "Agent:BetterSitesSend",
"sourceHandle": "tool",
"target": "Tool:SharpPensBurn",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:BetterSitesSendstart-Agent:EagerNailsRemainend",
"source": "Agent:BetterSitesSend",
"sourceHandle": "start",
"target": "Agent:EagerNailsRemain",
"targetHandle": "end"
},
{
"id": "xy-edge__Agent:EagerNailsRemaintool-Tool:WickedDeerHealend",
"source": "Agent:EagerNailsRemain",
"sourceHandle": "tool",
"target": "Tool:WickedDeerHeal",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:EagerNailsRemainstart-Agent:LovelyHeadsOwnend",
"source": "Agent:EagerNailsRemain",
"sourceHandle": "start",
"target": "Agent:LovelyHeadsOwn",
"targetHandle": "end"
},
{
"data": {
"isHovered": false
},
"id": "xy-edge__Agent:LovelyHeadsOwnstart-Message:LegalBeansBetend",
"source": "Agent:LovelyHeadsOwn",
"sourceHandle": "start",
"target": "Message:LegalBeansBet",
"targetHandle": "end"
}
],
"nodes": [
{
"data": {
"form": {
"enablePrologue": true,
"inputs": {},
"mode": "conversational",
"prologue": "Hi! I'm your SEO blog assistant.\n\nTo get started, please tell me:\n1. What topic you want the blog to cover\n2. Who is the target audience\n3. What you hope to achieve with this blog (e.g., SEO traffic, teaching beginners, promoting a product)\n"
},
"label": "Begin",
"name": "begin"
},
"id": "begin",
"measured": {
"height": 48,
"width": 200
},
"position": {
"x": 50,
"y": 200
},
"selected": false,
"sourcePosition": "left",
"targetPosition": "right",
"type": "beginNode"
},
{
"data": {
"form": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 1,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "The user query is {sys.query}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Parse_And_Keyword_Agent**, responsible for interpreting a user's blog writing request and generating a structured writing intent summary and keyword strategy for SEO-optimized content generation.\n\n# Goals\n\n1. Extract and infer the user's true writing intent, even if the input is informal or vague.\n\n2. Identify the writing type, target audience, and implied goal.\n\n3. Suggest 3\u20135 long-tail keywords based on the input and context.\n\n4. Output all data in a Markdown format for downstream agents.\n\n# Operating Guidelines\n\n\n- If the user's input lacks clarity, make reasonable and **conservative** assumptions based on SEO best practices.\n\n- Always choose one clear \"Writing Type\" from the list below.\n\n- Your job is not to write the blog \u2014 only to structure the brief.\n\n# Output Format\n\n```markdown\n## Writing Type\n\n[Choose one: Tutorial / Informative Guide / Marketing Content / Case Study / Opinion Piece / How-to / Comparison Article]\n\n## Target Audience\n\n[Try to be specific based on clues in the input: e.g., marketing managers, junior developers, SEO beginners]\n\n## User Intent Summary\n\n[A 1\u20132 sentence summary of what the user wants to achieve with the blog post]\n\n## Suggested Long-tail Keywords\n\n- keyword 1\n\n- keyword 2\n\n- keyword 3\n\n- keyword 4 (optional)\n\n- keyword 5 (optional)\n\n\n\n\n## Input Examples (and how to handle them)\n\nInput: \"I want to write about RAGFlow.\"\n\u2192 Output: Informative Guide, Audience: AI developers, Intent: explain what RAGFlow is and its use cases\n\nInput: \"Need a blog to promote our prompt design tool.\"\n\u2192 Output: Marketing Content, Audience: product managers or tool adopters, Intent: raise awareness and interest in the product\n\n\n\nInput: \"How to get more Google traffic using AI\"\n\u2192 Output: How-to, Audience: SEO marketers, Intent: guide readers on applying AI for SEO growth",
"temperature": 0.2,
"temperatureEnabled": true,
"tools": [],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Parse And Keyword Agent"
},
"dragging": false,
"id": "Agent:ClearRabbitsScream",
"measured": {
"height": 84,
"width": 200
},
"position": {
"x": 344.7766966202233,
"y": 234.82202253184496
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.3,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 3,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Balance",
"presencePenaltyEnabled": false,
"presence_penalty": 0.2,
"prompts": [
{
"content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Outline_Agent**, responsible for generating a clear and SEO-optimized blog outline based on the user's parsed writing intent and keyword strategy.\n\n# Tool Access:\n\n- You have access to a search tool called `Tavily Search`.\n\n- If you are unsure how to structure a section, you may call this tool to search for related blog outlines or content from Google.\n\n- Do not overuse it. Your job is to extract **structure**, not to write paragraphs.\n\n\n# Goals\n\n1. Create a well-structured outline with appropriate H2 and H3 headings.\n\n2. Ensure logical flow from introduction to conclusion.\n\n3. Assign 1\u20132 suggested long-tail keywords to each major section for SEO alignment.\n\n4. Make the structure suitable for downstream paragraph writing.\n\n\n\n\n#Note\n\n- Use concise, scannable section titles.\n\n- Do not write full paragraphs.\n\n- Prioritize clarity, logical progression, and SEO alignment.\n\n\n\n- If the blog type is \u201cTutorial\u201d or \u201cHow-to\u201d, include step-based sections.\n\n\n# Input\n\nYou will receive:\n\n- Writing Type (e.g., Tutorial, Informative Guide)\n\n- Target Audience\n\n- User Intent Summary\n\n- 3\u20135 long-tail keywords\n\n\nUse this information to design a structure that both informs readers and maximizes search engine visibility.\n\n# Output Format\n\n```markdown\n\n## Blog Title (suggested)\n\n[Give a short, SEO-friendly title suggestion]\n\n## Outline\n\n### Introduction\n\n- Purpose of the article\n\n- Brief context\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 1]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 2]\n\n- [Short description of what this section will cover]\n\n- **Suggested keywords**: [keyword1, keyword2]\n\n### H2: [Section Title 3]\n\n- [Optional H3 Subsection Title A]\n\n - [Explanation of sub-point]\n\n- [Optional H3 Subsection Title B]\n\n - [Explanation of sub-point]\n\n- **Suggested keywords**: [keyword1]\n\n### Conclusion\n\n- Recap key takeaways\n\n- Optional CTA (Call to Action)\n\n- **Suggested keywords**: [keyword3]\n\n",
"temperature": 0.5,
"temperatureEnabled": true,
"tools": [
{
"component_name": "TavilySearch",
"name": "TavilySearch",
"params": {
"api_key": "",
"days": 7,
"exclude_domains": [],
"include_answer": false,
"include_domains": [],
"include_image_descriptions": false,
"include_images": false,
"include_raw_content": true,
"max_results": 5,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
},
"json": {
"type": "Array<Object>",
"value": []
}
},
"query": "sys.query",
"search_depth": "basic",
"topic": "general"
}
}
],
"topPEnabled": false,
"top_p": 0.85,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Outline Agent"
},
"dragging": false,
"id": "Agent:BetterSitesSend",
"measured": {
"height": 84,
"width": 200
},
"position": {
"x": 613.4368763415628,
"y": 164.3074269048589
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"description": "This is an agent for a specific task.",
"user_prompt": "This is the order you need to send to the agent."
},
"label": "Tool",
"name": "flow.tool_0"
},
"dragging": false,
"id": "Tool:SharpPensBurn",
"measured": {
"height": 44,
"width": 200
},
"position": {
"x": 580.1877078861457,
"y": 287.7669662022325
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "toolNode"
},
{
"data": {
"form": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 5,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\n\n\nThe Outline agent output is {Agent:BetterSitesSend@content}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Body_Agent**, responsible for generating the full content of each section of an SEO-optimized blog based on the provided outline and keyword strategy.\n\n# Tool Access:\n\nYou can use the `Tavily Search` tool to retrieve relevant content, statistics, or examples to support each section you're writing.\n\nUse it **only** when the provided outline lacks enough information, or if the section requires factual grounding.\n\nAlways cite the original link or indicate source where possible.\n\n\n# Goals\n\n1. Write each section (based on H2/H3 structure) as a complete and natural blog paragraph.\n\n2. Integrate the suggested long-tail keywords naturally into each section.\n\n3. When appropriate, use the `Tavily Search` tool to enrich your writing with relevant facts, examples, or quotes.\n\n4. Ensure each section is clear, engaging, and informative, suitable for both human readers and search engines.\n\n\n# Style Guidelines\n\n- Write in a tone appropriate to the audience. Be explanatory, not promotional, unless it's a marketing blog.\n\n- Avoid generic filler content. Prioritize clarity, structure, and value.\n\n- Ensure SEO keywords are embedded seamlessly, not forcefully.\n\n\n\n- Maintain writing rhythm. Vary sentence lengths. Use transitions between ideas.\n\n\n# Input\n\n\nYou will receive:\n\n- Blog title\n\n- Structured outline (including section titles, keywords, and descriptions)\n\n- Target audience\n\n- Blog type and user intent\n\nYou must **follow the outline strictly**. Write content **section-by-section**, based on the structure.\n\n\n# Output Format\n\n```markdown\n\n## H2: [Section Title]\n\n[Your generated content for this section \u2014 500-600 words, using keywords naturally.]\n\n",
"temperature": 0.2,
"temperatureEnabled": true,
"tools": [
{
"component_name": "TavilySearch",
"name": "TavilySearch",
"params": {
"api_key": "",
"days": 7,
"exclude_domains": [],
"include_answer": false,
"include_domains": [],
"include_image_descriptions": false,
"include_images": false,
"include_raw_content": true,
"max_results": 5,
"outputs": {
"formalized_content": {
"type": "string",
"value": ""
},
"json": {
"type": "Array<Object>",
"value": []
}
},
"query": "sys.query",
"search_depth": "basic",
"topic": "general"
}
}
],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Body Agent"
},
"dragging": false,
"id": "Agent:EagerNailsRemain",
"measured": {
"height": 84,
"width": 200
},
"position": {
"x": 889.0614605692713,
"y": 247.00973041799065
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"description": "This is an agent for a specific task.",
"user_prompt": "This is the order you need to send to the agent."
},
"label": "Tool",
"name": "flow.tool_1"
},
"dragging": false,
"id": "Tool:WickedDeerHeal",
"measured": {
"height": 44,
"width": 200
},
"position": {
"x": 853.2006404239659,
"y": 364.37541577229143
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "toolNode"
},
{
"data": {
"form": {
"delay_after_error": 1,
"description": "",
"exception_comment": "",
"exception_default_value": "",
"exception_goto": [],
"exception_method": null,
"frequencyPenaltyEnabled": false,
"frequency_penalty": 0.5,
"llm_id": "deepseek-chat@DeepSeek",
"maxTokensEnabled": false,
"max_retries": 3,
"max_rounds": 5,
"max_tokens": 4096,
"mcp": [],
"message_history_window_size": 12,
"outputs": {
"content": {
"type": "string",
"value": ""
}
},
"parameter": "Precise",
"presencePenaltyEnabled": false,
"presence_penalty": 0.5,
"prompts": [
{
"content": "The parse and keyword agent output is {Agent:ClearRabbitsScream@content}\n\nThe Outline agent output is {Agent:BetterSitesSend@content}\n\nThe Body agent output is {Agent:EagerNailsRemain@content}",
"role": "user"
}
],
"sys_prompt": "# Role\n\nYou are the **Editor_Agent**, responsible for finalizing the blog post for both human readability and SEO effectiveness.\n\n# Goals\n\n1. Polish the entire blog content for clarity, coherence, and style.\n\n2. Improve transitions between sections, ensure logical flow.\n\n3. Verify that keywords are used appropriately and effectively.\n\n4. Conduct a lightweight SEO audit \u2014 checking keyword density, structure (H1/H2/H3), and overall searchability.\n\n\n\n# Style Guidelines\n\n- Be precise. Avoid bloated or vague language.\n\n- Maintain an informative and engaging tone, suitable to the target audience.\n\n- Do not remove keywords unless absolutely necessary for clarity.\n\n- Ensure paragraph flow and section continuity.\n\n\n# Input\n\nYou will receive:\n\n- Full blog content, written section-by-section\n\n- Original outline with suggested keywords\n\n- Target audience and writing type\n\n# Output Format\n\n```markdown\n\n[The revised, fully polished blog post content goes here.]\n\n",
"temperature": 0.2,
"temperatureEnabled": true,
"tools": [],
"topPEnabled": false,
"top_p": 0.75,
"user_prompt": "",
"visual_files_var": ""
},
"label": "Agent",
"name": "Editor Agent"
},
"dragging": false,
"id": "Agent:LovelyHeadsOwn",
"measured": {
"height": 84,
"width": 200
},
"position": {
"x": 1160.3332919804993,
"y": 149.50806732882472
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "agentNode"
},
{
"data": {
"form": {
"content": [
"{Agent:LovelyHeadsOwn@content}"
]
},
"label": "Message",
"name": "Response"
},
"dragging": false,
"id": "Message:LegalBeansBet",
"measured": {
"height": 56,
"width": 200
},
"position": {
"x": 1370.6665839609984,
"y": 267.0323933738015
},
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "messageNode"
},
{
"data": {
"form": {
"text": "This workflow automatically generates a complete SEO-optimized blog article based on a simple user input. You don\u2019t need any writing experience. Just provide a topic or short request \u2014 the system will handle the rest.\n\nThe process includes the following key stages:\n\n1. **Understanding your topic and goals**\n2. **Designing the blog structure**\n3. **Writing high-quality content**\n\n\n"
},
"label": "Note",
"name": "Workflow Overall Description"
},
"dragHandle": ".note-drag-handle",
"dragging": false,
"height": 205,
"id": "Note:SlimyGhostsWear",
"measured": {
"height": 205,
"width": 415
},
"position": {
"x": -284.3143151688742,
"y": 150.47632147913419
},
"resizing": false,
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "noteNode",
"width": 415
},
{
"data": {
"form": {
"text": "**Purpose**: \nThis agent reads the user\u2019s input and figures out what kind of blog needs to be written.\n\n**What it does**:\n- Understands the main topic you want to write about \n- Identifies who the blog is for (e.g., beginners, marketers, developers) \n- Determines the writing purpose (e.g., SEO traffic, product promotion, education) \n- Suggests 3\u20135 long-tail SEO keywords related to the topic"
},
"label": "Note",
"name": "Parse And Keyword Agent"
},
"dragHandle": ".note-drag-handle",
"dragging": false,
"height": 152,
"id": "Note:EmptyChairsShake",
"measured": {
"height": 152,
"width": 340
},
"position": {
"x": 295.04147626768133,
"y": 372.2755718118446
},
"resizing": false,
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "noteNode",
"width": 340
},
{
"data": {
"form": {
"text": "**Purpose**: \nThis agent builds the blog structure \u2014 just like writing a table of contents before you start writing the full article.\n\n**What it does**:\n- Suggests a clear blog title that includes important keywords \n- Breaks the article into sections using H2 and H3 headings (like a professional blog layout) \n- Assigns 1\u20132 recommended keywords to each section to help with SEO \n- Follows the writing goal and target audience set in the previous step"
},
"label": "Note",
"name": "Outline Agent"
},
"dragHandle": ".note-drag-handle",
"dragging": false,
"height": 146,
"id": "Note:TallMelonsNotice",
"measured": {
"height": 146,
"width": 343
},
"position": {
"x": 598.5644991893463,
"y": 5.801054564756448
},
"resizing": false,
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "noteNode",
"width": 343
},
{
"data": {
"form": {
"text": "**Purpose**: \nThis agent is responsible for writing the actual content of the blog \u2014 paragraph by paragraph \u2014 based on the outline created earlier.\n\n**What it does**:\n- Looks at each H2/H3 section in the outline \n- Writes 150\u2013220 words of clear, helpful, and well-structured content per section \n- Includes the suggested SEO keywords naturally (not keyword stuffing) \n- Uses real examples or facts if needed (by calling a web search tool like Tavily)"
},
"label": "Note",
"name": "Body Agent"
},
"dragHandle": ".note-drag-handle",
"dragging": false,
"height": 137,
"id": "Note:RipeCougarsBuild",
"measured": {
"height": 137,
"width": 319
},
"position": {
"x": 860.4854129814981,
"y": 427.2196835690842
},
"resizing": false,
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "noteNode",
"width": 319
},
{
"data": {
"form": {
"text": "**Purpose**: \nThis agent reviews the entire blog draft to make sure it is smooth, professional, and SEO-friendly. It acts like a human editor before publishing.\n\n**What it does**:\n- Polishes the writing: improves sentence clarity, fixes awkward phrasing \n- Makes sure the content flows well from one section to the next \n- Double-checks keyword usage: are they present, natural, and not overused? \n- Verifies the blog structure (H1, H2, H3 headings) is correct \n- Adds two key SEO elements:\n - **Meta Title** (shows up in search results)\n - **Meta Description** (summary for Google and social sharing)"
},
"label": "Note",
"name": "Editor Agent"
},
"dragHandle": ".note-drag-handle",
"height": 146,
"id": "Note:OpenTurkeysSell",
"measured": {
"height": 146,
"width": 320
},
"position": {
"x": 1129,
"y": -30
},
"resizing": false,
"selected": false,
"sourcePosition": "right",
"targetPosition": "left",
"type": "noteNode",
"width": 320
}
]
},
"history": [],
"messages": [],
"path": [],
"retrieval": []
},
"avatar": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gHYSUNDX1BST0ZJTEUAAQEAAAHIAAAAAAQwAABtbnRyUkdCIFhZWiAH4AABAAEAAAAAAABhY3NwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAA9tYAAQAAAADTLQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlkZXNjAAAA8AAAACRyWFlaAAABFAAAABRnWFlaAAABKAAAABRiWFlaAAABPAAAABR3dHB0AAABUAAAABRyVFJDAAABZAAAAChnVFJDAAABZAAAAChiVFJDAAABZAAAAChjcHJ0AAABjAAAADxtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJYWVogAAAAAAAAb6IAADj1AAADkFhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAAwADADASIAAhEBAxEB/8QAGQAAAwEBAQAAAAAAAAAAAAAABgkKBwUI/8QAMBAAAAYCAQIEBQQCAwAAAAAAAQIDBAUGBxEhCAkAEjFBFFFhcaETFiKRFyOx8PH/xAAaAQACAwEBAAAAAAAAAAAAAAACAwABBgQF/8QALBEAAgIBAgUCBAcAAAAAAAAAAQIDBBEFEgATITFRIkEGIzJhFBUWgaGx8P/aAAwDAQACEQMRAD8AfF2hez9089t7pvxgQMa1Gb6qZ6oQE9m/NEvCIStyPfJSOF/M1epzMugo/qtMqbiRc1mJjoJKCLMNIxKcsLJedfO1Ct9cI63x9fx6CA/19t+oh4LFA5HfuAgP/A8eOIsnsTBrkBHXA7+v53+Q+ficTgJft9gIgA+/P9/1r342O/YA8A8k3/if+IbAN7+2/f8AAiI6H19PGoPyESTMZQPKUAHkQEN+3r9dh78/YPGUTk2wb/qAZZIugH1OHH5DjkdfbnWw2DsOxPj+xjrnx2H39unBopJGBn9s+PHv1HXjPJtH+J+B40O9a16h/wB/92j/ALrPa/wR104UyAobHlXhuo2HrEtK4qy3CwjKOuJLRHJLSkXWrFKs/gVrJVrE8TUiH8bPrP20UEu8m4hNpMJJuTOfnbUw/kUqyZgMHGjAO9+mtDsQ53sdcB6eMhnpEjhNQxRKICAgHy5+/roOdjr7c+J6O4x07dx484/n7nzw1gexBGfIPkZ/3t39uGpqc6+fP5/Ht8vGFZCzJjWpWuBxvO2yPjrtclUUK7BqmUI4fuASeyhG5FzFI0Bw4aQ0iZNoDgzvRW4qtyFkI4XmwyEk2YNnDp0sVBu3IUyy5iqH8gqKERSIRNIii67hddRJs1at01Xbx2sgzZoLu10UFJR+4V1A5cxF3FqNcLvjwcno43uuLrOxZYjujaClcb4QQfxEizpFiQyM9olcueRnjC2ZMt9iY06zL0qytrMSqSOVGsfHMaGhZ3l4lSRI2MqE74zJvRTveNFWWIh3RWw+XCAM5icKQLrCH57T17FhErSlRXnWvyZXKQwWJ3eraD14p5YuZCFgacskK2oGkVuKO5GYTHzf7DaD12cBD3DgPOIDrWw9PnrXPgDkpVsUDGMG+DD6E9gHXIjrYjwUPQTCXYgHPhIV974+F6E1hpC14Yzmzj56YaQEeZhXsayD1zLPW7pygxaMf81Nzu1iJsnIuDIKnaJAkPldqrHaoORZ73tMVEbFdSXT9nVgRQgnBq6j8e/HCIEATpAnH5KlmRVkFRFJwks/bqImSXJ5VFyA3N6Ikh3bCW3YHp5cowOmCfTgA+xJCnrjtwHKcLvJj2ZGcTRFj19kEhckdzgEjKnABGSSzdc1Fe5byXXGNjKdvRcw5NxvLidNZFFCxUa62KrzMaChw8hhYScFJtROAgmuLByq1MsgkZYPaVVuDe0wraRaqAdJwgRQo+YR8xTlAQNx6b49w41vXiJpCalLh1jZhyrTqRM4+jstdRmYryNkydLQRWg1LNGcWd5jIFFvCythlIySa0mNu74sKRQtaWsTmupqPItw0lE52ufpyYzrSkx6cw5bLmBEpkTsz+dt8P5QFuCRtAIkBH9MuwKHICIaDQhnojMs9mKaeGcrMxXlQtAYkdVljimRrE5MqI4zL8oSqQ6wxjodBqK05qdK3Vo3aCSVkBW7bjuC1NFJJBPaqyx6fp6pWkliYLXK2XrukkRu2CCVoSWMgsdMyySKwoLFcIGWSTUMg4IBgTcICoBhRcplMcpFkhIqQp1ClMBTmA0Zfe1zpjvHfXff65bZlzXpB3jjGTgiirmPjAfs16PHqHeQ75Wbj3xxZpOEkV3LRJJSPdomUBZISJLncV2k+8D07dxXp7xsYuTapA9UkJUYWIzNhadnWEZeCXGLQQiJi1ViHfhHL2unWh+mlORsrW0JFpEFnGVfm1mU4kq0FY3eD6corJncv6dr5NLSMNXVaTUksjTiMnaq8uFfSVuDyiJ1iZpy0LOJtpa3YfkcQ5fdozyxI2m5qqcrHN61YYmHsh6v3o9ParYmYJEtlhIx6+gUbjgD23M6oqg92YL0JyF6Bps+qDValVA9h9Lj5SZI3SHXdEQlj1wiQtLLIe6pGzjO3BlBkK1hxpblLVH5wdW0BcFKf/JwRtjsot2z8omaSdxbzzk1iEjsE0AM9rrRZNRIrVyo7dGO6E+oh8axLlJ5H5VaJKx7ePRGFbW6vUeFfHQIWPTI9Tm7HHfuhqY7E6C7JFqUzM6iZXIoncNxX7+bIVdJnTT48x3OQU1krIDW3UeixVhyISzYz6cadY5Xph6TseRNTRsTElzzBn9Vlly0TAERsdgnMYyLROjyFbg5R4ZlsGaMT4yNi2Zlq1GwjZB3jq0PsaJfA3t0jL0W0Y9xf1V41lpWckXMLaZiwxuKYPqc6LlHdkeRF+Qxswx5ASDqBVrsL+2A/N6SiCbYymV2BywJiMZj3GRRMTnL+lVyHCll3R7Szv0vqXMtQ74T+HijljIScLaEpkKCB3rqMBIi0jPs5JeOKTZMZEi5VVnouzy0k3jXjWSMlY6UcVGDxlKMVDqx91SILWSi3D2KdgYy3kP8E9X/AE1SnRXBNdNRMlefT6g7aY6giK+cPLGNg0bY68rcnpsNh9PqIBve/EcPQ3WIq2dR93xpSgk5SAZ9R6MLAOZFUkpLSUDXp6/KPpGUkmTdswlnKnwbl5ITMdGwcXJi7LKsqzUmT5tWYmkXuF9wjBvb76b7dHheazJ9RElUJOCxViuMlUJC0Gtz6PKyjLBY4qMWUe12r1xZ6lOyT6XPEBKN2CkTDOlZd02TBdTMt7Upx2knrkdCv1UKjDKn1A7XBYH6SCOOrWn5Oi/DtRiu+GleRthDL8rXdVjZlcfWrSIxVlGGGCOnH//Z"
}

View File

@@ -1,14 +1,14 @@
{
"id": 22,
"title": {
"en": "Ecommerce Customer Service Workflow",
"en": "Smart customer service specialist",
"de": "Ecommerce Kundenservice Workflow",
"zh": "电子商务客户服务工作流程"
"zh": "智能客户服务专员"
},
"description": {
"en": "This template helps e-commerce platforms address complex customer needs, such as comparing product features, providing usage support, and coordinating home installation services.",
"de": "Diese Vorlage hilft E-Commerce-Plattformen, komplexe Kundenbedürfnisse zu erfüllen, wie z.B. den Vergleich von Produktmerkmalen, die Bereitstellung von Nutzungsunterstützung und die Koordination von Hausinstallationsdiensten.",
"zh": "该模板可帮助电子商务平台解决复杂的客户需求,例如比较产品功能、提供使用支持和协调家庭安装服务。"
"en": "This template helps address complex customer needs, such as comparing product features, providing usage support, and coordinating home installation services.",
"de": "Diese Vorlage hilft komplexe Kundenbedürfnisse zu erfüllen, wie z.B. den Vergleich von Produktmerkmalen, die Bereitstellung von Nutzungsunterstützung und die Koordination von Hausinstallationsdiensten.",
"zh": "该模板可帮助解决复杂的客户需求,例如比较产品功能、提供使用支持和协调家庭安装服务。"
},
"canvas_type": "Customer Support",
"dsl": {

View File

@@ -1,12 +1,12 @@
{
"id": 17,
"title": {
"en": "SQL Assistant",
"de": "SQL Assistent",
"zh": "SQL助理"},
"en": "Text-to-SQL data expert",
"de": "Text-to-SQL-Datenexperte",
"zh": "Text-to-SQL 问数专家"},
"description": {
"en": "SQL Assistant is an AI-powered tool that lets business users turn plain-English questions into fully formed SQL queries. Simply type your question (e.g., 'Show me last quarter's top 10 products by revenue') and SQL Assistant generates the exact SQL, runs it against your database, and returns the results in seconds. ",
"de": "SQL-Assistent ist ein KI-gestütztes Tool, mit dem Geschäftsanwender einfache englische Fragen in vollständige SQL-Abfragen umwandeln können. Geben Sie einfach Ihre Frage ein (z.B. 'Zeige mir die Top 10 Produkte des letzten Quartals nach Umsatz') und der SQL-Assistent generiert das exakte SQL, führt es gegen Ihre Datenbank aus und liefert die Ergebnisse in Sekunden.",
"en": "Text-to-SQL data expert lets business users turn plain-English questions into fully formed SQL queries. Simply type your question (e.g., 'Show me last quarter's top 10 products by revenue') and Text-to-SQL data expert generates the exact SQL, runs it against your database, and returns the results in seconds. ",
"de": "Text-to-SQL-Datenexperte ist ein KI-gestütztes Tool, mit dem Geschäftsanwender einfache englische Fragen in vollständige SQL-Abfragen umwandeln können. Geben Sie einfach Ihre Frage ein (z.B. 'Zeige mir die Top 10 Produkte des letzten Quartals nach Umsatz') und der SQL-Assistent generiert das exakte SQL, führt es gegen Ihre Datenbank aus und liefert die Ergebnisse in Sekunden.",
"zh": "用户能够将简单文本问题转化为完整的SQL查询并输出结果。只需输入您的问题例如展示上个季度前十名按收入排序的产品SQL助理就会生成精确的SQL语句对其运行您的数据库并几秒钟内返回结果。"},
"canvas_type": "Marketing",
"dsl": {

File diff suppressed because one or more lines are too long

View File

@@ -2,13 +2,13 @@
{
"id": 14,
"title": {
"en": "Trip Planner",
"en": "Trip planner",
"de": "Reiseplaner",
"zh": "旅行规划"},
"zh": "旅行规划"},
"description": {
"en": "This smart trip planner utilizes LLM technology to automatically generate customized travel itineraries, with optional tool integration for enhanced reliability.",
"de": "Dieser intelligente Reiseplaner nutzt LLM-Technologie zur automatischen Generierung maßgeschneiderter Reiserouten mit optionaler Tool-Integration für erhöhte Zuverlässigkeit.",
"zh": "智能旅行规划将利用大模型自动生成定制化的旅行行程,附带可选工具集成,以增强可靠性。"},
"zh": "智能旅行规划将利用大模型自动生成定制化的旅行行程,附带可选工具集成,以增强可靠性。"},
"canvas_type": "Consumer App",
"dsl": {
"components": {

View File

@@ -2,14 +2,14 @@
{
"id": 9,
"title": {
"en": "Technical Docs QA",
"de": "Technische Dokumentation Fragen & Antworten",
"zh": "技术文档问答"},
"en": "Your starter dataset chatbot",
"de": "Dein Starter-Datensatz-Chatbot",
"zh": "入门级知识库聊天助手"},
"description": {
"en": "This is a document question-and-answer system based on a knowledge base. When a user asks a question, it retrieves relevant document content to provide accurate answers.",
"de": "Dies ist ein dokumentenbasiertes Frage-und-Antwort-System auf Basis einer Wissensdatenbank. Wenn ein Benutzer eine Frage stellt, werden relevante Dokumenteninhalte abgerufen, um genaue Antworten zu liefern.",
"zh": "基于知识库的文档问答系统,当用户提出问题时,会检索相关本地文档并提供准确回答。"},
"canvas_type": "Customer Support",
"zh": "基于知识库的入门级知识库聊天助手,当用户提出问题时,会检索相关本地文档并提供准确回答。"},
"canvas_type": "Recommended",
"dsl": {
"components": {
"Agent:StalePandasDream": {

View File

@@ -57,17 +57,19 @@ class LLMToolPluginCallSession(ToolCallSession):
async def tool_call_async(self, name: str, arguments: dict[str, Any]) -> Any:
assert name in self.tools_map, f"LLM tool {name} does not exist"
logging.info(f"[ToolCall] invoke name={name} arguments={str(arguments)[:200]}")
st = timer()
tool_obj = self.tools_map[name]
if isinstance(tool_obj, MCPToolCallSession):
resp = await thread_pool_exec(tool_obj.tool_call, name, arguments, 60)
elif hasattr(tool_obj, "invoke_async") and asyncio.iscoroutinefunction(tool_obj.invoke_async):
resp = await tool_obj.invoke_async(**arguments)
else:
if hasattr(tool_obj, "invoke_async") and asyncio.iscoroutinefunction(tool_obj.invoke_async):
resp = await tool_obj.invoke_async(**arguments)
else:
resp = await thread_pool_exec(tool_obj.invoke, **arguments)
resp = await thread_pool_exec(tool_obj.invoke, **arguments)
self.callback(name, arguments, resp, elapsed_time=timer()-st)
elapsed = timer() - st
logging.info(f"[ToolCall] done name={name} elapsed={elapsed:.2f}s result={str(resp)[:200]}")
self.callback(name, arguments, resp, elapsed_time=elapsed)
return resp
def get_tool_obj(self, name):
@@ -101,13 +103,8 @@ class ToolParamBase(ComponentParamBase):
if "enum" in p:
params[k]["enum"] = p["enum"]
desc = self.meta["description"]
if hasattr(self, "description"):
desc = self.description
function_name = self.meta["name"]
if hasattr(self, "function_name"):
function_name = self.function_name
desc = getattr(self, "description", None) or self.meta["description"]
function_name = getattr(self, "function_name", self.meta["name"])
return {
"type": "function",

View File

@@ -18,15 +18,196 @@ import base64
import json
import logging
import os
import uuid
from abc import ABC
from collections.abc import Mapping
from typing import Optional
from pydantic import BaseModel, Field, field_validator
from strenum import StrEnum
from agent.tools.base import ToolBase, ToolMeta, ToolParamBase
from api.db.services.file_service import FileService
from common import settings
from common.connection_utils import timeout
from common.constants import SANDBOX_ARTIFACT_BUCKET, SANDBOX_ARTIFACT_EXPIRE_DAYS
SYSTEM_OUTPUT_KEYS = frozenset(
{
"content",
"actual_type",
"_ERROR",
"_ARTIFACTS",
"_ATTACHMENT_CONTENT",
"raw_result",
"_created_time",
"_elapsed_time",
}
)
class ContractError(ValueError):
pass
def _validate_business_output_name(name: str) -> None:
if not name or not name.strip():
raise ContractError("CodeExec business output name must not be empty")
if name in SYSTEM_OUTPUT_KEYS:
raise ContractError(f"CodeExec reserved output name is not allowed: {name}")
if "." in name:
raise ContractError(f"CodeExec business output name must not contain '.': {name}")
def select_business_output(outputs: Mapping[str, object]) -> tuple[str, object]:
if len(outputs) == 1:
only_name, only_meta = next(iter(outputs.items()))
_validate_business_output_name(only_name)
return only_name, only_meta
business_outputs = [(name, meta) for name, meta in outputs.items() if name not in SYSTEM_OUTPUT_KEYS]
if len(business_outputs) != 1:
raise ContractError(
f"CodeExec contract must contain exactly one business output, got {len(business_outputs)}"
)
_validate_business_output_name(business_outputs[0][0])
return business_outputs[0]
def normalize_output_value(value):
if isinstance(value, (tuple, list)):
return [normalize_output_value(item) for item in value]
if isinstance(value, dict):
return {key: normalize_output_value(item) for key, item in value.items()}
return value
def infer_actual_type(value) -> str:
value = normalize_output_value(value)
if value is None:
return "Null"
if isinstance(value, bool):
return "Boolean"
if _is_number(value):
return "Number"
if isinstance(value, str):
return "String"
if isinstance(value, dict):
return "Object"
if isinstance(value, list):
if not value:
return "Array<Any>"
inferred = {infer_actual_type(item) for item in value}
if len(inferred) == 1:
return f"Array<{inferred.pop()}>"
return "Array<Any>"
return "Any"
def render_canonical_content(value) -> str:
value = normalize_output_value(value)
if value is None:
return ""
if isinstance(value, str):
return value
if isinstance(value, (dict, list)):
return json.dumps(value, ensure_ascii=False, indent=2, sort_keys=True)
return str(value)
def _is_number(value) -> bool:
return isinstance(value, (int, float)) and not isinstance(value, bool)
def _validate_top_level_value_domain(value) -> None:
allowed = value is None or isinstance(value, (bool, str, dict, list)) or _is_number(value)
if not allowed:
raise ContractError(
f"CodeExec unsupported top-level result type: {type(value).__name__}. "
"Allowed top-level values are String, Number, Boolean, Object, Array, or Null."
)
def _normalize_expected_type(expected_type: str) -> str:
etype = expected_type.strip()
low = etype.lower()
simple_types = {
"string": "String",
"number": "Number",
"boolean": "Boolean",
"object": "Object",
"null": "Null",
"any": "Any",
}
if low in simple_types:
return simple_types[low]
if low.startswith("array<") and low.endswith(">"):
inner = etype[etype.find("<") + 1 : -1].strip()
if not inner:
raise ContractError(f"Unsupported expected type: {expected_type}")
return f"Array<{_normalize_expected_type(inner)}>"
return etype
def _validate_expected_type(expected_type: str, value, path: str = "") -> None:
etype = _normalize_expected_type(expected_type)
if not etype or etype.lower() == "any":
return
value = normalize_output_value(value)
if etype.startswith("Array<") and etype.endswith(">"):
inner_type = etype[6:-1].strip()
if not isinstance(value, list):
raise ContractError(
f"CodeExec contract mismatch at {path or 'value'}: expected type {etype}, got {infer_actual_type(value)}"
)
for index, item in enumerate(value):
child_path = f"{path}[{index}]" if path else f"[{index}]"
_validate_expected_type(inner_type, item, child_path)
return
actual_type = infer_actual_type(value)
if etype == "String":
valid = isinstance(value, str)
elif etype == "Number":
valid = _is_number(value)
elif etype == "Boolean":
valid = isinstance(value, bool)
elif etype == "Object":
valid = isinstance(value, dict)
elif etype == "Null":
valid = value is None
else:
raise ContractError(f"Unsupported expected type: {expected_type}")
if not valid:
raise ContractError(
f"CodeExec contract mismatch at {path or 'value'}: expected type {etype}, got {actual_type}"
)
def build_code_exec_contract(outputs: Mapping[str, object], raw_result) -> dict[str, object]:
business_name, business_meta = select_business_output(outputs)
expected_type = ""
if isinstance(business_meta, Mapping):
expected_type = str(business_meta.get("type") or "")
normalized_value = normalize_output_value(raw_result)
_validate_top_level_value_domain(normalized_value)
_validate_expected_type(expected_type, normalized_value)
return {
"business_output": business_name,
"value": normalized_value,
"actual_type": infer_actual_type(normalized_value),
"content": render_canonical_content(normalized_value),
}
def _art_field(art, field: str, default=""):
return art.get(field, default) if isinstance(art, dict) else getattr(art, field, default)
class Language(StrEnum):
@@ -70,6 +251,7 @@ class CodeExecParam(ToolParamBase):
"name": "execute_code",
"description": """
This tool has a sandbox that can execute code written in 'Python'/'Javascript'. It receives a piece of code and return a Json string.
Here's a code example for Python(`main` function MUST be included):
def main() -> dict:
\"\"\"
@@ -84,6 +266,26 @@ def main() -> dict:
"result": fibonacci_recursive(100),
}
To generate charts or files (images, PDFs, CSVs, etc.), save them to the `artifacts/` directory (relative to the working directory). The sandbox will automatically collect these files and return them. Example:
def main() -> dict:
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [10, 20, 25, 30]})
fig, ax = plt.subplots()
ax.plot(df["x"], df["y"])
ax.set_title("Sample Chart")
fig.savefig("artifacts/chart.png", dpi=150, bbox_inches="tight")
plt.close(fig)
return {"summary": "Chart saved to artifacts/chart.png"}
Available Python packages: pandas, numpy, matplotlib, requests.
Supported artifact file types: .png, .jpg, .jpeg, .svg, .pdf, .csv, .json, .html
Collected artifacts are also parsed automatically and appended to the stable text output `content`. The content includes sections like `attachment1 (image): ...`, `attachment2 (pdf): ...`, so downstream nodes can consume a single text output without depending on unstable attachment-specific variables.
Here's a code example for Javascript(`main` function MUST be included and exported):
const axios = require('axios');
async function main(args) {
@@ -125,6 +327,7 @@ module.exports = { main };
class CodeExec(ToolBase, ABC):
component_name = "CodeExec"
_lifecycle_configured = False
@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60)))
def _invoke(self, **kwargs):
@@ -148,6 +351,8 @@ class CodeExec(ToolBase, ABC):
if self.check_if_canceled("CodeExec execution"):
return self.output()
timeout_seconds = int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60))
try:
# Try using the new sandbox provider system first
try:
@@ -157,25 +362,19 @@ class CodeExec(ToolBase, ABC):
return
# Execute code using the provider system
result = sandbox_execute_code(
code=code,
language=language,
timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60)),
arguments=arguments
)
result = sandbox_execute_code(code=code, language=language, timeout=timeout_seconds, arguments=arguments)
if self.check_if_canceled("CodeExec execution"):
return
# Process the result
if result.stderr:
self.set_output("_ERROR", result.stderr)
return
parsed_stdout = self._deserialize_stdout(result.stdout)
logging.info(f"[CodeExec]: Provider system -> {parsed_stdout}")
self._populate_outputs(parsed_stdout, result.stdout)
return
artifacts = result.metadata.get("artifacts", []) if result.metadata else []
return self._process_execution_result(
result.stdout,
result.stderr,
"Provider system",
artifacts,
execution_metadata=result.metadata,
)
except (ImportError, RuntimeError) as provider_error:
# Provider system not available or not configured, fall back to HTTP
@@ -196,7 +395,7 @@ class CodeExec(ToolBase, ABC):
self.set_output("_ERROR", "Task has been canceled")
return self.output()
resp = requests.post(url=f"http://{settings.SANDBOX_HOST}:9385/run", json=code_req, timeout=int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10 * 60)))
resp = requests.post(url=f"http://{settings.SANDBOX_HOST}:9385/run", json=code_req, timeout=timeout_seconds)
logging.info(f"http://{settings.SANDBOX_HOST}:9385/run, code_req: {code_req}, resp.status_code {resp.status_code}:")
if self.check_if_canceled("CodeExec execution"):
@@ -206,14 +405,13 @@ class CodeExec(ToolBase, ABC):
resp.raise_for_status()
body = resp.json()
if body:
stderr = body.get("stderr")
if stderr:
self.set_output("_ERROR", stderr)
return self.output()
raw_stdout = body.get("stdout", "")
parsed_stdout = self._deserialize_stdout(raw_stdout)
logging.info(f"[CodeExec]: http://{settings.SANDBOX_HOST}:9385/run -> {parsed_stdout}")
self._populate_outputs(parsed_stdout, raw_stdout)
return self._process_execution_result(
body.get("stdout", ""),
body.get("stderr"),
f"http://{settings.SANDBOX_HOST}:9385/run",
body.get("artifacts", []),
execution_metadata=self._build_http_execution_metadata(body),
)
else:
self.set_output("_ERROR", "There is no response from sandbox")
return self.output()
@@ -226,6 +424,129 @@ class CodeExec(ToolBase, ABC):
return self.output()
def _process_execution_result(
self,
stdout: str,
stderr: str | None,
source: str,
artifacts: list | None = None,
execution_metadata: dict | None = None,
):
has_structured_result = bool((execution_metadata or {}).get("result_present") is True)
resolved_value, used_stdout_fallback = self._resolve_execution_result_value(stdout, execution_metadata)
if stderr and not has_structured_result and not artifacts and not str(stdout or "").strip():
self.set_output("_ERROR", stderr)
return self.output()
# Clear any stale error from previous runs or base class initialization
self.set_output("_ERROR", "")
if stderr:
logging.warning(f"[CodeExec]: stderr (non-fatal): {stderr[:500]}")
if used_stdout_fallback and str(stdout or "").strip():
logging.warning("[CodeExec]: Falling back to stdout deserialization because no structured result metadata was provided")
logging.info(f"[CodeExec]: {source} -> {resolved_value}")
content_parts = []
base_content = self._apply_business_output(resolved_value)
if base_content:
content_parts.append(base_content)
if artifacts:
artifact_urls = self._upload_artifacts(artifacts)
self.set_output("_ARTIFACTS", artifact_urls or None)
attachment_text = self._build_attachment_content(artifacts, artifact_urls)
self.set_output("_ATTACHMENT_CONTENT", attachment_text)
if attachment_text:
content_parts.append(attachment_text)
else:
self.set_output("_ARTIFACTS", None)
self.set_output("_ATTACHMENT_CONTENT", "")
self.set_output("content", "\n\n".join([part for part in content_parts if part]).strip())
return self.output()
def _build_http_execution_metadata(self, body: Mapping | None) -> dict:
if not isinstance(body, Mapping):
return {}
structured_result = body.get("result")
if not isinstance(structured_result, Mapping):
return {}
return {
"result_present": structured_result.get("present", False),
"result_value": structured_result.get("value"),
"result_type": structured_result.get("type"),
}
def _resolve_execution_result_value(self, stdout: str, execution_metadata: Mapping | None = None):
metadata = execution_metadata or {}
if metadata.get("result_present") is True:
return metadata.get("result_value"), False
return self._deserialize_stdout(stdout), True
@classmethod
def _ensure_bucket_lifecycle(cls):
if cls._lifecycle_configured:
return
try:
storage = settings.STORAGE_IMPL
# Only MinIO/S3 backends expose .conn for lifecycle config
if not hasattr(storage, "conn") or storage.conn is None:
cls._lifecycle_configured = True
return
if not storage.conn.bucket_exists(SANDBOX_ARTIFACT_BUCKET):
storage.conn.make_bucket(SANDBOX_ARTIFACT_BUCKET)
from minio.commonconfig import Filter
from minio.lifecycleconfig import Expiration, LifecycleConfig, Rule
rule = Rule(
rule_id="auto-expire",
status="Enabled",
rule_filter=Filter(prefix=""),
expiration=Expiration(days=SANDBOX_ARTIFACT_EXPIRE_DAYS),
)
storage.conn.set_bucket_lifecycle(SANDBOX_ARTIFACT_BUCKET, LifecycleConfig([rule]))
logging.info(f"[CodeExec]: Set {SANDBOX_ARTIFACT_EXPIRE_DAYS}-day lifecycle on bucket '{SANDBOX_ARTIFACT_BUCKET}'")
cls._lifecycle_configured = True
except Exception as e:
# Do NOT set _lifecycle_configured so we retry next time
logging.warning(f"[CodeExec]: Failed to set bucket lifecycle: {e}")
def _upload_artifacts(self, artifacts: list) -> list[dict]:
self._ensure_bucket_lifecycle()
uploaded = []
for art in artifacts:
try:
name = _art_field(art, "name")
content_b64 = _art_field(art, "content_b64")
mime_type = _art_field(art, "mime_type")
size = _art_field(art, "size", 0)
if not content_b64 or not name:
continue
ext = os.path.splitext(name)[1].lower()
storage_name = f"{uuid.uuid4().hex}{ext}"
binary = base64.b64decode(content_b64)
settings.STORAGE_IMPL.put(SANDBOX_ARTIFACT_BUCKET, storage_name, binary)
url = f"/v1/document/artifact/{storage_name}"
uploaded.append(
{
"name": name,
"url": url,
"mime_type": mime_type,
"size": size,
}
)
logging.info(f"[CodeExec]: Uploaded artifact {name} -> {url}")
except Exception as e:
logging.warning(f"[CodeExec]: Failed to upload artifact: {e}")
return uploaded
def _encode_code(self, code: str) -> str:
return base64.b64encode(code.encode("utf-8")).decode("utf-8")
@@ -243,139 +564,84 @@ class CodeExec(ToolBase, ABC):
continue
return text
def _coerce_output_value(self, value, expected_type: Optional[str]):
if expected_type is None:
return value
etype = expected_type.strip().lower()
inner_type = None
if etype.startswith("array<") and etype.endswith(">"):
inner_type = etype[6:-1].strip()
etype = "array"
def _apply_business_output(self, parsed_stdout) -> str:
normalized_result = normalize_output_value(parsed_stdout)
self.set_output("raw_result", normalized_result)
business_output_names = [name for name in self._param.outputs if name not in SYSTEM_OUTPUT_KEYS]
try:
if etype == "string":
return "" if value is None else str(value)
contract = build_code_exec_contract(self._param.outputs, normalized_result)
except ContractError as e:
for output_name in business_output_names:
self.set_output(output_name, None)
self.set_output("actual_type", infer_actual_type(normalized_result))
self.set_output("_ERROR", str(e))
logging.warning(f"[CodeExec]: contract validation failed: {e}")
return render_canonical_content(normalized_result)
if etype == "number":
if value is None or value == "":
return None
if isinstance(value, (int, float)):
return value
if isinstance(value, str):
try:
return float(value)
except Exception:
return value
return float(value)
self.set_output("actual_type", contract["actual_type"])
self.set_output(contract["business_output"], contract["value"])
return contract["content"]
if etype == "boolean":
if isinstance(value, bool):
return value
if isinstance(value, str):
lv = value.lower()
if lv in ("true", "1", "yes", "y", "on"):
return True
if lv in ("false", "0", "no", "n", "off"):
return False
return bool(value)
def _build_attachment_content(self, artifacts: list, artifact_urls: list[dict] | None = None) -> str:
sections = []
artifact_urls = artifact_urls or []
if etype == "array":
candidate = value
if isinstance(candidate, str):
parsed = self._deserialize_stdout(candidate)
candidate = parsed
if isinstance(candidate, tuple):
candidate = list(candidate)
if not isinstance(candidate, list):
candidate = [] if candidate is None else [candidate]
if inner_type == "string":
return ["" if v is None else str(v) for v in candidate]
if inner_type == "number":
coerced = []
for v in candidate:
try:
if v is None or v == "":
coerced.append(None)
elif isinstance(v, (int, float)):
coerced.append(v)
else:
coerced.append(float(v))
except Exception:
coerced.append(v)
return coerced
return candidate
if etype == "object":
if isinstance(value, dict):
return value
if isinstance(value, str):
parsed = self._deserialize_stdout(value)
if isinstance(parsed, dict):
return parsed
return value
except Exception:
return value
return value
def _populate_outputs(self, parsed_stdout, raw_stdout: str):
outputs_items = list(self._param.outputs.items())
logging.info(f"[CodeExec]: outputs schema keys: {[k for k, _ in outputs_items]}")
if not outputs_items:
return
if isinstance(parsed_stdout, dict):
for key, meta in outputs_items:
if key.startswith("_"):
for idx, art in enumerate(artifacts, start=1):
key = f"attachment{idx}"
try:
name = _art_field(art, "name")
content_b64 = _art_field(art, "content_b64")
mime_type = _art_field(art, "mime_type")
if not name or not content_b64:
continue
val = self._get_by_path(parsed_stdout, key)
if val is None and len(outputs_items) == 1:
val = parsed_stdout
coerced = self._coerce_output_value(val, meta.get("type"))
logging.info(f"[CodeExec]: populate dict key='{key}' raw='{val}' coerced='{coerced}'")
self.set_output(key, coerced)
return
if isinstance(parsed_stdout, (list, tuple)):
for idx, (key, meta) in enumerate(outputs_items):
if key.startswith("_"):
continue
val = parsed_stdout[idx] if idx < len(parsed_stdout) else None
coerced = self._coerce_output_value(val, meta.get("type"))
logging.info(f"[CodeExec]: populate list key='{key}' raw='{val}' coerced='{coerced}'")
self.set_output(key, coerced)
return
blob = base64.b64decode(content_b64)
parsed = FileService.parse(
name,
blob,
False,
tenant_id=self._canvas.get_tenant_id(),
)
attachment_type = self._normalize_attachment_type(name, mime_type)
section = self._format_attachment_section(key, attachment_type, name, parsed)
sections.append(section)
logging.info(f"[CodeExec]: parse attachment section key='{key}' from artifact='{name}'")
except Exception as e:
logging.warning(f"[CodeExec]: Failed to parse artifact for content section '{key}': {e}")
fallback_type = self._normalize_attachment_type(name, mime_type)
fallback_name = name
fallback_url = ""
if idx - 1 < len(artifact_urls):
fallback_url = artifact_urls[idx - 1].get("url", "")
fallback_text = "Artifact generated but parse failed."
if fallback_url:
fallback_text += f" Download: {fallback_url}"
sections.append(self._format_attachment_section(key, fallback_type, fallback_name, fallback_text))
default_val = parsed_stdout if parsed_stdout is not None else raw_stdout
for idx, (key, meta) in enumerate(outputs_items):
if key.startswith("_"):
continue
val = default_val if idx == 0 else None
coerced = self._coerce_output_value(val, meta.get("type"))
logging.info(f"[CodeExec]: populate scalar key='{key}' raw='{val}' coerced='{coerced}'")
self.set_output(key, coerced)
if sections:
return f"attachment_count: {len(sections)}\n\n" + "\n\n".join(sections)
return "attachment_count: 0"
def _get_by_path(self, data, path: str):
if not path:
return None
cur = data
for part in path.split("."):
part = part.strip()
if not part:
return None
if isinstance(cur, dict):
cur = cur.get(part)
elif isinstance(cur, list):
try:
idx = int(part)
cur = cur[idx]
except Exception:
return None
else:
return None
if cur is None:
return None
logging.info(f"[CodeExec]: resolve path '{path}' -> {cur}")
return cur
def _normalize_attachment_type(self, name: str, mime_type: str) -> str:
mime_type = str(mime_type or "").strip().lower()
if mime_type.startswith("image/"):
return "image"
if mime_type == "application/pdf":
return "pdf"
if mime_type == "text/csv":
return "csv"
if mime_type == "application/json":
return "json"
if mime_type == "text/html":
return "html"
ext = os.path.splitext(name or "")[1].lower().lstrip(".")
return ext or "file"
def _format_attachment_section(self, key: str, attachment_type: str, name: str, parsed: str) -> str:
title = f"{key} ({attachment_type})"
if name:
title += f": {name}"
body = parsed if isinstance(parsed, str) else json.dumps(parsed, ensure_ascii=False)
return f"{title}\n{body}".strip()

Some files were not shown because too many files have changed in this diff Show More