## Summary
- Add optional `region` parameter to `Minio()` client constructor in
`rag/utils/minio_conn.py`
- Reads from `MINIO.region` in settings, defaults to `None` when not
configured
- Required by some S3-compatible storage services (e.g., AWS S3, Tencent
COS) for proper bucket access
## Motivation
When using RAGFlow with S3-compatible storage that requires a region
(such as AWS S3 or Tencent Cloud COS), the MinIO client fails to access
buckets because the `region` parameter is not passed through.
The `Minio()` Python client already supports the `region` parameter
natively — this PR simply wires it up from the RAGFlow configuration.
## Changes
- `rag/utils/minio_conn.py`: Pass `region=settings.MINIO.get("region",
None) or None` to `Minio()` constructor
## Backward Compatibility
- No breaking changes. When `region` is not configured, it defaults to
`None`, preserving the existing behavior exactly.
## Test Plan
- [ ] Verified with MinIO (no region set) — works as before
- [x] Verified with S3-compatible storage requiring region — bucket
access succeeds
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Enhanced MinIO client initialization with regional configuration
support for improved compatibility with region-specific deployments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Jarry Wang <code-better-life@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add float parsing
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
api_host -> webAPI
ExternalApi -> restAPIv1
### Type of change
- [x] Refactoring
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Updated internal API endpoint configuration to use consolidated base
URL constants for improved maintainability and consistency across the
application.
* **Chores**
* Updated server-side protocol validation for admin connectivity checks.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: The agent selected a knowledge base, but the API returned the
error: "No dataset is selected".
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
This fixes two broken internal documentation links in the guides:
- `docs/develop/mcp/launch_mcp_server.md` linked
`./acquire_ragflow_api_key.md`, but the target page lives one level up
as `../acquire_ragflow_api_key.md`.
- `docs/guides/dataset/run_retrieval_test.md` linked
`./construct_knowledge_graph.md`, but the actual page lives under
`./advanced/construct_knowledge_graph.md`.
These broken links make it harder to follow the MCP and retrieval-test
docs from the local docs tree.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Refactor context search command
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Fix `a image` → `an image` in README and log message
- Fix `colomn` → `column` in table structure recognizer comment
- Fix `formated` → `formatted` in confluence connector docstring
- Fix `tabel of content` → `table of contents` in TOC prompt
## Test plan
- [ ] Documentation and comment changes, no functional impact
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add validation logic for parser_config.
Refactor the processing flow. Before change, validation logics and
update logics are mixed up - some validation logis executes followed by
some update logic executes and then another such
"validation-and-then-update" which is not good. After change, all
validation logic executes firstly. Update logic will be executed after
ALL validation logic executed.
Validation logic for parameters (that come from front end) will be
checked using Pydantic. For validation logic that depends on data from
DB, they will be in separate methods.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
fix#13944 where OpenAI-compatible custom endpoints failed verification
when model names contained `gpt-5` becauser of incorrect name-based
handling in the Base/backend=`base` path.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
The MySQL and PostgreSQL sync classes in `sync_data_source.py` were not
passing `id_column`, `timestamp_column`, and `metadata_columns` to
`RDBMSConnector`,
making incremental sync and document update impossible even when
configured.
- Without `id_column`: updated records generate new documents instead of
overwriting existing ones (doc ID is derived from content hash, so any
change produces a new ID).
- Without `timestamp_column`: `poll_source` always falls back to full
sync,
ignoring the configured time range.
- The three fields existed in the frontend default values but had no
form
inputs, so users had no way to fill them in.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### Changes
- **Backend** (`rag/svr/sync_data_source.py`): pass `id_column`,
`timestamp_column`, and `metadata_columns` from `self.conf` to
`RDBMSConnector` for both `MySQL` and `PostgreSQL` sync classes.
- **Frontend**
(`web/src/pages/user-setting/data-source/constant/index.tsx`):
add `ID Column`, `Timestamp Column`, and `Metadata Columns` form fields
to MySQL and PostgreSQL data source configuration UI with tooltips.
Signed-off-by: lixintao <lixintao@uniontech.com>
Co-authored-by: lixintao <lixintao@uniontech.com>
### What problem does this PR solve?
Implement UpdateDataset and UpdateMetadata in GO
Add cli:
UPDATE CHUNK <chunk_id> OF DATASET <dataset_name> SET <update_fields>
REMOVE TAGS 'tag1', 'tag2' from DATASET 'dataset_name';
SET METADATA OF DOCUMENT <doc_id> TO <meta>
### Type of change
- [ ] Refactoring
### What problem does this PR solve?
Add a script to migrate data in tenant_llm into tenant_model_provider.
### Type of change
- [x] Other (please describe): tool script.
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
Now user can use 'think mode' to chat with LLM
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Problem Description
When a user creates Dataset A using the **Tag parser** (for CSV/Excel
files with tag definitions), and then creates Dataset B, the Tag Sets
dropdown in Dataset B's Configuration page cannot display Dataset A.
### Steps to Reproduce
1. Create Dataset A with **Tag** as the chunking method
2. Upload a CSV file to Dataset A to generate tags
3. Create Dataset B
4. Navigate to Dataset B → Configuration → Tag Sets
5. **Expected**: Dataset A should appear in the dropdown
6. **Actual**: The dropdown is empty, Dataset A is not visible
---
## Root Cause Analysis
After thorough code review, **the original code logic is correct**. The
`chunk_method` field flows properly through the system:
### Data Flow
```mermaid
sequenceDiagram
participant Frontend
participant Pydantic
participant API
participant Database
Note over Frontend,Database: Creating a Tag Dataset
Frontend->>Pydantic: POST {chunk_method: "tag"}
Pydantic->>API: serialization_alias converts<br/>chunk_method → parser_id
API->>Database: INSERT {parser_id: "tag"}
Note over Frontend,Database: Querying Datasets
Frontend->>API: GET /api/v1/datasets
API->>Database: SELECT parser_id, ...
Database-->>API: Returns {parser_id: "tag"}
API->>API: remap_dictionary_keys()<br/>parser_id → chunk_method
API-->>Frontend: {chunk_method: "tag"}
Note over Frontend: Filter: x.chunk_method === 'tag'
Note over Frontend: ✅ Match found!
```
### Field Mapping
**Location**: `api/utils/api_utils.py:657-662`
```python
DEFAULT_KEY_MAP = {
"chunk_num": "chunk_count",
"doc_num": "document_count",
"parser_id": "chunk_method", # Maps DB field to API response
"embd_id": "embedding_model",
}
```
### Frontend Filtering (Already Correct)
**Location**:
`web/src/pages/dataset/dataset-setting/components/tag-item.tsx:24`
```typescript
const knowledgeOptions = knowledgeList
.filter((x) => x.chunk_method === 'tag') // ✅ Correct field
.map((x) => ({...}));
```
---
## Actual Issue
The most likely causes for the "bug" are:
1. **Browser Cache**: Old data cached before proper deployment
2. **Stale Data**: Datasets created before the code was fully deployed
3. **Container Not Restarted**: Changes not applied to running container
---
## Resolution
**No code changes are needed.** The existing code correctly:
1. Accepts `chunk_method` from frontend
2. Converts to `parser_id` via Pydantic serialization_alias
3. Stores in database as `parser_id`
4. Maps back to `chunk_method` in API response
5. Frontend filters by `chunk_method === 'tag'`
### What problem does this PR solve?
Update the customer feedback dispatcher template and introduce a new
operator `Variable Aggregator`.
### Type of change
- [x] Other (please describe): Template change
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Place the language configuration in web/.env for easy user
configuration.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
\`switch.py\` line 137 concatenates the operator directly after the text
without separator:
\`'Not supported operator' + operator\` → produces \`"Not supported
operatorXXX"\`
Changed to: \`f'Not supported operator: {operator}'\`
### What problem does this PR solve?
feat(File Management): Refactor File List API and Add Knowledge Base
Document Initialization
- Migrate the file list API endpoint from `/v1/file/list` to
`/api/v1/files` to align with the Python implementation.
- Add logic for initializing knowledge base documents; automatically
create the `.knowledgebase` folder and associated documents when
retrieving the root directory.
- Enhance parameter validation and error handling, including the
introduction of a new `CodeParamError` error code.
- Optimize the file list response structure to match the implementation
on the Python side.
- Update the Vite configuration to support proxying the new
`/api/v1/files` endpoint.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- The Azure SPN storage handler hardcoded
`AzureAuthorityHosts.AZURE_CHINA`, preventing users in Azure Public
Cloud regions (UK-South, EU, US, etc.) from authenticating
- Add a `cloud` config option (env: `AZURE_CLOUD`) supporting all four
Azure sovereignties: `public`, `china`, `government`, `germany`
- Defaults to `public` (global Azure) — the most common international
use case
Closes#13259
## Test plan
- [ ] Verify default (`cloud: public`) connects to Azure Public Cloud
endpoints
- [ ] Verify `cloud: china` retains existing behavior for Azure China
users
- [ ] Verify `AZURE_CLOUD` env var overrides the config file value
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Replace `quay.io/minio/minio` with `pgsty/minio` community fork in
`docker/docker-compose-base.yml`
MinIO stopped distributing pre-built Docker images and changed its
license. The pgsty/minio fork provides drop-in compatible images under
AGPLv3.
Closes#13840
## Test plan
- [x] Verify `docker compose -f docker/docker-compose-base.yml up -d`
pulls the pgsty/minio image successfully
- [ ] Verify MinIO console accessible on port 9001
- [ ] Verify RAGFlow backend can connect to MinIO and perform file
operations normally
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
feat: Implement file upload and folder creation features
- Add file upload route in router.go
- Add file operation methods in dao/file.go
- Add util/file.go for file type detection and filename handling
- Implement file upload and folder creation endpoints in handler/file.go
- Implement file upload and folder creation logic in service/file.go
- Modify response message format in memory.go
- Add document count method in dao/document.go
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Introduce 5 new tables, including model groups and provider instance.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
1. Search() in Infinity can return row_id now
2. To Get ROW_ID from search(), refer to handling of retrieval_test.
example
```
$ curl -s -X POST "http://localhost:$PORT/v1/chunk/retrieval_test" -H "Authorization: $TOKEN" -H "Content-Type: application/json" -d '{"kb_id": "4fcd01582ca911f1954184ba59049aa3", "question": "曹操"}'
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR fixes a race in batch document parsing where overlapping parse
requests for the same document could clear/rewrite chunk state and make
previously parsed content appear lost. It adds an atomic per-document
parse guard so only one parse can run at a time for that document (Fixes
#13864 ).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR fixes WebDAV sync behavior for unsupported file types
([#13795](https://github.com/infiniflow/ragflow/issues/13795)).
Previously, the WebDAV connector selected files primarily by modified
time (and size threshold) and could still pass unsupported extensions
into the download/document-generation path. This caused unnecessary
processing and inconsistent behavior compared with connectors that
validate file type earlier.
This change adds extension validation in two places:
1. **Early filter during recursive listing** to skip unsupported files
before they enter the download flow.
2. **Defensive filter before download/document creation** to prevent
unsupported files from being processed if any listing edge case slips
through.
It also wires `allow_images` into the WebDAV sync path so image
extension handling follows connector policy.
Scope is intentionally limited to WebDAV for a focused bug-fix PR.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### How was this tested?
- Manual verification with mixed file types under the configured WebDAV
path:
- supported: `.pdf`, `.txt`, `.md`
- unsupported: `.exe`, `.bin`, `.dat`
- Triggered full sync and polling sync.
- Confirmed unsupported files are skipped before download.
- Confirmed supported files are still indexed normally.
- Confirmed image handling follows `allow_images` setting.
Fixes: #13795
### What problem does this PR solve?
Fixes markdown tables being parsed twice (once as markdown and again as
generated HTML), which caused duplicate table chunks in the chunk list
UI.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Two small fixes:
1. **iterationitem.py line 72**: Typo "interationitem" → "iterationitem"
(missing 't'). The component name check never matched IterationItem
components.
2. **raptor.py line 94**: Error message "Embedding error: " had a
trailing colon with no details. Changed to "Embedding error: empty
embeddings returned".
### What problem does this PR solve?
Fix: The dataset on the list page cannot be renamed.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Implement InsertDataset and InsertMetadata in GO
new internal cli for go:
INSERT DATASET FROM FILE "file_name"
INSERT METADATA FROM FILE "file_name"
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: If a model configured in the agent is deleted from the user
center, a notification will be displayed on the canvas with a red
border.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
As title.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: The agent form sheet will be obscured by the message log sheet.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Previously, `apikey_required` called
`request.headers.get('Authorization').split()[1]` without checking for
None or insufficient parts, causing an unhandled AttributeError or
IndexError (500) instead of a proper 403 JSON response.
This applies the same guarding pattern already used by `token_required`
in the same file.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Fix: Unable to reconnect after deleting the connection between begin and
parser #13868
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: The chat settings are not displayed correctly on the first page
load.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)