Commit Graph

7135 Commits

Author SHA1 Message Date
euvre
e65bac238e fix: preserve existing links when bulk linking files to knowledge bases (#16587) 2026-07-03 13:17:19 +08:00
Wang Qi
6a4b9be426 Refactor: reformat all code for lefthook using ruff and gofmt (#16585) dev-20260703-2 2026-07-03 12:53:39 +08:00
Yingfeng
19fcb4a981 Fix harness DAG slow-branch test cased by nil initialization of pregel engine (#16591) dev-20260703 2026-07-03 12:53:25 +08:00
euvre
918229613a fix: prevent duplicate 'skills' and '.knowledgebase' folders caused by race conditions (#16568) 2026-07-03 12:06:45 +08:00
Muhammad Furqan
83540185e1 fix(agent/tools): port AkShare to ToolBase so it works as an Agent tool (#16417)
### What problem does this PR solve?

Closes #16416.

The **AkShare** agent tool (`agent/tools/akshare.py`) was never ported
to the modern `ToolBase`/`_invoke` interface during the agent module
redesign and was still written against the removed legacy
`_run`/`be_output` API, so it was non-functional:

1. **Adding it to an Agent raised `AttributeError`.** `AkShare` extended
`ComponentBase` (not `ToolBase`) and `AkShareParam` defined no `meta`,
so it had no `get_meta()`. `agent/component/agent_with_tools.py` builds
each tool's function descriptor via `cpn.get_meta()`, so constructing an
Agent that includes the AkShare tool raised `AttributeError: 'AkShare'
object has no attribute 'get_meta'`.
2. **It could never run.** `invoke()` dispatches to `self._invoke`, but
`AkShare` only implemented the legacy `_run`, so `_invoke` fell through
to `ComponentBase._invoke` → `NotImplementedError`. `_run` also called
`be_output(...)`, which no longer exists on the base classes.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

- Port `AkShareParam` to `ToolParamBase` with a `ToolMeta` (defined
before `super().__init__()`, matching `ArXivParam`/`TavilyExtractParam`)
exposing a required `query` parameter — the stock symbol to look up,
default `{sys.query}`. `query` matches the `{sys.query}` convention
shared by the other tools.
- Rewrite the component with `_invoke`/`set_output("formalized_content",
...)` (errors surfaced via `_ERROR`), keeping `top_n` and importing
`akshare` lazily.
- Add regression tests
(`test/unit_test/agent/component/test_akshare.py`) covering param
construction, validation, and the tool descriptor.

Same class of defect as #16329 (DeepL) and #16414 (Crawler).
Backend-only; no frontend changes.

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-07-03 11:39:26 +08:00
Jin Hai
1aa8abe373 Go: file syncer service framework (#16579)
### Summary

./ragflow_main --syncer to start file syncer


config yaml file has following config
```
file_syncer:
  max_concurrent_syncs: 4 # concurrent file sync threads
  sync_interval: 3 # check interval

```

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-07-03 11:14:02 +08:00
Kevin Hu
62f94cd59b Feat: Add knowledge compilation workflows (#16515)
## Summary
- Add knowledge compilation template APIs, services, and builtin
template seed data
- Add advanced knowledge compile structure/artifact/RAPTOR workflow
support
- Update parsing, dataset/document APIs, and supporting services for
compilation workflows
2026-07-02 23:22:07 +08:00
Jin Hai
7d64a78f83 Go: unify three services into one binary (#16462)
### Summary

Plan to start api_server, admin_server and ingestor in one binary:
- ./ragflow_main --admin
- ./ragflow_main --api
- ./ragflow_main --ingestor

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-07-02 21:21:10 +08:00
Liu An
32c5cb16e9 Docs: Update version references to v0.26.3 in READMEs and docs (#16574) v0.26.3 2026-07-02 20:55:15 +08:00
Wang Qi
93f6d647d4 Fix the sandbox exec image cannot show and download (#16577) 2026-07-02 20:49:51 +08:00
maoyifeng
4a81b9cfde fix workflow file type Identify (#16576)
fix workflow file type Identify
2026-07-02 20:41:14 +08:00
Lynn
bc54903bf6 Fix: display model_id in memory_list (#16567) 2026-07-02 20:28:27 +08:00
chanx
9a6d30bfe6 Fix: send agent log date filters as local wall-clock strings (#16575) 2026-07-02 20:23:15 +08:00
qinling0210
dcbd0d260c Port agent PRs to GO - 2 (#16565)
### Summary

Port the following PRs to GO in this PR

https://github.com/infiniflow/ragflow/pull/16420
https://github.com/infiniflow/ragflow/pull/13295
2026-07-02 20:20:11 +08:00
qinling0210
24118ac0d1 Fix chat thinking & Figure issue in GO (#16558)
### Summary

Fix chat thinking & Figure issue
2026-07-02 20:19:50 +08:00
Hz_
42aba36c1b fix(go): chunk stats after chunk deletion (#16553)
## Summary
- Decrement document and knowledgebase chunk counts after chunks are
deleted
- Keep token counts unchanged because deleted chunk token totals are not
available
- Add tests for stats update, zero-delete behavior, error handling, and
transaction rollback
2026-07-02 19:54:42 +08:00
Hz_
dfd95c9c5c fix(go): Add tenant filter to file queries (#16526)
## Summary

- Add `tenant_id` filtering to `FileDAO.Query`.
- Pass tenant IDs through existing file query call sites.
- Prevent cross-tenant filename and folder duplicate checks.
2026-07-02 19:54:22 +08:00
Jin Hai
11dfea489d Fix Go: fix minio port issue (#16552)
### Summary

1. env 'MINIO_PORT' is used for MINIO external access, which shouldn't
be used in Go config.
2. After RAGFlow 1.0 release, MINIO_PORT will be used for docker compose
internal usage. new ENV MINIO_EXTERNAL_PORT will be used for external
access.

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-07-02 19:15:58 +08:00
euvre
fc9116578c Fix: PDF page count detection for compressed PDFs (#16487) 2026-07-02 19:08:49 +08:00
Wang Qi
f7e39a09dc Fix graphrag generate error - AttributeError: 'RedisDB' object has no attribute 'mget' (#16573) 2026-07-02 19:06:16 +08:00
Jack
6ea95807be Fix: disable agent tests (#16562)
### Summary

Per discussion with @yuzhichang , disable agent test firstly.


https://github.com/infiniflow/ragflow/actions/runs/28562749273/job/84704079689?pr=16521
[0.094ms] [rows:0] SELECT * FROM `tenant_model_instance` WHERE
provider_id = "provider-1" AND instance_name = "default" ORDER BY
`tenant_model_instance`.`id` LIMIT 1
  --- FAIL: TestInvoke_ProxyDNSPin (2.00s)
invoke_test.go:375: dial error = Invoke: do: Get "http://8.8.8.8/api":
context deadline exceeded; want pinned proxy IP 192.88.99.1:9999
(connection-refused is acceptable; an absent IP means the dialer fell
through to the default resolver and the pinning regression went
undetected)
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.358ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.283ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58
/home/infiniflow/runners_work/tower01-9dd627fd9c44/ragflow/ragflow/internal/dao/kb.go:79
record not found
[0.523ms] [rows:0] SELECT * FROM `knowledgebase` WHERE id = "da1" AND
status = "1" ORDER BY `knowledgebase`.`id` LIMIT 1
2026/07/02 14:34:58 ExpectPing will have no effect as monitoring pings
is disabled. Use MonitorPingsOption to enable.
  FAIL
  FAIL    ragflow/internal/agent/component    2.759s
  ok      ragflow/internal/agent/component/io    0.026s
2026-07-02 18:50:20 +08:00
Renzo
7d422ba67d feat(go): implement chatbots/<dialog_id>/info and searchbots/detail (#15420)
### What problem does this PR solve?

Part of #15240 (rewriting the RAGFlow API server in Go).

Implements the two public bot endpoints from
`api/apps/restful_apis/bot_api.py`:

- **`GET /api/v1/chatbots/<dialog_id>/info`** (`chatbots_inputs`) —
returns `{title, avatar, prologue, has_tavily_key}` for a dialog the
authenticated tenant owns (tenant match + `status == VALID`), otherwise
`"Authentication error: no access to this chatbot!"`.
- **`GET /api/v1/searchbots/detail`** (`detail_share_embedded`) —
returns search-app detail for a `search_id` the tenant can access.
Permission is checked across the tenant's joined tenants; denial returns
`"Has no permission for this operation."` (operating error, `data:
false`) and a missing app returns `"Can't find this Search App!"`.

Both endpoints authenticate with an SDK **beta token** (`Authorization:
Bearer <beta>`) rather than a session — the token is resolved to a
tenant via `APIToken.query(beta=token)`, backed by a new
`APITokenDAO.GetByBeta`. Because they perform their own token-based
auth, the routes are registered on the unauthenticated route group
(mirroring the Python blueprint, which has no `@login_required`).

Both live in a new `internal/handler/bot.go` + `internal/service/bot.go`
since they share the same source module. Handler unit tests cover the
auth, success, and error-mapping paths.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ling Qin <qinling0210@163.com>
2026-07-02 18:46:00 +08:00
Jack
7ae18a45ee Fix: correct download_deps.py path in error messages and add native libs doc (#16557)
## Summary

Fix error messages in `build.sh` and add documentation in
`internal/development.md` for downloading native static libraries
(pdfium, pdf_oxide, office_oxide).

## Changes

- `build.sh`: change error hint from `uv run download_deps.py` to `uv
run ragflow_deps/download_deps.py` (correct path from project root)
- `internal/development.md`: add section 2.1 documenting how to download
native libs and install lld
2026-07-02 18:41:39 +08:00
maoyifeng
3e7e5f4f6a add web and build start steps (#16572)
### Summary

update ci
2026-07-02 18:17:06 +08:00
writinwaters
ce8941ded4 Docs: Added v0.26.3 release notes. (#16566) 2026-07-02 17:50:14 +08:00
chanx
16b8c79a2b Fix: hide model settings button and related functionality (#16563) 2026-07-02 17:49:52 +08:00
chanx
2ef78189ce Fix: pass mcp to useExportMcp for correct JSON export filename (#16564) 2026-07-02 17:49:46 +08:00
chanx
c44d56f1bb Fix: enhance reference handling in SessionChat component (#16571) 2026-07-02 17:48:48 +08:00
Hz_
d31640a7a2 fix(go): shared chatbot session id length (#16559)
## Summary
- use the project-standard 32-character ID generator when creating
shared chatbot sessions
- fix MySQL insert failures caused by writing 36-character UUID strings
into `api_4_conversation.id`
2026-07-02 17:42:33 +08:00
Jack
c8cf0c967d Feat: add DOCX parser (#16521)
### Summary

Add DOCX parser - go.
2026-07-02 16:31:09 +08:00
Haruko386
9c8d8c7b83 fix: unable to load pic in chunk result (#16485)
### Summary

As title:
2026-07-02 16:05:49 +08:00
Haruko386
3a5bc1371a fix: unable to build go backend (#16542) 2026-07-02 15:57:51 +08:00
Haruko386
92e8eb5fe7 fix: add search keywords and filter for datasets-search (#16550) 2026-07-02 15:57:07 +08:00
Wang Qi
4130091b69 [Python] 1, Fix to allow single login, 2, update password to force re-login (#16556) 2026-07-02 15:47:51 +08:00
Hz_
cbb24944e8 fix(go): clear task cancel signals and chunk counters on rerunWithDelete (#16544) 2026-07-02 15:46:11 +08:00
Hz_
fa1b52ca74 fix(go): prevent moving folders into themselves (#16522) 2026-07-02 15:45:30 +08:00
maoyifeng
404ef4ce87 workflow steps separated to go or python (#16561)
add new workflow yml,  steps separated to go or python
2026-07-02 15:02:11 +08:00
Jin Hai
0b9ab12c58 Go: fix lint (#16533)
### Summary

as title.

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-07-02 13:44:05 +08:00
grandpig
17e3e34e78 refactor: use WaitGroup.Go to simplify code (#16539)
### Summary

Adopt sync.WaitGroup.Go (Go 1.25) to simplify tracked goroutine
spawning. This replaces the error-prone trio of wg.Add(1), go func(),
and defer wg.Done() with a single, self-contained call.

More info: https://github.com/golang/go/issues/63796

Signed-off-by: grandpig <grandpig@outlook.com>
2026-07-02 13:41:53 +08:00
Hz_
d0d0339428 fix(go): agent settings update clearing DSL (#16495)
### Summary

This PR fixes a Go backend bug where updating agent settings, such as
description, could clear the agent DSL.

Root cause:
PUT /api/v1/agents/:canvas_id only bound the dsl field in Go. When the
frontend submitted settings without dsl, the service still updated the
canvas with an empty DSL value.

Changes:

- Treat agent updates as partial patches.
- Preserve existing DSL when dsl is not present in the request.
- Update only specified user_canvas fields instead of saving the full
row.
- Add a regression test for settings updates preserving DSL.

Test:

`go test ./internal/service ./internal/handler`

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-07-02 13:41:24 +08:00
Hz_
a67026f714 fix(go): agent explore thumbnail loading for multiple doc_ids (#16514)
## Summary
- align the Go `/api/v1/thumbnails` endpoint with the frontend request
format for repeated `doc_ids`
- return thumbnail mappings for multiple documents instead of failing on
a single missing document
- preserve Python-compatible thumbnail formatting, including base64
thumbnail passthrough
2026-07-02 12:35:10 +08:00
Hz_
cb8012e30b fix(go): accept disabled chunk filter in list chunks handler (#16532)
### Summary

Fixes a bug in the Go chunk list handler where the available` query
parser rejected `false` and `0` even though they were documented as
supported values.`

This caused requests from the "Disabled" chunk filter to return HTTP 400
and broke the chunk list page when filtering disabled chunks.
2026-07-02 12:07:19 +08:00
Haruko386
b4825166a7 fix: JSONMap scan in dataset index chunking config (#16489)
### Summary

As title

This PR fixes dataset index task creation failing with unsupported data
type: entity.JSONMap when loading document chunking config.

#### issues:
```
2026/06/30 15:19:40 /home/infiniflow/Documents/development/ragflow/internal/dao/document.go:162 
[error] unsupported data type: ragflow/internal/entity.JSONMap
```

#### Changes:
+ Adds the missing GORM type:longtext tag to ParserConfig in
DocumentDAO.GetChunkingConfig.
+ Adds a DAO regression test covering GetChunkingConfig joins across
document, knowledgebase, and tenant while scanning parser_config.
2026-07-02 12:06:53 +08:00
Haruko386
d6b1c5937b fix: get duplicate datasetID when get-Chat (#16498)
### Summary

As title

```go
// Resolve kb_ids to kb_names
	kbNames, datasetIDs := s.getDatasetNamesAndIDs(chat.KBIDs)

        // duplicated add datasetID(removed)
	for _, kbID := range chat.KBIDs {
		datasetID, ok := kbID.(string)
		if !ok {
			continue
		}
		datasetIDs = append(datasetIDs, datasetID)
	}
```
2026-07-02 12:06:29 +08:00
Haruko386
ee45c97b0b fix: unadble to add metadata for file in kb (#16523)
### Summary

As title

Before, it return `update success` but never insert or update any
metadata

fixed:

```go
	_, err = s.docEngine.InsertMetadata(nil, []map[string]interface{}{
		{
			"id":          docID,
			"kb_id":       doc.KbID,
			"meta_fields": meta,
		},
	}, tenantID)
```
2026-07-02 12:06:05 +08:00
Br1an
27c9a093bd Fix: close MCP sessions after canvas execution to prevent connection leaks (#13295)
### What problem does this PR solve?

Closes #12962

MCPToolCallSessions created during agent execution (in `Agent.__init__`)
are never explicitly closed. Each session starts its own event loop
thread and opens an SSE/HTTP connection to the MCP server. When the
canvas goes out of scope, these threads and connections remain alive
indefinitely, accumulating over time and causing resource exhaustion
after prolonged use.

### Solution

1. Add a `Graph.close()` method that iterates all components, finds
MCPToolCallSessions held by Agent tools, and calls `close_sync()` on
each to properly shut down the event loop, thread, and connection.
2. Call `canvas.close()` in `finally` blocks after `canvas.run()`
completes in `canvas_service.py` and `canvas_app.py`.
3. Move MCP session cleanup to `finally` blocks in `test_tool` endpoint
(`mcp_server_app.py`) and `get_mcp_tools` (`api_utils.py`) to ensure
sessions are closed even on exceptions.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: conflict-resolver <conflict-resolver@local>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-07-02 10:57:24 +08:00
Zhichang Yu
ba552f64b9 Stabilize timeout tests with semantic assertions (#16537)
Replace fragile wall-clock timeout assertions with semantic checks for
deadline errors, retry suppression, and event ordering. Keep only
lower-bound timing checks where they prove backoff behavior. This
reduces CPU-load flakes without weakening regression coverage.
2026-07-02 10:56:38 +08:00
euvre
3195d6fa89 fix: improve Normal role badge visibility with proper styling (#16528) 2026-07-02 10:47:01 +08:00
Wang Qi
7abc69434f [Go] Fix to allow duplicate key for provider (#16543) 2026-07-02 10:34:36 +08:00
Hz_
9b83d0f154 fix(go): document count in kb (#16490)
### Summary
This PR fixes incorrect dataset document counters in the Go service.

Several document creation paths inserted document records directly
through documentDAO.Create, bypassing the shared InsertDocument logic
that increments knowledgebase.doc_num. As a result, datasets could
contain documents while doc_num remained 0.
2026-07-02 10:34:14 +08:00