26 Commits

Author SHA1 Message Date
Jin Hai
1087a25f22 Revert "feat(go-api): Add Go chat session message delete and feedback APIs" (#16465)
Reverts infiniflow/ragflow#16442
2026-06-29 21:37:11 +08:00
Hz_
a553886989 feat(go-api): Add Go chat session message delete and feedback APIs (#16442)
### Summary

```
/api/v1/chats/<chat_id>/sessions/<session_id>/messages/<msg_id> DELETE
/api/v1/chats/<chat_id>/sessions/<session_id>/messages/<msg_id>/feedback PUT
```

Migrates the chat session message delete and feedback APIs to the Go
server, matching the Python behavior for authorization, session
ownership checks, message/reference updates, and feedback validation.

### Testing

  - `/usr/local/go/bin/go test ./internal/service ./internal/handler`
- Verified through the frontend page for deleting chat messages and
updating message feedback
2026-06-29 19:05:50 +08:00
Zhichang Yu
195bfffb5e fix(security): address 93 CodeQL code-scanning alerts across 61 files (#16407)
## Summary

Resolves all 93 open alerts at
https://github.com/infiniflow/ragflow/security/code-scanning by rule:

| Rule | Count | Treatment |
|------|-------|-----------|
| py/clear-text-logging-sensitive-data | 23 | Real fix — log scrubbing |
| go/path-injection | 15 | Real fix where possible, suppression with
rationale |
| go/request-forgery | 8 | Suppression with rationale
(operator-controlled URLs) |
| go/clear-text-logging | 10 | Real fix — log scrubbing |
| go/unsafe-quoting | 5 | Real fix — escape or refactor |
| go/sql-injection | 3 | Real fix — orderby whitelist + CodeQL comment |
| go/uncontrolled-allocation-size | 2 | Real fix — cap to 1024 |
| go/incorrect-integer-conversion | 3 | Real fix — ParseInt + range
check |
| go/insecure-hostkeycallback | 1 | Real fix — known_hosts file |
| go/disabled-certificate-check | 2 | Suppression with rationale |
| go/command-injection | 1 | Suppression (sanitized via shq()) |
| go/email-injection | 1 | Suppression with rationale |
| go/cookie-httponly-not-set | 1 | Suppression (SPA bootstrap) |
| js/stack-trace-exposure | 1 | Real fix — generic client message |
| js/prototype-pollution-utility | 1 | Real fix — reject
__proto__/constructor/prototype |
| py/weak-sensitive-data-hashing | 1 | Real fix — MD5 → SHA-256 |
| py/incomplete-url-substring-sanitization | 3 | Real fix —
urlparse(hostname) |
| py/paramiko-missing-host-key-validation | 1 | Real fix —
load_system_host_keys + RejectPolicy |
| cpp/integer-multiplication-cast-to-long | 2 | Real fix — cast to
size_t |

## Real fixes (with measurable security improvement)

**SSH host key verification (Go + Python)**  
Replace `InsecureIgnoreHostKey()` / `paramiko.AutoAddPolicy()` with
proper host key verification against a known_hosts file (configurable
via `SSH_KNOWN_HOSTS` env / `known_hosts` config field; fail-closed when
unset). Loads `~/.ssh/known_hosts` first via `load_system_host_keys()`
so existing setups keep working.

**SQL injection in `user_canvas`**  
Add `userCanvasOrderableColumns` whitelist + `userCanvasOrderClause`
helper. Both `GetList()` and `ListByTenantIDs()` now route the
user-supplied `orderby` query param through the helper, defaulting to
`create_time` on miss.

**SQL injection in `pipeline_operation_log`**  
Existing whitelist documented via CodeQL comment.

**Real SQL injection in `infinity/chunk.go:931`**  
Escape `'` → `''` on user-controlled `questionText` before splicing into
`filter_fulltext(...)` SQL filter.

**Real SQL injection in `elasticsearch/sql.go:75`**  
Defense-in-depth escape on tokenizer output before splicing into
`MATCH(...)`.

**Python code injection in `result_protocol.go`**  
Replace raw JSON literal embedding into Python/JS expressions with
base64 + `json.loads` / `JSON.parse(Buffer.from(...,
'base64').toString('utf8'))`. Eliminates both the unsafe-quoting sink
and the brittleness of mixing JSON true/false/null with Python syntax.

**URL substring check bypass in `embedding_model.py`**  
Replace `if "dashscope-intl.aliyuncs.com" in u` with
`urlparse(u).hostname == "dashscope-intl.aliyuncs.com"` so a base_url
like `https://attacker.example/?u=dashscope-intl.aliyuncs.com` cannot
bypass the routing.

**Prototype pollution in `setNestedValue` (TS)**  
Reject `__proto__`/`constructor`/`prototype` keys before any assignment.

**Integer overflow**  
- scrypt params via `ParseInt` + non-positive check
(`internal/common/password.go`)
- `topN` and `n` caps to 1024 (retrieval_service.go, dataset.go)
- `nalloc*statesize` cast to `size_t` (cpp/re2/onepass.cc)

**Cookie httponly**  
Set explicitly with rationale: this is the OAuth bootstrap cookie
intentionally read by the SPA.

**Stack trace exposure**  
Replace `error.message` in HTTP 500 response with generic `"internal
error"`; full error still logged server-side via `console.error`.

**Weak hashing**  
MD5 → SHA-256 for deterministic `conv_id` derivation
(`conversation_service.py`).

**Log scrubbing**  
Remove or redact user-controlled / sensitive content from clear-text
logs across 8 ingestion parsers, `llm_service.py` ×11,
`tenant_llm_service.py` ×7, `misc_utils.py` ×4, `redis_conn.py` ×10,
`conftest.py` ×4, `init_data.py`, `dataset_api_service.py`,
`generator.py`, `mysql_migration.py`, `cli.go`, `user_command.go`,
`pdf_parser.go`. Most patterns converted to parameterized logging
(`logging.info("...: %d", n)`) or static messages.

## CodeQL suppressions (each with rationale)

For alerts where the data flow is genuinely safe but CodeQL can't see
the context — operator-controlled URLs, sanitized inputs, etc. — I added
`// codeql[go/<rule>] <rationale>` annotations rather than dismissing
them, so future readers can audit the rationale inline:

- `internal/agent/component/invoke.go:135` — Invoke is a generic canvas
HTTP client
- `internal/service/langfuse.go` ×2 — host is per-tenant operator config
- `internal/service/file.go:1184` — already SSRF-guarded by
`assertURLSafe`
- `internal/utility/mcp_client.go` ×3 — already `AssertURLSafe` +
IP-pinned
- `internal/entity/models/bedrock.go` — sigv4-signed request, URL can't
be tampered
- `internal/service/deep_researcher.go:269` — `callback` is SSE display
string, not SQL
- `internal/engine/infinity/chunk.go:346` — UUIDs can't contain `'` (RFC
4122)
- `internal/cli/common_command.go` ×2 — CLI trusts operator-configured
URL
- `internal/utility/smtp.go:194` — msg is server-built, not user form
input
- `internal/entity/models/*` ×14 (path-injection) — audio file paths are
caller-supplied

## Test plan

-  All 13 modified Go packages build cleanly
-  663 tests pass across `internal/agent/sandbox`, `internal/common`,
`internal/agent/component`, `internal/engine/infinity`, `internal/dao`
-  All 11 modified Python files parse via `ast.parse`
-  TypeScript `tsc --noEmit` clean on the modified
`use-provider-fields.tsx`
-  `node --check` clean on the modified JS file

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-29 09:45:16 +08:00
Zhichang Yu
f58fae5fb7 feat(go-agent): Ported retrieval node, added Keenable web search tool (#16396)
Ported retrieval node, added Keenable web search tool
- [x] New Feature (non-breaking change which adds functionality)
2026-06-29 09:45:16 +08:00
Haruko386
a1f1dd5007 feat[Go]: implement Add messages for Go (#16375)
### What problem does this PR solve?

As title

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-26 19:21:52 +08:00
Haruko386
74597b8683 feat[Go]: implemet api: Search/Get/Update-Messages (#16307)
### What problem does this PR solve?

As title:
implement:
```
/api/v1/messages/search GET
/api/v1/messages GET
/api/v1/messages/<memory_id>:<message_id>/content GET
/api/v1/memories/<memory_id>/config GET
/api/v1/messages/<memory_id>:<message_id> PUT
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-06-25 19:07:34 +08:00
maoyifeng
643cb4788f Go CLI: add response output (#16263)
### What problem does this PR solve?

Go CLI: add response output
2026-06-23 18:12:15 +08:00
qinling0210
563d855780 Implement OpenAI chat completions in GO (#16177)
### What problem does this PR solve?

Implement OpenAI chat completions in GO

POST /api/v1/openai/<chat_id>/chat/completions

OpenAI chat cli: internal/development.md

### Type of change

- [x] Refactoring
2026-06-18 18:07:27 +08:00
Jonathan Chang
dfcf226ba3 feat: Implement API of ragflow server in Go (#15256)
## Summary
- Implemented the Go API endpoint for Memory message forgetting:
  - `DELETE /api/v1/messages/{memory_id}:{message_id}`
- Added route registration for the Memory message DELETE endpoint only.
- Added request path validation for `memory_id:message_id`.
- Added service logic to mark a message as forgotten by setting
`forget_at`.
- Preserved Python-compatible response behavior:
  - Success returns `code: 0`, `message: true`, `data: null`.
- Added focused unit tests for message path parsing and invalid message
ID handling.
- Fixed Linux cgo linker config to use the installed shared PCRE2
library so Go tests/builds can run in this environment.
## Related Issue
Closes: #15240 
## Change Type
- [x] Feature
- [x] Test
- [x] Build / CI compatibility

## Implemented API
- `DELETE /api/v1/messages/{memory_id}:{message_id}`
## Real Behavior Proof
Validated with targeted Go tests:
```bash
/tmp/go1.25.0/bin/go test ./internal/handler ./internal/router
```
Result:
```text
ok  	ragflow/internal/handler
?   	ragflow/internal/router	[no test files]
```
Validated server entrypoint build:
```bash
/tmp/go1.25.0/bin/go build -o /tmp/ragflow-server-main ./cmd/server_main.go
```

Result:
```text
build succeeded
```
Validated patch formatting:
```bash
git diff --check
```

Result:

```text
no whitespace errors
```
## Checklist
- [x] Implemented only `DELETE
/api/v1/messages/{memory_id}:{message_id}`.
- [x] Did not implement unrelated Memory message APIs.
- [x] Added route registration.
- [x] Added handler validation.
- [x] Added service-level memory access check.
- [x] Added tests.
- [x] Ran targeted Go tests.
- [x] Ran server build validation.
- [x] Ran `git diff --check`.
2026-06-10 21:27:35 +08:00
ghost
64b860f771 fix(elasticsearch): complete Go result functions (#15148)
## Summary
- Complete the Go Elasticsearch result functions that remained stubbed
after #15160.
- Add focused unit coverage for field mapping, aggregation, IDs, and
highlighting behavior.
- Update a stale query-builder test type import discovered during
validation.

## What changed
- Keep the Elasticsearch Go implementation merged in #15160 and fill in
`GetFields`, `GetAggregation`, `GetHighlight`, and `GetDocIDs` in
`internal/engine/elasticsearch/chunk.go`.
- Add regression and invariant coverage in
`internal/engine/elasticsearch/chunk_helpers_test.go`.
- Update `internal/service/nlp/query_builder_test.go` to use the current
`types.MatchTextExpr` type.

## Why
- #15160 implemented the main Go Elasticsearch surface, but
retrieval/tag flows still call result functions that returned stubs.
- Completing these functions keeps Elasticsearch result processing
aligned with the expected document-engine behavior for field extraction,
tag aggregation, doc ID extraction, and snippet highlighting.

## Validation
- `go test ./internal/engine/elasticsearch`
- `GOARCH=arm64 CGO_ENABLED=1 go test ./internal/service/nlp -run
TestQueryBuilder`
- `git diff --check`
- CodeRabbit review reported 0 issues after follow-up fixes.
- Codex Security diff scan found no reportable issues.

## Notes
- This PR is now a follow-up to #15160 rather than a competing
implementation.
- A full local `go test ./internal/service/nlp` run is blocked by local
WordNet resource prerequisites; the query-builder tests touched by this
PR pass with the arm64 CGO path.
2026-06-09 20:10:11 +08:00
qinling0210
5e0a7ce408 Update Rerank logic in GO (#15755)
### What problem does this PR solve?

Sync the rerank logic in the following PR  to  GO.
https://github.com/infiniflow/ragflow/pull/15429
https://github.com/infiniflow/ragflow/pull/15434

### Type of change

- [x] Refactoring
2026-06-08 15:28:10 +08:00
qinling0210
c960dc2a4c Refine handling of POST /api/v1/datasets/search in GO (#15583)
### What problem does this PR solve?

Refine handling of POST /api/v1/datasets/search in GO

### Type of change

- [x] Refactoring
2026-06-08 11:49:37 +08:00
Jack
3b1ae3f829 feat: support SelectFields override in DocEngine for KG-specific queries (#15684)
## Summary

Both ES and Infinity engines now respect `SearchRequest.SelectFields`,
allowing callers to specify output columns for KG
entity/relation/community queries instead of the default chunk columns.

### Changes

- **`internal/engine/elasticsearch/chunk.go`**: Added `SelectFields`
override after default `outputColumns`
- **`internal/engine/infinity/chunk.go`**: Added `SelectFields` override
after default `outputColumns`
- **`internal/engine/elasticsearch/kg_test.go`** (new): Integration test
(skipped unless `ES_TEST=1`)

### Usage

```go
result, err := docEngine.Search(ctx, \&types.SearchRequest{
    KbIDs:        kbIDs,
    SelectFields: []string{entity_kwd, entity_type_kwd, rank_flt, n_hop_with_weight},
    Filter:       map[string]interface{}{knowledge_graph_kwd: entity},
})
```

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:41:39 +08:00
qinling0210
af85aa9c7b Implement Elasticsearch functions in GO (#15160)
### What problem does this PR solve?

Implement Elasticsearch functions in GO (except for Search)

### Type of change

- [x] Refactoring
2026-05-25 19:15:07 +08:00
qinling0210
77834870fc Refact functions in engine in GO (#14981)
### What problem does this PR solve?

Refact functions in engine in GO
### Type of change

- [x] Refactoring
2026-05-19 17:34:59 +08:00
Jin Hai
3a5df08c76 Go: add file parse command (#14892)
### What problem does this PR solve?

```
RAGFlow(user)> ocr with 'hunyuanocr@test@gitee' file './picture.png'
+----------------------------------------------------------+
| text                                                     |
+----------------------------------------------------------+
| 生活不是等待风暴过去,而是学会在雨中翩翩起舞。
——佚名                                                       |
+----------------------------------------------------------+

RAGFlow(user)> list 'test@gitee' tasks;
+---------+----------------------------------+
| status  | task_id                          |
+---------+----------------------------------+
| success | C3FX4MQNKY5MGC6ZFMIXIAMJKHCEBQB5 |
+---------+----------------------------------+
RAGFlow(user)> show 'test@gitee' task 'C3FX4MQNKY5MGC6ZFMIXIAMJKHCEBQB5';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| content                                                                                                                                                                                                                                                          | index |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| # PDF 1: Purpose of RAGFlow  

RAGFlow is an open source Retrieval-Augmented Generation (RAG) engine designed to turn raw documents into reliable context for large language models.Its purpose is to make it practical to build an Al assistant that can ans... | 1     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-15 12:29:52 +08:00
Jin Hai
aa57b5bd8b Go: move logger to common module (#14545)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-06 10:41:58 +08:00
Yingfeng
4ee0702aed Feat: add skills space to context engine (#13908)
### What problem does this PR solve?

issue #13714

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-30 12:36:03 +08:00
qinling0210
1473000135 Implement retrieval_test in GO (#14231)
### What problem does this PR solve?

Implement retrieval_test in GO

### Type of change

- [x] Refactoring
2026-04-24 15:30:14 +08:00
qinling0210
82fa85c837 Implement Delete in GO and refactor functions (#13974)
### What problem does this PR solve?

Implement Delete in GO and refactor functions

### Type of change

- [x] Refactoring

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a remove_chunks command to delete specific or all chunks from a
document.
  * Added new endpoints for chunk removal and chunk update.

* **Refactor**
* Renamed index commands to dataset/metadata table terminology and
updated REST routes accordingly.
* Updated chunk update flow to a JSON POST style and improved metadata
error messages.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-04-09 09:52:31 +08:00
qinling0210
49386bc1b5 Implement UpdateDataset and UpdateMetadata in GO (#13928)
### What problem does this PR solve?

Implement UpdateDataset and UpdateMetadata in GO

Add cli:
UPDATE CHUNK <chunk_id> OF DATASET <dataset_name> SET <update_fields>
REMOVE TAGS 'tag1', 'tag2' from DATASET 'dataset_name';
SET METADATA OF DOCUMENT <doc_id> TO <meta>


### Type of change

- [ ] Refactoring
2026-04-07 09:44:51 +08:00
qinling0210
bb4a06f759 Implement InsertDataset and InsertMetadata in GO (#13883)
### What problem does this PR solve?

Implement InsertDataset and InsertMetadata in GO

new internal cli for go:

INSERT DATASET FROM FILE "file_name"
INSERT METADATA FROM FILE "file_name"

### Type of change

- [x] Refactoring
2026-04-01 16:16:25 +08:00
qinling0210
ebf36950e4 Implement Create/Drop Index/Metadata index in GO (#13791)
### What problem does this PR solve?

Implement Create/Drop Index/Metadata index in GO

New API handling in GO:
POST/kb/index 
DELETE /kb/index
POST /tenant/doc_meta_index
DELETE /tenant/doc_meta_index

CREATE INDEX FOR DATASET 'dataset_name' VECTOR_SIZE 1024;
DROP INDEX FOR DATASET 'dataset_name';
CREATE INDEX DOC_META;
DROP INDEX DOC_META;

### Type of change

- [x] Refactoring
2026-03-26 11:54:10 +08:00
qinling0210
7c8927c4fb Implement GetChunk() in Infinity in GO (#13758)
### What problem does this PR solve?

Implement GetChunk() in Infinity in GO

Add cli:
GET CHUNK 'XXX';
LIST CHUNKS OF DOCUMENT 'XXX';

### Type of change

- [x] Refactoring
2026-03-24 20:10:21 +08:00
Jin Hai
610c1b507d Add more API of admin server of go (#13403)
### What problem does this PR solve?

Add APIs to admin server.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 10:44:53 +08:00
Jin Hai
70e9743ef1 RAGFlow go API server (#13240)
# RAGFlow Go Implementation Plan 🚀

This repository tracks the progress of porting RAGFlow to Go. We'll
implement core features and provide performance comparisons between
Python and Go versions.

## Implementation Checklist

- [x] User Management APIs
- [x] Dataset Management Operations
- [x] Retrieval Test
- [x] Chat Management Operations
- [x] Infinity Go SDK

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-03-04 19:17:16 +08:00