201 Commits

Author SHA1 Message Date
Attili-sys
5fc254eb2e Feature big query connector (#15871)
### What problem does this PR solve?

This PR adds Google BigQuery as a first-class data source connector in
RAGFlow.

It enables users to ingest and sync BigQuery data using the same
row-to-document model used by relational database connectors: selected
content columns become document text, metadata columns become document
metadata, an optional ID column provides stable document IDs, and an
optional timestamp column enables cursor-based incremental sync.

The connector supports service-account JSON credentials, table mode,
custom query mode, GoogleSQL queries, cursor-based incremental sync,
deleted-row pruning support, configurable query limits such as
`maximum_bytes_billed`, dry-run validation, batch loading, stable
document IDs, and BigQuery-aware value serialization.
2026-06-29 22:08:40 +08:00
Zhichang Yu
f58fae5fb7 feat(go-agent): Ported retrieval node, added Keenable web search tool (#16396)
Ported retrieval node, added Keenable web search tool
- [x] New Feature (non-breaking change which adds functionality)
2026-06-29 09:45:16 +08:00
Liu An
f86a0e7386 Docs: Update version references to v0.26.2 in READMEs and docs (#16387) 2026-06-29 09:45:16 +08:00
Liu An
4379269374 Docs: Update version references to v0.26.1 in READMEs and docs (#16158)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.26.0 to v0.26.1
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-06-17 19:35:32 +08:00
Wang Qi
02ccd35241 Fix RAGFlow cannot start (#16116)
# Summary
- The culprit is commit b4c8711d5 / PR #15415 (fix: upgrade crawl4ai to
0.8.0).
- That upgrade brought in unclecode-litellm, which installs the same
top-level litellm namespace as upstream litellm.
- The crash happens when files from one LiteLLM distribution are mixed
with files from the other: custom_guardrail.py expects
GuardrailTracingDetail, but types/utils.py can come from the older
conflicting package.
2026-06-17 11:27:31 +08:00
dependabot[bot]
b732636546 build(deps): bump aiohttp from 3.13.3 to 3.14.1 (#16090) 2026-06-16 20:07:32 +08:00
Zhichang Yu
3fa15c0e2f feat(agent): Go port — canvas engine, 22 components, DSL v2, 13 endpoints (#15952)
Ports the agent canvas subsystem from Python to Go.

## What's included

### Canvas Engine (Phase 0/1)
- State engine, scheduler, variable resolver, Redis checkpoint store,
cancel protocol
- **209 tests** across canvas / component / io packages

### 22 Components (P0–P4)
| Tier | Components |
|---|---|
| P0 T1+T2+T3 | LLM, Agent, ExitLoop, Switch, Categorize, Begin,
Message, Invoke |
| P1 T3 | VariableAggregator, VariableAssigner, StringTransform,
ListOperations, DataOperations |
| P2 T3 | Iteration, IterationItem, Loop, LoopItem |
| P3 T3 | UserFillUp, Fillup |
| P4 T5 | Browser, ExcelProcessor, DocsGenerator |

### DSL v2 Schema (Phase 2.5)
- Typed v2 in-memory model with v1-to-v2 auto-detect converter
- v1 legacy field stripping per plan §2.11.7

### HTTP Endpoints & Bug Fixes (Plans PR1–PR3)
- **DELETE SQL bug fix**: gorm v2 `Where("id = ?", id).Delete(...)`
pattern
- **CreateAgent validation**: title/DSL required, duplicate check, 103
envelope
- **13 new endpoints**: templates, prompts, tags, sessions CRUD,
chat/completions (SSE + non-stream stubs), rerun, test_db_connection,
logs, webhook/logs
- **756 Go unit tests** (745 → 756, +18)
- **17 → 0 Python integration test failures** (test_agents.py +
test_session_management/)

### Tools
21 eino tools: HTTPHelper, search tools, financial/data tools, mandatory
stubs

### Infrastructure
OTel observability, NATS message queue, DeepDoc gRPC client, SSRF
guards, IDOR mitigation
2026-06-12 22:58:28 +08:00
Kevin Hu
b5a426e6e0 Feat: chat channels — connect assistants to external messaging bots (#15850)
### What problem does this PR solve?

#15844

Adds a **Chat channels** capability so a RAGFlow assistant (Dialog) can
be exposed as a bot on external messaging platforms (Feishu/Lark,
Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot
in the UI, connects it to an assistant, and inbound messages are
answered from that assistant's knowledge base — replies are delivered
back on the channel.

**Feishu/Lark is implemented and tested end-to-end.** Discord, Telegram,
LINE, and WeCom are scaffolded against the same interface; the remaining
listed channels are tracked as follow-ups.

### Design

**Backend**
- New `chat_channel` table (`tenant_id`, `name`, `channel`, `config`
JSON holding `{credential: {...}}`, `dialog_id`, `status`) +
`ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`.
- Channel framework under `api/channels/`: a `core` registry +
per-channel packages that self-register a builder and implement a common
`Channel` interface (`start`/`stop`/`send` + inbound normalization) over
`IncomingMessage`/`OutgoingMessage`.
- Embedded **reconcile loop** in `ragflow_server`
(`api/channels/bootstrap.py`): loads enabled bots, and
starts/stops/restarts them as rows change (no server restart needed).
Inbound messages run the connected dialog via the non-streaming
completion path, keeping per-end-user conversation history.
- Missing optional channel SDKs degrade gracefully (channel skipped with
a warning; others unaffected). Channel-level errors are logged, not
crashed.
- Feishu's WebSocket client runs in a dedicated thread with its own
event loop to avoid cross-loop/contextvars conflicts with the channel
runtime.

**Frontend**
- **Settings → Chat channels** panel: available-channels grid +
configured-bots list with add/edit/delete and a **Connect assistant**
popup that binds a bot to a dialog.
- Brand icons via simple-icons / reused shared data-source assets, with
colored fallbacks for brands not available.
- Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix
so the settings page no longer highlights the Chat tab.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### Notes
- DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id`
is also covered by a `migrate_db` `alter_db_add_column` for existing
installs.
- Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`,
`line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies.
- Screenshots / per-channel credential docs to follow.

<img width="1338" height="1290" alt="Image"
src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704"
/>

<img width="1344" height="738" alt="Image"
src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617"
/>

<img width="672" height="887" alt="Image"
src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84"
/>

---------
2026-06-12 18:21:30 +08:00
Liu An
92c4b7688b Docs: Update version references to v0.26.0 in READMEs and docs (#15941)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.6 to v0.26.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-06-11 18:34:26 +08:00
Rene Arredondo
a079c08594 fix(deps): exclude litellm 1.82.6 (internal ImportError) — #15916 (#15920)
## Summary

Fixes #15916.

A fresh `docker compose -f docker-compose-macos.yml up -d` against
v0.25.6 errors out on container start with
2026-06-11 11:40:07 +08:00
OrbisAI Security
b4c8711d51 fix: upgrade crawl4ai to 0.8.0 (CVE-2026-26217) (#15415)
## Summary
Upgrade crawl4ai from 0.7.6 to 0.8.0 to fix CVE-2026-26217.

## Vulnerability
| Field | Value |
|-------|-------|
| **ID** | CVE-2026-26217 |
| **Severity** | CRITICAL |
| **Scanner** | trivy |
| **Rule** | `CVE-2026-26217` |
| **File** | `uv.lock` |
| **Assessment** | Likely exploitable |

**Description**: Crawl4AI Has Local File Inclusion in Docker API via
file:// URLs

## Evidence

**Scanner confirmation**: trivy rule `CVE-2026-26217` flagged this
pattern.

**Production code**: This file is in the production codebase, not
test-only code.

## Threat Model Context

This is a web service - vulnerabilities in request handlers are directly
exploitable by remote attackers.

## Changes
- `pyproject.toml`
- `uv.lock`

## Verification
- [x] Build passes
- [x] Scanner re-scan confirms fix
- [x] LLM code review passed

---
*This change addresses a pattern flagged by static analysis. The code
path handles user-influenced input and the fix reduces the attack
surface against both manual and automated exploitation.*

---
*Automated security fix by [OrbisAI Security](https://orbisappsec.com)*
2026-05-29 21:38:41 +08:00
Liu An
0639dba89a Docs: Update version references to v0.25.6 in READMEs and docs (#15248)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.5 to v0.25.6
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-05-26 19:45:43 +08:00
Jin Hai
775ea55679 Docs: update python version to 3.13 (#15103)
### What problem does this PR solve?

1. update python version to 3.13
2. upgrade ormsgpack to 1.6.0

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-21 19:09:19 +08:00
天海蒼灆
3e5b11a523 Feat(browser control):Add new agent component 'browser' to control browser by AI (#14888)
### What problem does this PR solve?
This PR adds a new `Browser` operator to Agent workflows, enabling
prompt-driven browser automation in RAGFlow.Technically based
‘Browser-Use’

It includes:
- Backend browser component execution with tenant LLM integration
- Upload source support (file IDs, URLs, variables, CSV/JSON array)
- Downloaded file persistence to RAGFlow storage
- Frontend node/operator integration, form config, icon, and i18n
updates
- Unit tests for upload/download and ID parsing logic
- Dependency and Docker updates for browser-use runtime support

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-21 15:32:32 +08:00
Jin Hai
90c76e73d0 Docs: Update version references to v0.25.5 in READMEs and docs (#15059)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-20 20:05:45 +08:00
Magicbook1108
b28e134944 Feat: add local & ssh provider in admin panel (#15039)
### What problem does this PR solve?

Feat: add local & ssh provider in admin panel

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-20 16:56:20 +08:00
qinling0210
9d94527b1d Bump to infinity v0.7.0 (#14968)
### What problem does this PR solve?

Upgrade infinity

### Type of change

- [x] Refactoring
2026-05-18 10:25:59 +08:00
wdeveloper16
14c0985182 feat: bump Python minimum from 3.12 to 3.13, drop strenum backport (#14767)
Closes #14753

## What changed

| File | Change |
|---|---|
| `pyproject.toml` | `requires-python` → `>=3.13,<3.15`; remove
`strenum==0.4.15` |
| `Dockerfile` | `uv python install 3.13`, `uv sync --python 3.13` |
| `.github/workflows/tests.yml` | `uv sync --python 3.13` on both matrix
legs |
| `CLAUDE.md` | dev setup command + requirements note updated |
| `deepdoc/parser/mineru_parser.py` | `from strenum import StrEnum` →
`from enum import StrEnum` |
| `agent/tools/code_exec.py` | same |

`StrEnum` has been in the stdlib since Python 3.11 — the `strenum`
backport package is no longer needed once the floor is 3.13.

## Why uv.lock is not regenerated

`uv lock --python 3.13` fails because:

1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0`
2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels)
depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0`
3. These two constraints are irreconcilable on Python 3.13

The lockfile regeneration requires loosening the `numpy` upper bound in
the `infiniflow/graspologic` fork. Once that fork commit is updated and
the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will
succeed.

## RFC corrections

Two claims in the original RFC (#14753) did not hold up under code
review:

- **"graspologic hard-blocks 3.13"** — the infiniflow fork at the pinned
commit has no `<3.13` Python constraint. The blocker is the transitive
`numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a
direct Python version cap.
- **"free-threading throughput gains for I/O-bound workload"** — Python
3.13 free-threading requires a special `--disable-gil` build and
provides no benefit for async I/O code (the GIL is already released
during I/O). The real motivation is forward compatibility and improved
error messages.
2026-05-15 14:40:53 +08:00
Liu An
f038a34154 Docs: Update version references to v0.25.4 in READMEs and docs (#14912)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.3 to v0.25.4
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-05-14 11:07:08 +08:00
Jin Hai
87516edadf Bump to infinity v0.7.0-dev7 (#14897)
### What problem does this PR solve?

Upgrade infinity

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-13 19:42:50 +08:00
Liu An
3182fd0789 Docs: Update version references to v0.25.3 in READMEs and docs (#14896)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.2 to v0.25.3
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-05-13 18:42:42 +08:00
Wang Qi
3838770e7a GraphRAG feature - Part 1 - add spacy to extract entity and relation (#14670)
### What problem does this PR solve?

GraphRAG feature - Part 1 - add spacy to extract entity and relation

<img width="1621" height="1288" alt="image"
src="https://github.com/user-attachments/assets/aadeddad-94da-46c6-adad-9c3784181f61"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-11 12:59:59 +08:00
VincentLambert
08bb53bbb1 Feat: add BedrockCV for vision/image2text inference via LiteLLM (#14705)
## Summary

- `CvModel["Bedrock"]` was absent from `rag/llm/cv_model.py`, causing
`model_instance()` to return `None` when a Bedrock model was used as a
PDF parser — even after correct model resolution.
- This PR adds `BedrockCV`, enabling Bedrock vision models (e.g.
`amazon.nova-pro-v1:0`, `anthropic.claude-3-5-sonnet`) to be used as PDF
parsers.

## What problem does this PR solve?

When a Bedrock model is selected as the PDF parser in a knowledge base,
ingestion failed with:

```
'LiteLLMBase' object has no attribute 'describe_with_prompt'
```

The root cause: `LiteLLMBase` (the Bedrock chat implementation) was the
only registered handler for the Bedrock factory. It does not implement
`describe_with_prompt`. `CvModel` had no Bedrock entry, so
`model_instance()` returned `None` for `image2text` requests.

## Type of change

- [x] New Feature (non-breaking change which adds functionality)

## Changes

**`rag/llm/cv_model.py`**

Adds `BedrockCV(Base)` with `_FACTORY_NAME = "Bedrock"`:

- Uses `litellm.completion` with the `bedrock/` prefix (consistent with
`LiteLLMBase`)
- Parses AWS credentials from the JSON key assembled by `add_llm`
(`auth_mode`, `bedrock_ak`, `bedrock_sk`, `bedrock_region`,
`aws_role_arn`)
- Supports three auth modes: `access_key_secret`, `iam_role` (via STS
`assume_role`), and default credential chain (IRSA, instance profile)
- Implements `describe_with_prompt` and `describe`

## Test plan

- [ ] Configure a Bedrock vision model (e.g. `amazon.nova-pro-v1:0`)
with valid AWS credentials
- [ ] Select it as PDF parser in a knowledge base
- [ ] Verify ingestion of a PDF document completes without errors
- [ ] Verify `CvModel["Bedrock"]` resolves to `BedrockCV`

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 10:29:58 +08:00
Liu An
57b24be6d6 Docs: Update version references to v0.25.2 in READMEs and docs (#14731)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.1 to v0.25.2
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-05-09 19:06:05 +08:00
qinling0210
12f80f170c Bump to infinity v0.7.0-dev6 (#14606)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev6

(uv lock --upgrade-package infinity-sdk)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-07 10:51:17 +08:00
dependabot[bot]
9e4f3614de Chore(deps-dev): Bump pillow from 12.1.1 to 12.2.0 (#14578)
As title
2026-05-06 11:08:38 +08:00
Jin Hai
aa57b5bd8b Go: move logger to common module (#14545)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-06 10:41:58 +08:00
Liu An
ce4c782fd7 Docs: Update version references to v0.25.1 in READMEs and docs (#14488)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.0 to v0.25.1
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-04-30 10:49:26 +08:00
RazmikGevorgyan
c41b5e8a5d fix: migrate Langfuse integration from start_generation to start_obse… (#14205)
The Langfuse Python SDK v3+ removed `start_generation()` method.
RagFlow's code called this non-existent method, causing AttributeError
when Langfuse tracing is enabled.

Replace all `start_generation()` calls with
`start_observation(as_type="generation")` which is the correct v4 SDK
API.

Affected files:
- api/db/services/llm_service.py (12 occurrences)
- api/db/services/dialog_service.py (1 occurrence)

Fixes #14204
Related to #9243

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 10:03:57 +08:00
Liu An
a33d0737cd Docs: Update version references to v0.25.0 in READMEs and docs (#14257)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.24.0 to v0.25.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-04-21 17:26:50 +08:00
Julian
ba7d3f6c31 Add debugpy dependency to pyproject.toml (#14225)
In order to attach the debugger to a running docker container it has to
be inside the docker image

### What problem does this PR solve?

[#14224](https://github.com/infiniflow/ragflow/issues/14224)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 18:05:17 +08:00
dependabot[bot]
b34a726acd Build(deps): Bump pypdf from 6.9.2 to 6.10.2 (#14184)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.9.2 to 6.10.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/releases">pypdf's
releases</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)
by <a href="https://github.com/Ygnas"><code>@​Ygnas</code></a></li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)
by <a href="https://github.com/j-t-1"><code>@​j-t-1</code></a></li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)
by <a href="https://github.com/rassie"><code>@​rassie</code></a></li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)
by <a
href="https://github.com/astahlman"><code>@​astahlman</code></a></li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)
by <a
href="https://github.com/ReinerBRO"><code>@​ReinerBRO</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's
changelog</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)</li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)</li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)</li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)</li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)</li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)</li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)</li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c476b4f293"><code>c476b4f</code></a>
REL: 6.10.2</li>
<li><a
href="c50a0104cf"><code>c50a010</code></a>
SEC: Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li><a
href="ac734dab4e"><code>ac734da</code></a>
SEC: Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
<li><a
href="b49e7eb454"><code>b49e7eb</code></a>
REL: 6.10.1</li>
<li><a
href="62338e9d36"><code>62338e9</code></a>
SEC: Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
<li><a
href="5dcc0aebaa"><code>5dcc0ae</code></a>
DEV: Update pytest-benchmark to 5.2.3</li>
<li><a
href="b42e4aa98a"><code>b42e4aa</code></a>
DEV: Update pinned pillow and pytest where possible (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3732">#3732</a>)</li>
<li><a
href="717446b121"><code>717446b</code></a>
ROB: Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
<li><a
href="9e461d361b"><code>9e461d3</code></a>
DEV: Bump softprops/action-gh-release from 2 to 3 (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3730">#3730</a>)</li>
<li><a
href="500d09d92f"><code>500d09d</code></a>
TST: Update <code>test_embedded_file__basic</code> to use
<code>tmp_path</code> fixture (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3726">#3726</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.9.2&new-version=6.10.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-17 18:43:19 +08:00
Jack
3b7723855c Fix: revert xgboost version to 1.6.0 (#13984)
### What problem does this PR solve?

Revert xgboost version to 1.6.0

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated xgboost dependency from version 3.2.0 to 1.6.0

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 19:53:47 +08:00
Jack
c4b0aaa874 Fix: #6098 - Add validation logic for parser_config when update document (#13911)
### What problem does this PR solve?

Add validation logic for parser_config.
Refactor the processing flow. Before change, validation logics and
update logics are mixed up - some validation logis executes followed by
some update logic executes and then another such
"validation-and-then-update" which is not good. After change, all
validation logic executes firstly. Update logic will be executed after
ALL validation logic executed.
Validation logic for parameters (that come from front end) will be
checked using Pydantic. For validation logic that depends on data from
DB, they will be in separate methods.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-04-07 11:33:05 +08:00
Zhichang Yu
ab358fe949 feat: make Azure cloud authority configurable for SPN auth (#13898)
## Summary
- The Azure SPN storage handler hardcoded
`AzureAuthorityHosts.AZURE_CHINA`, preventing users in Azure Public
Cloud regions (UK-South, EU, US, etc.) from authenticating
- Add a `cloud` config option (env: `AZURE_CLOUD`) supporting all four
Azure sovereignties: `public`, `china`, `government`, `germany`
- Defaults to `public` (global Azure) — the most common international
use case

Closes #13259

## Test plan
- [ ] Verify default (`cloud: public`) connects to Azure Public Cloud
endpoints
- [ ] Verify `cloud: china` retains existing behavior for Azure China
users
- [ ] Verify `AZURE_CLOUD` env var overrides the config file value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 12:51:26 +08:00
qinling0210
a8bbe167a9 Bump to infinity v0.7.0-dev5 (#13846)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev5

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-30 10:19:06 +08:00
KeJun
cb78ce0a7b feat: support rss datasource (#13721)
### What problem does this PR solve?

Supporting public RSS/Atom feed URLs as data sources for RagFlow.

link https://github.com/infiniflow/ragflow/issues/12313

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-27 22:58:44 +08:00
Krishna Chaitanya
cdbbd2620c Fix: upgrade pyasn1 from 0.6.2 to 0.6.3 to address CVE-2026-30922 (#13773)
## Summary

- Adds `pyasn1>=0.6.3` as a `[tool.uv.constraint-dependencies]` entry to
mitigate **CVE-2026-30922** (CVSS 7.5 HIGH)
- Regenerates `uv.lock` so the resolved pyasn1 version moves from
**0.6.2 to 0.6.3**

## Details

**CVE-2026-30922** is a Denial of Service vulnerability in pyasn1 caused
by unbounded recursion when decoding ASN.1 data with deeply nested
structures. An attacker can send crafted payloads with thousands of
nested SEQUENCE or SET tags to trigger a `RecursionError` crash or
memory exhaustion.

- **Severity:** HIGH (CVSS 7.5)
- **Affected versions:** pyasn1 < 0.6.3
- **Fixed in:** pyasn1 >= 0.6.3
- **NVD:** https://nvd.nist.gov/vuln/detail/CVE-2026-25769

`pyasn1` is not a direct dependency of RAGFlow but is pulled in
transitively via `google-auth` -> `rsa` -> `pyasn1-modules` -> `pyasn1`.
The `constraint-dependencies` mechanism in uv is the correct way to
enforce a minimum version for transitive dependencies without polluting
the direct dependency list.

## Test plan

- [x] `pyproject.toml` passes TOML validation
- [x] `uv lock` resolves successfully with the new constraint
- [x] pyasn1 version in `uv.lock` is now 0.6.3
- [ ] Existing CI/CD tests continue to pass

Closes #13686
2026-03-27 10:37:34 +08:00
Yongteng Lei
ea1430bec5 Security: do not use litellm 1.82.7 and 1.82.8 (#13768)
### What problem does this PR solve?

See [issue](https://github.com/BerriAI/litellm/issues/24518) from
Litellm.

Upgraded from `1.81.15` to `1.82.6`, so RAGFlow is safe as always. 

### Type of change

- [x] Security

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-25 22:39:33 +08:00
Liu An
5b3bb25010 Fix: switch Python package mirror from Tsinghua to Aliyun (#13617)
### What problem does this PR solve?

Replace pypi.tuna.tsinghua.edu.cn with mirrors.aliyun.com to resolve
issues with missing packages on the Tsinghua mirror.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-16 12:12:25 +08:00
Yongteng Lei
287637162c Revert "fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416" (#13613)
Reverts infiniflow/ragflow#13583 which cause uv sync fails.
2026-03-16 10:19:29 +08:00
Sank
a67fa03584 fix CVE-2026-28804 CVE-2026-31826 (#13592)
What problem does this PR solve?

fix CVE-2026-28804 CVE-2026-31826

 Bug Fix (non-breaking change which fixes an issue)

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 16:34:28 +08:00
Sank
e90f0e8910 fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416 (#13583)
### What problem does this PR solve?

fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 11:17:39 +08:00
Yongteng Lei
7484298c82 Refa: convert download_img to async (#13477)
### What problem does this PR solve?

Convert download_img to async.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-03-09 19:00:17 +08:00
guptas6est
32d31284cc Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454)
### What problem does this PR solve?

This PR addresses security vulnerabilities in PDF processing
dependencies identified by Trivy security scan:

1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient
decoding of ASCIIHexDecode streams
2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop
when parsing malformed comments

Since pypdf2 is deprecated with no available fixes, this PR migrates all
pypdf2 usage to the actively maintained pypdf library (version 6.7.5),
which resolves
both vulnerabilities.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-09 12:06:00 +08:00
Heyang Wang
c217b8f3d8 Feat: add DingTalk AI Table connector and integration for data synch… (#13413)
### What problem does this PR solve?

Add DingTalk AI Table connector and integration for data synchronization

Issue #13400

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: wangheyang <wangheyang@corp.netease.com>
2026-03-06 21:13:23 +08:00
Jin Hai
6bb00e2762 Update graspologic to gitee (#13362)
### What problem does this PR solve?

Accelerate python module downloading

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-04 17:48:47 +08:00
Idriss Sbaaoui
860c4bd0bb Feat: UI testing automation with playwright (#12749)
### What problem does this PR solve?

This PR helps automate the testing of the ui interface using pytest
Playwright

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test automation infrastructure

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 13:04:08 +08:00
Magicbook1108
158503a1aa Feat: optimize ingestion pipeline with preprocess (#13211)
### What problem does this PR solve?

Feat: optimize ingestion pipeline with preprocess

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-26 10:24:13 +08:00
Magicbook1108
98e1d5aa5c Refact: switch from google-generativeai to google-genai (#13140)
### What problem does this PR solve?

Refact: switch from oogle-generativeai to google-genai  #13132
Refact: commnet out unused pywencai.

### Type of change

- [x] Refactoring
2026-02-24 10:28:33 +08:00