ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-29 23:41:12 +08:00

Author	SHA1	Message	Date
Kevin Hu	15f50e5cb2	fix: rename dialog_id to chat_id in chat_channel (backend + frontend) (#16096 ) ## Summary - The `ChatChannel` DB column was renamed from `dialog_id` to `chat_id` via a migration (added in a prior commit). - Aligns the REST API layer (`chat_channel_api.py`, `chat_channel_service.py`) to use `chat_id` consistently. - Updates the frontend (`interface.ts`, `hooks.ts`, `connect-dialog-modal.tsx`, `added-channel-card.tsx`) to read/write `chat_id` instead of `dialog_id`. - The joined `dialog_name` alias in the list query is unchanged (backend still returns it under that name). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-16 19:02:20 +08:00
Kevin Hu	5a817762fa	Refactor: Change table chat_channel status data type. (#16061 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring	2026-06-16 12:02:12 +08:00
Yingfeng	b5bea72e4b	Add git-like file commit API (#15978 ) ### What problem does this PR solve? \| # \| Method \| Endpoint \| Description \| Git Equivalent \| \|---\|--------\|----------\|-------------\|----------------\| \| 1 \| `POST` \| `/api/v1/{prefix}/{folder_id}/commits` \| Create a snapshot commit with file changes (add/modify/delete/rename) \| `git add` + `git commit` \| \| 2 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits` \| List commit history (paginated) \| `git log` \| \| 3 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}` \| Get commit detail with file changes \| `git show` \| \| 4 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files` \| List file changes in a commit \| `git show --name-status` \| \| 5 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/diff?from=...&to=...` \| Compare two commits and return differences \| `git diff` \| \| 6 \| `GET` \| `/api/v1/{prefix}/{folder_id}/changes` \| Get uncommitted changes (add/modify/delete) \| `git status` \| \| 7 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/tree` \| Get the folder tree snapshot at commit time \| `git ls-tree` \| \| 8 \| `GET` \| `/api/v1/{prefix}/{folder_id}/commits/{commit_id}/files/{file_id}/content` \| Get a file's content as it existed in a specific commit \| `git show HEAD:file` \| \| 9 \| `GET` \| `/api/v1/{prefix}/{file_id}/versions` \| Get version history for a specific file across all commits \| `git log -- file` \| Where `{prefix}/{id}` can be: - `folders/{folder_id}` — direct folder access - `workspaces/{workspace_id}` — alias of `folders/{folder_id}` - `datasets/{dataset_id}` — resolves to the dataset's folder - `memories/{memory_id}` — resolves to the memory's folder - `skills/{skill_id}` — resolves to the skill's folder ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-06-15 11:19:56 +08:00
Kevin Hu	b5a426e6e0	Feat: chat channels — connect assistants to external messaging bots (#15850 ) ### What problem does this PR solve? #15844 Adds a Chat channels capability so a RAGFlow assistant (Dialog) can be exposed as a bot on external messaging platforms (Feishu/Lark, Discord, Telegram, Slack, WeCom, LINE, etc.). An admin configures a bot in the UI, connects it to an assistant, and inbound messages are answered from that assistant's knowledge base — replies are delivered back on the channel. Feishu/Lark is implemented and tested end-to-end. Discord, Telegram, LINE, and WeCom are scaffolded against the same interface; the remaining listed channels are tracked as follow-ups. ### Design Backend - New `chat_channel` table (`tenant_id`, `name`, `channel`, `config` JSON holding `{credential: {...}}`, `dialog_id`, `status`) + `ChatChannelService` and RESTful CRUD under `/api/v1/chat_channels`. - Channel framework under `api/channels/`: a `core` registry + per-channel packages that self-register a builder and implement a common `Channel` interface (`start`/`stop`/`send` + inbound normalization) over `IncomingMessage`/`OutgoingMessage`. - Embedded reconcile loop in `ragflow_server` (`api/channels/bootstrap.py`): loads enabled bots, and starts/stops/restarts them as rows change (no server restart needed). Inbound messages run the connected dialog via the non-streaming completion path, keeping per-end-user conversation history. - Missing optional channel SDKs degrade gracefully (channel skipped with a warning; others unaffected). Channel-level errors are logged, not crashed. - Feishu's WebSocket client runs in a dedicated thread with its own event loop to avoid cross-loop/contextvars conflicts with the channel runtime. Frontend - Settings → Chat channels panel: available-channels grid + configured-bots list with add/edit/delete and a Connect assistant popup that binds a bot to a dialog. - Brand icons via simple-icons / reused shared data-source assets, with colored fallbacks for brands not available. - Route, sidebar entry, i18n (en/zh), and a top-nav segment-boundary fix so the settings page no longer highlights the Chat tab. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Notes - DB: new `chat_channel` table is auto-created; `chat_channel.dialog_id` is also covered by a `migrate_db` `alter_db_add_column` for existing installs. - Channel SDKs (`lark-oapi`, `discord.py`, `python-telegram-bot`, `line-bot-sdk`, `wechatpy`, `aiohttp`) added to dependencies. - Screenshots / per-channel credential docs to follow. <img width="1338" height="1290" alt="Image" src="https://github.com/user-attachments/assets/042cb2f9-0dad-4e6a-bcf7-43ced4bbd704" /> <img width="1344" height="738" alt="Image" src="https://github.com/user-attachments/assets/373cd08e-ec40-4c67-9c51-4d948b1ba617" /> <img width="672" height="887" alt="Image" src="https://github.com/user-attachments/assets/5a34953f-a9a3-4c1e-869e-5eff0dc64c84" /> ---------	2026-06-12 18:21:30 +08:00
Rene Arredondo	b978e26208	fix(db): drop Peewee-auto-named unique index on tenant_model_instance (#15699 ) (#15879 ) ## Summary Fixes #15699. User upgrades to v0.25.6 against an existing MySQL database, tries to add an Ollama provider instance, and gets: ``` MySQL IntegrityError: Duplicate entry 'dbaafbfe608a11f1a5516d6066988224' for key 'tenant_model_instance.tenantmodelinstance_api_key_provider_id' ``` The route at [api/apps/restful_apis/provider_api.py:354](api/apps/restful_apis/provider_api.py#L354) catches it and returns `get_error_data_result(message="Internal server error")` — which by RAGFlow's convention is HTTP 200 with an error `code` on the body — hence the reporter's "200 status code but the database errored" complaint. ### Root cause The provider-instance refactor in [PR #15460](https://github.com/infiniflow/ragflow/pull/15460) dropped the unique-compound-index tuple from `TenantModelInstance`: ```python # Removed in #15460 class Meta: db_table = "tenant_model_instance" indexes = ( (("api_key", "provider_id"), True), # unique ) ``` and added a one-shot drop in `migrate_db()` for existing databases. But the drop targets the wrong index name: ```python # Before this PR — wrong name for table_name, index_name in [ ("tenant_model_instance", "idx_api_key_provider_id"), # ← doesn't exist ("tenant_model", "idx_provider_model_instance"), ]: ``` Peewee's auto-derived index name is `<lowercase classname>_<col1>_<col2>` → `tenantmodelinstance_api_key_provider_id`, which matches the user's error verbatim. The drop raises `OperationalError: 1091 (HY000): Can't DROP …`, the surrounding `except` clause at [db_models.py:1736](api/db/db_models.py#L1736) swallows it as expected-on-fresh-installs, and the legacy unique index lives on indefinitely. ### Why Ollama hits it specifically Ollama doesn't require an API key. The form posts `api_key: ""`. The app-layer dedupe at [provider_api_service.py:288-292](api/apps/services/provider_api_service.py#L288-L292): ```python api_key_str = "" if api_key: # ← skipped for "" ... same_key_instance = TenantModelInstanceService.get_by_provider_id_and_api_key(...) if same_key_instance: return False, f"Already exist instance: ... with api_key {api_key}" ``` falls through for empty keys. Control reaches `TenantModelInstanceService.create_instance(..., api_key="")` which inserts a row whose `(api_key, provider_id) = ("", <provider_uuid>)` collides with any prior Ollama row that already shipped that same pair → the still-present unique index throws. (`dbaafbfe608a11f1a5516d6066988224` in the user's error is the duplicated `provider_id` UUID, paired with the empty `api_key`.) ### Fix Add the Peewee auto-name alongside the existing `idx_` entry so the migration finally drops the obsolete index on next restart: ```python legacy_indexes = [ ("tenant_model_instance", "idx_api_key_provider_id"), ("tenant_model_instance", "tenantmodelinstance_api_key_provider_id"), # ← added ("tenant_model", "idx_provider_model_instance"), ] ``` The surrounding `try/except (OperationalError, ProgrammingError)` matches `1091` / `can't DROP` / `does not exist` and treats them as success, so every state is idempotent (see Test plan). ### Idempotency matrix \| Database state \| First entry (`idx_api_key_provider_id`) \| New entry (`tenantmodelinstance_api_key_provider_id`) \| \| --- \| --- \| --- \| \| Fresh install (≥ #15460) — neither index exists \| `1091` → swallowed \| `1091` → swallowed \| \| Upgraded from before `dc4b82523` (the user's case) — auto-name present \| `1091` → swallowed \| drops the index* \| \| Upgraded after a manual rename to `idx_` \| drops the index \| `1091` → swallowed \| \| Re-run of `migrate_db()` after either of the above \| `1091` → swallowed \| `1091` → swallowed \| No rollback hazard: nothing depends on this unique constraint anymore (`create_instance` dedupes by `instance_name` via `duplicate_name`, see [tenant_model_instance_service.py:27](api/db/services/tenant_model_instance_service.py#L27)). ### What this PR does NOT change - `provider_api_service.create_provider_instance`* — its `if api_key:` gate is correct for the post-migration world: multiple Ollama instances with empty keys under one provider are legitimate, so we shouldn't tighten the app-layer check. - `TenantModelInstance` Peewee model — the `indexes` tuple was already removed in #15460. New databases never get the constraint in the first place. - The `except → get_error_data_result` → HTTP 200 pattern at `provider_api.py:354` — that's a project-wide convention; changing one route to HTTP 500 would be inconsistent and out of scope. ## Test plan - [ ] Reproducer (pre-fix): on a database originally created before #15460, configure an Ollama provider with an empty `api_key`, then try to create a second instance under the same provider — confirm the `Duplicate entry … 'tenantmodelinstance_api_key_provider_id'` error in the server log. - [ ] Verify the index is present pre-restart: `SHOW INDEX FROM tenant_model_instance WHERE Key_name = 'tenantmodelinstance_api_key_provider_id';` — non-empty result. - [ ] Restart with the fix applied: server starts cleanly, `migrate_db()` runs, no `Failed to drop index` in critical logs. - [ ] Verify the index is gone post-restart: same `SHOW INDEX` query — empty result. - [ ] Re-run the reproducer: two Ollama instances under the same provider, both `api_key=""`, both succeed. - [ ] Restart a second time — no new errors; the matching `1091` swallow keeps the migration idempotent. - [ ] Fresh install smoke test: drop the DB volume, start clean — no `1091` noise (the new index never existed), no functional regression. ## Files changed - [api/db/db_models.py](api/db/db_models.py) — extend the legacy-index drop list with `tenantmodelinstance_api_key_provider_id`; refactor the inline list to a named `legacy_indexes` local with a comment pointing at #15460 and #15699. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-06-11 15:47:12 +08:00
Lynn	3bc5ed282e	Fix: model-provider bugs (#15460 ) ### What problem does this PR solve? Fix: - Use @ to avoid split by `_` in model_name. - Verify api_key when add instance. - Pop api_key in list intances response. - Remove useless index. - Sort providers, instances and models by name. - Get `is_tools` from llm_factories.json ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-06-02 13:24:53 +08:00
Lynn	dc4b82523b	Feat: tenant llm provider (#14595 ) ### What problem does this PR solve? Python implementation of the Go-based model_provider API suite. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: bill <yibie_jingnian@163.com>	2026-05-29 17:39:41 +08:00
dale053	c33d0b8081	fix: prevent sensitive fields from leaking in user API responses (#14792 ) Closes #14789 ### What problem does this PR solve? User API endpoints (`login`, `user_profile`, `user_add`, `forget_reset_password`) were returning full user objects via `to_json()` / `to_dict()`, which included sensitive fields like `password` and `access_token` in the response body. This leaks credentials to the client. This PR adds a `to_safe_dict()` method on the `User` model that strips sensitive fields (`password`, `access_token`) and replaces all affected call sites to use it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-22 15:14:26 +08:00
Magicbook1108	b69a6a5d80	Feat: full optimization on connector dashboard (#14979 ) ### What problem does this PR solve? This PR improves the connector dashboard task management experience and adds better visibility into connector execution logs. ### Overview: #### Before <img width="700" alt="image" src="https://github.com/user-attachments/assets/e4a8ed6f-2e18-4f0f-8528-41a514550052" /> #### Now: <img width="700" alt="Screenshot from 2026-05-18 16-31-30" src="https://github.com/user-attachments/assets/d4ca193b-847a-49ae-9e4f-5fbca60ea627" /> ### 1. Add a new logging page to the connector dashboard A new logging page has been added so users can view connector task execution logs directly from the connector dashboard. ### 2. Merge the Resume button into Confirm The separate Resume button has been removed. The Confirm button now represents different actions depending on the current task state: - Save: Save form changes and reschedule tasks. - Stop: Cancel currently scheduled or running tasks. - Resume: Create new scheduled tasks after the previous tasks have been stopped. - Start: Start tasks when no task has been started yet. ### 3. Separate syncing and pruning tasks Connector tasks are now separated into syncing and pruning. Pruning is controlled by the Sync deleted files option: - When Sync deleted files is disabled, only syncing tasks are shown. - When Sync deleted files is enabled, both syncing and pruning tasks are shown. Now: Sync deleted files disabled <img width="700" alt="Sync deleted files disabled" src="https://github.com/user-attachments/assets/dbd9232e-614a-407f-a0b1-c109e5fa567d" /> Now: Sync deleted files enabled <img width="700" alt="Sync deleted files enabled" src="https://github.com/user-attachments/assets/1f527f48-ccb3-4ee8-97ca-086891489296" /> ### 4. Update logs in backend <img width="700" alt="image" src="https://github.com/user-attachments/assets/10a95a3f-98c1-4e67-8afa-ddf6cda5b0b2" /> ### 5. Remove connector resume API - Removed: `POST /v1/connectors/<connector_id>/resume` - Replaced by: `PATCH /v1/connectors/<connector_id>` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-19 10:07:11 +08:00
plind	dd76653dc1	feat: add tag management for Agents with filtering and sorting (#14774 ) (#14799 ) ## Summary Closes #14774. Adds free-form tags on agents (UserCanvas) with full UI + API: - Stored as comma-separated `tags` column on `UserCanvas` with online migration. - New endpoints: `GET /v1/agents/tags` (aggregate counts) and `PUT /v1/agent/<id>/tags` (write). `GET /v1/agents` accepts a `tags=` query. - "Edit tags" item in agent dropdown opens a chip-style editor dialog; tags render as badges on each agent card. - New "Tags" facet in the agents filter bar, with counts. ## Implementation notes - Tag matching is exact-token: the SQL filter wraps stored tags as `,…,` and matches `,ml,` so `ml` doesn't match `ml-ops`. - Server-side normalization in `UserCanvasService.update_tags`: dedup (case-insensitive), per-tag cap of 64 chars, total length capped at 512 chars to fit the column, commas inside tag values are replaced with spaces. - Tenant authorization: `PUT /v1/agent/<id>/tags` gates on `UserCanvasService.accessible(canvas_id, tenant_id)`. - Tag listing scope: `UserCanvasService.list_tags` follows the same own + team-shared rule as `get_by_tenant_ids`. - i18n: keys added to `en.ts` and `zh.ts` only (per project convention; other locales fall back). - `HomeCard` gets a non-breaking `extra?: ReactNode` slot for the chip row; no `src/components/ui/` files modified. ## Test plan - [ ] Backend boot runs `migrate_db` → confirm `user_canvas.tags` column exists (`DESCRIBE user_canvas`). - [ ] Agents page renders cards normally (no console error from missing field). - [ ] `⋯ → Edit tags` opens a dialog that stays open (regression: dialog was unmounting with the dropdown). - [ ] Typing a tag without pressing Enter and clicking Save persists it (regression: last typed tag was being dropped). - [ ] Chip input supports Enter/comma to commit, Backspace on empty to remove, `×` to remove individual chip. - [ ] Tag containing a comma sent via API is stored with the comma replaced by a space. - [ ] 20 long tags sent via API does not error (length cap silently truncates). - [ ] "Tags" filter in the filter bar shows counts and narrows the list. - [ ] Filtering by `ml` does not return agents tagged `ml-ops`. - [ ] UI in Chinese shows 编辑标签 / 添加标签以整理和筛选你的智能体 etc. - [ ] `PUT /v1/agent/<other-tenant-id>/tags` returns `Agent not found or no permission.`	2026-05-13 21:41:32 +08:00
Jin Hai	1d0519d025	Fix secret key inconsistency cross the RAGFlow servers (#14591 ) ### What problem does this PR solve? A and B, two API servers and a REDIS server. If A and REDIS restart, B will hold the obsolete secret key and will lead to error. TODO: app.config['SECRET_KEY'] and app.secret_key still hold obsolete secret key. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-07 10:10:02 +08:00
euvre	2846a93998	Fix: Remove hardcoded page limits causing parsing failures on large PDFs (>300 pages) (#14382 ) ### What problem does this PR solve? Fixes #14196 ## Problem When using DeepDOC to parse large PDFs (over 1000 pages), the parser silently truncated processing at 300 pages due to a hardcoded default `page_to=299` in `RAGFlowPdfParser.__images__()`. This caused: - Errors on pages beyond the limit - Poor image quality as the parser attempted to compensate with missing page data - Inconsistent chunk splitting between full PDF imports and partial imports Additionally, the codebase scattered magic numbers (`299`, `600`, `10000`, `100000`, `100000000`, `10000000000`, `10*9`) across 22 files as sentinel values for "parse all pages", making future maintenance error-prone. ## Root Cause ```python # deepdoc/parser/pdf_parser.py (before) def __images__(self, fnm, zoomin=3, page_from=0, page_to=299, callback=None): # Only the first 300 pages were rendered; everything beyond was silently dropped ``` While most callers in `rag/app/.py` correctly passed `to_page=100000`, the base class `RAGFlowPdfParser.__call__()` and `parse_into_bboxes()` invoked `__images__` without forwarding `page_from`/`page_to`, falling back to the restrictive default of 299. ## Solution ### 1. Define constants in `common/constants.py` ```python MAXIMUM_PAGE_NUMBER = 100000 # Used by the parsing layer MAXIMUM_TASK_PAGE_NUMBER = MAXIMUM_PAGE_NUMBER * 1000 # Used by the task/DB layer ``` ### 2. Replace all hardcoded sentinel values \| Layer \| Files Changed \| Old Values \| New Value \| \|---\|---\|---\|---\| \| Deepdoc parsers \| `pdf_parser.py`, `mineru_parser.py`, `docling_parser.py`, `opendataloader_parser.py`, `paddleocr_parser.py`, `docx_parser.py` \| `299`, `600`, `109`, `100000000` \| `MAXIMUM_PAGE_NUMBER` \| \| Chunk parsers \| `naive.py`, `book.py`, `qa.py`, `one.py`, `manual.py`, `paper.py`, `presentation.py`, `laws.py`, `resume.py`, `email.py`, `table.py` \| `100000`, `10000`, `10000000000` \| `MAXIMUM_PAGE_NUMBER` \| \| Task/DB layer** \| `db_models.py`, `task_service.py`, `document_service.py`, `file_service.py` \| `100000000` \| `MAXIMUM_TASK_PAGE_NUMBER` \| ### 3. Fix `parse_into_bboxes()` missing parameters Added `from_page`/`to_page` parameters to `parse_into_bboxes()` so that the `rag/flow/parser/parser.py` DeepDOC path no longer falls back to the restrictive default. ## Files Changed (22) - `common/constants.py` - `deepdoc/parser/pdf_parser.py` - `deepdoc/parser/mineru_parser.py` - `deepdoc/parser/docling_parser.py` - `deepdoc/parser/opendataloader_parser.py` - `deepdoc/parser/paddleocr_parser.py` - `deepdoc/parser/docx_parser.py` - `rag/app/naive.py` - `rag/app/book.py` - `rag/app/qa.py` - `rag/app/one.py` - `rag/app/manual.py` - `rag/app/paper.py` - `rag/app/presentation.py` - `rag/app/laws.py` - `rag/app/resume.py` - `rag/app/email.py` - `rag/app/table.py` - `api/db/db_models.py` - `api/db/services/task_service.py` - `api/db/services/document_service.py` - `api/db/services/file_service.py` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 14:57:20 +08:00
euvre	9a785b26bd	fix: change file size column from IntegerField to BigIntegerField to support files > 2GB (#14148 ) ### What problem does this PR solve? Fixes #6034 Changes the `size` field in both `Document` and `File` models from `IntegerField` (32-bit, max ~2GB) to `BigIntegerField` (64-bit, max ~9.2EB), and adds corresponding database migrations. ## Problem When uploading a file larger than 2GB, the `size` value overflows a 32-bit signed integer (max 2,147,483,647). This causes: - The stored `size` wraps around to an incorrect value (e.g., a 3GB file shows as 2,097,152 KB in File Management). - Subsequent file operations (e.g., download) fail because the corrupted size leads to invalid storage lookups. ## Changes - `Document.size`: `IntegerField` → `BigIntegerField` - `File.size`: `IntegerField` → `BigIntegerField` - Added `alter_db_column_type` migrations in `migrate_db()` for both `document.size` and `file.size` columns to ensure existing deployments are upgraded automatically. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-16 15:43:29 +08:00
bitloi	853021ff2a	feat: support multiple canvas_types for agent templates and remove duplicate files (#14030 ) ### What problem does this PR solve? Closes #13907 The template catalog had duplicate files (e.g. `*_r.json`) only to place the same template into multiple sidebar groups. This increases maintenance cost and makes template updates error-prone. This PR adds first-class support for multiple template categories in a single file via `canvas_types`, then removes duplicate template files. What changed: - Added `canvas_types` to `CanvasTemplate` model and DB migration. - Added normalization logic when loading templates: - accepts legacy `canvas_type` - accepts new `canvas_types` - merges/deduplicates values - preserves backward compatibility by keeping `canvas_type` as first normalized value. - Updated template import flow to load only `.json` files and in stable sorted order. - Updated frontend template filtering to match on `canvas_types` first, with fallback to legacy `canvas_type`. - Consolidated duplicated template pairs into single files and removed: - `deep_search_r.json` - `reflective_academic_paper_generator_r.json` - `seo_article_writer_r.json` - Added regression/edge-case tests for category normalization and route serialization expectations. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-04-13 20:26:30 +08:00
Yongteng Lei	1b29522279	Fix: migrate_add_unique_email silently skips unique constraint (#13744 ) ### What problem does this PR solve? Fix migrate_add_unique_email-silently-skips-unique-constraint-when-non-unique-user_email-index-exists. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-24 20:24:24 +08:00
balibabu	6cae364ac2	Feat: Export Agent Logs. (#13658 ) ### What problem does this PR solve? Feat: Export Agent Logs. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-03-17 18:51:26 +08:00
Ethan T.	71804bf5bc	fix(db_models): guard MySQL-specific SQL in migration with DB_TYPE check (fixes #13544 ) (#13582 ) ## Summary Fixes #13544: PostgreSQL startup crash because `update_tenant_llm_to_id_primary_key()` unconditionally uses MySQL-specific SQL. - Split `update_tenant_llm_to_id_primary_key()` into `_update_tenant_llm_to_id_primary_key_mysql()` and `_update_tenant_llm_to_id_primary_key_postgres()`, dispatching on `settings.DATABASE_TYPE` - MySQL path: unchanged (existing `DATABASE()`, `SET @row = 0`, `AUTO_INCREMENT`, `DROP PRIMARY KEY` logic) - PostgreSQL path: uses `current_database()`, `ROW_NUMBER() OVER (ORDER BY ...)` for sequential IDs, `CREATE SEQUENCE` + `nextval()` for auto-increment, and `information_schema.table_constraints` to find the PK constraint name - Also fix `migrate_add_unique_email()`: MySQL-only `information_schema.statistics` is replaced with `pg_indexes` on PostgreSQL ## Test plan - [ ] Start RAGFlow with `DB_TYPE=postgres` — startup should complete without `function database() does not exist` error - [ ] Start RAGFlow with `DB_TYPE=mysql` (default) — existing behaviour unchanged, migration runs as before - [ ] Fresh PostgreSQL install: verify `tenant_llm.id` column is created as a serial primary key after migration - [ ] Idempotency: running migration twice on PostgreSQL should be a no-op (column already exists check passes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: gambletan <gambletan@github> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 11:53:01 +08:00
balibabu	aaf900cf16	Feat: Display release status in agent version history. (#13479 ) ### What problem does this PR solve? Feat: Display release status in agent version history. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-03-10 14:25:27 +08:00
BitToby	383986dc5f	fix: re-chunk documents when data source content is updated (#12918 ) Closes: #12889 ### What problem does this PR solve? When syncing external data sources (e.g., Jira, Confluence, Google Drive), updated documents were not being re-chunked. The raw content was correctly updated in blob storage, but the vector database retained stale chunks, causing search results to return outdated information. Root cause: The task digest used for chunk reuse optimization was calculated only from parser configuration fields (`parser_id`, `parser_config`, `kb_id`, etc.), without any content-dependent fields. When a document's content changed but the parser configuration remained the same, the system incorrectly reused old chunks instead of regenerating new ones. Example scenario: 1. User syncs a Jira issue: "Meeting scheduled for Monday" 2. User updates the Jira issue to: "Meeting rescheduled to Friday" 3. User triggers sync again 4. Raw content panel shows updated text ✓ 5. Chunk panel still shows old text "Monday" ✗ Solution: 1. Include `update_time` and `size` in the chunking config, so the task digest changes when document content is updated 2. Track updated documents separately in `upload_document()` and return them for processing 3. Process updated documents through the re-parsing pipeline to regenerate chunks [1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 12:48:47 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
Magicbook1108	47540a4147	Feat: published agent version control (#13410 ) ### What problem does this PR solve? Feat: published agent version control ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-05 17:26:39 +08:00
as-ondewo	194e076e26	Fix: init superuser can create duplicate users (#13221 ) ### What problem does this PR solve? This PR fixes 2 bugs related to RAGFlow's init superuser functionality. #### Bug 1 When the RAGFlow server was started with the `--init-superuser` option it would always create a new admin user even if it already exists resulting in duplicate users. To fix this, I added an additional check before create the superuser and added the unique constraint to the email column of the database, to mitigate potential TOCTOU race conditions. Since existing databases could contain duplicate emails I added email de-duplication to the database migration. #### Bug 2 When the RAGFlow server was started with the `--init-superuser` option but without configured default LLM and embedding models it would fail to start because the `init_superuser` function would always make test request to the models even if they were not set. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 19:55:51 +08:00
Kevin Hu	1262533b74	Feat: support verify to set llm key and boost bigrams. (#12980 ) #12863 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-05 19:19:09 +08:00
Liu An	1b587013d8	Fix: remove unused imports and f-string formatting (#12935 ) ### What problem does this PR solve? - Remove unused imports (Mock, patch, MagicMock, json, os, RAGFLOW_COLUMNS, VECTOR_FIELD_PATTERN) from multiple files - Replace f-string formatting with regular strings for console output messages in cli.py - Clean up unnecessary imports that were no longer being used in the codebase ### Type of change - [x] Refactoring	2026-02-02 12:11:39 +08:00
NTLx	c4c3f744c0	feat: add Peewee ORM support for OceanBase as primary database (#12769 ) (#12926 ) ## Summary This PR adds Peewee ORM support for OceanBase as the primary database in RAGFlow, as requested in issue #12769. ## Changes ### Core Implementation 1. RetryingPooledOceanBaseDatabase Class - Inherits from `PooledMySQLDatabase` (OceanBase is MySQL-compatible) - Implements retry mechanism for connection issues - Handles MySQL-specific error codes (2013, 2006 for connection loss) - Provides connection pool management 2. PooledDatabase Enum - Added `OCEANBASE = RetryingPooledOceanBaseDatabase` 3. DatabaseLock Enum - Added `OCEANBASE = MysqlDatabaseLock` - OceanBase uses MySQL-style locking 4. TextFieldType Enum - Added `OCEANBASE = "LONGTEXT"` - OceanBase uses same text field type as MySQL 5. DatabaseMigrator Enum - Added `OCEANBASE = MySQLMigrator` - OceanBase uses MySQL migration tools ### Usage ```bash # Set environment variable to use OceanBase export DB_TYPE=oceanbase # Configure connection (in docker/.env or environment) OCEANBASE_HOST=localhost OCEANBASE_PORT=2881 OCEANBASE_USER=root OCEANBASE_PASSWORD=password OCEANBASE_DATABASE=ragflow ``` ### Technical Details - Location: `api/db/db_models.py` - Dependencies: No new dependencies (uses existing Peewee MySQL support) - Code Size: ~90 lines - Difficulty: Simple ### Testing - Added comprehensive unit tests in `tests/unit/test_oceanbase_peewee.py` - Tests cover: - OceanBase database class existence and inheritance - Enum values for PooledDatabase, DatabaseLock, TextFieldType - Initialization with custom retry settings - Environment variable configuration ### Acceptance Criteria ✅ Can switch to OceanBase database via `DB_TYPE=oceanbase` environment variable ✅ All database operations work normally in OceanBase environment ✅ OceanBase uses MySQL compatibility mode (no additional dependencies) ### Background This is part of the RAGFlow + OceanBase Hackathon to allow users to choose OceanBase as RAGFlow's primary database, leveraging OceanBase's high availability and scalability. --- ## Related Issues - Primary: https://github.com/infiniflow/ragflow/issues/12769 - Context: https://github.com/oceanbase/seekdb/issues/123 (OceanBase Developer Challenge) --- Closes infiniflow/ragflow#12769	2026-01-31 15:45:20 +08:00
qinling0210	9a5208976c	Put document metadata in ES/Infinity (#12826 ) ### What problem does this PR solve? Put document metadata in ES/Infinity. Index name of meta data: ragflow_doc_meta_{tenant_id} ### Type of change - [x] Refactoring	2026-01-28 13:29:34 +08:00
Zhichang Yu	fd11aca8e5	feat: Implement pluggable multi-provider sandbox architecture (#12820 ) ## Summary Implement a flexible sandbox provider system supporting both self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for secure code execution in agent workflows. Key Changes: - ✅ Aliyun Code Interpreter provider using official `agentrun-sdk>=0.0.16` - ✅ Self-managed provider with gVisor (runsc) security - ✅ Arguments parameter support for dynamic code execution - ✅ Database-only configuration (removed fallback logic) - ✅ Configuration scripts for quick setup Issue #12479 ## Features ### 🔌 Provider Abstraction Layer 1. Self-Managed Provider (`agent/sandbox/providers/self_managed.py`) - Wraps existing executor_manager HTTP API - gVisor (runsc) for secure container isolation - Configurable pool size, timeout, retry logic - Languages: Python, Node.js, JavaScript - ⚠️ Requires: gVisor installation, Docker, base images 2. Aliyun Code Interpreter (`agent/sandbox/providers/aliyun_codeinterpreter.py`) - SaaS integration using official agentrun-sdk - Serverless microVM execution with auto-authentication - Hard timeout: 30 seconds max - Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`, `AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION` - Automatically wraps code to call `main()` function 3. E2B Provider (`agent/sandbox/providers/e2b.py`) - Placeholder for future integration ### ⚙️ Configuration System - `conf/system_settings.json`: Default provider = `aliyun_codeinterpreter` - `agent/sandbox/client.py`: Enforces database-only configuration - Admin UI: `/admin/sandbox-settings` - Configuration validation via `validate_config()` method - Health checks for all providers ### 🎯 Key Capabilities Arguments Parameter Support: All providers support passing arguments to `main()` function: ```python # User code def main(name: str, count: int) -> dict: return {"message": f"Hello {name}!" * count} # Executed with: arguments={"name": "World", "count": 3} # Result: {"message": "Hello World!Hello World!Hello World!"} ``` Self-Describing Providers: Each provider implements `get_config_schema()` returning form configuration for Admin UI Error Handling: Structured `ExecutionResult` with stdout, stderr, exit_code, execution_time ## Configuration Scripts Two scripts for quick Aliyun sandbox setup: Shell Script (requires jq): ```bash source scripts/configure_aliyun_sandbox.sh ``` Python Script (interactive): ```bash python3 scripts/configure_aliyun_sandbox.py ``` ## Testing ```bash # Unit tests uv run pytest agent/sandbox/tests/test_providers.py -v # Aliyun provider tests uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v # Integration tests (requires credentials) uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v # Quick SDK validation python3 agent/sandbox/tests/verify_sdk.py ``` Test Coverage: - 30 unit tests for provider abstraction - Provider-specific tests for Aliyun - Integration tests with real API - Security tests for executor_manager ## Documentation - `docs/develop/sandbox_spec.md` - Complete architecture specification - `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy sandbox - `agent/sandbox/tests/QUICKSTART.md` - Quick start guide - `agent/sandbox/tests/README.md` - Testing documentation ## Breaking Changes ⚠️ Migration Required: 1. Directory Move: `sandbox/` → `agent/sandbox/` - Update imports: `from sandbox.` → `from agent.sandbox.` 2. Mandatory Configuration: - SystemSettings must have `sandbox.provider_type` configured - Removed fallback default values - Configuration must exist in database (from `conf/system_settings.json`) 3. Aliyun Credentials: - Requires `AGENTRUN_` environment variables (not `ALIYUN_`) - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID) 4. Self-Managed Provider: - gVisor (runsc) must be installed for security - Install: `go install gvisor.dev/gvisor/runsc@latest` ## Database Schema Changes ```python # SystemSettings.value: CharField → TextField api/db/db_models.py: Changed for unlimited config length # SystemSettingsService.get_by_name(): Fixed query precision api/db/services/system_settings_service.py: startswith → exact match ``` ## Files Changed ### Backend (Python) - `agent/sandbox/providers/base.py` - SandboxProvider ABC interface - `agent/sandbox/providers/manager.py` - ProviderManager - `agent/sandbox/providers/self_managed.py` - Self-managed provider - `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider - `agent/sandbox/providers/e2b.py` - E2B provider (placeholder) - `agent/sandbox/client.py` - Unified client (enforces DB-only config) - `agent/tools/code_exec.py` - Updated to use provider system - `admin/server/services.py` - SandboxMgr with registry & validation - `admin/server/routes.py` - 5 sandbox API endpoints - `conf/system_settings.json` - Default: aliyun_codeinterpreter - `api/db/db_models.py` - TextField for SystemSettings.value - `api/db/services/system_settings_service.py` - Exact match query ### Frontend (TypeScript/React) - `web/src/pages/admin/sandbox-settings.tsx` - Settings UI - `web/src/services/admin-service.ts` - Sandbox service functions - `web/src/services/admin.service.d.ts` - Type definitions - `web/src/utils/api.ts` - Sandbox API endpoints ### Documentation - `docs/develop/sandbox_spec.md` - Architecture spec - `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide - `agent/sandbox/tests/QUICKSTART.md` - Quick start - `agent/sandbox/tests/README.md` - Testing guide ### Configuration Scripts - `scripts/configure_aliyun_sandbox.sh` - Shell script (jq) - `scripts/configure_aliyun_sandbox.py` - Python script ### Tests - `agent/sandbox/tests/test_providers.py` - 30 unit tests - `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests - `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` - Integration tests - `agent/sandbox/tests/verify_sdk.py` - SDK validation ## Architecture ``` Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged\|Aliyun\|E2B] ↓ SystemSettings ``` ## Usage ### 1. Configure Provider Via Admin UI: 1. Navigate to `/admin/sandbox-settings` 2. Select provider (Aliyun Code Interpreter / Self-Managed) 3. Fill in configuration 4. Click "Test Connection" to verify 5. Click "Save" to apply Via Configuration Scripts: ```bash # Aliyun provider export AGENTRUN_ACCESS_KEY_ID="xxx" export AGENTRUN_ACCESS_KEY_SECRET="yyy" export AGENTRUN_ACCOUNT_ID="zzz" export AGENTRUN_REGION="cn-shanghai" source scripts/configure_aliyun_sandbox.sh ``` ### 2. Restart Service ```bash cd docker docker compose restart ragflow-server ``` ### 3. Execute Code in Agent ```python from agent.sandbox.client import execute_code result = execute_code( code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}', language="python", timeout=30, arguments={"name": "World"} ) print(result.stdout) # {"message": "Hello World!"} ``` ## Troubleshooting ### "Container pool is busy" (Self-Managed) - Cause: Pool exhausted (default: 1 container in `.env`) - Fix: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+ ### "Sandbox provider type not configured" - Cause: Database missing configuration - Fix: Run config script or set via Admin UI ### "gVisor not found" - Cause: runsc not installed - Fix: `go install gvisor.dev/gvisor/runsc@latest && sudo cp ~/go/bin/runsc /usr/local/bin/` ### Aliyun authentication errors - Cause: Wrong environment variable names - Fix: Use `AGENTRUN_` prefix (not `ALIYUN_`) ## Checklist - [x] All tests passing (30 unit tests + integration tests) - [x] Documentation updated (spec, migration guide, quickstart) - [x] Type definitions added (TypeScript) - [x] Admin UI implemented - [x] Configuration validation - [x] Health checks implemented - [x] Error handling with structured results - [x] Breaking changes documented - [x] Configuration scripts created - [x] gVisor requirements documented Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-28 13:28:21 +08:00
Mohan	0a8eb11c3d	fix: Add proper error handling for database reconnection attempts (#12650 ) ## Problem When database connection is lost, the reconnection logic had a bug: if the first reconnect attempt failed, the second attempt was not wrapped in error handling, causing unhandled exceptions. ## Solution Added proper try-except blocks around the second reconnect attempt in both MySQL and PostgreSQL database classes to ensure errors are properly logged and handled. ## Changes - Fixed `_handle_connection_loss()` in `RetryingPooledMySQLDatabase` - Fixed `_handle_connection_loss()` in `RetryingPooledPostgresqlDatabase` Fixes #12294 --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=158349177 Co-authored-by: SID <158349177+0xsid0703@users.noreply.github.com>	2026-01-19 09:48:10 +08:00
Jin Hai	14c250e3d7	Fix adding column error (#12503 ) ### What problem does this PR solve? 1. Fix redundant column adding 2. Refactor the code ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-08 16:44:53 +08:00
Jin Hai	5ebe334a2f	Refactor setting type (#12425 ) ### What problem does this PR solve? Refactor setting type ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-04 20:26:12 +08:00
Jin Hai	ac9113b0ef	feature: add system setting service (#12408 ) ### What problem does this PR solve? #12409 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-04 14:21:39 +08:00
Jin Hai	6044314811	Fix text issue (#12221 ) ### What problem does this PR solve? Fix several text issues. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-26 11:18:08 +08:00
Lynn	a1164b9c89	Feat/memory (#11812 ) ### What problem does this PR solve? Manage and display memory datasets. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-10 13:34:08 +08:00
hsparks-codes	237a66913b	Feat: RAG evaluation (#11674 ) ### What problem does this PR solve? Feature: This PR implements a comprehensive RAG evaluation framework to address issue #11656. Problem: Developers using RAGFlow lack systematic ways to measure RAG accuracy and quality. They cannot objectively answer: 1. Are RAG results truly accurate? 2. How should configurations be adjusted to improve quality? 3. How to maintain and improve RAG performance over time? Solution: This PR adds a complete evaluation system with: - Dataset & test case management - Create ground truth datasets with questions and expected answers - Automated evaluation - Run RAG pipeline on test cases and compute metrics - Comprehensive metrics - Precision, recall, F1 score, MRR, hit rate for retrieval quality - Smart recommendations - Analyze results and suggest specific configuration improvements (e.g., "increase top_k", "enable reranking") - 20+ REST API endpoints - Full CRUD operations for datasets, test cases, and evaluation runs Impact: Enables developers to objectively measure RAG quality, identify issues, and systematically improve their RAG systems through data-driven configuration tuning. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:00:58 +08:00
Yongteng Lei	9d8b96c1d0	Feat: add context for figure and table (#11547 ) ### What problem does this PR solve? Add context for figure table. ![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e) `==================()` for demonstrating purpose. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-27 10:21:44 +08:00
Kevin Hu	d1716d865a	Feat: Alter flask to Quart for async API serving. (#11275 ) ### What problem does this PR solve? #11277 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-18 17:05:16 +08:00
Jin Hai	61cf430dbb	Minor tweats (#11271 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-16 19:29:20 +08:00
Kevin Hu	d207291217	Fix: add download stats to kb logs. (#11112 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-10 13:28:07 +08:00
Kevin Hu	34283d4db4	Feat: add data source to pipleline logs . (#11075 ) ### What problem does this PR solve? #10953 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-07 11:43:59 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Jin Hai	16d2be623c	Minor tweaks (#10987 ) ### What problem does this PR solve? 1. Rename identifier name 2. Fix some return statement 3. Fix some typos ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 14:15:31 +08:00
Kevin Hu	3e5a39482e	Feat: Support multiple data sources synchronizations (#10954 ) ### What problem does this PR solve? #10953 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-03 19:59:18 +08:00
Jin Hai	6447b737ab	Move singleton to common directory (#10935 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-02 12:24:08 +08:00
Billy Bao	55eb525fdc	Feat: rename file to avoid package name conflict (#10863 ) ### What problem does this PR solve? Feat: rename file to avoid package name conflict ### Type of change - [x] Refactoring	2025-10-29 12:19:57 +08:00
Jin Hai	5a200f7652	Add time utils (#10849 ) ### What problem does this PR solve? - Add time utilities and unit tests ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-28 19:09:14 +08:00
Andrea Bugeja	8a41057236	Fix: Add RetryingPooledPostgresqlDatabase to handle max_retries param (#10524 ) ## What problem does this PR solve? Fixes the PostgreSQL connection error that prevents RAGFlow from starting: peewee.ProgrammingError: invalid dsn: invalid connection option "max_retries" ## Problem Analysis The `BaseDataBase` class in `api/db/db_models.py` adds `max_retries` and `retry_delay` to the database configuration dict before passing it to the database connection constructor. - MySQL: Has `RetryingPooledMySQLDatabase` class that properly extracts these custom parameters using `kwargs.pop()` before calling the parent constructor - PostgreSQL: Was using the base `PooledPostgresqlDatabase` class which passes all parameters directly to `psycopg2.connect()`, which doesn't recognize `max_retries` as a valid connection option ## Solution Created `RetryingPooledPostgresqlDatabase` class that: - Extracts `max_retries` and `retry_delay` parameters before initialization - Implements retry logic with exponential backoff for connection failures - Handles PostgreSQL-specific connection errors (connection refused, server closed, etc.) - Mirrors the existing `RetryingPooledMySQLDatabase` implementation Updated the `PooledDatabase` enum to use the new retrying class for PostgreSQL. ## Benefits ✅ Prevents invalid connection parameters from being passed to psycopg2 ✅ Adds automatic retry logic for PostgreSQL connection failures ✅ Provides better error logging for PostgreSQL-specific issues ✅ Maintains consistency between MySQL and PostgreSQL database handling ## Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ## Testing Tested with PostgreSQL database configuration and verified: - Server starts without the "invalid dsn" error - Database connections are established successfully - Retry logic works correctly on connection failures Co-authored-by: Andrea Bugeja <andrea.bugeja@gig.com>	2025-10-16 15:08:41 +08:00
Günter Lukas	0283e4098f	Fix #10408 (#10471 ) ### What problem does this PR solve? Google Cloud model does not work correctly with gemini-2.5 models Close #10408 ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-10-10 19:18:24 +08:00
Kevin Hu	cbf04ee470	Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423 ) ### What problem does this PR solve? #9869 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: chanx <1243304602@qq.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com> Co-authored-by: Lynn <lynn_inf@hotmail.com> Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com> Co-authored-by: huangzl <huangzl@shinemo.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Wilmer <33392318@qq.com> Co-authored-by: Adrian Weidig <adrianweidig@gmx.net> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: Liu An <asiro@qq.com> Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com> Co-authored-by: BadwomanCraZY <511528396@qq.com> Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com> Co-authored-by: Russell Valentine <russ@coldstonelabs.org> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Billy Bao <newyorkupperbay@gmail.com> Co-authored-by: Zhedong Cen <cenzhedong2@126.com> Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com> Co-authored-by: TensorNull <tensor.null@gmail.com> Co-authored-by: TeslaZY <TeslaZY@outlook.com> Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com> Co-authored-by: AB <aj@Ajays-MacBook-Air.local> Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com> Co-authored-by: He Wang <wanghechn@qq.com> Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com> Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box> Co-authored-by: Stephen Hu <stephenhu@seismic.com> Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com> Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com> Co-authored-by: mxc <mxc@example.com> Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com> Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com> Co-authored-by: mcoder6425 <mcoder64@gmail.com> Co-authored-by: lemsn <lemsn@msn.com> Co-authored-by: lemsn <lemsn@126.com> Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com> Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com> Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>	2025-10-09 12:36:19 +08:00
Jin Hai	b0b866c8fd	Refactor: move some functions out of api/utils/__init__.py (#10216 ) ### What problem does this PR solve? Refactor import modules. ### Type of change - [x] Refactoring --------- Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-09-25 18:04:49 +08:00

1 2 3

140 Commits