ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-05 10:58:34 +08:00

Author	SHA1	Message	Date
Yingfeng	b686a60713	Switch from demo.ragflow.io to cloud.ragflow.io (#13624 ) ### What problem does this PR solve? Switch from demo.ragflow.io to cloud.ragflow.io ### Type of change - [x] Documentation Update	2026-03-16 14:44:39 +08:00
Magicbook1108	675810e0cf	Refact: optimize confluence performance (#13497 ) ### What problem does this PR solve? Refact: optimize confluence performance #13494 ### Type of change - [x] Refactoring	2026-03-10 15:02:24 +08:00
Idriss Sbaaoui	249b78561b	Fix missmatch docnm_kwd in raptor chunks (#13451 ) ### What problem does this PR solve? issue #13393 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 14:24:33 +08:00
Heyang Wang	c217b8f3d8	Feat: add DingTalk AI Table connector and integration for data synch… (#13413 ) ### What problem does this PR solve? Add DingTalk AI Table connector and integration for data synchronization Issue #13400 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: wangheyang <wangheyang@corp.netease.com>	2026-03-06 21:13:23 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
Yesid Cano Castro	d1afcc9e71	feat(seafile): add library and directory sync scope support (#13153 ) ### What problem does this PR solve? The SeaFile connector currently synchronises the entire account — every library visible to the authenticated user. This is impractical for users who only need a subset of their data indexed, especially on large SeaFile instances with many shared libraries. This PR introduces granular sync scope support, allowing users to choose between syncing their entire account, a single library, or a specific directory within a library. It also adds support for SeaFile library-scoped API tokens (`/api/v2.1/via-repo-token/` endpoints), enabling tighter access control without exposing account-level credentials. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Test ``` from seafile_connector import SeaFileConnector import logging import os logging.basicConfig(level=logging.DEBUG) URL = os.environ.get("SEAFILE_URL", "https://seafile.example.com") TOKEN = os.environ.get("SEAFILE_TOKEN", "") REPO_ID = os.environ.get("SEAFILE_REPO_ID", "") SYNC_PATH = os.environ.get("SEAFILE_SYNC_PATH", "/Documents") REPO_TOKEN = os.environ.get("SEAFILE_REPO_TOKEN", "") def _test_scope(scope, repo_id=None, sync_path=None): print(f"\n{'='50}") print(f"Testing scope: {scope}") print(f"{'='50}") creds = {"seafile_token": TOKEN} if TOKEN else {} if REPO_TOKEN and scope in ("library", "directory"): creds["repo_token"] = REPO_TOKEN connector = SeaFileConnector( seafile_url=URL, batch_size=5, sync_scope=scope, include_shared = False, repo_id=repo_id, sync_path=sync_path, ) connector.load_credentials(creds) connector.validate_connector_settings() count = 0 for batch in connector.load_from_state(): for doc in batch: count += 1 print(f" [{count}] {doc.semantic_identifier} " f"({doc.size_bytes} bytes, {doc.extension})") print(f"\n-> {scope} scope: {count} document(s) found.\n") # 1. Account scope if TOKEN: _test_scope("account") else: print("\nSkipping account scope (set SEAFILE_TOKEN)") # 2. Library scope if REPO_ID and (TOKEN or REPO_TOKEN): _test_scope("library", repo_id=REPO_ID) else: print("\nSkipping library scope (set SEAFILE_REPO_ID + token)") # 3. Directory scope if REPO_ID and SYNC_PATH and (TOKEN or REPO_TOKEN): _test_scope("directory", repo_id=REPO_ID, sync_path=SYNC_PATH) else: print("\nSkipping directory scope (set SEAFILE_REPO_ID + SEAFILE_SYNC_PATH + token)") ```	2026-02-28 10:24:28 +08:00
Yao Wei	cf6fd6f115	fix: When using OceanBase as storage, the list_chunk sorting is abnormal. #13198 (#13208 ) Actual behavior When using OceanBase as storage, the list_chunk sorting is abnormal. The following is the SQL statement. SELECT id, content_with_weight, important_kwd, question_kwd, img_id, available_int, position_int, doc_type_kwd, create_timestamp_flt, create_time, array_to_string(page_num_int, ',') AS page_num_int_sort, array_to_string(top_int, ',') AS top_int_sort FROM rag_store_284250730805059584 WHERE doc_id = '' AND kb_id IN ('') ORDER BY page_num_int_sort ASC, top_int_sort ASC, create_timestamp_flt DESC LIMIT 0, 20 <img width="1610" height="740" alt="image" src="https://github.com/user-attachments/assets/84e14c30-a97f-4e8f-8c8c-6ccac915d97d" /> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>	2026-02-25 13:36:18 +08:00
Magicbook1108	301ed76aa4	Fix: task cancel (#13034 ) ### What problem does this PR solve? Fix: task cancel #11745 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-06 14:48:24 +08:00
Magicbook1108	4b0d65f089	Fix: correct llm_id for graphrag (#13032 ) ### What problem does this PR solve? Fix: correct llm_id for graphrag #13030 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-06 14:05:32 +08:00
MkDev11	6f31c5fed2	feat/add MySQL and PostgreSQL data source connectors (#12817 ) ### What problem does this PR solve? This PR adds MySQL and PostgreSQL as data source connectors, allowing users to import data directly from relational databases into RAGFlow for RAG workflows. Many users store their knowledge in databases (product catalogs, documentation, FAQs, etc.) and currently have no way to sync this data into RAGFlow without exporting to files first. This feature lets them connect directly to their databases, run SQL queries, and automatically create documents from the results. Closes #763 Closes #11560 ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### What this PR does New capabilities: - Connect to MySQL and PostgreSQL databases - Run custom SQL queries to extract data - Map database columns to document content (vectorized) and metadata (searchable) - Support incremental sync using a timestamp column - Full frontend UI with connection form and tooltips Files changed: Backend: - `common/constants.py` - Added MYSQL/POSTGRESQL to FileSource enum - `common/data_source/config.py` - Added to DocumentSource enum - `common/data_source/rdbms_connector.py` - New connector (368 lines) - `common/data_source/__init__.py` - Exported the connector - `rag/svr/sync_data_source.py` - Added MySQL and PostgreSQL sync classes - `pyproject.toml` - Added mysql-connector-python dependency Frontend: - `web/src/pages/user-setting/data-source/constant/index.tsx` - Form fields - `web/src/locales/en.ts` - English translations - `web/src/assets/svg/data-source/mysql.svg` - MySQL icon - `web/src/assets/svg/data-source/postgresql.svg` - PostgreSQL icon ### Testing done Tested with MySQL 8.0 and PostgreSQL 16: - Connection validation works correctly - Full sync imports all query results as documents - Incremental sync only fetches rows updated since last sync - Custom SQL queries filter data as expected - Invalid credentials show clear error messages - Lint checks pass (`ruff check` returns no errors) --------- Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>	2026-02-04 10:14:32 +08:00
Yesid Cano Castro	deeae8dba4	feat(connector): add Seafile as data source (#12945 ) ### What problem does this PR solve? This PR adds Seafile as a new data source connector for RAGFlow. [Seafile](https://www.seafile.com/) is an open-source, self-hosted file sync and share platform widely used by enterprises, universities, and organizations that require data sovereignty and privacy. Users who store documents in Seafile currently have no way to index and search their content through RAGFlow. This connector enables RAGFlow users to: - Connect to self-hosted Seafile servers via API token - Index documents from personal and shared libraries - Support incremental polling for updated files - Seamlessly integrate Seafile-stored documents into their RAG pipelines ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Changes included - `SeaFileConnector` implementing `LoadConnector` and `PollConnector` interfaces - Support for API token - Recursive file traversal across libraries - Time-based filtering for incremental updates - Seafile logo (sourced from Simple Icons, CC0) - Connector configuration and registration ### Testing - Tested against self-hosted Seafile Community Edition - Verified authentication (token) - Verified document ingestion from personal and shared libraries - Verified incremental polling with time filters	2026-02-03 13:42:05 +08:00
Kevin Hu	32c0161ff1	Refa: Clean the folders. (#12890 ) ### Type of change - [x] Refactoring	2026-01-29 14:23:26 +08:00
qinling0210	9a5208976c	Put document metadata in ES/Infinity (#12826 ) ### What problem does this PR solve? Put document metadata in ES/Infinity. Index name of meta data: ragflow_doc_meta_{tenant_id} ### Type of change - [x] Refactoring	2026-01-28 13:29:34 +08:00
Kevin Hu	3beb85efa0	Feat: enhance metadata arranging. (#12745 ) ### What problem does this PR solve? #11564 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-22 15:34:08 +08:00
Kevin Hu	927db0b373	Refa: asyncio.to_thread to ThreadPoolExecutor to break thread limitat… (#12716 ) ### Type of change - [x] Refactoring	2026-01-20 13:29:37 +08:00
E.G	f367189703	fix(raptor): handle missing vector fields gracefully (#12713 ) ## Summary This PR fixes a `KeyError` crash when running RAPTOR tasks on documents that don't have the expected vector field. ## Related Issue Fixes https://github.com/infiniflow/ragflow/issues/12675 ## Problem When running RAPTOR tasks, the code assumes all chunks have the vector field `q_<size>_vec` (e.g., `q_1024_vec`). However, chunks may not have this field if: 1. They were indexed with a different embedding model (different vector size) 2. The embedding step failed silently during initial parsing 3. The document was parsed before the current embedding model was configured This caused a crash: ``` KeyError: 'q_1024_vec' ``` ## Solution Added defensive validation in `run_raptor_for_kb()`: 1. Check for vector field existence before accessing it 2. Skip chunks that don't have the required vector field instead of crashing 3. Log warnings for skipped chunks with actionable guidance 4. Provide informative error messages suggesting users re-parse documents with the current embedding model 5. Handle both scopes (`file` and `kb` modes) ## Changes - `rag/svr/task_executor.py`: Added validation and error handling in `run_raptor_for_kb()` ## Testing 1. Create a knowledge base with an embedding model 2. Parse documents 3. Change the embedding model to one with a different vector size 4. Run RAPTOR task 5. Before: Crashes with `KeyError` 6. After: Gracefully skips incompatible chunks with informative warnings --- <!-- Gittensor Contribution Tag: @GlobalStar117 --> Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>	2026-01-20 12:24:20 +08:00
qinling0210	b40d639fdb	Add dataset with table parser type for Infinity and answer question in chat using SQL (#12541 ) ### What problem does this PR solve? 1) Create dataset using table parser for infinity 2) Answer questions in chat using SQL ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-19 19:35:14 +08:00
francisye19	57d189b483	fix: Correct gitlab_url access in sync_data_source.py (#12681 ) ### What problem does this PR solve? Correct gitlab_url access. See https://github.com/infiniflow/ragflow/blob/main/web/src/pages/user-setting/data-source/constant/index.tsx#L660-L666 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 11:01:34 +08:00
Magicbook1108	b40a7b2e7d	Feat: Hash doc id to avoid duplicate name. (#12573 ) ### What problem does this PR solve? Feat: Hash doc id to avoid duplicate name. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-15 14:02:15 +08:00
Yongteng Lei	68e5c86e9c	Fix: image not displaying thumbnails when using pipeline (#12574 ) ### What problem does this PR solve? Fix image not displaying thumbnails when using pipeline. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-13 12:54:13 +08:00
Jin Hai	a7dd3b7e9e	Add time cost when start servers (#12552 ) ### What problem does this PR solve? - API server - Ingestion server - Data sync server - Admin server ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-12 12:48:23 +08:00
Magicbook1108	011bbe9556	Feat: support context window for docx (#12455 ) ### What problem does this PR solve? Feat: support context window for docx #12303 Done: - [x] naive.py - [x] one.py TODO: - [ ] book.py - [ ] manual.py Fix: incorrect image position Fix: incorrect chunk type tag ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-01-07 15:08:17 +08:00
Liu An	606f4e6c9e	Refa: improve TOC building with better error handling (#12427 ) ### What problem does this PR solve? Refactor TOC building logic to use enumerate instead of while loop, add comprehensive error handling for missing/invalid chunk_id values, and improve logging with more specific error messages. The changes make the code more robust against malformed TOC data while maintaining the same functionality for valid inputs. ### Type of change - [x] Refactoring	2026-01-05 10:02:42 +08:00
OliverW	d6e006f086	Improve task executor heartbeat handling and cleanup (#12390 ) Improve task executor heartbeat handling and cleanup. ### What problem does this PR solve? - Reduce lock contention during executor cleanup: The cleanup lock is acquired only when removing expired executors, not during regular heartbeat reporting, reducing potential lock contention. - Optimize own heartbeat cleanup: Each executor removes its own expired heartbeat using `zremrangebyscore` instead of `zcount` + `zpopmin`, reducing Redis operations and improving efficiency. - Improve cleanup of other executors' heartbeats: Expired executors are detected by checking their latest heartbeat, and stale entries are removed safely. - Other improvements: IP address and PID are captured once at startup, and unnecessary global declarations are removed. ### Type of change - [x] Performance Improvement Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-01-04 11:24:05 +08:00
Magicbook1108	f56bceb2a9	Fix: remvoe async wrappers (#12405 ) ### What problem does this PR solve? Fix: remvoe async wrappers #12396 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-04 11:19:48 +08:00
Magicbook1108	96810b7d97	Fix: webdav connector (#12380 ) ### What problem does this PR solve? fix webdav #11422 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-31 19:00:00 +08:00
Magicbook1108	7d4d687dde	Feat: Bitbucket connector (#12332 ) ### What problem does this PR solve? Feat: Bitbucket connector NOT READY TO MERGE ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-31 17:18:30 +08:00
buua436	c2ee2bf7fe	Feat: add Zendesk data source integration with configuration and sync capabilities (#12344 ) ### What problem does this PR solve? issue: #12313 change: add Zendesk data source integration with configuration and sync capabilities ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-31 14:40:49 +08:00
Kevin Hu	1a4a7d1705	Fix: apply kb configured llm issue. (#12354 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-31 12:40:28 +08:00
Kevin Hu	52f91c2388	Refine: image/table context. (#12336 ) ### What problem does this PR solve? #12303 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-30 20:24:27 +08:00
buua436	bffdb5fb11	Feat: add IMAP data source integration with configuration and sync capabilities (#12316 ) ### What problem does this PR solve? issue: #12217 [#12313](https://github.com/infiniflow/ragflow/issues/12313) change: add IMAP data source integration with configuration and sync capabilities ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-30 17:09:13 +08:00
Magicbook1108	5903d1c8f1	Feat: GitHub connector (#12314 ) ### What problem does this PR solve? Feat: GitHub connector ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-30 15:09:52 +08:00
Lynn	4a6d37f0e8	Fix: use async task to save memory (#12308 ) ### What problem does this PR solve? Use async task to save memory. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2025-12-30 11:41:38 +08:00
Jin Hai	df3cbb9b9e	Refactor code (#12305 ) ### What problem does this PR solve? as title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-30 11:09:18 +08:00
Magicbook1108	c3ae1aaecd	Feat: Gitlab connector (#12248 ) ### What problem does this PR solve? Feat: Gitlab connector Fix: submit button in darkmode ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-29 17:05:20 +08:00
buua436	a764f0a5b2	Feat: Add Asana data source integration and configuration options (#12239 ) ### What problem does this PR solve? change: Add Asana data source integration and configuration options ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-29 13:28:37 +08:00
Jin Hai	01f0ced1e6	Fix IDE warnings (#12281 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-29 12:01:18 +08:00
Lynn	6e9691a419	Feat: message manage (#12196 ) ### What problem does this PR solve? Manage message and use in agent. Issue #4213 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-25 21:18:13 +08:00
buua436	1812491679	Feat: add Airtable connector and integration for data synchronization (#12211 ) ### What problem does this PR solve? change: add Airtable connector and integration for data synchronization ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-25 17:50:41 +08:00
Kevin Hu	8cbfb5aef6	Fix: toc no chunk found issue. (#12197 ) ### What problem does this PR solve? #12170 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-25 14:06:20 +08:00
Kevin Hu	ce08ee399b	Fix: metadata_obj issue. (#12146 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-25 11:54:09 +08:00
Kevin Hu	8197f9a873	Fix: table tag on chunks. (#12126 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-25 11:25:38 +08:00
Kevin Hu	00bb6fbd28	Fix: metadata issue & graphrag speeding up. (#12113 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Liu An <asiro@qq.com>	2025-12-23 15:57:27 +08:00
Magicbook1108	d5a44e913d	Fix: fix task cancel (#12093 ) ### What problem does this PR solve? Fix: fix task cancel ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-23 09:38:25 +08:00
Kevin Hu	bd76b8ff1a	Fix: Tika server upgrades. (#12073 ) ### What problem does this PR solve? #12037 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-23 09:35:52 +08:00
Magicbook1108	5aea82d9c4	Feat: Separate connectors from s3 (#12045 ) ### What problem does this PR solve? Feat: Separate connectors from s3 #12008 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Overview: <img width="1500" alt="image" src="https://github.com/user-attachments/assets/d54fea7a-7294-4ec0-ab6c-9753b3f03a72" /> Oracle: <img width="350" alt="image" src="https://github.com/user-attachments/assets/bca140c1-33d8-4950-afdc-153407eedc46" />	2025-12-22 09:36:16 +08:00
Jonah Hartmann	7a4044b05f	Feat: use filepath for files with the same name for all data source types (#11819 ) ### What problem does this PR solve? When there are multiple files with the same name the file would just duplicate, making it hard to distinguish between the different files. Now if there are multiple files with the same name, they will be named after their folder path in the storage unit. This was done for the webdav connector and with this PR also for Notion, Confluence and S3 Storage. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Contribution by RAGcon GmbH, visit us [here](https://www.ragcon.ai/)	2025-12-18 17:42:43 +08:00
concertdictate	4dd8cdc38b	task executor issues (#12006 ) ### What problem does this PR solve? Fixes #8706 - `InfinityException: TOO_MANY_CONNECTIONS` when running multiple task executor workers ### Problem Description When running RAGFlow with 8-16 task executor workers, most workers fail to start properly. Checking logs revealed that workers were stuck/hanging during Infinity connection initialization - only 1-2 workers would successfully register in Redis while the rest remained blocked. ### Root Cause The Infinity SDK `ConnectionPool` pre-allocates all connections in `__init__`. With the default `max_size=32` and multiple workers (e.g., 16), this creates 16×32=512 connections immediately on startup, exceeding Infinity's default 128 connection limit. Workers hang while waiting for connections that can never be established. ### Changes 1. Prevent Infinity connection storm (`rag/utils/infinity_conn.py`, `rag/svr/task_executor.py`) - Reduced ConnectionPool `max_size` from 32 to 4 (sufficient since operations are synchronous) - Added staggered startup delay (2s per worker) to spread connection initialization 2. Handle None children_delimiter (`rag/app/naive.py`) - Use `or ""` to handle explicitly set None values from parser config 3. MinerU parser robustness (`deepdoc/parser/mineru_parser.py`) - Use `.get()` for optional output fields that may be missing - Fix DISCARDED block handling: change `pass` to `continue` to skip discarded blocks entirely ### Why `max_size=4` is sufficient \| Workers \| Pool Size \| Total Connections \| Infinity Limit \| \|---------\|-----------\|-------------------\|----------------\| \| 16 \| 32 \| 512 \| 128 ❌ \| \| 16 \| 4 \| 64 \| 128 ✅ \| \| 32 \| 4 \| 128 \| 128 ✅ \| - All RAGFlow operations are synchronous: `get_conn()` → operation → `release_conn()` - No parallel `docStoreConn` operations in the codebase - Maximum 1-2 concurrent connections needed per worker; 4 provides safety margin ### MinerU DISCARDED block bug When MinerU returns blocks with `type: "discarded"` (headers, footers, watermarks, page numbers, artifacts), the previous code used `pass` which left the `section` variable undefined, causing: - UnboundLocalError if DISCARDED is the first block - Duplicate content if DISCARDED follows another block (stale value from previous iteration) Root cause confirmed via MinerU source code: From [`mineru/utils/enum_class.py`](https://github.com/opendatalab/MinerU/blob/main/mineru/utils/enum_class.py#L14): ```python class BlockType: DISCARDED = 'discarded' # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE ``` Per [MinerU documentation](https://opendatalab.github.io/MinerU/reference/output_files/), discarded blocks contain content that should be filtered out for clean text extraction. Fix: Changed `pass` to `continue` to skip discarded blocks entirely. ### Testing - Verified all 16 workers now register successfully in Redis - All workers heartbeating correctly - Document parsing works as expected - MinerU parsing with DISCARDED blocks no longer crashes ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: user210 <user210@rt>	2025-12-18 10:03:30 +08:00
Kevin Hu	8e4d011b15	Fix: parent-children chunking method. (#11997 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 16:50:36 +08:00
Yongteng Lei	03f9be7cbb	Refa: only support MinerU-API now (#11977 ) ### What problem does this PR solve? Only support MinerU-API now, still need to complete frontend for pipeline to allow the configuration of MinerU options. ### Type of change - [x] Refactoring	2025-12-17 12:58:48 +08:00

1 2 3 4 5 ...

308 Commits