ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-03 17:21:59 +08:00

Author	SHA1	Message	Date
apps-lycusinc	8b984c9d5f	Fixing WordNetCorpusReader object has no attribute _LazyCorpusLoader_… (#13600 ) ### What problem does this PR solve? Forces NLTK to load the corpus synchronously once, preventing concurrent tasks from triggering the lazy-loading race condition that cause Fixing WordNetCorpusReader object has no attribute _LazyCorpusLoader_… #13590 ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: shakeel <shakeel@lollylaw.com>	2026-03-13 19:55:01 +08:00
Ethan T.	1cee8b1a7b	fix: use context managers for file handles to prevent resource leaks (#13514 ) ## Summary - Convert bare `open()` calls to `with` context managers or `Path.read_text()` - File handles leak if not properly closed, especially on exceptions - Fixes in crypt.py, sequence2txt_model.py, term_weight.py, deepdoc/vision/__init__.py ## Test plan - [x] File operations work correctly with context managers - [x] Resources properly cleaned up on exceptions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 16:47:06 +08:00
eviaaaaa	d0ca388bec	Refa: implement unified lazy image loading for Docx parsers (qa/manual) (#13329 ) ## Summary This PR is the direct successor to the previous `docx` lazy-loading implementation. It addresses the technical debt intentionally left out in the last PR by fully migrating the `qa` and `manual` parsing strategies to the new lazy-loading model. Additionally, this PR comprehensively refactors the underlying `docx` parsing pipeline to eliminate significant code redundancy and introduces robust fallback mechanisms to handle completely corrupted image streams safely. ## What's Changed * Centralized Abstraction (`docx_parser.py`): Moved the `get_picture` extraction logic up to the `RAGFlowDocxParser` base class. Previously, `naive`, `qa`, and `manual` parsers maintained separate, redundant copies of this method. All downstream strategies now natively gather raw blobs and return `LazyDocxImage` objects automatically. * Robust Corrupted Image Fallback (`docx_parser.py`): Handled edge cases where `python-docx` encounters critically malformed magic headers. Implemented an explicit `try-except` structure that safely intercepts `UnrecognizedImageError` (and similar exceptions) and seamlessly falls back to retrieving the raw binary via `getattr(related_part, "blob", None)`, preventing parser crashes on damaged documents. * Legacy Code & Redundancy Purge: * Removed the duplicate `get_picture` methods from `naive.py`, `qa.py`, and `manual.py`. * Removed the standalone, immediate-decoding `concat_img` method in `manual.py`. It has been completely replaced by the globally unified, lazy-loading-compatible `rag.nlp.concat_img`. * Cleaned up unused legacy imports (e.g., `PIL.Image`, docx exception packages) across all updated strategy files. ## Scope To keep this PR focused, I have restricted these changes strictly to the unification of `docx` extraction logic and the lazy-load migration of `qa` and `manual`. ## Validation & Testing I've tested this to ensure no regressions and validated the fallback logic: * Output Consistency: Compared identical `.docx` inputs using `qa` and `manual` strategies before and after this branch: chunk counts, extracted text, table HTML, and attached images match perfectly. * Memory Footprint Drop: Confirmed a noticeable drop in peak memory usage when processing image-dense documents through the `qa` and `manual` pipelines, bringing them up to parity with the `naive` strategy's performance gains. ## Breaking Changes * None.	2026-03-11 10:00:07 +08:00
qinling0210	7c92f51133	Fix retrieval function when metadata_condtion is specified in retrieval API (#13473 ) ### What problem does this PR solve? Fix https://github.com/infiniflow/ragflow/issues/13388 The following command returns empty when there is doc with the meta data ``` curl --request POST \ --url http://localhost:9222/api/v1/retrieval \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \ --data '{ "question": "any question", "dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"], "metadata_condition": { "logic": "and", "conditions": [ { "name": "character", "comparison_operator": "is", "value": "刘备" } ] } }' ``` When metadata_condtion is specified in the retrieval API, it is converted to doc_ids and doc_ids is passed to retrieval function. In retrieval funciton, when doc_ids is explicitly provided , we should bypass threshold. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 11:57:32 +08:00
Magicbook1108	daec36e935	Fix: add soft limit for graph rag size (#13252 ) ### What problem does this PR solve? Fix: add soft limit for graph rag size #13258 Q2 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 14:02:36 +08:00
Attili-sys	21bc1ab7ec	Feature rtl support (#13118 ) ### What problem does this PR solve? This PR adds comprehensive Right-to-Left (RTL) language support, primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu, etc.). Previously, RTL content had multiple rendering issues: - Incorrect sentence splitting for Arabic punctuation in citation logic - Misaligned text in chat messages and markdown components - Improper positioning of blockquotes and “think” sections - Incorrect table alignment - Citation placement ambiguity in RTL prompts - UI layout inconsistencies when mixing LTR and RTL text This PR introduces backend and frontend improvements to properly detect, render, and style RTL content while preserving existing LTR behavior. #### Backend - Updated sentence boundary regex in `rag/nlp/search.py` to include Arabic punctuation: - `،` (comma) - `؛` (semicolon) - `؟` (question mark) - `۔` (Arabic full stop) - Ensures citation insertion works correctly in RTL sentences. - Updated citation prompt instructions to clarify citation placement rules for RTL languages. #### Frontend - Introduced a new utility: `text-direction.ts` - Detects text direction based on Unicode ranges. - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts. - Provides `getDirAttribute()` for automatic `dir` assignment. - Applied dynamic `dir` attributes across: - Markdown rendering - Chat messages - Search results - Tables - Hover cards and reference popovers - Added proper RTL styling in LESS: - Text alignment adjustments - Blockquote border flipping - Section indentation correction - Table direction switching - Use of `<bdi>` for figure labels to prevent bidirectional conflicts #### DevOps / Environment - Added Windows backend launch script with retry handling. - Updated dependency metadata. - Adjusted development-only React debugging behavior. --- ### Type of change - [x] Bug Fix (non-breaking change which fixes RTL rendering and citation issues) - [x] New Feature (non-breaking change which adds RTL detection and dynamic direction handling) --------- Co-authored-by: 6ba3i <isbaaoui09@gmail.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-02 13:03:44 +08:00
eviaaaaa	fa71f8d0c7	refactor(word): lazy-load DOCX images to reduce peak memory without changing output (#13233 ) Summary This PR tackles a significant memory bottleneck when processing image-heavy Word documents. Previously, our pipeline eagerly decoded DOCX images into `PIL.Image` objects, which caused high peak memory usage. To solve this, I've introduced a lazy-loading approach: images are now stored as raw blobs and only decoded exactly when and where they are consumed. This successfully reduces the memory footprint while keeping the parsing output completely identical to before. What's Changed Instead of a dry file-by-file list, here is the logical breakdown of the updates: * The Core Abstraction (`lazy_image.py`): Introduced `LazyDocxImage` along with helper APIs to handle lazy decoding, image-type checks, and NumPy compatibility. It also supports `.close()` and detached PIL access to ensure safe lifecycle management and prevent memory leaks. * Pipeline Integration (`naive.py`, `figure_parser.py`, etc.): Updated the general DOCX picture extraction to return these new lazy images. Downstream consumers (like the figure/VLM flow and base64 encoding paths) now decode images right at the use site using detached PIL instances, avoiding shared-instance side effects. * Compatibility Hooks (`operators.py`, `book.py`, etc.): Added necessary compatibility conversions so these lazy images flow smoothly through existing merging, filtering, and presentation steps without breaking. Scope & What is Intentionally Left Out To keep this PR focused, I have restricted these changes strictly to the general Word pipeline and its downstream consumers. The `QA` and `manual` Word parsing pipelines are explicitly not modified in this PR. They can be safely migrated to this new lazy-load model in a subsequent, standalone PR. Design Considerations I briefly considered adding image compression during processing, but decided against it to avoid any potential quality degradation in the derived outputs. I also held off on a massive pipeline re-architecture to avoid overly invasive changes right now. Validation & Testing I've tested this to ensure no regressions: * Compared identical DOCX inputs before and after this branch: chunk counts, extracted text, table HTML, and image descriptions match perfectly. * Confirmed a noticeable drop in peak memory usage when processing image-dense documents. For a 30MB Word document containing 243 1080p screenshots, memory consumption is reduced by approximately 1.5GB. Breaking Changes None.	2026-02-28 11:22:31 +08:00
qinling0210	4bc622b409	Fix parameter of calling self.dataStore.get() and warning info during parser (#13068 ) ### What problem does this PR solve? Fix parameter of calling self.dataStore.get() and warning info during parser https://github.com/infiniflow/ragflow/issues/13036 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 17:56:59 +08:00
6ba3i	fabbfcab90	Fix: failing p3 test for SDK/HTTP APIs (#13062 ) ### What problem does this PR solve? Adjust highlight parsing, add row-count SQL override, tweak retrieval thresholding, and update tests with engine-aware skips/utilities. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 14:56:10 +08:00
Kevin Hu	1262533b74	Feat: support verify to set llm key and boost bigrams. (#12980 ) #12863 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-05 19:19:09 +08:00
Magicbook1108	f11ca54e0e	Fix: docx parser output consistent (#12965 ) ### What problem does this PR solve? Fix: docx parser output consistent > File "/home/bxy/ragflow/rag/flow/parser/parser.py", line 506, in _word > sections, tbls = docx_parser(name, binary=blob) > ^^^^^^^^^^^^^^ > ValueError: too many values to unpack (expected 2) > ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-03 15:36:58 +08:00
Philipp Heyken Soares	ad06c042c4	Support operator constraints in semi-automatic metadata filtering (#12956 ) ### What problem does this PR solve? #### Summary This PR enhances the Semi-automatic metadata filtering mode by allowing users to explicitly pre-define operators (e.g., contains, =, >, etc.) for selected metadata keys. While the LLM still dynamically extracts the filter value from the user's query, it is now strictly constrained to use the operator specified in the UI configuration. Using this feature is optional. By default the operator selection is set to "automatic" resulting in the LLM choosing the operator (as presently). #### Rationale & Use Case This enhancement was driven by a concrete challenge I encountered while working with technical documentation. In my specific use case, I was trying to filter for software versions within a technical manual. In this dataset, a single document chunk often applies to multiple software versions. These versions are stored as a combined string within the metadata for each chunk. When using the standard semi-automatic filter, the LLM would inconsistently choose between the contains and equals operators. When it chose equals, it would exclude every chunk that applied to more than one version, even if the version I was searching for was clearly included in that metadata string. This led to incomplete and frustrating retrieval results. By extending the semi-automatic filter to allow pre-defining the operator for a specific key, I was able to force the use of contains for the version field. This change immediately led to significantly improved and more reliable results in my case. I believe this functionality will be equally useful for others dealing with "tagged" or multi-value metadata where the relationship between the query and the field is known, but the specific value needs to remain dynamic. #### Key Changes ##### Backend & Core Logic - `common/metadata_utils.py`: Updated apply_meta_data_filter to support a mixed data structure for semi_auto (handling both legacy string arrays and the new object-based format {"key": "...", "op": "..."}). - `rag/prompts/generator.py`: Extended gen_meta_filter to accept and pass operator constraints to the LLM. - `rag/prompts/meta_filter.md`: Updated the system prompt to instruct the LLM to strictly respect provided operator constraints. ##### Frontend - `web/src/components/metadata-filter/metadata-semi-auto-fields.tsx`: Enhanced the UI to include an operator dropdown for each selected metadata key, utilizing existing operator constants. - `web/src/components/metadata-filter/index.tsx`: Updated the validation schema to accommodate the new state structure. #### Test Plan - Backward Compatibility: Verified that existing semi-auto filters stored as simple strings still function correctly. - Prompt Verification: Confirmed that constraints are correctly rendered in the LLM system prompt when specified. - Added unit tests as `test/unit_test/common/test_apply_semi_auto_meta_data_filter.py` - Manual End-to-End: - Configured a "Semi-automatic" filter for a "Version" key with the "contains" operator. - Asked a version-specific query. - Result <img width="1173" height="704" alt="Screenshot 2026-02-02 145359" src="https://github.com/user-attachments/assets/510a6a61-a231-4dc2-a7fe-cdfc07219132" /> ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Philipp Heyken Soares <philipp.heyken-soares@am.ai>	2026-02-03 11:11:34 +08:00
Magicbook1108	7be3dacdaa	Fix: custom delimeter in docx (#12946 ) ### What problem does this PR solve? Fix: custom delimeter in docx ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-03 09:43:18 +08:00
Yongteng Lei	13076bb87b	Fix: Parent chunking fails on DOCX files (#12822 ) ### What problem does this PR solve? Fixes parent chunking fails on DOCX files. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-26 17:55:09 +08:00
Kevin Hu	927db0b373	Refa: asyncio.to_thread to ThreadPoolExecutor to break thread limitat… (#12716 ) ### Type of change - [x] Refactoring	2026-01-20 13:29:37 +08:00
qinling0210	828ae1e82f	Round float value of minimum_should_match (#12688 ) ### What problem does this PR solve? In paragraph() of class FulltextQueryer, "len(keywords) / 10" should be rounded to integer before set to minimum_should_match. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-19 11:39:33 +08:00
Pegasus	d8192f8f17	Fix: validate regex pattern in split_with_pattern to prevent crash (#12633 ) ### What problem does this PR solve? Fix regex pattern validation in split_with_pattern (#12605) - Add try-except block to validate user-provided regex patterns before use - Gracefully fallback to single chunk when invalid regex is provided - Prevent server crash during DOCX parsing with malformed delimiters ## Problem Parsing DOCX files with custom regex delimiters crashes with `re.error: nothing to repeat at position 9` when users provide invalid regex patterns. Closes #12605 ## Solution Validate and compile regex pattern before use. On invalid pattern, log warning and return content as single chunk instead of crashing. ## Changes - `rag/nlp/__init__.py`: Add regex validation in `split_with_pattern()` function ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461	2026-01-15 14:24:51 +08:00
Kevin Hu	9a10558f80	Refa: async retrieval process. (#12629 ) ### Type of change - [x] Refactoring - [x] Performance Improvement	2026-01-15 12:28:49 +08:00
Kevin Hu	44bada64c9	Feat: support tree structured deep-research policy. (#12559 ) ### What problem does this PR solve? #12558 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-13 09:41:35 +08:00
lys1313013	b226e06e2d	refactor: remove debug print statements (#12534 ) ### What problem does this PR solve? refactor: remove debug print statements ### Type of change - [x] Refactoring	2026-01-09 19:23:50 +08:00
Kevin Hu	23a9544b73	Fix: toc async issue. (#12485 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-07 15:35:30 +08:00
Magicbook1108	011bbe9556	Feat: support context window for docx (#12455 ) ### What problem does this PR solve? Feat: support context window for docx #12303 Done: - [x] naive.py - [x] one.py TODO: - [ ] book.py - [ ] manual.py Fix: incorrect image position Fix: incorrect chunk type tag ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-01-07 15:08:17 +08:00
Yongteng Lei	4cd4526492	Feat: PDF vision figure parser supports reading context (#12416 ) ### What problem does this PR solve? PDF vision figure parser supports reading context. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-05 09:55:43 +08:00
Kevin Hu	52f91c2388	Refine: image/table context. (#12336 ) ### What problem does this PR solve? #12303 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-30 20:24:27 +08:00
Jin Hai	01f0ced1e6	Fix IDE warnings (#12281 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-29 12:01:18 +08:00
Yongteng Lei	51bc41b2e8	Refa: improve image table context (#12244 ) ### What problem does this PR solve? Improve image table context. Current strategy in attach_media_context: - Order by position when possible: if any chunk has page/position info, sort by (page, top, left), otherwise keep original order. - Apply only to media chunks: images use image_context_size, tables use table_context_size. - Primary matching: on the same page, choose a text chunk whose vertical span overlaps the media, then pick the one with the closest vertical midpoint. - Fallback matching: if no overlap on that page, choose the nearest text chunk on the same page (page-head uses the next text; page-tail uses the previous text). - Context extraction: inside the chosen text chunk, find a mid-sentence boundary near the text midpoint, then take context_size tokens split before/after (total budget). - No multi-chunk stitching: context comes from a single text chunk to avoid mixing unrelated segments. ### Type of change - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-26 17:55:32 +08:00
Lynn	6e9691a419	Feat: message manage (#12196 ) ### What problem does this PR solve? Manage message and use in agent. Issue #4213 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-25 21:18:13 +08:00
Kevin Hu	8cbfb5aef6	Fix: toc no chunk found issue. (#12197 ) ### What problem does this PR solve? #12170 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-25 14:06:20 +08:00
Kevin Hu	8197f9a873	Fix: table tag on chunks. (#12126 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-25 11:25:38 +08:00
Kevin Hu	8e4d011b15	Fix: parent-children chunking method. (#11997 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-12-17 16:50:36 +08:00
Kevin Hu	ea4a5cd665	Fix: tokenizer issue. (#11902 ) #11786 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-11 17:38:17 +08:00
buua436	dd046be976	Fix: parent-child chunking method (#11810 ) ### What problem does this PR solve? change: parent-child chunking method ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-09 09:34:01 +08:00
David Eberto Domenech Castillo	3c224c817b	Fix: Correct pagination and early termination bugs in chunk_list() (#11692 ) ## Summary This PR fixes two critical bugs in `chunk_list()` method that prevent processing large documents (>128 chunks) in GraphRAG and other workflows. ## Bugs Fixed ### Bug 1: Incorrect pagination offset calculation Location: `rag/nlp/search.py` lines 530-531 Problem: The loop variable `p` was used directly as offset, causing incorrect pagination: ```python # BEFORE (BUGGY): for p in range(offset, max_count, bs): # p = 0, 128, 256, 384... es_res = self.dataStore.search(..., p, bs, ...) # p used as offset Fix: Use page number multiplied by batch size: # AFTER (FIXED): for page_num, p in enumerate(range(offset, max_count, bs)): es_res = self.dataStore.search(..., page_num * bs, bs, ...) Bug 2: Premature loop termination Location: rag/nlp/search.py lines 538-539 Problem: Loop terminates when any page returns fewer than 128 chunks, even when thousands more remain: # BEFORE (BUGGY): if len(dict_chunks.values()) < bs: # Breaks at 126 chunks even if 3,000+ remain break Fix: Only terminate when zero chunks returned: # AFTER (FIXED): if len(dict_chunks.values()) == 0: break Enhancement: Add max_count parameter to GraphRAG Location: graphrag/general/index.py line 60 Added max_count=10000 parameter to chunk loading for both LightRAG and General GraphRAG paths to ensure all chunks are processed. Testing Validated with a 314-page legal document containing 3,207 chunks: Before fixes: - Only 2-126 chunks processed - GraphRAG generated 25 nodes, 8 edges After fixes: - All 3,209 chunks processed ✅ - GraphRAG processing complete dataset Impact These bugs affect any workflow using chunk_list() with large documents, particularly: - GraphRAG knowledge graph generation - RAPTOR hierarchical summarization - Document processing pipelines with >128 chunks Related Issue Fixes #11687 Checklist - Code follows project style guidelines - Tested with large documents (3,207+ chunks) - Both bugs validated by Dosu bot in issue #11687 - No breaking changes to API --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-12-03 19:44:20 +08:00
qinling0210	2ffe6f7439	Import rag_tokenizer from Infinity (#11647 ) ### What problem does this PR solve? - Original rag/nlp/rag_tokenizer.py is put to Infinity and infinity-sdk via https://github.com/infiniflow/infinity/pull/3117 . Import rag_tokenizer from infinity and inherit from rag_tokenizer.RagTokenizer in new rag/nlp/rag_tokenizer.py. - Bump infinity to 0.6.8 ### Type of change - [x] Refactoring	2025-12-02 14:59:37 +08:00
Kevin Hu	81ae6cf78d	Feat: support uploading in dialog. (#11634 ) ### What problem does this PR solve? #9590 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-01 16:54:57 +08:00
Kevin Hu	6ea4248bdc	Feat: support parent-child in search procedure. (#11629 ) ### What problem does this PR solve? #7996 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-01 14:03:09 +08:00
Kevin Hu	14616cf845	Feat: add child parent chunking method in backend. (#11598 ) ### What problem does this PR solve? #7996 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 19:25:32 +08:00
Billy Bao	982ed233a2	Fix: doc_aggs not correctly returned when no chunks retrieved. (#11578 ) ### What problem does this PR solve? Fix: doc_aggs not correctly returned when no chunks retrieved. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-28 13:09:05 +08:00
Yongteng Lei	9d8b96c1d0	Feat: add context for figure and table (#11547 ) ### What problem does this PR solve? Add context for figure table. ![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e) `==================()` for demonstrating purpose. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-27 10:21:44 +08:00
Zhichang Yu	40e84ca41a	Use Infinity single-field-multi-index (#11444 ) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-11-26 11:06:37 +08:00
Kevin Hu	74e0b58d89	Fix: excel default optimization. (#11519 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-25 19:54:20 +08:00
Yongteng Lei	db0f6840d9	Feat: ignore chunk size when using custom delimiters (#11434 ) ### What problem does this PR solve? Ignore chunk size when using custom delimiter. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-21 14:36:26 +08:00
Yongteng Lei	b846a0f547	Fix: incorrect retrieval total count with pagination enabled (#11400 ) ### What problem does this PR solve? Incorrect retrieval total count with pagination enabled. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-20 15:35:09 +08:00
He Wang	38234aca53	feat: add OceanBase doc engine (#11228 ) ### What problem does this PR solve? Add OceanBase doc engine. Close #5350 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-20 10:00:14 +08:00
Billy Bao	7264fb6978	Fix: concat images in word document. (#11310 ) ### What problem does this PR solve? Fix: concat images in word document. Partially solved issues in #11063 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-17 19:38:26 +08:00
Jin Hai	bd4bc57009	Refactor: move mcp connection utilities to common (#11304 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-17 15:34:17 +08:00
Billy Bao	e27ff8d3d4	Fix: rerank algorithm (#11266 ) ### What problem does this PR solve? Fix: rerank algorithm #11234 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-14 13:59:54 +08:00
Billy Bao	93422fa8cc	Fix: Law parser (#11246 ) ### What problem does this PR solve? Fix: Law parser ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-13 15:19:02 +08:00
Jin Hai	296476ab89	Refactor function name (#11210 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-12 19:00:15 +08:00
Billy Bao	5629fbd2ca	Fix: OpenSearch retrieval no return & Add documentation of /retrieval (#11083 ) ### What problem does this PR solve? Fix: OpenSearch retrieval no return #11006 Add documentation #11072 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Documentation Update --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2025-11-07 09:28:42 +08:00

1 2 3 4 5 ...

269 Commits