ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-07-03 01:01:56 +08:00

Author	SHA1	Message	Date
Harsh Kashyap	b8e960e6c8	fix(qa): preserve final CSV pair row number (#16433 )	2026-07-01 14:52:08 +08:00
Tim Wang	f0f10b6092	Fix: UserFillUp interactive forms not working in agent explore mode (#14589 ) ## Summary - Backend: `_iter_session_completion_events` in `agent_api.py` was filtering out `user_inputs` and `workflow_finished` SSE events, causing agents with UserFillUp components to silently fail in explore mode — the interactive form never appeared, while the same agent worked correctly in run (editor) mode. - Frontend: `SessionChat` component in explore mode was missing `DebugContent` children rendering inside `MessageItem`, so even if the backend forwarded the events, the form UI would not render. Added `DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and input-disabling logic to match the run mode's `chat/box.tsx` behavior. ## What was changed ### Backend (`api/apps/restful_apis/agent_api.py`) - Line 266: Added `"user_inputs"` and `"workflow_finished"` to the allowed event filter in `_iter_session_completion_events` ### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`) - Added imports: `DebugContent`, `MarkdownContent`, `useAwaitCompentData`, `useParams` - Added `sendFormMessage` from `useSendSessionMessage()` hook - Added `useAwaitCompentData` hook for form state management - Added `DebugContent` as `MessageItem` children for the latest assistant message (renders UserFillUp form) - Added `MarkdownContent` + submitted values display for previous assistant messages - Updated `NextMessageInput` disabled states to respect `isWaitting` (form submission in progress) ## Test plan - [x] Agent with UserFillUp component (e.g., email draft with send/edit/cancel options) shows interactive form in explore mode - [x] Same agent continues to work correctly in run (editor) mode - [x] Form submission sends data back to the agent and workflow continues - [x] Input field is disabled while waiting for form submission - [ ] Agents without UserFillUp components are unaffected in explore mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>	2026-06-29 09:45:17 +08:00
galuis116	6bfaa3f21e	Fix: SSRF in markdown parser remote image fetch (#15438 ) ### What problem does this PR solve? `rag/app/naive.py` `Markdown.load_images_from_urls` fetched image URLs parsed straight out of an untrusted uploaded markdown document via a raw `requests.get`, with no SSRF validation. Markdown chunking always reaches this path (`return_section_images=True`), so any authenticated user who uploads a `.md`/`.markdown`/`.mdx` file to a knowledge base could make the server issue requests to internal services or cloud-metadata endpoints, e.g. `![x](http://169.254.169.254/latest/meta-data/...)`. The `image/` Content-Type check only gates decoding — the outbound request (the SSRF) always fires. This was the one user-controlled fetch site missed by the project's existing SSRF-hardening (`common/ssrf_guard.py`, already applied to the crawler, SearXNG, RSS connector, MCP/document APIs, and OAuth avatar download). The fix validates and DNS-pins every hop with `common.ssrf_guard.assert_url_is_safe` before connecting, and follows redirects manually so each redirect target is re-validated (closing the DNS-rebinding / redirect-bypass window), mirroring `common/data_source/rss_connector.py`. Blocked URLs are skipped and logged like any other unreachable image, so legitimate public images are unaffected. Adds a regression test at `test/unit_test/rag/app/test_markdown_image_ssrf.py`. Closes #15437 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Ubuntu <ubuntu@ubuntu-2204.linuxvmimages.local> Co-authored-by: galuis116 <galuis116@users.noreply.github.com>	2026-06-16 18:54:55 +08:00
Ahmad Intisar	3c4d1da98f	Feature/table parser column roles (#13710 ) ### What problem does this PR solve? The table file parser (CSV/Excel) currently treats all columns identically — every column is both vectorized (embedded in chunk text) and stored as filterable metadata. There's no way for users to control which columns should be searchable by semantic meaning versus which should only be filterable attributes. For example, when ingesting a news articles CSV with columns like title, content, country, category, source, etc., the embedding includes metadata fields like country: Brazil and source: Reuters in the chunk text, which dilutes the semantic quality of the embedding without adding retrieval value. The RDBMS connector (MySQL/PostgreSQL) already supports content_columns / metadata_columns, but this capability was missing for file-based table ingestion. This PR adds column-level control (vectorize / metadata / both) for the table file parser, following RAGFlow's existing patterns. Backward compatible: Datasets without table_column_roles or with table_column_mode: auto behave exactly as before (all columns = both). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 10:06:04 +08:00

4 Commits