4 Commits

Author SHA1 Message Date
Harsh Kashyap
b8e960e6c8 fix(qa): preserve final CSV pair row number (#16433) 2026-07-01 14:52:08 +08:00
Tim Wang
f0f10b6092 Fix: UserFillUp interactive forms not working in agent explore mode (#14589)
## Summary

- **Backend**: `_iter_session_completion_events` in `agent_api.py` was
filtering out `user_inputs` and `workflow_finished` SSE events, causing
agents with UserFillUp components to silently fail in explore mode — the
interactive form never appeared, while the same agent worked correctly
in run (editor) mode.
- **Frontend**: `SessionChat` component in explore mode was missing
`DebugContent` children rendering inside `MessageItem`, so even if the
backend forwarded the events, the form UI would not render. Added
`DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and
input-disabling logic to match the run mode's `chat/box.tsx` behavior.

## What was changed

### Backend (`api/apps/restful_apis/agent_api.py`)
- Line 266: Added `"user_inputs"` and `"workflow_finished"` to the
allowed event filter in `_iter_session_completion_events`

### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`)
- Added imports: `DebugContent`, `MarkdownContent`,
`useAwaitCompentData`, `useParams`
- Added `sendFormMessage` from `useSendSessionMessage()` hook
- Added `useAwaitCompentData` hook for form state management
- Added `DebugContent` as `MessageItem` children for the latest
assistant message (renders UserFillUp form)
- Added `MarkdownContent` + submitted values display for previous
assistant messages
- Updated `NextMessageInput` disabled states to respect `isWaitting`
(form submission in progress)

## Test plan

- [x] Agent with UserFillUp component (e.g., email draft with
send/edit/cancel options) shows interactive form in **explore mode**
- [x] Same agent continues to work correctly in **run (editor) mode**
- [x] Form submission sends data back to the agent and workflow
continues
- [x] Input field is disabled while waiting for form submission
- [ ] Agents without UserFillUp components are unaffected in explore
mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-29 09:45:17 +08:00
galuis116
6bfaa3f21e Fix: SSRF in markdown parser remote image fetch (#15438)
### What problem does this PR solve?

`rag/app/naive.py` `Markdown.load_images_from_urls` fetched image URLs
parsed
straight out of an untrusted uploaded markdown document via a raw
`requests.get`,
with no SSRF validation. Markdown chunking always reaches this path
(`return_section_images=True`), so any authenticated user who uploads a
`.md`/`.markdown`/`.mdx` file to a knowledge base could make the server
issue
requests to internal services or cloud-metadata endpoints, e.g.
`![x](http://169.254.169.254/latest/meta-data/...)`. The `image/`
Content-Type
check only gates decoding — the outbound request (the SSRF) always
fires.

This was the one user-controlled fetch site missed by the project's
existing
SSRF-hardening (`common/ssrf_guard.py`, already applied to the crawler,
SearXNG,
RSS connector, MCP/document APIs, and OAuth avatar download).

The fix validates and DNS-pins every hop with
`common.ssrf_guard.assert_url_is_safe`
before connecting, and follows redirects manually so each redirect
target is
re-validated (closing the DNS-rebinding / redirect-bypass window),
mirroring
`common/data_source/rss_connector.py`. Blocked URLs are skipped and
logged like
any other unreachable image, so legitimate public images are unaffected.
Adds a
regression test at `test/unit_test/rag/app/test_markdown_image_ssrf.py`.

Closes #15437 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Ubuntu <ubuntu@ubuntu-2204.linuxvmimages.local>
Co-authored-by: galuis116 <galuis116@users.noreply.github.com>
2026-06-16 18:54:55 +08:00
Ahmad Intisar
3c4d1da98f Feature/table parser column roles (#13710)
### What problem does this PR solve?

The table file parser (CSV/Excel) currently treats all columns
identically — every column is both vectorized (embedded in chunk text)
and stored as filterable metadata. There's no way for users to control
which columns should be searchable by semantic meaning versus which
should only be filterable attributes.

For example, when ingesting a news articles CSV with columns like title,
content, country, category, source, etc., the embedding includes
metadata fields like country: Brazil and source: Reuters in the chunk
text, which dilutes the semantic quality of the embedding without adding
retrieval value.

The RDBMS connector (MySQL/PostgreSQL) already supports content_columns
/ metadata_columns, but this capability was missing for file-based table
ingestion.

This PR adds column-level control (vectorize / metadata / both) for the
table file parser, following RAGFlow's existing patterns.

Backward compatible: Datasets without table_column_roles or with
table_column_mode: auto behave exactly as before (all columns = both).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-11 10:06:04 +08:00