Files
ragflow/deepdoc
Tim Wang f0f10b6092 Fix: UserFillUp interactive forms not working in agent explore mode (#14589)
## Summary

- **Backend**: `_iter_session_completion_events` in `agent_api.py` was
filtering out `user_inputs` and `workflow_finished` SSE events, causing
agents with UserFillUp components to silently fail in explore mode — the
interactive form never appeared, while the same agent worked correctly
in run (editor) mode.
- **Frontend**: `SessionChat` component in explore mode was missing
`DebugContent` children rendering inside `MessageItem`, so even if the
backend forwarded the events, the form UI would not render. Added
`DebugContent`, `MarkdownContent`, `useAwaitCompentData` hook, and
input-disabling logic to match the run mode's `chat/box.tsx` behavior.

## What was changed

### Backend (`api/apps/restful_apis/agent_api.py`)
- Line 266: Added `"user_inputs"` and `"workflow_finished"` to the
allowed event filter in `_iter_session_completion_events`

### Frontend (`web/src/pages/agent/explore/components/session-chat.tsx`)
- Added imports: `DebugContent`, `MarkdownContent`,
`useAwaitCompentData`, `useParams`
- Added `sendFormMessage` from `useSendSessionMessage()` hook
- Added `useAwaitCompentData` hook for form state management
- Added `DebugContent` as `MessageItem` children for the latest
assistant message (renders UserFillUp form)
- Added `MarkdownContent` + submitted values display for previous
assistant messages
- Updated `NextMessageInput` disabled states to respect `isWaitting`
(form submission in progress)

## Test plan

- [x] Agent with UserFillUp component (e.g., email draft with
send/edit/cancel options) shows interactive form in **explore mode**
- [x] Same agent continues to work correctly in **run (editor) mode**
- [x] Form submission sends data back to the agent and workflow
continues
- [x] Input field is disabled while waiting for form submission
- [ ] Agents without UserFillUp components are unaffected in explore
mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
2026-06-29 09:45:17 +08:00
..
2025-01-21 20:52:28 +08:00

English | 简体中文

DeepDoc

1. Introduction

With a bunch of documents from various domains with various formats and along with diverse retrieval requirements, an accurate analysis becomes a very challenge task. DeepDoc is born for that purpose. There are 2 parts in DeepDoc so far: vision and parser. You can run the flowing test programs if you're interested in our results of OCR, layout recognition and TSR.

python deepdoc/vision/t_ocr.py -h
usage: t_ocr.py [-h] --inputs INPUTS [--output_dir OUTPUT_DIR]

options:
  -h, --help            show this help message and exit
  --inputs INPUTS       Directory where to store images or PDFs, or a file path to a single image or PDF
  --output_dir OUTPUT_DIR
                        Directory where to store the output images. Default: './ocr_outputs'
python deepdoc/vision/t_recognizer.py -h
usage: t_recognizer.py [-h] --inputs INPUTS [--output_dir OUTPUT_DIR] [--threshold THRESHOLD] [--mode {layout,tsr}]

options:
  -h, --help            show this help message and exit
  --inputs INPUTS       Directory where to store images or PDFs, or a file path to a single image or PDF
  --output_dir OUTPUT_DIR
                        Directory where to store the output images. Default: './layouts_outputs'
  --threshold THRESHOLD
                        A threshold to filter out detections. Default: 0.5
  --mode {layout,tsr}   Task mode: layout recognition or table structure recognition

Our models are served on HuggingFace. If you have trouble downloading HuggingFace models, this might help!!

export HF_ENDPOINT=https://hf-mirror.com

2. Vision

We use vision information to resolve problems as human being.

  • OCR. Since a lot of documents presented as images or at least be able to transform to image, OCR is a very essential and fundamental or even universal solution for text extraction.

        python deepdoc/vision/t_ocr.py --inputs=path_to_images_or_pdfs --output_dir=path_to_store_result
    

    The inputs could be directory to images or PDF, or an image or PDF. You can look into the folder 'path_to_store_result' where has images which demonstrate the positions of results, txt files which contain the OCR text.

  • Layout recognition. Documents from different domain may have various layouts, like, newspaper, magazine, book and résumé are distinct in terms of layout. Only when machine have an accurate layout analysis, it can decide if these text parts are successive or not, or this part needs Table Structure Recognition(TSR) to process, or this part is a figure and described with this caption. We have 10 basic layout components which covers most cases:

    • Text
    • Title
    • Figure
    • Figure caption
    • Table
    • Table caption
    • Header
    • Footer
    • Reference
    • Equation

    Have a try on the following command to see the layout detection results.

       python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=layout --output_dir=path_to_store_result
    

    The inputs could be directory to images or PDF, or an image or PDF. You can look into the folder 'path_to_store_result' where has images which demonstrate the detection results as following:

  • Table Structure Recognition(TSR). Data table is a frequently used structure to present data including numbers or text. And the structure of a table might be very complex, like hierarchy headers, spanning cells and projected row headers. Along with TSR, we also reassemble the content into sentences which could be well comprehended by LLM. We have five labels for TSR task:

    • Column
    • Row
    • Column header
    • Projected row header
    • Spanning cell

    Have a try on the following command to see the layout detection results.

       python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=tsr --output_dir=path_to_store_result
    

    The inputs could be directory to images or PDF, or an image or PDF. You can look into the folder 'path_to_store_result' where has both images and html pages which demonstrate the detection results as following:

  • Table Auto-Rotation. For scanned PDFs where tables may be incorrectly oriented (rotated 90°, 180°, or 270°), the PDF parser automatically detects the best rotation angle using OCR confidence scores before performing table structure recognition. This significantly improves OCR accuracy and table structure detection for rotated tables.

    The feature evaluates 4 rotation angles (0°, 90°, 180°, 270°) and selects the one with highest OCR confidence. After determining the best orientation, it re-performs OCR on the correctly rotated table image.

    This feature is enabled by default. You can control it via environment variable:

    # Disable table auto-rotation
    export TABLE_AUTO_ROTATE=false
    
    # Enable table auto-rotation (default)
    export TABLE_AUTO_ROTATE=true
    

    Or via API parameter:

    from deepdoc.parser import PdfParser
    
    parser = PdfParser()
    # Disable auto-rotation for this call
    boxes, tables = parser(pdf_path, auto_rotate_tables=False)
    

3. Parser

Four kinds of document formats as PDF, DOCX, EXCEL and PPT have their corresponding parser. The most complex one is PDF parser since PDF's flexibility. The output of PDF parser includes:

  • Text chunks with their own positions in PDF(page number and rectangular positions).
  • Tables with cropped image from the PDF, and contents which has already translated into natural language sentences.
  • Figures with caption and text in the figures.

Résumé

The résumé is a very complicated kind of document. A résumé which is composed of unstructured text with various layouts could be resolved into structured data composed of nearly a hundred of fields. We haven't opened the parser yet, as we open the processing method after parsing procedure.