mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
Core optimizations (refer to arXiv:2510.09722): 1. PDF text fusion: Metadata + OCR dual-path extraction and fusion 2. Page-aware reconstruction: YOLOv10 page segmentation + hierarchical sorting + line number indexing 3. Parallel task decomposition: Basic information/work experience/educational background three-way parallel LLM extraction 4. Index pointer mechanism: LLM returns a range of line numbers instead of generating the full text, reducing the illusion of full text. --------- Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
31 lines
1010 B
Markdown
31 lines
1010 B
Markdown
Please extract education background from the following line-indexed resume text.
|
|
|
|
{indexed_text}
|
|
|
|
Extract into JSON:
|
|
{{
|
|
"education": [
|
|
{{
|
|
"school": "",
|
|
"major": "",
|
|
"degree": "",
|
|
"department": "",
|
|
"start_date": "",
|
|
"end_date": "",
|
|
"desc_lines": [start_index, end_index]
|
|
}}
|
|
]
|
|
}}
|
|
|
|
Field descriptions:
|
|
- school: Full school name, e.g. "Stanford University", both Chinese and English are acceptable
|
|
- major: Major/field of study, e.g. "Computer Science"
|
|
- degree: Degree level - Bachelor/Master/PhD/Associate/High School/Middle School, leave "" if not available
|
|
- department: Department/College, e.g. "School of Engineering"
|
|
- start_date: Start date, format %Y.%m or %Y
|
|
- end_date: End date, use "Present" if still enrolled, "" if not available
|
|
- desc_lines: [start_line, end_line], line number range for education description (optional)
|
|
- Includes coursework, research focus, GPA, honors/awards, etc.
|
|
- Use [] if not available
|
|
|
|
Return JSON only. /no_think |