Commit Graph

5 Commits

Author SHA1 Message Date
Jin Hai
e3cb86d540 Go: parse HTML file (#16018)
### What problem does this PR solve?

```
RAGFlow(api/default)> parse file 'test.html';
Parsing HTML file: test.html
  <html>
......
```

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-15 15:49:17 +08:00
Jin Hai
2846216674 Go: add Markdown parser (#16016)
### What problem does this PR solve?

```
RAGFlow(api/default)> parse file 'README.md';
Parsing Markdown file: README.md
--- AST tree:
HTMLBlock '<div align="center">\n<a href="https:…'
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-15 15:07:29 +08:00
Jin Hai
e89afbae21 Go: file parser config (#15989)
### What problem does this PR solve?

Add parser config

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-13 19:40:43 +08:00
Jin Hai
d32e05d560 Go: add more file parser (#15979)
### What problem does this PR solve?

Now we can parse 'pptx', 'ppt', 'doc', 'xls', 'xlsx'

```
RAGFlow(api/default)> parse file 'test.pptx';
Parsing PPTX file: test.pptx
Document format: pptx
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-12 23:28:14 +08:00
Jin Hai
234f1b7cff Go: add office_oxide and parse docx file. (#15976)
### What problem does this PR solve?

As title.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-06-12 20:28:15 +08:00