### What problem does this PR solve?
## Problem
When parsing PDFs containing English figure/table captions (e.g. "Fig.
20", "Figure 20", "Table 20"), the `is_caption` method in
`TableStructureRecognizer` failed to recognize them as captions. This
caused figure numbering gaps in the parsed output (e.g. Fig. 19 → Fig.
21, skipping Fig. 20).
## Root Cause
The `is_caption` regex only matched Chinese caption formats:
```python
patt = [r"[图表]+[ 0-9::]{2,}"]
```
When the layout recognizer also failed to assign a `caption` layout type
to a given text block, English captions were entirely missed.
## Fix
Added three case-insensitive English caption patterns to `is_caption` in
`deepdoc/vision/table_structure_recognizer.py`:
- `(?i)Fig\.?\s*\d+` — matches `Fig. 20`, `Fig 20`, `FIG. 20`, etc.
- `(?i)Figure\s+\d+` — matches `Figure 20`, `FIGURE 20`, etc.
- `(?i)Table\s+\d+` — matches `Table 20`, `TABLE 20`, etc.
## Files Changed
- `deepdoc/vision/table_structure_recognizer.py` — extended `is_caption`
regex patterns
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: noob <yixiao121314@outlook.com>
## Summary
- Fix `a image` → `an image` in README and log message
- Fix `colomn` → `column` in table structure recognizer comment
- Fix `formated` → `formatted` in confluence connector docstring
- Fix `tabel of content` → `table of contents` in TOC prompt
## Test plan
- [ ] Documentation and comment changes, no functional impact
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
issue:
#10825
change:
remove unexpected keyword argument in table_structure_recognizer logging
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add support for the Ascend table structure recognizer.
Use the environment variable `TABLE_STRUCTURE_RECOGNIZER_TYPE=ascend` to
enable the Ascend table structure recognizer.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Optimized Recognizer.sort_X_firstly and Recognizer.sort_Y_firstly
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Edit chunk shall update instead of insert it. Close#3679
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Use consistent log file names, introduced initLogger
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Related source file is in Windows/DOS format, they are format to Unix
format.
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>