docling-models
docling-project
transformers
docling-project/docling-models
2,478,635
下载量
209
收藏数
6
浏览量
['cdla-permissive-2.0', 'apache-2.0']
许可
简介
此页面包含驱动PDF文档转换包docling的模型。
模型卡片
许可协议
['cdla-permissive-2.0', 'apache-2.0']
模型详情
已翻译Docling 模型
本页面介绍了为 PDF 文档转换包 docling 提供支持的模型。
布局模型
布局模型接收页面图像,并应用 RT-DETR 模型来识别不同的布局组件。目前可检测的标签包括:Caption、Footnote、Formula、List-item、Page-footer、Page-header、Picture、Section-header、Table、Text、Title。作为参考(来自 DocLayNet 论文),以下是标准目标检测方法在 DocLayNet 数据集上的性能与人工评估的对比:
| 人工 | MRCNN | MRCNN | FRCNN | YOLO | |
|---|---|---|---|---|---|
| 人工 | R50 | R101 | R101 | v5x6 | |
| Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 |
| Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 |
| Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 |
| List-item | 87-88 | 81.2 | 80.8 | 81.0 | 86.2 |
| Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 |
| Page-header | 85-89 | 71.9 | 70.0 | 72.0 | 67.9 |
| Picture | 69-71 | 71.7 | 72.7 | 72.0 | 77.1 |
| Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 |
| Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 |
| Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 |
| Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 |
| 全部 | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 |
TableFormer
TableFormer 模型从表格图像出发,识别表格的结构。它利用布局模型预测的表格区域来定位表格。TableFormer 在表格结构识别方面达到了 SOTA 水平:
| 模型 (TEDS) | 简单表格 | 复杂表格 | 全部表格 |
|---|---|---|---|
| Tabula | 78.0 | 57.8 | 67.9 |
| Traprange | 60.8 | 49.9 | 55.4 |
| Camelot | 80.0 | 66.0 | 73.0 |
| Acrobat Pro | 68.9 | 61.8 | 65.3 |
| EDD | 91.2 | 85.4 | 88.3 |
| TableFormer | 95.4 | 90.1 | 93.6 |
参考文献
@techreport{Docling,
author = {Deep Search Team},
month = {8},
title = {{Docling Technical Report}},
url={https://arxiv.org/abs/2408.09869},
eprint={2408.09869},
doi = "10.48550/arXiv.2408.09869",
version = {1.0.0},
year = {2024}
}
@article{doclaynet2022,
title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis},
doi = {10.1145/3534678.353904},
url = {https://arxiv.org/abs/2206.01062},
author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
year = {2022}
}
@InProceedings{TableFormer2022,
author = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter},
title = {TableFormer: Table Structure Understanding With Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {4614-4623},
doi = {https://doi.org/10.1109/CVPR52688.2022.00457}
}
正在翻译中,请稍候...
标签
arxiv:2408.09869
arxiv:2206.01062
doi:10.57967/hf/3036
license:cdla-permissive-2.0
license:apache-2.0
eval-results
endpoints_compatible
region:us