模型库 / docling-project/docling-models

docling-models

docling-project transformers
docling-project/docling-models
2,478,635
下载量
209
收藏数
6
浏览量
['cdla-permissive-2.0', 'apache-2.0']
许可

简介

此页面包含驱动PDF文档转换包docling的模型。

模型卡片

许可协议 ['cdla-permissive-2.0', 'apache-2.0']

模型详情

已翻译

Docling 模型

本页面介绍了为 PDF 文档转换包 docling 提供支持的模型。

布局模型

布局模型接收页面图像,并应用 RT-DETR 模型来识别不同的布局组件。目前可检测的标签包括:Caption、Footnote、Formula、List-item、Page-footer、Page-header、Picture、Section-header、Table、Text、Title。作为参考(来自 DocLayNet 论文),以下是标准目标检测方法在 DocLayNet 数据集上的性能与人工评估的对比:

人工 MRCNN MRCNN FRCNN YOLO
人工 R50 R101 R101 v5x6
Caption 84-89 68.4 71.5 70.1 77.7
Footnote 83-91 70.9 71.8 73.7 77.2
Formula 83-85 60.1 63.4 63.5 66.2
List-item 87-88 81.2 80.8 81.0 86.2
Page-footer 93-94 61.6 59.3 58.9 61.1
Page-header 85-89 71.9 70.0 72.0 67.9
Picture 69-71 71.7 72.7 72.0 77.1
Section-header 83-84 67.6 69.3 68.4 74.6
Table 77-81 82.2 82.9 82.2 86.3
Text 84-86 84.6 85.8 85.4 88.1
Title 60-72 76.7 80.4 79.9 82.7
全部 82-83 72.4 73.5 73.4 76.8

TableFormer

TableFormer 模型从表格图像出发,识别表格的结构。它利用布局模型预测的表格区域来定位表格。TableFormer 在表格结构识别方面达到了 SOTA 水平:

模型 (TEDS) 简单表格 复杂表格 全部表格
Tabula 78.0 57.8 67.9
Traprange 60.8 49.9 55.4
Camelot 80.0 66.0 73.0
Acrobat Pro 68.9 61.8 65.3
EDD 91.2 85.4 88.3
TableFormer 95.4 90.1 93.6

参考文献

@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {{Docling Technical Report}},
  url={https://arxiv.org/abs/2408.09869},
  eprint={2408.09869},
  doi = "10.48550/arXiv.2408.09869",
  version = {1.0.0},
  year = {2024}
}

@article{doclaynet2022,
  title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis},  
  doi = {10.1145/3534678.353904},
  url = {https://arxiv.org/abs/2206.01062},
  author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
  year = {2022}
}

@InProceedings{TableFormer2022,
    author    = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter},
    title     = {TableFormer: Table Structure Understanding With Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4614-4623},
    doi = {https://doi.org/10.1109/CVPR52688.2022.00457}
}

标签

arxiv:2408.09869 arxiv:2206.01062 doi:10.57967/hf/3036 license:cdla-permissive-2.0 license:apache-2.0 eval-results endpoints_compatible region:us

操作


详细信息

厂商
docling-project
框架
transformers
许可(HF)
['cdla-permissive-2.0', 'apache-2.0']