docling-models

docling-project transformers

docling-project/docling-models

2,478,635

下载量

209

收藏数

28

浏览量

['cdla-permissive-2.0', 'apache-2.0']

许可

简介

此页面包含驱动PDF文档转换包docling的模型。

模型卡片

许可协议 ['cdla-permissive-2.0', 'apache-2.0']

模型详情

已翻译

Docling 模型

本页面介绍了为 PDF 文档转换包 docling 提供支持的模型。

布局模型

布局模型接收页面图像，并应用 RT-DETR 模型来识别不同的布局组件。目前可检测的标签包括：Caption、Footnote、Formula、List-item、Page-footer、Page-header、Picture、Section-header、Table、Text、Title。作为参考（来自 DocLayNet 论文），以下是标准目标检测方法在 DocLayNet 数据集上的性能与人工评估的对比：

	人工	MRCNN	MRCNN	FRCNN	YOLO
	人工	R50	R101	R101	v5x6
Caption	84-89	68.4	71.5	70.1	77.7
Footnote	83-91	70.9	71.8	73.7	77.2
Formula	83-85	60.1	63.4	63.5	66.2
List-item	87-88	81.2	80.8	81.0	86.2
Page-footer	93-94	61.6	59.3	58.9	61.1
Page-header	85-89	71.9	70.0	72.0	67.9
Picture	69-71	71.7	72.7	72.0	77.1
Section-header	83-84	67.6	69.3	68.4	74.6
Table	77-81	82.2	82.9	82.2	86.3
Text	84-86	84.6	85.8	85.4	88.1
Title	60-72	76.7	80.4	79.9	82.7
全部	82-83	72.4	73.5	73.4	76.8

TableFormer

TableFormer 模型从表格图像出发，识别表格的结构。它利用布局模型预测的表格区域来定位表格。TableFormer 在表格结构识别方面达到了 SOTA 水平：

模型 (TEDS)	简单表格	复杂表格	全部表格
Tabula	78.0	57.8	67.9
Traprange	60.8	49.9	55.4
Camelot	80.0	66.0	73.0
Acrobat Pro	68.9	61.8	65.3
EDD	91.2	85.4	88.3
TableFormer	95.4	90.1	93.6

参考文献

@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {{Docling Technical Report}},
  url={https://arxiv.org/abs/2408.09869},
  eprint={2408.09869},
  doi = "10.48550/arXiv.2408.09869",
  version = {1.0.0},
  year = {2024}
}

@article{doclaynet2022,
  title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis},  
  doi = {10.1145/3534678.353904},
  url = {https://arxiv.org/abs/2206.01062},
  author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
  year = {2022}
}

@InProceedings{TableFormer2022,
    author    = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter},
    title     = {TableFormer: Table Structure Understanding With Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4614-4623},
    doi = {https://doi.org/10.1109/CVPR52688.2022.00457}
}