dinov2-small

facebook image-feature-extraction transformers

facebook/dinov2-small

2,148,091

下载量

64

收藏数

36

浏览量

apache-2.0

许可

简介

Vision Transformer (small-sized model) trained using DINOv2

模型卡片

许可协议 apache-2.0

dino vision

模型配置

模型类型 dinov2

架构 Dinov2Model

模型详情

已翻译

使用 DINOv2 训练的 Vision Transformer（小型模型）

Vision Transformer (ViT) 模型采用 DINOv2 方法训练。该模型由 Oquab 等人在论文 DINOv2: Learning Robust Visual Features without Supervision 中提出，并首次发布于此仓库。

免责声明：发布 DINOv2 的团队并未为该模型撰写 model card，因此本 model card 由 Hugging Face 团队编写。

模型描述

Vision Transformer (ViT) 是一种 transformer 编码器模型（类似 BERT），通过自监督方式在大量图像集合上预训练。

图像以固定大小 patch 序列的形式输入模型，这些 patch 经过线性嵌入。同时，在序列开头添加一个 [CLS] token 用于分类任务。在将序列输入 Transformer 编码器各层之前，还会添加绝对位置嵌入。

请注意，该模型不包含任何微调后的分类头。

通过预训练，模型学习到图像的内在表示，可用于提取下游任务所需的特征：例如，如果你有带标签的图像数据集，可以在预训练编码器之上添加一个线性层来训练标准分类器。通常的做法是在 [CLS] token 之上放置一个线性层，因为该 token 的最后一个隐藏状态可视为整张图像的表示。

预期用途与局限性

你可以使用原始模型进行特征提取。请访问模型中心查找你感兴趣任务的微调版本。

使用方法

以下是使用该模型的方法：

from transformers import AutoImageProcessor, AutoModel
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained('facebook/dinov2-small')
model = AutoModel.from_pretrained('facebook/dinov2-small')

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state

BibTeX 条目与引用信息

misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}