Depth-Anything-V2-Small-hf

depth-anything depth-estimation transformers

depth-anything/Depth-Anything-V2-Small-hf

2,015,953

下载量

31

收藏数

16

浏览量

apache-2.0

许可

简介

Depth Anything V2 基于 59.5 万张合成标注图像和 6200 万张以上真实未标注图像训练而成，提供了能力最强的单目深度估计（MDE）模型，具有以下特点： - 比 Depth Anything V1 更精细的细节 - 比 Depth Anything V1 及基于 SD 的模型更鲁棒

模型卡片

许可协议 apache-2.0

任务 depth-estimation

depth relative depth

模型配置

模型类型 depth_anything

架构 DepthAnythingForDepthEstimation

模型详情

已翻译

Depth Anything V2 Small – Transformers 版本

Depth Anything V2 基于 595K 张合成标注图像和 62M+ 张真实无标注图像训练而成，提供了功能最强大的单目深度估计（MDE）模型，具有以下特点：
- 比 Depth Anything V1 更精细的细节
- 比 Depth Anything V1 和基于 SD 的模型（如 Marigold、Geowizard）更鲁棒
- 比基于 SD 的模型更高效（速度快 10 倍）且更轻量
- 使用我们的预训练模型可获得令人印象深刻的微调性能

该模型检查点与 transformers 库兼容。

Depth Anything V2 由 Lihe Yang 等人在同名论文中提出。它采用与原始 Depth Anything 版本相同的架构，但使用合成数据和更大容量的 teacher 模型，实现了更精细、更鲁棒的深度预测。原始 Depth Anything 模型由 Lihe Yang 等人在论文 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data 中提出，并首次发布在此仓库中。

在线演示。

模型描述

Depth Anything V2 采用 DPT 架构，以 DINOv2 作为 backbone。

该模型在约 600K 张合成标注图像和约 6200 万张真实无标注图像上训练，在相对和绝对深度估计方面均取得了最先进的结果。

Depth Anything 概览。取自原始论文。

预期用途与限制

您可以使用原始模型执行零样本深度估计等任务。请查看模型中心以寻找您感兴趣任务的其他版本。

使用方法

以下是使用该模型执行零样本深度估计的方法：

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

或者，您也可以使用模型和 processor 类：

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Small-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Small-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

更多代码示例，请参考文档。

引用

@misc{yang2024depth,
      title={Depth Anything V2}, 
      author={Lihe Yang and Bingyi Kang and Zilong Huang and Zhen Zhao and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
      year={2024},
      eprint={2406.09414},
      archivePrefix={arXiv},
      primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}