convnextv2_nano.fcmae_ft_in22k_in1k
timm
image-classification
timm
timm/convnextv2_nano.fcmae_ft_in22k_in1k
3,159,019
下载量
4
收藏数
10
浏览量
cc-by-nc-4.0
许可
简介
A ConvNeXt-V2 image classification model. Pretrained with a fully convolutional masked autoencoder framework (FCMAE) and fine-tuned on ImageNet-22k and then ImageNet-1k.
模型卡片
许可协议
cc-by-nc-4.0
框架
timm
数据集
imagenet-1k
imagenet-1k
image-classification
timm
transformers
模型详情
已翻译convnextv2_nano.fcmae_ft_in22k_in1k 模型卡片
一个 ConvNeXt-V2 图像分类模型。采用全卷积掩码自编码器框架(FCMAE)进行预训练,并在 ImageNet-22k 和 ImageNet-1k 上进行了微调。
模型详情
- 模型类型: 图像分类 / 特征骨干网络
- 模型统计:
- 参数量(M):15.6
- GMACs:2.5
- 激活值(M):8.4
- 图像尺寸:训练 = 224 x 224,测试 = 288 x 288
- 论文:
- ConvNeXt V2:Co-designing and Scaling ConvNets with Masked Autoencoders:https://arxiv.org/abs/2301.00808
- 原始代码: https://github.com/facebookresearch/ConvNeXt-V2
- 数据集: ImageNet-1k
- 预训练数据集: ImageNet-1k
模型使用
图像分类
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('convnextv2_nano.fcmae_ft_in22k_in1k', pretrained=True)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
特征图提取
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'convnextv2_nano.fcmae_ft_in22k_in1k',
pretrained=True,
features_only=True,
)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
for o in output:
# print shape of each feature map in output
# e.g.:
# torch.Size([1, 80, 56, 56])
# torch.Size([1, 160, 28, 28])
# torch.Size([1, 320, 14, 14])
# torch.Size([1, 640, 7, 7])
print(o.shape)
图像 Embedding
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'convnextv2_nano.fcmae_ft_in22k_in1k',
pretrained=True,
num_classes=0, # remove classifier nn.Linear
)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
# or equivalently (without needing to set num_classes=0)
output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 640, 7, 7) shaped tensor
output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor
模型对比
在 timm 模型结果 中查看该模型的数据集和运行时指标。
所有计时数据来自 eager 模式 PyTorch 1.13,运行于 RTX 3090,启用 AMP。
| 模型 | top1 | top5 | img_size | param_count | gmacs | macts | samples_per_sec | batch_size |
|---|---|---|---|---|---|---|---|---|
| convnextv2_huge.fcmae_ft_in22k_in1k_512 | 88.848 | 98.742 | 512 | 660.29 | 600.81 | 413.07 | 28.58 | 48 |
| convnextv2_huge.fcmae_ft_in22k_in1k_384 | 88.668 | 98.738 | 384 | 660.29 | 337.96 | 232.35 | 50.56 | 64 |
| convnext_xxlarge.clip_laion2b_soup_ft_in1k | 88.612 | 98.704 | 256 | 846.47 | 198.09 | 124.45 | 122.45 | 256 |
| convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 | 88.312 | 98.578 | 384 | 200.13 | 101.11 | 126.74 | 196.84 | 256 |
| convnextv2_large.fcmae_ft_in22k_in1k_384 | 88.196 | 98.532 | 384 | 197.96 | 101.1 | 126.74 | 128.94 | 128 |
| convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 | 87.968 | 98.47 | 320 | 200.13 | 70.21 | 88.02 | 283.42 | 256 |
| convnext_xlarge.fb_in22k_ft_in1k_384 | 87.75 | 98.556 | 384 | 350.2 | 179.2 | 168.99 | 124.85 | 192 |
| convnextv2_base.fcmae_ft_in22k_in1k_384 | 87.646 | 98.422 | 384 | 88.72 | 45.21 | 84.49 | 209.51 | 256 |
| convnext_large.fb_in22k_ft_in1k_384 | 87.476 | 98.382 | 384 | 197.77 | 101.1 | 126.74 | 194.66 | 256 |
| convnext_large_mlp.clip_laion2b_augreg_ft_in1k | 87.344 | 98.218 | 256 | 200.13 | 44.94 | 56.33 | 438.08 | 256 |
| convnextv2_large.fcmae_ft_in22k_in1k | 87.26 | 98.248 | 224 | 197.96 | 34.4 | 43.13 | 376.84 | 256 |
| convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 | 87.138 | 98.212 | 384 | 88.59 | 45.21 | 84.49 | 365.47 | 256 |
| convnext_xlarge.fb_in22k_ft_in1k | 87.002 | 98.208 | 224 | 350.2 | 60.98 | 57.5 | 368.01 | 256 |
| convnext_base.fb_in22k_ft_in1k_384 | 86.796 | 98.264 | 384 | 88.59 | 45.21 | 84.49 | 366.54 | 256 |
| convnextv2_base.fcmae_ft_in22k_in1k | 86.74 | 98.022 | 224 | 88.72 | 15.38 | 28.75 | 624.23 | 256 |
| convnext_large.fb_in22k_ft_in1k | 86.636 | 98.028 | 224 | 197.77 | 34.4 | 43.13 | 581.43 | 256 |
| convnext_base.clip_laiona_augreg_ft_in1k_384 | 86.504 | 97.97 | 384 |
正在翻译中,请稍候...
标签
dataset:imagenet-1k
arxiv:2301.00808
license:cc-by-nc-4.0
region:us