AI 模型库

共个模型

排序方式

厂商

全部 01.AI Alibaba Alibaba-NLP Andycurrent Anthropic BAAI Bingsu Cohere Comfy-Org DeepSeek E-MIMIC EleutherAI FacebookAI Falconsai FinLang Google Kijai MahmoudAshraf Marqo Meta Midjourney Mistral AI NeoQuasar OpenAI Perplexity AI ProsusAI Qwen RedHatAI ResembleAI Salesforce Stability AI TinyLlama TostAI Xenova amazon apple argmaxinc autogluon cardiffnlp colbert-ir coqui cross-encoder cyankiwi daekeun-ml deepseek-ai depth-anything dima806 distilbert docling-project dphn emilyalsentzer facebook google google-bert google-t5 hexgrad hmellor intfloat jinaai jonatasgrosman k2-fsa laion llava-hf lpiccinelli meta-llama microsoft mistralai mixedbread-ai neuralmind nomic-ai nvidia openai openai-community patrickjohncyh prajjwal1 pyannote rhasspy sentence-transformers speechbrain stabilityai timm trl-internal-testing unsloth usyd-community vikhyatk xAI zai-org 北京智源研究院商汤科技字节跳动智谱AI 月之暗面百川智能百度科大讯飞稀宇科技腾讯阶跃星辰阿里巴巴

任务类型

全部文本生成 42 图文理解 27 句子相似度 18 完形填空 12 特征提取 12 语音识别 11 时序预测 8 图像分类 7 零样本图像分类 6 文本分类 6 语音合成 4 图像特征提取 4 文本排序 3 语音活动检测 2 翻译 2 图生文 2 多模态 2 零样本分类 1 文生图 1 目标检测 1 掩码生成 1 关键点检测 1 图像转3D 1 深度估计 1 音频分类 1

下载量收藏数最新

fairface_age_image_detection

image-classification

dima806 · dima806/fairface_age_image_detection

Detects age group with about 59% accuracy based on an image.

6,233,949 73 transformers

Qwen3.5-4B

image-text-to-text

Qwen · Qwen/Qwen3.5-4B

> [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Tra

6,181,955 525 transformers

DFN5B-CLIP-ViT-H-14-378

apple · apple/DFN5B-CLIP-ViT-H-14-378

一个在DFN-5B上训练的CLIP（对比语言-图像预训练）模型。数据过滤网络（DFN）是用于自动过滤大规模未整理数据池的小型网络。该模型基于从430亿未整理图文对（128亿图像）中筛选出的50亿张图像进行训练。

6,071,841 109 open_clip

paraphrase-multilingual-mpnet-base-v2

sentence-similarity

sentence-transformers · sentence-transformers/paraphrase-multilingual-mpnet-base-v2

sentence-transformers/paraphrase-multilingual-mpnet-base-v2

6,055,304 460 sentence-transformers

Qwen3-32B

text-generation

Qwen · Qwen/Qwen3-32B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groun

5,968,332 692 transformers

chronos-2

time-series-forecasting

autogluon · autogluon/chronos-2

Chronos-2 **Chronos-2** is a 120M-parameter, encoder-only time series foundation model for zero-shot forecasting. It supports **univariate**, **multivariate**, and **covariate-informed** tasks within

5,898,361 14 chronos-forecasting

Wan_2.2_ComfyUI_Repackaged

Comfy-Org · Comfy-Org/Wan_2.2_ComfyUI_Repackaged

示例：https://comfyanonymous.github.io/ComfyUI_examples/wan22/

5,887,459 705 diffusion-single-file

Qwen2.5-0.5B-Instruct

text-generation

Qwen · Qwen/Qwen2.5-0.5B-Instruct

Qwen2.5是Qwen大语言模型的最新系列。针对Qwen2.5，我们发布了一系列基础语言模型和指令微调语言模型，参数量从0.5亿到720亿不等。与Qwen2相比，Qwen2.5带来了以下改进：

5,858,017 510 transformers

Qwen3-Embedding-0.6B

feature-extraction

Qwen · Qwen/Qwen3-Embedding-0.6B

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen

5,778,172 1008 sentence-transformers

WanVideo_comfy

Kijai · Kijai/WanVideo_comfy

WanVideo的合并与量化模型，源自此处：

5,761,123 2304 diffusion-single-file

gemma-4-E4B-it

any-to-any

google · google/gemma-4-E4B-it

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind

5,585,425 971 transformers

Qwen3-VL-8B-Instruct

image-text-to-text

Qwen · Qwen/Qwen3-VL-8B-Instruct

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.

5,445,377 898 transformers

whisper-large-v3

automatic-speech-recognition

openai · openai/whisper-large-v3

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et

4,998,671 5669 transformers

segmentation

voice-activity-detection

pyannote · pyannote/segmentation

语音活动检测

4,834,943 676 pyannote-audio

vit-base-patch16-224

image-classification

google · google/vit-base-patch16-224

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 22

4,780,326 958 transformers