AI 模型库
gemma-3-4b-it
image-text-to-textimage-text-to-text
wav2vec2-base
该基础模型在16kHz采样的语音音频上进行了预训练。使用该模型时,请确保您的语音输入也以16kHz进行采样。
dinov2-small
image-feature-extractionVision Transformer (small-sized model) trained using DINOv2
m2m100_1.2B
M2M100是一个多语言编码器-解码器(序列到序列)模型,专为多对多多语言翻译任务而训练。 该模型首次在本文中提出,并在此仓库中首次发布。
Llama-3.2-1B-Instruct-FP8-dynamic
text-generation模型概述 - **模型架构:** Meta-Llama-3.2 - **输入:** 文本 - **输出:** 文本 - **模型优化:** - **权重量化:** FP8 - **激活量化:** FP8 - **预期用途:** 适用于多语言的商业和研究用途。与Lla类似
flan-t5-base
0. TL;DR 1. Model Details 2. Usage 3. Uses 4. Bias, Risks, and Limitations 5. Training Details 6. Evaluation 7. Environmental Impact 8. Citation 9. Model Card Authors
siglip-so400m-patch14-384
zero-shot-image-classificationSigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.
Qwen3-Embedding-4B
feature-extractionThe Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen
stable-diffusion-xl-base-1.0
text-to-imageSDXL包含一个用于潜在扩散的专家集成管道: 首先,基础模型用于生成(带噪声的)潜在表示, 随后通过一个专门用于优化的精炼模型(下载地址:https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/)进一步处理这些潜在表示。
Qwen2.5-14B-Instruct-AWQ
text-generationQwen2.5是Qwen大语言模型的最新系列。针对Qwen2.5,我们发布了一系列基础语言模型和指令微调语言模型,参数量从0.5亿到720亿不等。相较于Qwen2,Qwen2.5带来了以下改进:
chronos-bolt-tiny
time-series-forecasting🚀 **Update Feb 14, 2025**: Chronos-Bolt models are now available on Amazon SageMaker JumpStart! Check out the tutorial notebook to learn how to deploy Chronos endpoints for production use in a few lin
gpt2-large
text-generation目录 - 模型详情 - 模型入门指南 - 用途 - 风险、局限性与偏见 - 训练 - 评估 - 环境影响 - 技术规格 - 引用信息 - 模型卡片作者
inclusively-reformulation-it5
该模型是一个意大利语序列到序列模型,基于IT5-large针对包容性语言改写任务进行了微调。
vitpose-plus-base
keypoint-detectionViTPose: Simple Vision Transformer Baselines for Human Pose Estimation and ViTPose+: Vision Transformer Foundation Model for Generic Body Pose Estimation. It obtains 81.1 AP on MS COCO Keypoint test-d
Qwen3-ASR-1.7B
automatic-speech-recognitionThe Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects. Both leverage large-scale speech training data and the str
Depth-Anything-V2-Small-hf
depth-estimationDepth Anything V2 基于 59.5 万张合成标注图像和 6200 万张以上真实未标注图像训练而成,提供了能力最强的单目深度估计(MDE)模型,具有以下特点: - 比 Depth Anything V1 更精细的细节 - 比 Depth Anything V1 及基于 SD 的模型更鲁棒
pythia-70m-deduped
text-generation*Pythia Scaling Suite* 是一组为促进可解释性研究而开发的模型集合(详见论文)。该套件包含两组共八个模型,参数量分别为70M、160M、410M、1B、1.4B、2.8B、6.9B和12B。每个参数量对应两个模型:一个基于Pile数据集训练,另一个基于P
Qwen3.5-2B
image-text-to-text> [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Tra
e5-base-v2
sentence-similarity弱监督对比预训练的文本嵌入。 梁旺、杨楠、黄晓龙、焦斌星、杨林军、江大新、Rangan Majumder、韦福如,arXiv 2022
DeepSeek-V4-Pro
text-generationDeepSeek-V4:迈向高效百万级Token上下文智能