AI 模型库
Qwen3-1.7B
text-generationQwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groun
Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking_GGUF
text-generation--- license: gemma language: - en base_model: - google/gemma-3-1b-it tags: - uncensored - text-generation - reasoning - instruction-tuned - lightweight --- Gemma 3 – 1B IT GLM-4.7 Flash
Meta-Llama-3-8B
text-generationtext-generation
pythia-160m
text-generation*Pythia Scaling Suite* 是一组为促进可解释性研究而开发的模型集合(详见论文)。该套件包含两组共八个模型,参数量分别为70M、160M、410M、1B、1.4B、2.8B、6.9B和12B。每个参数量对应两个模型:一个在Pile数据集上训练,另一个在P
Mistral-7B-Instruct-v0.2
text-generation使用 `mistral_common` 进行编码和解码 ```py from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_comm
distilgpt2
text-generationDistilGPT2(Distilled-GPT2的简称)是一个在生成式预训练Transformer 2(GPT-2)最小版本监督下预训练的英语语言模型。与GPT-2类似,DistilGPT2可用于文本生成。本模型卡的用户还应考虑关于设计的相关信息
tiny-random-LlamaForCausalLM
text-generation<!-- Provide a quick summary of what the model is/does. -->
TinyLlama-1.1B-Chat-v1.0
text-generationTinyLlama项目旨在**在3万亿个token上预训练一个11亿参数的Llama模型**。通过适当的优化,我们仅需使用16块A100-40G GPU,就能在"短短"90天内完成这一目标🚀🚀。训练已于2023年9月1日开始。
Qwen3-Coder-30B-A3B-Instruct
text-generation**Qwen3-Coder** is available in multiple sizes. Today, we're excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring th
Qwen3-14B
text-generationQwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groun
Qwen2.5-14B-Instruct
text-generationQwen2.5是Qwen大语言模型的最新系列。针对Qwen2.5,我们发布了多个基础语言模型和指令微调语言模型,参数量从0.5亿到720亿不等。相较于Qwen2,Qwen2.5带来了以下改进:
Qwen3Guard-Gen-0.6B
text-generation**Qwen3Guard** is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.
Llama-3.2-3B-Instruct
text-generationtext-generation
Qwen2.5-Coder-7B-Instruct
text-generationQwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 bil
Qwen3-0.6B-FP8
text-generationQwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groun
Gemma-4-31B-IT-NVFP4
text-generation描述: Gemma 4 31B IT 是由 Google DeepMind 构建的开放多模态模型,支持文本和图像输入,能够将视频作为帧序列进行处理,并生成文本输出。该模型旨在为推理、智能体工作流、编程和多模态理解提供前沿性能。
Qwen2.5-0.5B
text-generationQwen2.5是Qwen大语言模型的最新系列。针对Qwen2.5,我们发布了一系列基础语言模型和指令微调语言模型,参数规模从0.5亿到720亿不等。相较于Qwen2,Qwen2.5带来了以下改进:
Llama-3.2-1B-Instruct-FP8-dynamic
text-generation模型概述 - **模型架构:** Meta-Llama-3.2 - **输入:** 文本 - **输出:** 文本 - **模型优化:** - **权重量化:** FP8 - **激活量化:** FP8 - **预期用途:** 适用于多语言的商业和研究用途。与Lla类似
Qwen2.5-14B-Instruct-AWQ
text-generationQwen2.5是Qwen大语言模型的最新系列。针对Qwen2.5,我们发布了一系列基础语言模型和指令微调语言模型,参数量从0.5亿到720亿不等。相较于Qwen2,Qwen2.5带来了以下改进:
gpt2-large
text-generation目录 - 模型详情 - 模型入门指南 - 用途 - 风险、局限性与偏见 - 训练 - 评估 - 环境影响 - 技术规格 - 引用信息 - 模型卡片作者