Qwen3.6-35B-A3B-FP8

Qwen image-text-to-text transformers

Qwen/Qwen3.6-35B-A3B-FP8

3,489,178

下载量

211

收藏数

51

浏览量

apache-2.0

许可

简介

> [!Note] > This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. > > The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.

模型卡片

许可协议 apache-2.0

框架 transformers

任务 image-text-to-text

模型配置

模型类型 qwen3_5_moe

架构 Qwen3_5MoeForConditionalGeneration

模型详情

已翻译

Qwen3.6-35B-A3B-FP8

[!Note]
本仓库包含 FP8 量化后的模型权重和配置文件，适用于后训练模型，格式为 Hugging Face Transformers。

这些产物与 Hugging Face Transformers、vLLM、SGLang、KTransformers 等框架兼容。

量化方法为细粒度 fp8 量化，block size 为 128，其性能指标与原模型几乎一致。

继二月份发布 Qwen3.5 系列之后，我们很高兴地分享 Qwen3.6 的首个开放权重版本。基于社区的直接反馈，Qwen3.6 优先考虑了稳定性和实际应用价值，为开发者提供了更直观、响应更迅速且真正高效的编码体验。

Qwen3.6 亮点

本次发布带来了重大升级，尤其在以下方面：

Agentic Coding（智能体编程）： 模型现在能够更流畅、更精准地处理前端工作流和仓库级推理任务。
思维保留（Thinking Preservation）： 我们引入了一项新选项，用于保留历史消息中的推理上下文，从而简化迭代开发并降低开销。

Benchmark Results

更多详情，请参阅我们的博客文章 Qwen3.6-35B-A3B。

模型概述

类型： 带视觉编码器的因果语言模型（Causal Language Model with Vision Encoder）
训练阶段： 预训练 & 后训练
语言模型
- 参数数量：总计 35B，激活 3B
- 隐藏层维度：2048
- Token Embedding：248320（已填充）
- 层数：40
- 隐藏层布局：10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
- Gated DeltaNet：
  - 线性注意力头数：V 为 32，QK 为 16
  - 头维度：128
- Gated Attention：
  - 注意力头数：Q 为 16，KV 为 2
  - 头维度：256
  - 旋转位置编码维度：64
- 混合专家模型（Mixture Of Experts）
  - 专家数量：256
  - 激活专家数量：8 个路由专家 + 1 个共享专家
  - 专家中间维度：512
- LM 输出：248320（已填充）
- MTP：经过多步训练
上下文长度： 原生 262,144，可扩展至 1,010,000 个 token。

基准测试结果

语言能力

Qwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B

编码智能体（Coding Agent）

SWE-bench Verified
75.0
52.0
70.0
17.4
73.4

SWE-bench Multilingual
69.3
51.7
60.3
17.4
73.4

Qwen3.6-35B-A3B-FP8

简介

模型卡片

模型配置

模型详情

Qwen3.6-35B-A3B-FP8

Qwen3.6 亮点

模型概述

基准测试结果

语言能力

标签

操作

详细信息