Qwen3-Coder-30B-A3B-Instruct

Qwen text-generation transformers

Qwen/Qwen3-Coder-30B-A3B-Instruct

2,677,340

下载量

1054

收藏数

7

浏览量

apache-2.0

许可

简介

**Qwen3-Coder** is available in multiple sizes. Today, we're excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:

模型卡片

许可协议 apache-2.0

框架 transformers

任务 text-generation

模型配置

模型类型 qwen3_moe

架构 Qwen3MoeForCausalLM

模型详情

已翻译

Qwen3-Coder-30B-A3B-Instruct

亮点

Qwen3-Coder 提供多种尺寸版本。今天，我们很高兴推出 Qwen3-Coder-30B-A3B-Instruct。这款精简模型保持了出色的性能和效率，具备以下关键增强特性：

在 Agentic Coding、Agentic Browser-Use 及其他基础编码任务中，在开源模型中表现 显著优异。
长上下文能力，原生支持 256K tokens，使用 Yarn 可扩展至 1M tokens，针对仓库级理解进行了优化。
Agentic Coding 支持 Qwen Code、CLINE 等大多数平台，并配备专门设计的函数调用格式。

image/jpeg

模型概述

Qwen3-Coder-30B-A3B-Instruct 具有以下特性：
- 类型：因果语言模型
- 训练阶段：预训练与后训练
- 参数数量：总计 30.5B，激活参数 3.3B
- 层数：48
- 注意力头数（GQA）：Q 为 32，KV 为 4
- 专家数量：128
- 激活专家数量：8
- 上下文长度：原生支持 262,144。

注意：该模型仅支持非思考模式，输出中不会生成 ``` 块。同时，不再需要指定enable_thinking=False`。

更多详情，包括基准评估、硬件要求和推理性能，请参阅我们的博客、GitHub 和文档。

快速开始

我们建议您使用最新版本的 transformers。

使用 `transformers dict:
return num ** 2

Define Tools

tools=[
{
"type":"function",
"function":{
"name": "square_the_number",
"description": "output the square of the number.",
"parameters": {
"type": "object",
"required": ["input_num"],
"properties": {
'input_num': {
'type': 'number',
'description': 'input_num is a number that will be squared'
}
},
}
}
}
]

import OpenAI

Define LLM

client = OpenAI(
# Use a custom endpoint compatible with OpenAI API
base_url='http://localhost:8000/v1', # api_base
api_key="EMPTY"
)

messages = [{'role': 'user', 'content': 'square the number 1024'}]

completion = client.chat.completions.create(
messages=messages,
model="Qwen3-Coder-30B-A3B-Instruct",
max_tokens=65536,
tools=tools,
)

print(completion.choice[0])

## 最佳实践

为获得最佳性能，我们建议采用以下设置：

1. **采样参数**：
   - 建议使用 `temperature=0.7`、`top_p=0.8`、`top_k=20`、`repetition_penalty=1.05`。

2. **足够的输出长度**：对于大多数查询，建议使用 65,536 tokens 的输出长度，这对指令模型来说足够。

### 引用

如果您觉得我们的工作有帮助，欢迎引用我们。

@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
```

Qwen3-Coder-30B-A3B-Instruct

简介

模型卡片

模型配置

模型详情

Qwen3-Coder-30B-A3B-Instruct

亮点

模型概述

快速开始

Define Tools

Define LLM

标签

操作

详细信息