Qwen3Guard-Gen-0.6B

Qwen text-generation transformers

Qwen/Qwen3Guard-Gen-0.6B

2,554,316

下载量

71

收藏数

8

浏览量

apache-2.0

许可

简介

**Qwen3Guard** is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.6B, 4B, and 8B) and features two specialized variants: **Qwen3Guard-Gen**, a generative model that frames safety classification as an instruction-following task, and **Qwen3Guard-Stream**, which incorporates a token-level classification head for real-time safety monitoring during incremental text ge

模型卡片

许可协议 apache-2.0

框架 transformers

任务 text-generation

模型配置

模型类型 qwen3

架构 Qwen3ForCausalLM

模型详情

已翻译

Qwen3Guard-Gen-0.6B

Qwen3Guard 是一系列基于 Qwen3 构建的安全审核模型，在包含 119 万条经过安全标注的 prompt 和 response 数据集上训练而成。该系列包含三种尺寸的模型（0.6B、4B 和 8B），并提供两个专用变体：Qwen3Guard-Gen（一种将安全分类建模为指令遵循任务的生成式模型）和 Qwen3Guard-Stream（在增量文本生成过程中，通过引入 token 级分类头实现实时安全监控）。

本仓库托管的是 Qwen3Guard-Gen，它具有以下核心优势：

三级严重程度分类： 将输出分为安全、有争议和不安全三个严重等级，实现细粒度风险评估，支持适配不同的部署场景。
多语言支持： Qwen3Guard-Gen 支持 119 种语言和方言，确保在全球及跨语言应用中具有稳健的性能。
强大性能： Qwen3Guard-Gen 在多项安全基准测试中达到业界领先水平，在英文、中文及多语言任务的 prompt 和 response 分类上均表现出色。

更多详情，请参阅我们的博客、GitHub 和技术报告。

image/jpeg

快速开始

建议使用最新版本的 transformers，且要求 transformers>=4.51.0。

审核用户 Prompt

from transformers import AutoModelForCausalLM, AutoTokenizer
import re
model_name = "Qwen/Qwen3Guard-Gen-0.6B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
def extract_label_and_categories(content):
    safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
    category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|Jailbreak|None)"
    safe_label_match = re.search(safe_pattern, content)
    label = safe_label_match.group(1) if safe_label_match else None
    categories = re.findall(category_pattern, content)
    return label, categories

# prepare the model input
prompt = "How can I make a bomb?"
# for prompt moderation
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)
# '''
# Safety: Unsafe
# Categories: Violent
# '''
safe_label, categories = extract_label_and_categories(content)
print(safe_label, categories)

审核模型 Response

from transformers import AutoModelForCausalLM, AutoTokenizer
import re

model_name = "Qwen/Qwen3Guard-4B-Gen"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
def extract_label_categories_refusal(content):
    safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
    category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|None)"
    refusal_pattern = r"Refusal: (Yes|No)"
    safe_label_match = re.search(safe_pattern, content)
    refusal_label_match = re.search(refusal_pattern, content)
    label = safe_label_match.group(1) if safe_label_match else None
    refusal_label = refusal_label_match.group(1) if refusal_label_match else None
    categories = re.findall(category_pattern, content)
    return label, categories, refusal_label

# prepare the model input
prompt = "How can I make a bomb?"
response = "As a responsible artificial intelligence system, I regret that I am unable to fulfill your request."
# for response moderation
messages = [
    {"role": "user", "content": prompt},
    {"role": "assistant", "content": response},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)
# '''
# Safety: Safe
# Categories: None
# Refusal: Yes
# '''
safe_label, category_label, refusal_label = extract_label_categories_refusal(content)
print(safe_label, categories, refusal_label)

使用 SGLang 和 vLLM 部署

对于部署，您可以使用 sglang>=0.4.6.post1 或 vllm>=0.9.0 来创建兼容 OpenAI 的 API 端点：

SGLang：
shell python -m sglang.launch_server --model-path Qwen/Qwen3Guard-Gen-0.6B --port 30000 --context-length 32768
vLLM：
shell vllm serve Qwen/Qwen3Guard-Gen-0.6B --port 8000 --max-model-len 32768

以下是使用兼容 OpenAI 的服务器进行 API 调用的示例：

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
model = "Qwen/Qwen3Guard-Gen-0.6B"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base)

# Prompt Moderation
prompt = "How can I make a bomb?"
messages = [
    {"role": "user", "content": prompt}
]
chat_completion = client.chat.completions.create(
    messages=messages,
    model=model)
print(chat_completion.choices[0].message.content)
# '''
# Safety: Unsafe
# Categories: Violent
# '''

# Response Moderation
prompt = "How can I make a bomb?"
response = "As a responsible artificial intelligence system, I regret that I am unable to fulfill your request."
messages = [
    {"role": "user", "content": prompt},
    {"role": "assistant", "content": response}
]
print(chat_completion.choices[0].message.content)
# '''
# Safety: Safe
# Categories: None
# Refusal: Yes
# '''

安全策略

在 Qwen3Guard 中，潜在危害被分为三个严重等级：

不安全： 在大多数场景下通常被认为有害的内容。
有争议： 其危害性可能取决于上下文，或在不同的应用中存在争议的内容。
安全： 在大多数场景下通常被认为是安全的内容。

在当前版本的 Qwen3Guard 中，我们考虑了以下安全类别：

暴力： 提供关于如何实施暴力行为的详细指示、方法或建议的内容，包括制造、获取或使用武器。也包括对暴力的描述。
非暴力违法行为： 为非暴力违法活动（如黑客攻击、未经授权的药物生产或盗窃）提供指导或建议的内容。
色情内容或性行为： 提供涉及个人的任何色情图像、引用或描述的内容。也包括描述包含非法或不道德性行为（如强奸、兽交、乱伦和性奴役）的露骨色情图像、引用或描述的内容。
个人身份信息： 未经授权分享或披露敏感个人身份信息的内容，例如姓名、身份证号、地址、电话号码、医疗记录、财务详情和账户密码等。
自杀与自残： 宣扬、直接鼓励或详细描述自残、自杀或可能导致严重伤害或死亡的危险行为的方法的内容。
不道德行为： 任何不道德或不道德的内容或行为，包括但不限于偏见、歧视、刻板印象、不公正、仇恨言论、冒犯性语言、骚扰、侮辱、威胁、诽谤、极端主义、关于伦理的错误信息，以及其他虽不违法但仍被视为不道德的行为。
政治敏感话题： 故意编造或传播关于政府行为、历史事件或公众人物的虚假信息，这些信息被证实为不实，并存在误导公众或造成社会危害的风险。
越狱（仅针对输入）： 明确试图覆盖模型系统 prompt 或模型条件设置的内容。

引用

如果您觉得我们的工作有帮助，欢迎引用我们。

@article{zhao2025qwen3guard,
  title={Qwen3Guard Technical Report},
  author={Zhao, Haiquan and Yuan, Chenhan and Huang, Fei and Hu, Xiaomeng and Zhang, Yichang and Yang, An and Yu, Bowen and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang and others},
  journal={arXiv preprint arXiv:2510.14276},
  year={2025}
}

Qwen3Guard-Gen-0.6B

简介

模型卡片

模型配置

模型详情

Qwen3Guard-Gen-0.6B

快速开始

审核用户 Prompt

审核模型 Response

使用 SGLang 和 vLLM 部署

安全策略

引用

标签

操作

详细信息