Qwen3-Embedding-0.6B

Qwen feature-extraction sentence-transformers

Qwen/Qwen3-Embedding-0.6B

5,778,172

下载量

1008

收藏数

43

浏览量

apache-2.0

许可

简介

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents sign

模型卡片

许可协议 apache-2.0

transformers sentence-transformers sentence-similarity feature-extraction text-embeddings-inference

模型配置

模型类型 qwen3

架构 Qwen3ForCausalLM

模型详情

已翻译

Qwen3-Embedding-0.6B

亮点

Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型，专为文本 embedding 和排序任务设计。基于 Qwen3 系列的密集基础模型，它提供了多种尺寸（0.6B、4B 和 8B）的全面文本 embedding 和重排序模型。该系列继承了其基础模型卓越的多语言能力、长文本理解能力和推理能力。Qwen3 Embedding 系列在多项文本 embedding 和排序任务上取得了显著进步，包括文本检索、代码检索、文本分类、文本聚类和双语文本挖掘。

卓越的多功能性：该 embedding 模型在广泛的 downstream 应用评估中达到了最先进的性能。8B 尺寸的 embedding 模型在 MTEB 多语言排行榜中排名第一（截至 2025 年 6 月 5 日，得分 70.58），而重排序模型在各种文本检索场景中表现出色。

全面的灵活性：Qwen3 Embedding 系列为 embedding 和重排序模型提供了完整的尺寸范围（从 0.6B 到 8B），满足优先考虑效率和效果的各种用例。开发者可以无缝组合这两个模块。此外，embedding 模型允许在所有维度上灵活定义向量，并且 embedding 和重排序模型都支持用户自定义指令，以增强特定任务、语言或场景的性能。

多语言能力：得益于 Qwen3 模型的多语言能力，Qwen3 Embedding 系列支持超过 100 种语言。这包括各种编程语言，并提供强大的多语言、跨语言和代码检索能力。

模型概述

Qwen3-Embedding-0.6B 具有以下特点：

模型类型：文本 Embedding
支持语言：100+ 种语言
参数量：0.6B
上下文长度：32k
Embedding 维度：最高 1024，支持用户自定义输出维度，范围从 32 到 1024

更多详情，包括基准评估、硬件要求和推理性能，请参考我们的博客、GitHub。

Qwen3 Embedding 系列模型列表

模型类型	模型	大小	层数	序列长度	Embedding 维度	MRL 支持	指令感知
文本 Embedding	Qwen3-Embedding-0.6B	0.6B	28	32K	1024	是	是
文本 Embedding	Qwen3-Embedding-4B	4B	36	32K	2560	是	是
文本 Embedding	Qwen3-Embedding-8B	8B	36	32K	4096	是	是
文本重排序	Qwen3-Reranker-0.6B	0.6B	28	32K	-	-	是
文本重排序	Qwen3-Reranker-4B	4B	36	32K	-	-	是
文本重排序	Qwen3-Reranker-8B	8B	36	32K	-	-	是

注意：
- MRL 支持 表示 embedding 模型是否支持最终 embedding 的自定义维度。
- 指令感知 表示 embedding 或重排序模型是否支持根据不同的任务自定义输入指令。
- 我们的评估表明，对于大多数 downstream 任务，使用指令（instruct）通常比不使用指令能带来 1% 到 5% 的提升。因此，我们建议开发者针对其特定任务和场景创建量身定制的指令。在多语言环境中，我们还建议用户用英文编写指令，因为模型训练过程中使用的大多数指令最初都是用英文编写的。

使用方法

使用版本低于 4.51.0 的 Transformers 时，您可能会遇到以下错误：

KeyError: 'qwen3'

Sentence Transformers 使用方法

# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7646, 0.1414],
#         [0.1355, 0.6000]])

Transformers 使用方法

# Requires transformers>=4.51.0

import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel

def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')

# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]

vLLM 使用方法

# Requires vllm>=0.8.5
import torch
import vllm
from vllm import LLM

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")

outputs = model.embed(input_texts)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]

📌 提示：我们建议开发者根据其具体场景、任务和语言自定义 instruct。我们的测试已

Qwen3-Embedding-0.6B

简介

模型卡片

模型配置

模型详情

Qwen3-Embedding-0.6B

亮点

模型概述

Qwen3 Embedding 系列模型列表

使用方法

Sentence Transformers 使用方法

Transformers 使用方法

vLLM 使用方法

标签

操作

详细信息