paraphrase-multilingual-mpnet-base-v2
sentence-transformers
sentence-similarity
sentence-transformers
multilingual
ar
bg
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
6,055,304
下载量
460
收藏数
20
浏览量
apache-2.0
许可
简介
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
模型卡片
许可协议
apache-2.0
语言
multilingual
ar
bg
ca
cs
da
de
el
en
es
et
fa
fi
fr
gl
gu
he
hi
hr
hu
hy
id
it
ja
ka
ko
ku
lt
lv
mk
mn
mr
ms
my
nb
nl
pl
pt
ro
ru
sk
sl
sq
sr
sv
th
tr
uk
ur
vi
框架
sentence-transformers
任务
sentence-similarity
sentence-transformers
feature-extraction
sentence-similarity
transformers
text-embeddings-inference
模型配置
模型类型
xlm-roberta
架构
XLMRobertaModel
模型详情
已翻译sentence-transformers/paraphrase-multilingual-mpnet-base-v2
这是一个 sentence-transformers 模型:它将句子和段落映射到 768 维的密集向量空间,可用于聚类或语义搜索等任务。
使用方法(Sentence-Transformers)
安装 sentence-transformers 后,使用该模型变得非常简单:
pip install -U sentence-transformers
然后你可以像这样使用模型:
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
embeddings = model.encode(sentences)
print(embeddings)
使用方法(HuggingFace Transformers)
如果没有 sentence-transformers,你可以像这样使用模型:首先,将输入传入 transformer 模型,然后在上下文相关的 word embedding 之上应用正确的池化操作。
from transformers import AutoTokenizer, AutoModel
import torch
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
model = AutoModel.from_pretrained('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
使用方法(Text Embeddings Inference (TEI))
Text Embeddings Inference (TEI) 是一个用于文本 embedding 模型的极速推理解决方案。
- CPU:
docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id sentence-transformers/paraphrase-multilingual-mpnet-base-v2 --pooling mean --dtype float16
- NVIDIA GPU:
docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id sentence-transformers/paraphrase-multilingual-mpnet-base-v2 --pooling mean --dtype float16
向 /v1/embeddings 发送请求,通过 OpenAI Embeddings API 生成 embedding:
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
"input": "This is an example sentence"
}'
或者查看 Text Embeddings Inference API 规范。
完整模型架构
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)
引用与作者
该模型由 sentence-transformers 训练。
如果你觉得这个模型有帮助,欢迎引用我们的论文 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "http://arxiv.org/abs/1908.10084",
}
正在翻译中,请稍候...
标签
tf
onnx
openvino
xlm-roberta
feature-extraction
text-embeddings-inference
multilingual
ar
操作
详细信息
- 厂商
- sentence-transformers
- 任务
- sentence-similarity
- 框架
- sentence-transformers
- 模型类型
- xlm-roberta
- 许可(HF)
- apache-2.0
- 语言
- multilingual, ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi