模型库 / sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

paraphrase-multilingual-MiniLM-L12-v2

sentence-transformers sentence-similarity sentence-transformers multilingual ar bg
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
46,703,265
下载量
1424
收藏数
93
浏览量
apache-2.0
许可

简介

sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

模型卡片

许可协议 apache-2.0
语言
multilingual ar bg ca cs da de el en es et fa fi fr gl gu he hi hr hu hy id it ja ka ko ku lt lv mk mn mr ms my nb nl pl pt ro ru sk sl sq sr sv th tr uk ur vi
框架 sentence-transformers
任务 sentence-similarity
sentence-transformers feature-extraction sentence-similarity transformers

模型配置

模型类型 bert
架构 BertModel

模型详情

已翻译

sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

这是一个 sentence-transformers 模型:它将句子和段落映射到 384 维的稠密向量空间,可用于聚类或语义搜索等任务。

使用方法(Sentence-Transformers)

安装 sentence-transformers 后,使用该模型变得非常简单:

pip install -U sentence-transformers

然后你可以像这样使用模型:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
embeddings = model.encode(sentences)
print(embeddings)

使用方法(HuggingFace Transformers)

如果没有安装 sentence-transformers,你可以这样使用模型:首先,将输入传入 transformer 模型,然后需要在上下文相关的 word embedding 之上应用正确的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
model = AutoModel.from_pretrained('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

完整模型架构

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

引用与作者

该模型由 sentence-transformers 训练完成。

如果你觉得这个模型有帮助,欢迎引用我们的论文 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "http://arxiv.org/abs/1908.10084",
}

标签

tf onnx openvino bert feature-extraction multilingual ar bg

操作


详细信息

厂商
sentence-transformers
任务
sentence-similarity
框架
sentence-transformers
模型类型
bert
许可(HF)
apache-2.0
语言
multilingual, ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi