llava-1.5-7b-hf

llava-hf image-text-to-text transformers en

llava-hf/llava-1.5-7b-hf

2,848,725

下载量

359

收藏数

32

浏览量

llama2

许可

简介

以下是Llava 7b模型的模型卡，该内容复制自原始Llava模型卡，您可在此处找到。

模型卡片

许可协议 llama2

语言

en

任务 image-text-to-text

数据集

LLaVA-Instruct-150K

vision image-text-to-text

模型配置

模型类型 llava

架构 LlavaForConditionalGeneration

模型详情

已翻译

LLaVA 模型卡片

image/png

以下是 Llava 7b 模型的模型卡片，内容复制自原始 Llava 模型卡片，您可以在此处找到。

同时可以查看 Google Colab 演示，在免费版 Google Colab 实例上运行 Llava：

或者查看我们的 Spaces 演示！

模型详情

模型类型：
LLaVA 是一个开源聊天机器人，通过在 GPT 生成的多模态指令遵循数据上微调 LLaMA/Vicuna 训练而成。
它是一个基于 transformer 架构的自回归语言模型。

模型日期：
LLaVA-v1.5-7B 于 2023 年 9 月训练完成。

论文或更多资源：
https://llava-vl.github.io/

如何使用模型

首先，确保已安装 transformers >= 4.35.3。
该模型支持多图像和多提示生成。这意味着您可以在提示中传递多张图像。同时请确保遵循正确的提示模板（USER: xxx\nASSISTANT:），并在您想要查询图像的位置添加 token ``：

使用 `pipeline`：

下面我们使用了 "llava-hf/llava-1.5-7b-hf" 检查点。

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-1.5-7b-hf")
messages = [
    {
      "role": "user",
      "content": [
          {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
          {"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
        ],
    },
]

out = pipe(text=messages, max_new_tokens=20)
print(out)
>>> [{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}], 'generated_text': 'Lava'}]

使用纯 `transformers`：

以下是在 GPU 设备上以 float16 精度运行生成的示例脚本：

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "llava-hf/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)

processor = AutoProcessor.from_pretrained(model_id)

# Define a chat history and use `apply_chat_template` to get correctly formatted prompt
# Each value in "content" has to be a list of dicts with types ("text", "image") 
conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": "What are these?"},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

从 transformers>=v4.48 开始，您还可以将图像 URL 或本地路径传递给对话历史，让 chat template 处理其余部分。
Chat template 会为您加载图像，并返回 torch.Tensor 格式的输入，您可以直接传递给 model.generate()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://www.ilankelman.org/stopsigns/australia.jpg"}
            {"type": "text", "text": "What is shown in this image?"},
        ],
    },
]

inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors"pt")
output = model.generate(**inputs, max_new_tokens=50)

模型优化

通过 `bitsandbytes` 库进行 4-bit 量化

首先确保安装 bitsandbytes：pip install bitsandbytes，并确保拥有兼容 CUDA 的 GPU 设备。只需将上述代码片段替换为：

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True,
+   load_in_4bit=True
)

使用 Flash-Attention 2 进一步加速生成

首先确保安装 flash-attn。关于该包的安装，请参考 Flash Attention 的原始仓库。只需将上述代码片段替换为：

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True,
+   use_flash_attention_2=True
).to(0)

llava-1.5-7b-hf

简介

模型卡片

模型配置

模型详情

LLaVA 模型卡片

模型详情

如何使用模型

使用 `pipeline`：

使用纯 `transformers`：

模型优化

通过 `bitsandbytes` 库进行 4-bit 量化

使用 Flash-Attention 2 进一步加速生成

许可证

标签

操作

详细信息

llava-1.5-7b-hf

简介

模型卡片

模型配置

模型详情

LLaVA 模型卡片

模型详情

如何使用模型

使用 pipeline：

使用纯 transformers：

模型优化

通过 bitsandbytes 库进行 4-bit 量化

使用 Flash-Attention 2 进一步加速生成

许可证

标签

操作

详细信息

使用 `pipeline`：

使用纯 `transformers`：

通过 `bitsandbytes` 库进行 4-bit 量化