five

Vividbot/llava-pretrain-vi

收藏
Hugging Face2024-08-11 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/Vividbot/llava-pretrain-vi
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - vi size_categories: - 100K<n<1M --- # Structure Each sample will have a structure as follows: ``` { 'id': Value(dtype='string', id=None), 'images': Value(dtype='binary', id=None), 'conversations': [{'from': Value(dtype='string', id=None), 'value': Value(dtype='string', id=None)}] } { 'id': '004309348', 'image': <image-bytes>, 'conversations': [{'from': 'human', 'value': 'Điều gì được minh họa trong hình ảnh này?\n<image>'}, {'from': 'gpt', 'value': 'bạn nghĩ có bao nhiêu sinh viên ở farbaut sử dụng sản phẩm thuốc lá'}] } ``` # How To Use ## Convert binary objects Because the returned video will be in bytes, here is a way to extract frames and fps: ```python import io import numpy as np from PIL import Image from datasets import load_dataset def extract_image(image_bytes): img = Image.open(io.BytesIO(image_bytes)) arr = np.asarray(img) return arr dataset = load_dataset("Vividbot/instruct500k_vi", name="all", streaming=True) image_bytes = next(iter(dataset["train"]))["image"] image = extract_image(image_bytes) print(f"Image shape: {image.shape}") ```
提供机构:
Vividbot
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作