测试数据集

Name: 测试数据集
Creator: maas
Published: 2024-05-23 11:25:36
License: 暂无描述

魔搭社区2024-05-23 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/inerrupt/inerrupt_test_01

下载链接

链接失效反馈

官方服务：

资源简介：

# coco2017 Image-text pairs from [MS COCO2017](https://cocodataset.org/#download). ## Data origin * Data originates from [cocodataset.org](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) * While `coco-karpathy` uses a dense format (with several sentences and sendids per row), `coco-karpathy-long` uses a long format with one `sentence` (aka caption) and `sendid` per row. `coco-karpathy-long` uses the first five sentences and therefore is five times as long as `coco-karpathy`. * `phiyodr/coco2017`: One row corresponds one image with several sentences. * `phiyodr/coco2017-long`: One row correspond one sentence (aka caption). There are 5 rows (sometimes more) with the same image details. ## Format ```python DatasetDict({ train: Dataset({ features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'], num_rows: 118287 }) validation: Dataset({ features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'], num_rows: 5000 }) }) ``` ## Usage * Download image data and unzip ```bash cd PATH_TO_IMAGE_FOLDER wget http://images.cocodataset.org/zips/train2017.zip wget http://images.cocodataset.org/zips/val2017.zip #wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip # zip not needed: everything you need is in load_dataset("phiyodr/coco2017") unzip train2017.zip unzip val2017.zip ``` * Load dataset in Python ```python import os from datasets import load_dataset PATH_TO_IMAGE_FOLDER = "COCO2017" def create_full_path(example): """Create full path to image using `base_path` to COCO2017 folder.""" example["image_path"] = os.path.join(PATH_TO_IMAGE_FOLDER, example["file_name"]) return example dataset = load_dataset("phiyodr/coco2017") dataset = dataset.map(create_full_path) ```

# COCO2017 本数据集为源自[MS COCO2017](https://cocodataset.org/#download)的图像-文本对。 ## 数据集来源 * 数据集源自[cocodataset.org](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) * 其中`coco-karpathy`采用稠密格式（每行包含多条语句与`sendid`），`coco-karpathy-long`则采用长格式，每行仅包含一条语句（又称图像标题caption）与`sendid`。`coco-karpathy-long`选取前5条语句，因此其数据规模为`coco-karpathy`的5倍。 * `phiyodr/coco2017`：每行对应一张图像，附带多条语句。 * `phiyodr/coco2017-long`：每行对应一条语句（又称图像标题caption）。同一图像的细节会对应5行（有时更多）。 ## 数据集格式 python DatasetDict({ train: Dataset({ features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'], num_rows: 118287 }) validation: Dataset({ features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'], num_rows: 5000 }) }) ## 使用方法 * 下载图像数据并解压 bash cd PATH_TO_IMAGE_FOLDER wget http://images.cocodataset.org/zips/train2017.zip wget http://images.cocodataset.org/zips/val2017.zip #wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip # 无需下载该注释包：`load_dataset("phiyodr/coco2017")` 已包含所需全部内容 unzip train2017.zip unzip val2017.zip * 在Python中加载数据集 python import os from datasets import load_dataset PATH_TO_IMAGE_FOLDER = "COCO2017" def create_full_path(example): """基于COCO2017数据集文件夹路径生成图像的完整本地路径。""" example["image_path"] = os.path.join(PATH_TO_IMAGE_FOLDER, example["file_name"]) return example dataset = load_dataset("phiyodr/coco2017") dataset = dataset.map(create_full_path)

提供机构：

maas

创建时间：

2024-04-16

搜集汇总

数据集介绍