测试数据集
收藏魔搭社区2024-05-23 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/inerrupt/inerrupt_test_01
下载链接
链接失效反馈官方服务:
资源简介:
# coco2017
Image-text pairs from [MS COCO2017](https://cocodataset.org/#download).
## Data origin
* Data originates from [cocodataset.org](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
* While `coco-karpathy` uses a dense format (with several sentences and sendids per row), `coco-karpathy-long` uses a long format with one `sentence` (aka caption) and `sendid` per row. `coco-karpathy-long` uses the first five sentences and therefore is five times as long as `coco-karpathy`.
* `phiyodr/coco2017`: One row corresponds one image with several sentences.
* `phiyodr/coco2017-long`: One row correspond one sentence (aka caption). There are 5 rows (sometimes more) with the same image details.
## Format
```python
DatasetDict({
train: Dataset({
features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
num_rows: 118287
})
validation: Dataset({
features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
num_rows: 5000
})
})
```
## Usage
* Download image data and unzip
```bash
cd PATH_TO_IMAGE_FOLDER
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
#wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip # zip not needed: everything you need is in load_dataset("phiyodr/coco2017")
unzip train2017.zip
unzip val2017.zip
```
* Load dataset in Python
```python
import os
from datasets import load_dataset
PATH_TO_IMAGE_FOLDER = "COCO2017"
def create_full_path(example):
"""Create full path to image using `base_path` to COCO2017 folder."""
example["image_path"] = os.path.join(PATH_TO_IMAGE_FOLDER, example["file_name"])
return example
dataset = load_dataset("phiyodr/coco2017")
dataset = dataset.map(create_full_path)
```
# COCO2017
本数据集为源自[MS COCO2017](https://cocodataset.org/#download)的图像-文本对。
## 数据集来源
* 数据集源自[cocodataset.org](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
* 其中`coco-karpathy`采用稠密格式(每行包含多条语句与`sendid`),`coco-karpathy-long`则采用长格式,每行仅包含一条语句(又称图像标题caption)与`sendid`。`coco-karpathy-long`选取前5条语句,因此其数据规模为`coco-karpathy`的5倍。
* `phiyodr/coco2017`:每行对应一张图像,附带多条语句。
* `phiyodr/coco2017-long`:每行对应一条语句(又称图像标题caption)。同一图像的细节会对应5行(有时更多)。
## 数据集格式
python
DatasetDict({
train: Dataset({
features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
num_rows: 118287
})
validation: Dataset({
features: ['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'image_id', 'ids', 'captions'],
num_rows: 5000
})
})
## 使用方法
* 下载图像数据并解压
bash
cd PATH_TO_IMAGE_FOLDER
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
#wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip # 无需下载该注释包:`load_dataset("phiyodr/coco2017")` 已包含所需全部内容
unzip train2017.zip
unzip val2017.zip
* 在Python中加载数据集
python
import os
from datasets import load_dataset
PATH_TO_IMAGE_FOLDER = "COCO2017"
def create_full_path(example):
"""基于COCO2017数据集文件夹路径生成图像的完整本地路径。"""
example["image_path"] = os.path.join(PATH_TO_IMAGE_FOLDER, example["file_name"])
return example
dataset = load_dataset("phiyodr/coco2017")
dataset = dataset.map(create_full_path)
提供机构:
maas
创建时间:
2024-04-16
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集名为coco2017,包含来自MS COCO2017的图像与文本配对数据,源自cocodataset.org。它提供两种格式:一种每行关联一张图像和多个描述句子,另一种每行对应一个句子;数据分为训练集和验证集,用于图像处理和文本分析任务。
以上内容由遇见数据集搜集并总结生成



