five

Leyo/TGIF

收藏
Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Leyo/TGIF
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - crowdsourced language: - en license: - other multilinguality: - monolingual pretty_name: TGIF size_categories: - 100K<n<1M source_datasets: - original task_categories: - question-answering - visual-question-answering task_ids: - closed-domain-qa --- # Dataset Card for [Dataset Name] ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** http://raingo.github.io/TGIF-Release/ - **Repository:** https://github.com/raingo/TGIF-Release - **Paper:** https://arxiv.org/abs/1604.02748 - **Point of Contact:** mailto: yli@cs.rochester.edu ### Dataset Summary The Tumblr GIF (TGIF) dataset contains 100K animated GIFs and 120K sentences describing visual content of the animated GIFs. The animated GIFs have been collected from Tumblr, from randomly selected posts published between May and June of 2015. We provide the URLs of animated GIFs in this release. The sentences are collected via crowdsourcing, with a carefully designed annotation interface that ensures high quality dataset. We provide one sentence per animated GIF for the training and validation splits, and three sentences per GIF for the test split. The dataset shall be used to evaluate animated GIF/video description techniques. ### Languages The captions in the dataset are in English. ## Dataset Structure ### Data Fields - `video_path`: `str` "https://31.media.tumblr.com/001a8b092b9752d260ffec73c0bc29cd/tumblr_ndotjhRiX51t8n92fo1_500.gif" -`video_bytes`: `large_bytes` video file in bytes format - `en_global_captions`: `list_str` List of english captions describing the entire video ### Data Splits | |train |validation| test | Overall | |-------------|------:|---------:|------:|------:| |# of GIFs|80,000 |10,708 |11,360 |102,068 | ### Annotations Quoting [TGIF paper](https://arxiv.org/abs/1604.02748): \ "We annotated animated GIFs with natural language descriptions using the crowdsourcing service CrowdFlower. We carefully designed our annotation task with various quality control mechanisms to ensure the sentences are both syntactically and semantically of high quality. A total of 931 workers participated in our annotation task. We allowed workers only from Australia, Canada, New Zealand, UK and USA in an effort to collect fluent descriptions from native English speakers. Figure 2 shows the instructions given to the workers. Each task showed 5 animated GIFs and asked the worker to describe each with one sentence. To promote language style diversity, each worker could rate no more than 800 images (0.7% of our corpus). We paid 0.02 USD per sentence; the entire crowdsourcing cost less than 4K USD. We provide details of our annotation task in the supplementary material." ### Personal and Sensitive Information Nothing specifically mentioned in the paper. ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Licensing Information This dataset is provided to be used for approved non-commercial research purposes. No personally identifying information is available in this dataset. ### Citation Information ```bibtex @InProceedings{tgif-cvpr2016, author = {Li, Yuncheng and Song, Yale and Cao, Liangliang and Tetreault, Joel and Goldberg, Larry and Jaimes, Alejandro and Luo, Jiebo}, title = "{TGIF: A New Dataset and Benchmark on Animated GIF Description}", booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2016} } ``` ### Contributions Thanks to [@leot13](https://github.com/leot13) for adding this dataset.
提供机构:
Leyo
原始信息汇总

数据集卡片 for TGIF

数据集描述

  • 数据集摘要:TGIF数据集包含100K个动画GIF和120K个描述动画GIF视觉内容的句子。这些动画GIF是从Tumblr上随机选择的2015年5月至6月发布的帖子中收集的。我们提供了动画GIF的URL。这些句子是通过众包收集的,使用了一个精心设计的注释界面,确保了数据集的高质量。我们为训练和验证分割提供了每个动画GIF一个句子,为测试分割提供了每个GIF三个句子。该数据集用于评估动画GIF/视频描述技术。
  • 语言:数据集中的描述是英文的。

数据集结构

数据字段

  • video_path: str "https://31.media.tumblr.com/001a8b092b9752d260ffec73c0bc29cd/tumblr_ndotjhRiX51t8n92fo1_500.gif"
  • video_bytes: large_bytes 视频文件的字节格式
  • en_global_captions: list_str 描述整个视频的英文句子列表

数据分割

train validation test Overall
# of GIFs 80,000 10,708 11,360 102,068

注释

引用TGIF论文: "我们使用众包服务CrowdFlower为动画GIF添加自然语言描述。我们精心设计了注释任务,采用了多种质量控制机制,以确保句子在语法和语义上都是高质量的。共有931名工作者参与了我们的注释任务。我们只允许来自澳大利亚、加拿大、新西兰、英国和美国的工人参与,以收集来自母语为英语的工作者的流畅描述。每个任务显示5个动画GIF,并要求工作者用一个句子描述每个GIF。为了促进语言风格的多样性,每个工作者最多可以评价800张图像(占我们语料库的0.7%)。我们每句话支付0.02美元;整个众包成本不到4000美元。我们在补充材料中提供了我们注释任务的详细信息。"

使用数据的注意事项

数据集的社会影响

[需要更多信息]

偏见的讨论

[需要更多信息]

其他已知限制

[需要更多信息]

附加信息

许可信息

该数据集提供用于批准的非商业研究目的。该数据集中没有个人身份信息。

引用信息

bibtex @InProceedings{tgif-cvpr2016, author = {Li, Yuncheng and Song, Yale and Cao, Liangliang and Tetreault, Joel and Goldberg, Larry and Jaimes, Alejandro and Luo, Jiebo}, title = "{TGIF: A New Dataset and Benchmark on Animated GIF Description}", booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2016} }

贡献

感谢@leot13添加此数据集。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作