Aunsiels/InfantBooks
收藏Hugging Face2022-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Aunsiels/InfantBooks
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language:
- en
language_creators:
- crowdsourced
license:
- gpl
multilinguality:
- monolingual
pretty_name: InfantBooks
size_categories:
- 1M<n<10M
source_datasets:
- original
tags:
- research paper
- kids
- children
- books
task_categories:
- text-generation
task_ids:
- language-modeling
---
# Dataset Card for InfantBooks
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Additional Information](#additional-information)
- [Citation Information](#citation-information)
## Dataset Description
- **Homepage:** [https://www.mpi-inf.mpg.de/children-texts-for-commonsense](https://www.mpi-inf.mpg.de/children-texts-for-commonsense)
- **Paper:** Do Children Texts Hold The Key To Commonsense Knowledge?
### Dataset Summary
A dataset of infants/children's books.
### Languages
All the books are in English;
## Dataset Structure
### Data Instances
malis-friend_BookDash-FKB.txt,"Then a taxi driver, hooting around the yard with his wire car. Mali enjoys playing by himself..."
### Data Fields
- title: The title of the book
- content: The content of the book
## Dataset Creation
### Curation Rationale
The goal of the dataset is to study infant books, which are supposed to be easier to understand than normal texts. In particular, the original goal was to study if these texts contain more commonsense knowledge.
### Source Data
#### Initial Data Collection and Normalization
We automatically collected kids' books on the web.
#### Who are the source language producers?
Native speakers.
### Citation Information
```
Romero, J., & Razniewski, S. (2022).
Do Children Texts Hold The Key To Commonsense Knowledge?
In Proceedings of the 2022 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.
```
提供机构:
Aunsiels
原始信息汇总
数据集概述
数据集描述
数据集总结
- 名称: InfantBooks
- 内容: 包含婴幼儿/儿童书籍的数据集。
语言
- 语言: 英语
- 语言创建者: 众包
数据集结构
数据实例
- 示例:
malis-friend_BookDash-FKB.txt包含书籍内容。
数据字段
- 标题: 书籍的标题
- 内容: 书籍的内容
数据集创建
采集理由
- 目的: 研究婴幼儿书籍,这些书籍被认为比普通文本更易于理解,特别是研究这些文本是否包含更多常识知识。
源数据
- 初始数据收集: 自动从网络上收集儿童书籍
- 源语言生产者: 母语者
附加信息
引用信息
- 作者: Romero, J., & Razniewski, S.
- 出版物: 在2022年联合会议的实证方法在自然语言处理和计算自然语言学习会议论文集
- 论文标题: Do Children Texts Hold The Key To Commonsense Knowledge?



