SQUINKY! A Corpus of Sentence-level Formality, Informativeness, and Implicature

arXiv2016-09-28 更新2024-06-21 收录

下载链接：

https://drive.google.com/file/d/0B2Mzhc7popBgdXZmRlg2RUdqdDA/view?usp=sharing

下载链接

链接失效反馈

官方服务：

资源简介：

SQUINKY!是一个包含7032个句子的数据集，由密歇根大学的计算机科学与工程部门创建。该数据集通过亚马逊Mechanical Turk平台由人工标注者对句子的正式性、信息量和隐含意义进行评分，评分范围为1-7。数据集的内容涵盖了博客、新闻文章、学术论文和在线论坛等多种文档类型，旨在研究不同文体和语境下的语言使用。创建过程中，研究者对数据进行了严格的清洗和句子分割，确保数据的质量。该数据集的应用领域包括语言学研究、自然语言处理和文本分析，旨在解决如何量化和理解语言的正式性、信息量和隐含意义的问题。

SQUINKY! is a dataset containing 7032 sentences, developed by the Department of Computer Science and Engineering at the University of Michigan. Human annotators rated the sentences on three aspects—formality, informativeness, and implicature—via the Amazon Mechanical Turk platform, with scores ranging from 1 to 7. The dataset covers diverse document types including blogs, news articles, academic papers, and online forums, and is designed to study language usage across different styles and contexts. During the dataset construction process, researchers performed rigorous data cleaning and sentence segmentation to guarantee data quality. Its application fields cover linguistic research, natural language processing, and text analysis, with the objective of resolving the problem of quantifying and understanding language formality, informativeness, and implicature.

提供机构：

计算机科学与工程密歇根大学安娜堡, MI 48109

创建时间：

2015-06-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集