Georgetown University Multilayer (GUM) corpus

SSH Open MarketPlace2021-07-22 更新2024-08-03 收录

下载链接：

https://marketplace.sshopencloud.eu/dataset/0Ve968

下载链接

链接失效反馈

官方服务：

资源简介：

GUM is the Georgetown University Multilayer corpus, a collection of richly annotated digital texts used for linguistic research and making Natural Language Processing applications. The corpus is built and expanded each year by Georgetown students as part of the course LING-367, Computational Corpus Linguistics. The data we collect is chosen by students and currently includes interviews, news, travel guides, how-to guides, biographies, short stories, reddit forum discussions and academic writing.

GUM即乔治城大学多层语料库（Georgetown University Multilayer corpus），是一套经过精细化标注的数字文本集合，适用于语言学研究与自然语言处理（Natural Language Processing）应用开发。该语料库每年由乔治城大学学生作为课程LING-367《计算语料库语言学》的实践环节进行搭建与扩充。当前收录的数据集由学生遴选，涵盖访谈、新闻、旅游指南、操作指南、传记、短篇小说、红迪（Reddit）论坛讨论以及学术写作等多种文本类型。

创建时间：

2021-07-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集