STORYWARS
收藏arXiv2023-05-14 更新2024-06-21 收录
下载链接:
https://github.com/ylndu/storywars
下载链接
链接失效反馈官方服务:
资源简介:
STORYWARS是由哥伦比亚大学创建的一个包含超过40,000个协作故事的数据集,这些故事由9,400名不同作者在在线平台上共同创作。数据集旨在推动自然语言处理在协作故事理解和生成方面的研究。STORYWARS不仅包含故事文本,还包括故事的标题、类型、作者信息及用户评价等丰富信息。数据集的创建过程涉及从在线平台抓取故事,使用语言识别和GPT-2困惑度进行数据清洗,确保数据质量。STORYWARS的应用领域广泛,主要用于研究协作故事的生成和理解,解决模型在理解多作者协作文本中的挑战。
STORYWARS is a dataset developed by Columbia University, consisting of over 40,000 collaborative stories co-written by 9,400 distinct authors on online platforms. The dataset is designed to promote research on natural language processing (NLP) for collaborative story understanding and generation. In addition to the raw story texts, STORYWARS also contains rich supplementary information including story titles, genres, author profiles, and user reviews. The construction of STORYWARS involves scraping stories from online platforms, followed by data cleaning via language identification and GPT-2 perplexity scoring to guarantee data quality. STORYWARS has broad application prospects, mainly being utilized to study collaborative story generation and understanding, and to tackle the challenges that models encounter when interpreting co-authored collaborative texts.
提供机构:
哥伦比亚大学
创建时间:
2023-05-14



