five

德国新闻文章中的引用归属数据集

收藏
arXiv2024-04-26 更新2024-06-21 收录
下载链接:
https://github.com/uhh-lt/german-news-quotation-attribution-2024
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集由汉堡大学计算与数据科学研究所及语言技术组创建,专注于德国新闻文章中的引用归属问题。数据集包含1000篇经过精细标注的文章,总计约250,000个词汇,涵盖直接、间接等多种引用类型及其上下文信息。创建过程中,研究团队采用了严格的标注流程和质量控制措施,确保数据的高质量和准确性。该数据集适用于自然语言处理领域的多种下游任务,如引用检测、归属分析等,旨在帮助研究人员和记者更有效地分析和处理大量新闻数据,提高信息处理的自动化水平。

This dataset was developed by the Institute of Computing and Data Science and the Language Technology Group of the University of Hamburg, focusing on the task of citation attribution in German news articles. It includes 1,000 meticulously annotated articles, totaling approximately 250,000 words, covering various citation types such as direct and indirect citations along with their contextual information. During its development, the research team adopted strict annotation protocols and quality control measures to guarantee the high quality and accuracy of the dataset. This dataset supports a wide range of downstream natural language processing (NLP) tasks, including citation detection and attribution analysis, among others. It aims to assist researchers and journalists in analyzing and processing large volumes of news data more effectively, thereby enhancing the automation level of information processing.
提供机构:
汉堡大学计算与数据科学研究所及语言技术组
创建时间:
2024-04-26
二维码
社区交流群
二维码
科研交流群
商业服务