DirectQuote
收藏arXiv2021-10-15 更新2024-06-21 收录
下载链接:
https://github.com/THUNLP-MT/DirectQuote
下载链接
链接失效反馈官方服务:
资源简介:
DirectQuote是由清华大学创建的一个包含19,760个段落和10,279个直接引语的手动标注数据集,专注于新闻文本中的直接引语提取和归属。该数据集是目前最大的专注于新闻文本中直接引语的数据集,每个发言者都能链接到Wikidata上的特定命名实体,适用于多种下游任务。数据集的创建过程涉及从多个新闻源持续采样,确保文本分布与实际应用一致。DirectQuote的应用领域包括事实核查、媒体监控和新闻追踪,旨在提高新闻的透明度和责任性。
DirectQuote is a manually annotated dataset developed by Tsinghua University, which contains 19,760 paragraphs and 10,279 direct quotations, focusing on direct quotation extraction and attribution in news texts. It is currently the largest dataset specialized in direct quotations from news texts, where each speaker can be linked to a specific named entity on Wikidata, making it applicable to a variety of downstream tasks. The dataset was constructed through continuous sampling from multiple news sources, ensuring that its text distribution aligns with real-world application scenarios. The application fields of DirectQuote include fact-checking, media monitoring and news tracking, aiming to improve the transparency and accountability of news.
提供机构:
清华大学
创建时间:
2021-10-15



