COUGH
收藏arXiv2021-09-11 更新2024-06-21 收录
下载链接:
https://github.com/sunlab-osu/covid-faq
下载链接
链接失效反馈官方服务:
资源简介:
COUGH数据集是由俄亥俄州立大学创建的一个大型挑战性数据集,专门用于COVID-19常见问题解答检索。该数据集包含FAQ银行、查询银行和相关性集合三部分。FAQ银行包含约16000条从55个可信网站(如CDC和WHO)抓取的FAQ项。查询银行包含1236个人工改写的查询,而相关性集合则为每个查询提供了约32个由人工标注的FAQ项。COUGH数据集覆盖了COVID-19的广泛主题,从病毒的基本信息到特定健康饮食指导,旨在通过提供一个高质量的评估基准,推动COVID-19 FAQ检索模型的进一步研究和改进。
The COUGH dataset is a large-scale challenging dataset developed by The Ohio State University, specifically tailored for COVID-19 FAQ retrieval. This dataset comprises three core components: the FAQ bank, the query bank, and the relevance collection. The FAQ bank contains approximately 16,000 FAQ items crawled from 55 trusted websites such as the Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO). The query bank includes 1,236 manually rewritten queries, and the relevance collection provides around 32 manually annotated FAQ items for each individual query. Covering a broad spectrum of COVID-19-related topics spanning from basic virological information to specific healthy dietary guidelines, the COUGH dataset aims to facilitate further research and advancement of COVID-19 FAQ retrieval models by offering a high-quality evaluation benchmark.
提供机构:
俄亥俄州立大学
创建时间:
2020-10-24



