allenai/dolma-reddit-to-flashcards-0625
收藏Hugging Face2025-07-28 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/allenai/dolma-reddit-to-flashcards-0625
下载链接
链接失效反馈官方服务:
资源简介:
Dolma Reddit to Flashcards是一个基于过滤后的Reddit数据合成的问答(QA)项目数据集。该数据集的构建是为了更好地利用Reddit的帖子上下文以及其中的专业知识,以利于知识基础问答任务,如MMLU。数据集的构建包括三个基本部分:1) 构建受QA结构启发的帖子上下文,2) 过滤与学术主题相关的高质量subreddits,3) 重写这些subreddits的内容以减少噪声并增加与标准多项选择题(MCQA)的相似度。
Dolma Reddit to Flashcards is a dataset of synthetically-generated QA items based on filtered Reddit data. The construction of this dataset aims to better leverage Reddits post context and the specialized knowledge within it for knowledge-based QA tasks such as MMLU. The dataset construction involves three basic parts: 1) constructing thread contexts inspired by QA structure, 2) filtering to high-quality subreddits relevant to academic topics, 3) rewriting the content from those subreddits to reduce noise and increase resemblance to standard MCQA.
提供机构:
allenai



