thoddnn/OpenDataGen-factuality-en-v0.1

Name: thoddnn/OpenDataGen-factuality-en-v0.1
Creator: thoddnn
Published: 2024-04-01 06:05:09
License: 暂无描述

Hugging Face2024-04-01 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/thoddnn/OpenDataGen-factuality-en-v0.1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - question-answering language: - en tags: - wikipedia - synthetic - synthetic data size_categories: - n<1K --- This synthetic dataset was generated using the Open DataGen Python library. (https://github.com/thoddnn/open-datagen) # Methodology: 1) Retrieve random article content from the HuggingFace Wikipedia English dataset. 2) Construct a Chain of Thought (CoT) to generate a Multiple Choice Question (MCQ). 3) Utilize a Large Language Model (LLM) to score the results then filter it. All these steps are prompted in the 'template.json' file located in the specified code folder. Code: https://github.com/thoddnn/open-datagen/blob/main/opendatagen/examples/opendata-eval/ Feel free to reach me on Linkedin (https://www.linkedin.com/in/thomasdordonne/) or Twitter (https://twitter.com/thoDdnn)

提供机构：

thoddnn

原始信息汇总

数据集概述

许可证

任务类别

问答

语言

英语

数据集大小

小于1K

生成方法

从HuggingFace的Wikipedia英语数据集中随机获取文章内容。
构建思维链（Chain of Thought, CoT）以生成多选题（Multiple Choice Question, MCQ）。
利用大型语言模型（Large Language Model, LLM）对结果进行评分并过滤。

代码文件

相关代码位于指定的代码文件夹中，具体路径为：https://github.com/thoddnn/open-datagen/blob/main/opendatagen/examples/opendata-eval/

5,000+

优质数据集

54 个

任务类型

进入经典数据集