thoddnn/OpenDataGen-factuality-en-v0.1
收藏Hugging Face2024-04-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/thoddnn/OpenDataGen-factuality-en-v0.1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
language:
- en
tags:
- wikipedia
- synthetic
- synthetic data
size_categories:
- n<1K
---
This synthetic dataset was generated using the Open DataGen Python library. (https://github.com/thoddnn/open-datagen)
# Methodology:
1) Retrieve random article content from the HuggingFace Wikipedia English dataset.
2) Construct a Chain of Thought (CoT) to generate a Multiple Choice Question (MCQ).
3) Utilize a Large Language Model (LLM) to score the results then filter it.
All these steps are prompted in the 'template.json' file located in the specified code folder.
Code: https://github.com/thoddnn/open-datagen/blob/main/opendatagen/examples/opendata-eval/
Feel free to reach me on Linkedin (https://www.linkedin.com/in/thomasdordonne/) or Twitter (https://twitter.com/thoDdnn)
提供机构:
thoddnn
原始信息汇总
数据集概述
许可证
- MIT
任务类别
- 问答
语言
- 英语
标签
- Wikipedia
- 合成数据
- 合成数据集
数据集大小
- 小于1K
生成方法
- 从HuggingFace的Wikipedia英语数据集中随机获取文章内容。
- 构建思维链(Chain of Thought, CoT)以生成多选题(Multiple Choice Question, MCQ)。
- 利用大型语言模型(Large Language Model, LLM)对结果进行评分并过滤。
代码文件
- 相关代码位于指定的代码文件夹中,具体路径为:https://github.com/thoddnn/open-datagen/blob/main/opendatagen/examples/opendata-eval/



