Ba2han/Reddit-instruct-curated_rated-1.2k
收藏Hugging Face2024-02-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/Reddit-instruct-curated_rated-1.2k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
size_categories:
- 1K<n<10K
---
This is an LLM rated version of **euclaise/reddit-instruct-curated**, which is already a good dataset imo.
Only **post titles** and **comment texts** were rated as post texts can be confusing due to edits and seemingly out of context information.
First, **I filtered examples with <250 comment score**. Of course this is not a very efficient filtering as some pairs might have references to other comments or simply be unhelpful, yet upvoted due to Reddit hivemind.
Next I sent the example pairs with a rating prompt to Senku-Q2-XS and collected the numeric votes **(out of 10)**.
Overall there aren't many low rated examples. Here are three "worst" examples:

There are only 66 examples with <6 rate.
An example of highly upvoted but poorly rated pair:

**Let me know if I fucked up anything, I still have no idea what I am doing honestly.**
提供机构:
Ba2han
原始信息汇总
数据集概述
数据集来源
- 该数据集是基于 euclaise/reddit-instruct-curated 的改进版本。
数据内容
- 仅对 帖子标题 和 评论文本 进行了评级,因为帖子文本可能因编辑和上下文信息不明确而造成混淆。
数据筛选
- 首先,过滤了评论评分低于250的示例。
- 然后,将筛选后的示例对发送给 Senku-Q2-XS 进行评级,并收集了 10分制 的数值投票。
数据特点
- 整体上,低评分的示例不多。
- 仅有66个示例的评分低于6分。
- 存在一些高赞但评分低的示例对。
示例
- 提供了三个“最差”示例的图像链接。
- 提供了一个高赞但评分低的示例对的图像链接。



