Ba2han/Reddit-instruct-curated_rated-1.2k

Name: Ba2han/Reddit-instruct-curated_rated-1.2k
Creator: Ba2han
Published: 2024-02-16 03:50:26
License: 暂无描述

Hugging Face2024-02-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Ba2han/Reddit-instruct-curated_rated-1.2k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en size_categories: - 1K<n<10K --- This is an LLM rated version of **euclaise/reddit-instruct-curated**, which is already a good dataset imo. Only **post titles** and **comment texts** were rated as post texts can be confusing due to edits and seemingly out of context information. First, **I filtered examples with <250 comment score**. Of course this is not a very efficient filtering as some pairs might have references to other comments or simply be unhelpful, yet upvoted due to Reddit hivemind. Next I sent the example pairs with a rating prompt to Senku-Q2-XS and collected the numeric votes **(out of 10)**. Overall there aren't many low rated examples. Here are three "worst" examples: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/lxj7BGeJXqgRwtx3UoPlU.png) There are only 66 examples with <6 rate. An example of highly upvoted but poorly rated pair: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/u6wsjzeHNnN4OGPWplyXe.png) **Let me know if I fucked up anything, I still have no idea what I am doing honestly.**

提供机构：

Ba2han

原始信息汇总

数据集概述

数据集来源

该数据集是基于 euclaise/reddit-instruct-curated 的改进版本。

数据内容

仅对 帖子标题 和 评论文本 进行了评级，因为帖子文本可能因编辑和上下文信息不明确而造成混淆。

数据筛选

首先，过滤了评论评分低于250的示例。
然后，将筛选后的示例对发送给 Senku-Q2-XS 进行评级，并收集了 10分制 的数值投票。

数据特点

整体上，低评分的示例不多。
仅有66个示例的评分低于6分。
存在一些高赞但评分低的示例对。

示例

提供了三个“最差”示例的图像链接。
提供了一个高赞但评分低的示例对的图像链接。

5,000+

优质数据集

54 个

任务类型

进入经典数据集