five

Ba2han/Reddit-instruct-curated_rated-1.2k

收藏
Hugging Face2024-02-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/Reddit-instruct-curated_rated-1.2k
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en size_categories: - 1K<n<10K --- This is an LLM rated version of **euclaise/reddit-instruct-curated**, which is already a good dataset imo. Only **post titles** and **comment texts** were rated as post texts can be confusing due to edits and seemingly out of context information. First, **I filtered examples with <250 comment score**. Of course this is not a very efficient filtering as some pairs might have references to other comments or simply be unhelpful, yet upvoted due to Reddit hivemind. Next I sent the example pairs with a rating prompt to Senku-Q2-XS and collected the numeric votes **(out of 10)**. Overall there aren't many low rated examples. Here are three "worst" examples: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/lxj7BGeJXqgRwtx3UoPlU.png) There are only 66 examples with <6 rate. An example of highly upvoted but poorly rated pair: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/u6wsjzeHNnN4OGPWplyXe.png) **Let me know if I fucked up anything, I still have no idea what I am doing honestly.**
提供机构:
Ba2han
原始信息汇总

数据集概述

数据集来源

  • 该数据集是基于 euclaise/reddit-instruct-curated 的改进版本。

数据内容

  • 仅对 帖子标题评论文本 进行了评级,因为帖子文本可能因编辑和上下文信息不明确而造成混淆。

数据筛选

  • 首先,过滤了评论评分低于250的示例。
  • 然后,将筛选后的示例对发送给 Senku-Q2-XS 进行评级,并收集了 10分制 的数值投票。

数据特点

  • 整体上,低评分的示例不多。
  • 仅有66个示例的评分低于6分。
  • 存在一些高赞但评分低的示例对。

示例

  • 提供了三个“最差”示例的图像链接。
  • 提供了一个高赞但评分低的示例对的图像链接。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作