winddude/reddit_finance_43_250k
收藏Hugging Face2023-05-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/winddude/reddit_finance_43_250k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: gpl-3.0
language:
- en
tags:
- finance
- investing
- crypto
- reddit
---
# reddit finance 43 250k
`reddit_finance_43_250k` is a collection of 250k post/comment pairs from 43 financial, investing and crypto subreddits. Post must have all been text, with a length of 250chars, and a positive score. Each subreddit is narrowed down to the 70th qunatile before being mergered with their top 3 comments and than the other subs. Further score based methods are used to select the top 250k post/comment pairs.
The code to recreate the dataset is here: <https://github.com/getorca/ProfitsBot_V0_OLLM/tree/main/ds_builder>
The trained lora model is here: <https://huggingface.co/winddude/pb_lora_7b_v0.1>
提供机构:
winddude
原始信息汇总
数据集概述
数据集名称
reddit_finance_43_250k
数据集内容
该数据集包含250,000个来自43个金融、投资和加密货币相关子版块的Reddit帖子及其评论对。每个帖子均为文本形式,长度为250个字符,且具有正分。
数据集构建方法
- 首先,对每个子版块的数据进行筛选,仅保留排名前70%的帖子。
- 然后,将每个帖子的前3条评论与之合并。
- 最后,通过基于分数的方法,从所有子版块中选出前250,000个帖子及其评论对。
数据集语言
英语
数据集标签
- 金融
- 投资
- 加密货币
数据集许可证
GPL-3.0



