five

argilla-warehouse/personahub-fineweb-edu-4-raw

收藏
Hugging Face2024-09-10 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/argilla-warehouse/personahub-fineweb-edu-4-raw
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是通过distilabel工具创建的,应用了proj-persona/PersonaHub流程到HuggingFaceFW/fineweb-edu数据集的一个子集。数据集筛选了教育内容评分大于等于4的行,共包含22532926行数据。数据集的结构包括id、persona和model_name三个特征,其中persona是通过分析文本内容生成的用户画像。数据集的主要用途是通过分析文本内容生成用户画像,适用于教育内容分析等领域。

This dataset has been created with distilabel, applying the PersonaHub pipeline to a subset of the fineweb-edu dataset. Rows with a score threshold >=4 (those with the highest educational content) were filtered, resulting in 22532926 rows. The dataset includes features such as id, persona, and model_name, and is divided into a training set. The dataset size is 5812272702 bytes, with a download size of 2744862873 bytes. The creation of the dataset used a specific task class, TextToPersona, to analyze and assign general types of personas associated with the way of expressing from the text content.
提供机构:
argilla-warehouse
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作