argilla-warehouse/personahub-fineweb-edu-4-raw
收藏Hugging Face2024-09-10 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/argilla-warehouse/personahub-fineweb-edu-4-raw
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是通过distilabel工具创建的,应用了proj-persona/PersonaHub流程到HuggingFaceFW/fineweb-edu数据集的一个子集。数据集筛选了教育内容评分大于等于4的行,共包含22532926行数据。数据集的结构包括id、persona和model_name三个特征,其中persona是通过分析文本内容生成的用户画像。数据集的主要用途是通过分析文本内容生成用户画像,适用于教育内容分析等领域。
This dataset has been created with distilabel, applying the PersonaHub pipeline to a subset of the fineweb-edu dataset. Rows with a score threshold >=4 (those with the highest educational content) were filtered, resulting in 22532926 rows. The dataset includes features such as id, persona, and model_name, and is divided into a training set. The dataset size is 5812272702 bytes, with a download size of 2744862873 bytes. The creation of the dataset used a specific task class, TextToPersona, to analyze and assign general types of personas associated with the way of expressing from the text content.
提供机构:
argilla-warehouse



