Menlo/Ichigo-pretrain-tokenized-v0.1
收藏Hugging Face2024-12-28 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Menlo/Ichigo-pretrain-tokenized-v0.1
下载链接
链接失效反馈官方服务:
资源简介:
数据集包含四个配置:instruction_speech_v1、libris_r_filtered、mls_eng_10k和vivoice。每个数据集都包含以下特征:transcript(转录文本)、prompt(提示)、compress_prompt(压缩提示)以及一个嵌套列表conversations(对话),其中包含content(内容)和role(角色)。每个数据集都有关于其训练集分割、字节数、示例数、下载大小和数据集大小的信息。
The dataset consists of four configurations: instruction_speech_v1, libris_r_filtered, mls_eng_10k, and vivoice. Each dataset includes the following features: transcript (transcribed text), prompt (prompt), compress_prompt (compressed prompt), and a nested list conversations (conversation), which includes content (content) and role (role). Each dataset also has information about its train split, number of bytes, number of examples, download size, and dataset size.
提供机构:
Menlo



