homebrewltd/Ichigo-pretrain-tokenized-v0.1
收藏Hugging Face2024-12-28 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/homebrewltd/Ichigo-pretrain-tokenized-v0.1
下载链接
链接失效反馈官方服务:
资源简介:
本数据集包含四个子数据集:instruction_speech_v1、libris_r_filtered、mls_eng_10k和vivoice。每个子数据集都包括以下特征:文本记录(transcript)、提示语(prompt)、压缩提示语(compress_prompt)和对话(conversations),对话中包含对话内容和角色信息。每个子数据集都有训练集的分割信息,包括字节数和示例数量。此外,还提供了每个数据集的下载大小和总大小。
The dataset consists of four sub-datasets: instruction_speech_v1, libris_r_filtered, mls_eng_10k, and vivoice. Each sub-dataset includes features such as transcript, prompt, compressed prompt, and conversations which contain content and role fields. There are split details for each sub-dataset indicating the size of the training set in bytes and the number of examples. Moreover, the download size and the total size of each dataset are provided.
提供机构:
homebrewltd



