Shubhangi7/orpheus-HiEn-data-tokenized
收藏Hugging Face2025-11-10 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/Shubhangi7/orpheus-HiEn-data-tokenized
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个特征字段:input_ids, labels, 和attention_mask,分别对应不同的序列类型。数据集被划分为训练集,包含7326个示例和67646663字节的数据。默认配置下,训练数据可以通过指定的路径模式进行访问。
The dataset includes three feature fields: input_ids, labels, and attention_mask, each corresponding to different sequence types. The dataset is split into a training set, containing 7326 examples and 67646663 bytes of data. Under the default configuration, the training data can be accessed via a specified path pattern.
提供机构:
Shubhangi7



