patrykbart/codeparrot-clean-no-comments-starencoder-small
收藏Hugging Face2025-01-03 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/patrykbart/codeparrot-clean-no-comments-starencoder-small
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了用于训练的文本数据,具体特征包括input_ids(文本的索引表示)、attention_mask(注意力掩码)、depths(深度信息)和sibling_idxs(兄弟节点索引)。数据集分为训练集,共有5361365个示例,总文件大小为57731.178320字节。提供了默认配置,以及训练数据文件的路径。
The dataset contains text data for training, with features including input_ids (index representation of text), attention_mask (attention mask), depths (depth information), and sibling_idxs (sibling node indices). The dataset is split into a training set with a total of 5,361,365 examples and a total file size of 57,731,178,320 bytes. A default configuration is provided, along with the path to the training data files.
提供机构:
patrykbart



