atharvanighot/hindi-verified-tokenized
收藏Hugging Face2024-12-04 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/atharvanighot/hindi-verified-tokenized
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是ai4bharat/sangraha数据集的重新上传,特别包含了190万行经过验证的印地语数据。这些数据已经使用特定的印地语分词器(atharvanighot/hindi-tokenizer)进行了分词处理,因此可以直接用于训练,因为它是一个预分词的数据集。
This dataset is the reupload of ai4bharat/sangraha dataset. Specifically, 1.9 Million rows of Hindi Verified Data. This is tokenized with Hindi Tokenizer: atharvanighot/hindi-tokenizer such that it can be used to train directly as it is pretokenized dataset.
提供机构:
atharvanighot



