sartifyllc/clean_loal_pretrain
收藏Hugging Face2024-10-22 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sartifyllc/clean_loal_pretrain
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- sw
tags:
- dataset
- swahili
license: mit
---
# Dataset Card for sartifyllc/clean_loal_pretrain
## Dataset Description
This is a Swahili dataset for CPT.
## Dataset Usage
This dataset can be used for [describe potential uses].
## Dataset Structure
### Features:
- text: Number of rows 3017
- token: Number of token so far 0.003997569B
- token per row: Number of token so far per row 1325.0145840238647
## Dataset Creation
### Source Data
- Homepage: https://huggingface.co/datasets/sartifyllc/clean_loal_pretrain
### Data Preprocessing
[Describe any preprocessing steps here]
## Considerations for Using the Data
[Add any considerations, such as ethical concerns or potential biases]
## Additional Information
[Any other relevant information about the dataset]
提供机构:
sartifyllc



