SPEAK-PP/openslr-sinhala-spelling-correction-prediction-reference-60000-modified-v1
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/SPEAK-PP/openslr-sinhala-spelling-correction-prediction-reference-60000-modified-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: dyslexic_sentence
dtype: string
- name: clean_sentence
dtype: string
splits:
- name: train
num_bytes: 21155146
num_examples: 60417
- name: eval
num_bytes: 2642681
num_examples: 7676
download_size: 10995742
dataset_size: 23797827
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: eval
path: data/eval-*
---
# OpenSLR Sinhala Spelling Correction Dataset (Modified v1)
## Dataset Description
This dataset contains pairs of misspelled and correctly spelled Sinhala sentences for spelling correction tasks. It is derived from the OpenSLR Sinhala dataset and includes 60,000 examples of spelling correction patterns in Sinhala language.
## Dataset Structure
The dataset contains the following columns:
- **dyslexic_sentence**: Misspelled or incorrectly written Sinhala sentence
- **clean_sentence**: Correctly spelled and formatted Sinhala sentence
## Dataset Size
- **Total Examples**: 60,000 sentence pairs
- **Format**: CSV (Comma-Separated Values)
## Use Cases
This dataset can be used for:
- Sinhala spelling correction model training
- Language model fine-tuning
- Sequence-to-sequence (seq2seq) model development
- Text normalization and cleaning
- Grammar correction research
## Language
- **Language**: Sinhala (සිංහල)
- **Script**: Sinhala script (Unicode)
## License
Refer to the original OpenSLR dataset license terms.
## Citation
If you use this dataset, please cite the original OpenSLR project and this modified version:
```
@dataset{
title={OpenSLR Sinhala Spelling Correction Dataset (Modified v1)},
organization={SPEAK-PP},
year={2026}
}
```
## Data Format Example
| dyslexic_sentence | clean_sentence |
|---|---|
| එන වැඩත් එක්ක වංක දේශපාලකයෙකුට ඉතුවෙලා ඉන්නෙ නිච්චර උනාත්මක බවෙන් වැඩි නිර්මාණයක් බෙබ් අඩ විවලට යොමු | වෙන වැඩත් එක්ක අවංක දේශපාලකයෙකුට ඉතුරුවෙලා ඉන්නේ මෙච්චරයි ගුණාත්මක බවින් වැඩි නිර්මාණයක් වෙබ් අඩවිවලට යොමු |
提供机构:
SPEAK-PP



