five

SPEAK-PP/openslr-sinhala-spelling-correction-prediction-reference-60000-modified-v1

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/SPEAK-PP/openslr-sinhala-spelling-correction-prediction-reference-60000-modified-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: dyslexic_sentence dtype: string - name: clean_sentence dtype: string splits: - name: train num_bytes: 21155146 num_examples: 60417 - name: eval num_bytes: 2642681 num_examples: 7676 download_size: 10995742 dataset_size: 23797827 configs: - config_name: default data_files: - split: train path: data/train-* - split: eval path: data/eval-* --- # OpenSLR Sinhala Spelling Correction Dataset (Modified v1) ## Dataset Description This dataset contains pairs of misspelled and correctly spelled Sinhala sentences for spelling correction tasks. It is derived from the OpenSLR Sinhala dataset and includes 60,000 examples of spelling correction patterns in Sinhala language. ## Dataset Structure The dataset contains the following columns: - **dyslexic_sentence**: Misspelled or incorrectly written Sinhala sentence - **clean_sentence**: Correctly spelled and formatted Sinhala sentence ## Dataset Size - **Total Examples**: 60,000 sentence pairs - **Format**: CSV (Comma-Separated Values) ## Use Cases This dataset can be used for: - Sinhala spelling correction model training - Language model fine-tuning - Sequence-to-sequence (seq2seq) model development - Text normalization and cleaning - Grammar correction research ## Language - **Language**: Sinhala (සිංහල) - **Script**: Sinhala script (Unicode) ## License Refer to the original OpenSLR dataset license terms. ## Citation If you use this dataset, please cite the original OpenSLR project and this modified version: ``` @dataset{ title={OpenSLR Sinhala Spelling Correction Dataset (Modified v1)}, organization={SPEAK-PP}, year={2026} } ``` ## Data Format Example | dyslexic_sentence | clean_sentence | |---|---| | එන වැඩත් එක්ක වංක දේශපාලකයෙකුට ඉතුවෙලා ඉන්නෙ නිච්චර උනාත්මක බවෙන් වැඩි නිර්මාණයක් බෙබ් අඩ විවලට යොමු | වෙන වැඩත් එක්ක අවංක දේශපාලකයෙකුට ඉතුරුවෙලා ඉන්නේ මෙච්චරයි ගුණාත්මක බවින් වැඩි නිර්මාණයක් වෙබ් අඩවිවලට යොමු |
提供机构:
SPEAK-PP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作