MahdiMaaref/PersianToEnglishDataset-1M
收藏Hugging Face2025-12-06 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/MahdiMaaref/PersianToEnglishDataset-1M
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- translation
size_categories:
- 100K<n<1M
---
## 🔍 Dataset Source & Details
This cleaned dataset contains approximately **1 million records** curated from the **CCMatrix** parallel corpus, specifically from the Persian-English subset available at:
[https://opus.nlpl.eu/CCMatrix/en&fa/v1/CCMatrix](https://opus.nlpl.eu/CCMatrix/en&fa/v1/CCMatrix)
Each record in the dataset includes the following fields:
- **ID**: Unique identifier for the sentence pair
- **Score**: A translation confidence score between 0 and 100
- **fa_text**: The original Persian text
- **en_text**: The corresponding English translation
The dataset was carefully filtered and cleaned through a multi-stage curation process to ensure high translation quality and alignment accuracy.
## 🚀 Trained Model Repository
A translation model has been trained using this dataset and is available for use:
🔗 **Model Repository:** [https://github.com/Mahdi-Maaref/Persian-To-English-Translator](https://github.com/Mahdi-Maaref/Persian-To-English-Translator)
You can use this pre-trained model for Persian-English translation tasks, either through direct inference or by fine-tuning it for your specific applications.
提供机构:
MahdiMaaref



