five

Cong123779/mtet-vi-en-bidirectional

收藏
Hugging Face2026-02-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Cong123779/mtet-vi-en-bidirectional
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - vi - en license: mit task_categories: - translation tags: - vietnamese - english - parallel-corpus - bidirectional size_categories: - 1M<n<10M --- # MTET Vietnamese–English Bidirectional Dataset (Cleaned) This is the cleaned, bidirectional version of the MTET (Machine Translation Evaluation for Translation) Vietnamese–English parallel corpus used to train the [vi-en-transformer-25m](https://huggingface.co/Cong123779/vi-en-transformer-25m) model. ## Dataset Description - **Source:** MTET corpus (public Vietnamese–English parallel data) - **Processing:** Cleaned with heuristic filters, bidirectional pairs added (both VI→EN and EN→VI) - **Format:** CSV with columns `source` and `target` - **Size:** ~2 GB (cleaned) ## Files | File | Description | |---|---| | `mtet_bidirectional_cleaned.csv` | Main training data (bidirectional, cleaned) | ## Usage ```python import pandas as pd df = pd.read_csv("mtet_bidirectional_cleaned.csv") print(df.head()) # source target # xin chào hello # hello xin chào ``` ## License The dataset follows the original MTET distribution license (MIT).
提供机构:
Cong123779
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作