Yokii2/jmdict-ja-en-sentence-pairs
收藏Hugging Face2026-03-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Yokii2/jmdict-ja-en-sentence-pairs
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- translation
language:
- ja
- en
size_categories:
- 1M<n<10M
---
# jmdict-ja-en-sentence-pairs
A Japanese–English sentence pair dataset with 3,416,308 entries, generated using vocabulary from [JMDict-eng 3.6.2](https://www.edrdg.org/wiki/index.php/JMDict-JMnedict_and_EDICT-J_to_E_Dictionary_Files) as the word source.
Each sentence was constructed around a specific vocabulary word drawn from JMDict, covering a wide range of styles and contexts: casual conversation, business, travel, narrative prose, and more.
## Dataset Structure
| Column | Description |
|---|---|
| `input` | Japanese sentence |
| `output` | English translation |
| `word_used` | The JMDict vocabulary word the sentence was built around |
| `word_id` | JMDict entry ID for the word |
## Generation
- Vocabulary source: JMDict-eng 3.6.2
- ~134.4M tokens generated in total (89.3M input / 45.1M output)
- Sentence styles vary across everyday conversation, formal speech, business, travel, emotion, narrative, and more
- Quotation mark consistency between Japanese and English was validated and corrected during generation
- Models used for generation: DeepSeek-V3.2, Gemini 3.1 Flash Lite, and Mistral-Large-3-675B-Instruct-2512
## Citation
```bibtex
@misc{deepseekai2025deepseekv32,
title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models},
author={DeepSeek-AI},
year={2025}
}
```
提供机构:
Yokii2



