five

Yokii2/jmdict-ja-en-sentence-pairs

收藏
Hugging Face2026-03-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Yokii2/jmdict-ja-en-sentence-pairs
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - translation language: - ja - en size_categories: - 1M<n<10M --- # jmdict-ja-en-sentence-pairs A Japanese–English sentence pair dataset with 3,416,308 entries, generated using vocabulary from [JMDict-eng 3.6.2](https://www.edrdg.org/wiki/index.php/JMDict-JMnedict_and_EDICT-J_to_E_Dictionary_Files) as the word source. Each sentence was constructed around a specific vocabulary word drawn from JMDict, covering a wide range of styles and contexts: casual conversation, business, travel, narrative prose, and more. ## Dataset Structure | Column | Description | |---|---| | `input` | Japanese sentence | | `output` | English translation | | `word_used` | The JMDict vocabulary word the sentence was built around | | `word_id` | JMDict entry ID for the word | ## Generation - Vocabulary source: JMDict-eng 3.6.2 - ~134.4M tokens generated in total (89.3M input / 45.1M output) - Sentence styles vary across everyday conversation, formal speech, business, travel, emotion, narrative, and more - Quotation mark consistency between Japanese and English was validated and corrected during generation - Models used for generation: DeepSeek-V3.2, Gemini 3.1 Flash Lite, and Mistral-Large-3-675B-Instruct-2512 ## Citation ```bibtex @misc{deepseekai2025deepseekv32, title={DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models}, author={DeepSeek-AI}, year={2025} } ```
提供机构:
Yokii2
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作