Korean - English Parallel Corpus
收藏www.kaggle.com2020-08-19 更新2025-03-24 收录
下载链接:
https://www.kaggle.com/rareloto/naver-dictionary-conversation-of-the-day
下载链接
链接失效反馈官方服务:
资源简介:
### Context
As part of my Korean language learning hobby, I write and type out daily conversations from Naver Conversation of the Day.
After getting introduced to data science and machine learning, I wanted to use programming to facilitate my learning process by collecting data and trying out projects. So I scraped data from [Naver Dictionary](https://dict.naver.com) using a Python [script](https://github.com/rareloto/beginnerwebscraping-naverdictionary) to be used later when I train a bilingual AI study buddy chatbot or automate Anki [flashcards](https://quizlet.com/_8lh2dv?x=1jqt&i=224vll).
### Content
This is a corpus of Korean - English paired conversations parallel text extracted from [Naver Dictionary](https://dict.naver.com). This dataset consists of 4563 parallel text pairs from December 4, 2017 to August 19, 2020 of Naver's Conversation of the Day.
The files and their headers are listed below.
* conversations.csv
* date - 'Conversation of the Day' date
* conversation_id - ordered numbering to indicate conversation flow
* kor_sent - Korean sentence
* eng_sent - English translation
* qna_id - from sender or receiver, message or feedback
* conversation_titles.csv
* date - 'Conversation of the Day' date
* kor_title - 'Conversation of the Day' title in Korean
* eng_title - English translation of the title
* grammar - grammar of the day
* grammar_desc - grammar description
### Acknowledgements
The data was collected from [Naver Dictionary](https://dict.naver.com) and the conversations were from the Korean Language Institute of Yonsei University.
{'Context': '作为本人对韩语学习之业余爱好的组成部分,我每日从Naver每日对话中撰写并录入日常对话。自接触数据科学与机器学习之后,我希冀通过编程手段促进学习进程,搜集数据并尝试各类项目。因此,我利用Python脚本(https://github.com/rareloto/beginnerwebscraping-naverdictionary)从[Naver词典](https://dict.naver.com)中抓取数据,以便日后在训练双语AI学习伙伴聊天机器人或自动化Anki[闪卡](https://quizlet.com/_8lh2dv?x=1jqt&i=224vll)时使用。', 'Content': "本语料库为从[Naver词典](https://dict.naver.com)提取的韩语-英语配对对话平行文本。该数据集包含了自2017年12月4日至2020年8月19日Naver每日对话的4563对平行文本。以下为文件及其头部信息列表:
* conversations.csv
* date - '每日对话'的日期
* conversation_id - 标示对话流程的有序编号
* kor_sent - 韩语句子
* eng_sent - 英语翻译
* qna_id - 发送者或接收者的消息或反馈编号
* conversation_titles.csv
* date - '每日对话'的日期
* kor_title - '每日对话'的韩语标题
* eng_title - 标题的英语翻译
* grammar - 当日语法
* grammar_desc - 语法描述", 'Acknowledgements': '数据来源于[Naver词典](https://dict.naver.com),对话内容来自延世大学韩国语言学院。'}
提供机构:
www.kaggle.com



