five

Korean - English Parallel Corpus

收藏
www.kaggle.com2020-08-19 更新2025-03-24 收录
下载链接:
https://www.kaggle.com/rareloto/naver-dictionary-conversation-of-the-day
下载链接
链接失效反馈
官方服务:
资源简介:
### Context As part of my Korean language learning hobby, I write and type out daily conversations from Naver Conversation of the Day. After getting introduced to data science and machine learning, I wanted to use programming to facilitate my learning process by collecting data and trying out projects. So I scraped data from [Naver Dictionary](https://dict.naver.com) using a Python [script](https://github.com/rareloto/beginnerwebscraping-naverdictionary) to be used later when I train a bilingual AI study buddy chatbot or automate Anki [flashcards](https://quizlet.com/_8lh2dv?x=1jqt&i=224vll). ### Content This is a corpus of Korean - English paired conversations parallel text extracted from [Naver Dictionary](https://dict.naver.com). This dataset consists of 4563 parallel text pairs from December 4, 2017 to August 19, 2020 of Naver's Conversation of the Day. The files and their headers are listed below. * conversations.csv * date - 'Conversation of the Day' date * conversation_id - ordered numbering to indicate conversation flow * kor_sent - Korean sentence * eng_sent - English translation * qna_id - from sender or receiver, message or feedback * conversation_titles.csv * date - 'Conversation of the Day' date * kor_title - 'Conversation of the Day' title in Korean * eng_title - English translation of the title * grammar - grammar of the day * grammar_desc - grammar description ### Acknowledgements The data was collected from [Naver Dictionary](https://dict.naver.com) and the conversations were from the Korean Language Institute of Yonsei University.

{'Context': '作为本人对韩语学习之业余爱好的组成部分,我每日从Naver每日对话中撰写并录入日常对话。自接触数据科学与机器学习之后,我希冀通过编程手段促进学习进程,搜集数据并尝试各类项目。因此,我利用Python脚本(https://github.com/rareloto/beginnerwebscraping-naverdictionary)从[Naver词典](https://dict.naver.com)中抓取数据,以便日后在训练双语AI学习伙伴聊天机器人或自动化Anki[闪卡](https://quizlet.com/_8lh2dv?x=1jqt&i=224vll)时使用。', 'Content': "本语料库为从[Naver词典](https://dict.naver.com)提取的韩语-英语配对对话平行文本。该数据集包含了自2017年12月4日至2020年8月19日Naver每日对话的4563对平行文本。以下为文件及其头部信息列表: * conversations.csv * date - '每日对话'的日期 * conversation_id - 标示对话流程的有序编号 * kor_sent - 韩语句子 * eng_sent - 英语翻译 * qna_id - 发送者或接收者的消息或反馈编号 * conversation_titles.csv * date - '每日对话'的日期 * kor_title - '每日对话'的韩语标题 * eng_title - 标题的英语翻译 * grammar - 当日语法 * grammar_desc - 语法描述", 'Acknowledgements': '数据来源于[Naver词典](https://dict.naver.com),对话内容来自延世大学韩国语言学院。'}
提供机构:
www.kaggle.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作