Korean - English Parallel Corpus

Name: Korean - English Parallel Corpus
Creator: www.kaggle.com
Published: 2020-08-19 00:00:00
License: 暂无描述

www.kaggle.com2020-08-19 更新2025-03-24 收录

下载链接：

https://www.kaggle.com/rareloto/naver-dictionary-conversation-of-the-day

下载链接

链接失效反馈

官方服务：

资源简介：

### Context As part of my Korean language learning hobby, I write and type out daily conversations from Naver Conversation of the Day. After getting introduced to data science and machine learning, I wanted to use programming to facilitate my learning process by collecting data and trying out projects. So I scraped data from [Naver Dictionary](https://dict.naver.com) using a Python [script](https://github.com/rareloto/beginnerwebscraping-naverdictionary) to be used later when I train a bilingual AI study buddy chatbot or automate Anki [flashcards](https://quizlet.com/_8lh2dv?x=1jqt&i=224vll). ### Content This is a corpus of Korean - English paired conversations parallel text extracted from [Naver Dictionary](https://dict.naver.com). This dataset consists of 4563 parallel text pairs from December 4, 2017 to August 19, 2020 of Naver's Conversation of the Day. The files and their headers are listed below. * conversations.csv * date - 'Conversation of the Day' date * conversation_id - ordered numbering to indicate conversation flow * kor_sent - Korean sentence * eng_sent - English translation * qna_id - from sender or receiver, message or feedback * conversation_titles.csv * date - 'Conversation of the Day' date * kor_title - 'Conversation of the Day' title in Korean * eng_title - English translation of the title * grammar - grammar of the day * grammar_desc - grammar description ### Acknowledgements The data was collected from [Naver Dictionary](https://dict.naver.com) and the conversations were from the Korean Language Institute of Yonsei University.

{'Context': '作为本人对韩语学习之业余爱好的组成部分，我每日从Naver每日对话中撰写并录入日常对话。自接触数据科学与机器学习之后，我希冀通过编程手段促进学习进程，搜集数据并尝试各类项目。因此，我利用Python脚本（https://github.com/rareloto/beginnerwebscraping-naverdictionary）从[Naver词典](https://dict.naver.com)中抓取数据，以便日后在训练双语AI学习伙伴聊天机器人或自动化Anki[闪卡](https://quizlet.com/_8lh2dv?x=1jqt&i=224vll)时使用。', 'Content': "本语料库为从[Naver词典](https://dict.naver.com)提取的韩语-英语配对对话平行文本。该数据集包含了自2017年12月4日至2020年8月19日Naver每日对话的4563对平行文本。以下为文件及其头部信息列表： * conversations.csv * date - '每日对话'的日期 * conversation_id - 标示对话流程的有序编号 * kor_sent - 韩语句子 * eng_sent - 英语翻译 * qna_id - 发送者或接收者的消息或反馈编号 * conversation_titles.csv * date - '每日对话'的日期 * kor_title - '每日对话'的韩语标题 * eng_title - 标题的英语翻译 * grammar - 当日语法 * grammar_desc - 语法描述", 'Acknowledgements': '数据来源于[Naver词典](https://dict.naver.com)，对话内容来自延世大学韩国语言学院。'}

提供机构：

www.kaggle.com

5,000+

优质数据集

54 个

任务类型

进入经典数据集