Informal-Formal Persian Corpus
收藏arXiv2023-08-10 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2308.05336v1
下载链接
链接失效反馈官方服务:
资源简介:
Informal-Formal Persian Corpus是由伊朗计算机科学与工程学院开发的一个平行语料库,包含50,014对波斯语的非正式与正式句子对。该数据集从社交媒体、书籍、电影等多种资源中收集,旨在覆盖波斯语非正式与正式书写之间的词汇和句法差异。创建过程中,研究团队采用了自动和人工相结合的方法来确保数据的质量和多样性。该数据集主要用于开发自动转换非正式波斯语到正式波斯语的工具,以及帮助语言学家研究波斯语的非正式语法和正字法。
The Informal-Formal Persian Corpus is a parallel corpus developed by the School of Computer Science and Engineering, Iran. It contains 50,014 paired informal and formal Persian sentences. Collected from diverse sources including social media, books, films and other resources, this corpus is designed to capture the lexical and syntactic discrepancies between informal and formal written Persian. During the corpus construction process, the research team employed a hybrid automatic-manual methodology to ensure both data quality and diversity. This corpus primarily supports two core applications: developing automatic systems for converting informal Persian into formal Persian, and aiding linguists in conducting research on informal Persian grammar and orthography.
提供机构:
伊朗计算机科学与工程学院
创建时间:
2023-08-10



