LinCE
收藏arXiv2020-05-09 更新2024-07-25 收录
下载链接:
https://ritual.uh.edu/lince/
下载链接
链接失效反馈官方服务:
资源简介:
LinCE是由休斯顿大学计算机科学系创建的数据集,专注于语言代码转换(CS)评估。该数据集整合了十个涵盖西班牙语-英语、尼泊尔语-英语、印地语-英语和现代标准阿拉伯语-埃及阿拉伯语四个不同代码转换语言对的语料库。数据集内容丰富,包括语言识别、命名实体识别、词性标注和情感分析等任务。创建过程中,研究人员对原始数据集进行了细致的检查和分割,确保数据集的质量和适用性。LinCE的应用领域广泛,旨在解决语言处理中的代码转换问题,推动相关技术的发展。
LinCE is a dataset created by the Department of Computer Science at the University of Houston, focused on the evaluation of language code-switching (CS). This dataset integrates ten corpora covering four distinct code-switching language pairs: Spanish-English, Nepali-English, Hindi-English, and Modern Standard Arabic-Egyptian Arabic. The dataset encompasses diverse tasks including language identification, named entity recognition (NER), part-of-speech tagging, and sentiment analysis. During its development, researchers conducted meticulous inspection and partitioning of the original datasets to ensure its quality and applicability. LinCE has broad application scenarios, aiming to address code-switching issues in language processing and promote the development of related technologies.
提供机构:
休斯顿大学计算机科学系
创建时间:
2020-05-09



