five

GLUECoS

收藏
arXiv2020-05-14 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2004.12376v2
下载链接
链接失效反馈
官方服务:
资源简介:
GLUECoS是一个专为代码混合语言设计的评估基准,涵盖了英语-印地语和英语-西班牙语的多种NLP任务。该数据集由微软研究院创建,包含六个任务:语言识别、词性标注、命名实体识别、情感分析、问答和自然语言推理。数据集大小和数据量未详细说明,但强调了其多样性和复杂性,旨在解决代码混合语言处理中的挑战。创建过程中,研究者使用了多种技术和人工生成的代码混合数据。GLUECoS的应用领域广泛,特别是在多语言社会中处理和理解混合语言文本和语音方面。

GLUECoS is an evaluation benchmark purpose-built for code-mixed languages, covering a variety of NLP tasks across English-Hindi and English-Spanish code-mixed settings. Developed by Microsoft Research, this dataset comprises six core tasks: language identification, part-of-speech tagging, named entity recognition, sentiment analysis, question answering, and natural language inference. While the exact size and volume of the dataset have not been specified, its diversity and complexity are highlighted, as it is designed to tackle the challenges inherent in code-mixed language processing. During the dataset construction process, researchers utilized multiple techniques along with manually generated code-mixed data. GLUECoS boasts broad application prospects, particularly in the processing and comprehension of mixed-language text and speech within multilingual societies.
提供机构:
微软研究院,班加罗尔,印度
创建时间:
2020-04-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作