mmaguero/gua-spa-2023-task-1-2
收藏Hugging Face2025-11-07 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/mmaguero/gua-spa-2023-task-1-2
下载链接
链接失效反馈官方服务:
资源简介:
GUA-SPA数据集是用于瓜拉尼语和西班牙语代码转换检测与分析的第一个共享任务的数据集。该数据集包含从新闻文章和推文中提取的1500篇文本,约25万个标记,并为三个任务进行了注释:识别标记的语言、命名实体识别以及新颖的任务,即分类西班牙语片段在代码转换上下文中的使用方式。
The GUA-SPA dataset is a dataset for the detection and analysis of code-switching between Guarani and Spanish. It contains 1500 texts extracted from news articles and tweets, approximately 250,000 tokens, annotated for three tasks: identifying the language of a token, Named Entity Recognition (NER), and a novel task of classifying how a Spanish span is used in the code-switched context.
提供机构:
mmaguero



