five

Catalonia Independence Corpus (CIC)

收藏
arXiv2020-04-01 更新2024-06-21 收录
下载链接:
https://github.com/ixa-ehu/catalonia-independence-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
Catalonia Independence Corpus (CIC) 是一个专为检测加泰罗尼亚独立问题立场而设计的多语言数据集,由巴斯克地区大学HiTZ中心IXA组创建。该数据集包含10,048条推文,分别用加泰罗尼亚语和西班牙语标注,旨在促进多语言和跨语言环境下的立场检测研究。数据集的创建过程涉及用户分类和基于推特用户行为的半自动标注方法,有效减轻了手动标注的工作量。CIC数据集的应用领域主要集中在社交媒体分析,特别是在政治立场和舆论分析方面,旨在解决多语言环境下的立场检测问题。

The Catalonia Independence Corpus (CIC) is a multilingual dataset specifically developed for stance detection related to the Catalan independence issue, created by the IXA Group at the HiTZ Center of the University of the Basque Country. This dataset includes 10,048 tweets in Catalan and Spanish respectively, with all samples annotated for stance, aiming to promote stance detection research in multilingual and cross-lingual settings. The development of the CIC dataset involves user classification and a semi-automatic annotation approach based on Twitter users' behavioral data, which effectively reduces the workload of manual annotation. The primary application scenarios of the CIC dataset lie in social media analysis, particularly political stance and public opinion analysis, with the core goal of addressing stance detection challenges in multilingual environments.
提供机构:
巴斯克地区大学HiTZ中心IXA组
创建时间:
2020-04-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作