TuPy-E
收藏arXiv2023-12-30 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
TuPy-E是针对巴西葡萄牙语社交媒体中仇恨言论检测的最大注释语料库。该数据集由阿尔贝托·卢兹·科伊巴研究所研究生与工程研究和里约热内卢联邦大学创建,旨在通过开放源代码方法促进研究社区内的协作。数据集包含43,668条数据,涵盖了多种仇恨言论类别,如性别歧视、种族主义等。创建过程中,团队采用了BERT模型等先进技术进行详细分析,旨在为学术理解和实际应用提供支持,特别是在模型训练和评估方面。此外,TuPy-E还致力于开发更强大和文化敏感的工具,为更安全和包容的在线环境做出贡献。
TuPy-E is the largest annotated corpus for hate speech detection in Brazilian Portuguese social media. This dataset was developed by the Graduate Engineering Research Team of the Alberto Luz Coimbra Institute and the Federal University of Rio de Janeiro, aiming to promote collaboration within the research community via open-source methodologies. It encompasses 43,668 annotated instances spanning a variety of hate speech categories, including sexism, racism, and others. During the dataset construction process, the research team utilized advanced technologies including the BERT model to conduct in-depth analyses, with the objective of supporting both academic comprehension and practical applications, particularly in the domains of model training and evaluation. Furthermore, TuPy-E is dedicated to developing more robust and culturally sensitive tools to contribute to a safer and more inclusive online environment.
提供机构:
阿尔贝托·卢兹·科伊巴研究所研究生与工程研究
创建时间:
2023-12-30



