Mongabay conservation news dataset
收藏arXiv2023-10-24 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/Datasaur/mongabay-experiment
下载链接
链接失效反馈官方服务:
资源简介:
Mongabay conservation news dataset是由Datasaur.ai利用弱监督方法从印度尼西亚的Mongabay新闻文本中构建的NLP数据集。该数据集包含4896条记录,分为多标签分类和情感分类两种类型。数据集的创建过程涉及使用简单的标注函数,并通过各种预训练语言模型进行实验。该数据集主要用于解决印度尼西亚语资源不足的问题,特别是在自然语言处理领域,旨在通过大规模数据集推动NLP技术的发展。
The Mongabay conservation news dataset is an NLP dataset constructed by Datasaur.ai using weak supervision methods from Indonesian Mongabay news texts. This dataset contains 4,896 records, and covers two task types: multi-label classification and sentiment classification. The construction process of the dataset involves utilizing simple labeling functions and conducting experiments with various pre-trained language models. Primarily developed to address the scarcity of Indonesian language resources, particularly in the natural language processing domain, this dataset aims to promote the advancement of NLP technologies through large-scale datasets.
提供机构:
Datasaur.ai
创建时间:
2023-10-17



