110k Dutch Book Reviews Dataset (110kDBRD)
收藏arXiv2019-10-02 更新2024-06-21 收录
下载链接:
https://benjaminvdb.github.io/ 110kDBRD/
下载链接
链接失效反馈官方服务:
资源简介:
110k Dutch Book Reviews Dataset (110kDBRD)是由莱顿大学高级计算机科学研究所创建的一个包含110,000条荷兰语书籍评论的数据集。该数据集受到Large Movie Review Dataset的启发,旨在作为荷兰语情感分类的基准。数据集内容包括书籍评论及其相关的二元情感极性标签,数据来源于Hebban网站,通过网络爬虫收集。创建过程中,评论文本和评分(1至5分)被转换为分类标签(1和2表示负面,3表示中性,4和5表示正面)。该数据集主要应用于情感分类任务,特别是在处理有限标记数据时,通过预训练的语言模型进行微调,以提高分类模型的性能。
The 110k Dutch Book Reviews Dataset (110kDBRD) was developed by the Advanced Computer Science Institute of Leiden University, which contains 110,000 Dutch book reviews. Inspired by the Large Movie Review Dataset, this dataset is designed as a benchmark for Dutch sentiment classification. The dataset consists of book reviews and their associated binary sentiment polarity labels, and was collected via web crawling from the Hebban website. During its curation, the review texts and their star ratings ranging from 1 to 5 were converted into classification labels: ratings 1 and 2 represent negative sentiment, 3 represents neutral sentiment, and ratings 4 and 5 represent positive sentiment. This dataset is primarily applied to sentiment classification tasks, especially to improve the performance of classification models via fine-tuning pre-trained language models when handling limited labeled data.
提供机构:
莱顿大学高级计算机科学研究所
创建时间:
2019-10-02



