Mouwiya/imdb_reviews_with_labels

Name: Mouwiya/imdb_reviews_with_labels
Creator: Mouwiya
Published: 2024-04-26 15:49:17
License: 暂无描述

Hugging Face2024-04-26 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/Mouwiya/imdb_reviews_with_labels

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: text dtype: string - name: label dtype: int64 - name: predicted_sentiment_facebook/bart-large-mnli dtype: string - name: predicted_sentiment_distilbert-base-uncased dtype: string - name: predicted_sentiment_roberta-base dtype: string splits: - name: train num_bytes: 1361555 num_examples: 1000 download_size: 862047 dataset_size: 1361555 configs: - config_name: default data_files: - split: train path: data/train-* license: odbl task_categories: - text-classification language: - en size_categories: - n<1K --- ### Dataset Description  In this Task , we conducted one-shot sentiment analysis on a subset of the IMDb movie reviews dataset using multiple language models. The goal was to predict the sentiment (positive or negative) of movie reviews without fine-tuning the models on the specific task. We utilized three different pre-trained language models for zero-shot classification: BART-large, DistilBERT-base, and RoBERTa-base. For each model, we generated predicted sentiment labels for a subset of 100 movie reviews from the IMDb dataset. The reviews were randomly sampled, ensuring a diverse representation of sentiments. After processing the reviews through each model, we saved the predicted sentiment labels alongside the original reviews in a CSV file named "imdb_reviews_with_labels.csv". This file contains the reviews and the predicted sentiment labels for each model. Additionally, we uploaded both the dataset and the CSV file to the Hugging Face Hub for easy access and sharing. The dataset can be found at the following link after uploading: https://huggingface.co/datasets/Mouwiya/imdb_reviews_with_labels This task demonstrates the effectiveness of zero-shot classification using pre-trained language models for sentiment analysis tasks and provides a valuable resource for further analysis and experimentation. - **Curated by:** [Mouwiya S. A. AlQaisieh]

提供机构：

Mouwiya

原始信息汇总

数据集概述

数据集信息

特征（Features）:
- text: 数据类型为字符串（string），代表电影评论文本。
- label: 数据类型为整数（int64），代表电影评论的标签。
- predicted_sentiment_facebook/bart-large-mnli: 数据类型为字符串（string），代表使用BART-large模型预测的情感。
- predicted_sentiment_distilbert-base-uncased: 数据类型为字符串（string），代表使用DistilBERT-base模型预测的情感。
- predicted_sentiment_roberta-base: 数据类型为字符串（string），代表使用RoBERTa-base模型预测的情感。
分割（Splits）:
- train: 训练集，包含1000个样本，总大小为1361555字节。
下载大小（Download Size）: 862047字节。
数据集大小（Dataset Size）: 1361555字节。
配置（Configs）:
- default: 训练数据文件路径为data/train-*。
许可证（License）: 开放数据库许可证（ODbL）。
任务类别（Task Categories）: 文本分类（text-classification）。
语言（Language）: 英语（en）。
大小类别（Size Categories）: 小于1K（n<1K）。

数据集描述

本数据集用于情感分析任务，通过多个预训练语言模型对IMDb电影评论子集进行一次性情感预测。使用BART-large、DistilBERT-base和RoBERTa-base三种模型进行零样本分类，为每种模型生成了100个电影评论的预测情感标签。数据集包含原始评论和各模型预测的情感标签，文件名为imdb_reviews_with_labels.csv。

5,000+

优质数据集

54 个

任务类型

进入经典数据集