five

dumitrescustefan/ro_sent

收藏
Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/dumitrescustefan/ro_sent
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - ro license: - unknown multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-classification task_ids: - sentiment-classification pretty_name: RoSent dataset_info: features: - name: original_id dtype: string - name: id dtype: string - name: sentence dtype: string - name: label dtype: class_label: names: '0': negative '1': positive splits: - name: train num_bytes: 8367687 num_examples: 17941 - name: test num_bytes: 6837430 num_examples: 11005 download_size: 14700057 dataset_size: 15205117 --- # Dataset Card for RoSent ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [GitHub](https://github.com/dumitrescustefan/Romanian-Transformers/tree/examples/examples/sentiment_analysis) - **Repository:** [GitHub](https://github.com/dumitrescustefan/Romanian-Transformers/tree/examples/examples/sentiment_analysis) - **Paper:** [arXiv preprint](https://arxiv.org/pdf/2009.08712.pdf) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary This dataset is a Romanian Sentiment Analysis dataset. It is present in a processed form, as used by the authors of [`Romanian Transformers`](https://github.com/dumitrescustefan/Romanian-Transformers) in their examples and based on the original data present in at [this GitHub repository](https://github.com/katakonst/sentiment-analysis-tensorflow). The original data contains product and movie reviews in Romanian. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages This dataset is present in Romanian language. ## Dataset Structure ### Data Instances An instance from the `train` split: ``` {'id': '0', 'label': 1, 'original_id': '0', 'sentence': 'acest document mi-a deschis cu adevarat ochii la ceea ce oamenii din afara statelor unite s-au gandit la atacurile din 11 septembrie. acest film a fost construit in mod expert si prezinta acest dezastru ca fiind mai mult decat un atac asupra pamantului american. urmarile acestui dezastru sunt previzionate din multe tari si perspective diferite. cred ca acest film ar trebui sa fie mai bine distribuit pentru acest punct. de asemenea, el ajuta in procesul de vindecare sa vada in cele din urma altceva decat stirile despre atacurile teroriste. si unele dintre piese sunt de fapt amuzante, dar nu abuziv asa. acest film a fost extrem de recomandat pentru mine, si am trecut pe acelasi sentiment.'} ``` ### Data Fields - `original_id`: a `string` feature containing the original id from the file. - `id`: a `string` feature . - `sentence`: a `string` feature. - `label`: a classification label, with possible values including `negative` (0), `positive` (1). ### Data Splits This dataset has two splits: `train` with 17941 examples, and `test` with 11005 examples. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization The source dataset is present at the [this GitHub repository](https://github.com/katakonst/sentiment-analysis-tensorflow) and is based on product and movie reviews. The original source is unknown. #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators Stefan Daniel Dumitrescu, Andrei-Marious Avram, Sampo Pyysalo, [@katakonst](https://github.com/katakonst) ### Licensing Information [More Information Needed] ### Citation Information ``` @article{dumitrescu2020birth, title={The birth of Romanian BERT}, author={Dumitrescu, Stefan Daniel and Avram, Andrei-Marius and Pyysalo, Sampo}, journal={arXiv preprint arXiv:2009.08712}, year={2020} } ``` ### Contributions Thanks to [@gchhablani](https://github.com/gchhablani) and [@iliemihai](https://github.com/iliemihai) for adding this dataset.
提供机构:
dumitrescustefan
原始信息汇总

数据集概述

名称: RoSent

语言: 罗马尼亚语

许可证: 未知

多语言性: 单语种

大小分类: 10K<n<100K

源数据集: 原始数据

任务类别: 文本分类

任务ID: 情感分类

数据集结构

数据实例

  • 字段:
    • original_id:字符串类型,原始文件中的ID。
    • id:字符串类型。
    • sentence:字符串类型,句子内容。
    • label:分类标签,值包括negative(0)和positive(1)。

数据分片

  • 训练集: 17941个实例
  • 测试集: 11005个实例

数据集创建

源数据

数据注释

  • 创建者: 未详细说明

数据集使用考虑

许可证信息

  • 状态: 未知

引用信息

@article{dumitrescu2020birth, title={The birth of Romanian BERT}, author={Dumitrescu, Stefan Daniel and Avram, Andrei-Marius and Pyysalo, Sampo}, journal={arXiv preprint arXiv:2009.08712}, year={2020} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作