five

QCRI/ThatiAR

收藏
Hugging Face2024-10-21 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/QCRI/ThatiAR
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 task_categories: - text-classification language: - ar tags: - subjectivity - sentiment pretty_name: 'ThatiAR: Subjectivity Detection in Arabic News Sentences' size_categories: - 10K<n dataset_info: - config_name: ThatiAR splits: - name: train num_examples: 2558 - name: dev num_examples: 373 - name: test num_examples: 742 - config_name: ThatiAR-Instruct splits: - name: train num_examples: 2558 - name: dev num_examples: 373 - name: test num_examples: 742 configs: - config_name: ThatiAR data_files: - split: train path: data/subjectivity_2024_train.tsv - split: dev path: data/subjectivity_2024_dev.tsv - split: test path: data/subjectivity_2024_test.tsv - config_name: ThatiAR-Instruct data_files: - split: train path: instruction_explanation_dataset/subjectivity_2024_instruct_train.json - split: dev path: instruction_explanation_dataset/subjectivity_2024_instruct_dev.json - split: test path: instruction_explanation_dataset/subjectivity_2024_instruct_test.json --- # ThatiAR: Subjectivity Detection in Arabic News Sentences Along with the paper, we release the dataset and other experimental resources. Please find the attached directory structure below. ### Files Description - **data/** - `subjectivity_2024_dev.tsv`: Development set for subjectivity detection in Arabic news sentences. - `subjectivity_2024_test.tsv`: Test set for subjectivity detection in Arabic news sentences. - `subjectivity_2024_train.tsv`: Training set for subjectivity detection in Arabic news sentences. - **instruction_explanation_dataset/** - `subjectivity_2024_instruct_dev.json`: Development set with instruction explanations. - `subjectivity_2024_instruct_test.json`: Test set with instruction explanations. - `subjectivity_2024_instruct_train.json`: Training set with instruction explanations. - `licenses_by-nc-sa_4.0_legalcode.txt`: License information for the dataset, under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. - `README.md`: This readme file containing information about the dataset and its structure. ## License This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can view the full license in the `licenses_by-nc-sa_4.0_legalcode.txt` file. ## Usage To use this dataset, you can load the TSV or JSONL files into your data processing pipeline. ### Example (Python) ```python import pandas as pd import json def load_tsv(file_path): return pd.read_csv(file_path, sep='\t') def load_json(file_path): with open(file_path, 'r', encoding='utf-8') as file: data = json.load(file) # Use json.load() for reading standard JSON files return data # Load training data train_data_tsv = load_tsv('data/subjectivity_2024_train.tsv') train_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_train.json') # Load development data dev_data_tsv = load_tsv('data/subjectivity_2024_dev.tsv') dev_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_dev.json') # Load test data test_data_tsv = load_tsv('data/subjectivity_2024_test.tsv') test_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_test.json') ``` ### Data splits We split the dataset in a stratified manner, allocating 70\%, 10\%, and 20\% for training, development, and testing, respectively. ## Citation ``` @article{ThatiAR2024, title = {{ThatiAR}: Subjectivity Detection in Arabic News Sentences}, author = {Suwaileh, Reem and Hasanain, Maram and Hubail, Fatema and Zaghouani, Wajdi and Alam, Firoj}, year = {2024}, journal = {arXiv: 2406.05559}, } ```
提供机构:
QCRI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作