ThatiAR

Name: ThatiAR
Creator: maas
Published: 2025-08-29 16:35:54
License: 暂无描述

魔搭社区2025-08-29 更新2025-06-21 收录

下载链接：

https://modelscope.cn/datasets/QCRI/ThatiAR

下载链接

链接失效反馈

官方服务：

资源简介：

# ThatiAR: Subjectivity Detection in Arabic News Sentences Along with the paper, we release the dataset and other experimental resources. Please find the attached directory structure below. ### Files Description - **data/** - `subjectivity_2024_dev.tsv`: Development set for subjectivity detection in Arabic news sentences. - `subjectivity_2024_test.tsv`: Test set for subjectivity detection in Arabic news sentences. - `subjectivity_2024_train.tsv`: Training set for subjectivity detection in Arabic news sentences. - **instruction_explanation_dataset/** - `subjectivity_2024_instruct_dev.json`: Development set with instruction explanations. - `subjectivity_2024_instruct_test.json`: Test set with instruction explanations. - `subjectivity_2024_instruct_train.json`: Training set with instruction explanations. - `licenses_by-nc-sa_4.0_legalcode.txt`: License information for the dataset, under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. - `README.md`: This readme file containing information about the dataset and its structure. ## License This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can view the full license in the `licenses_by-nc-sa_4.0_legalcode.txt` file. ## Usage To use this dataset, you can load the TSV or JSONL files into your data processing pipeline. ### Example (Python) ```python import pandas as pd import json def load_tsv(file_path): return pd.read_csv(file_path, sep='\t') def load_json(file_path): with open(file_path, 'r', encoding='utf-8') as file: data = json.load(file) # Use json.load() for reading standard JSON files return data # Load training data train_data_tsv = load_tsv('data/subjectivity_2024_train.tsv') train_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_train.json') # Load development data dev_data_tsv = load_tsv('data/subjectivity_2024_dev.tsv') dev_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_dev.json') # Load test data test_data_tsv = load_tsv('data/subjectivity_2024_test.tsv') test_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_test.json') ``` ### Data splits We split the dataset in a stratified manner, allocating 70\%, 10\%, and 20\% for training, development, and testing, respectively. ## Citation ``` @article{ThatiAR2024, title = {{ThatiAR}: Subjectivity Detection in Arabic News Sentences}, author = {Suwaileh, Reem and Hasanain, Maram and Hubail, Fatema and Zaghouani, Wajdi and Alam, Firoj}, year = {2024}, journal = {arXiv: 2406.05559}, } ```

# ThatiAR：阿拉伯语新闻句子主观性检测数据集本工作随学术论文同步发布了本数据集及相关实验资源。下文为附带的目录结构说明。 ## 文件说明 - **data/** - `subjectivity_2024_dev.tsv`：阿拉伯语新闻句子主观性检测任务开发集 - `subjectivity_2024_test.tsv`：阿拉伯语新闻句子主观性检测任务测试集 - `subjectivity_2024_train.tsv`：阿拉伯语新闻句子主观性检测任务训练集 - **instruction_explanation_dataset/** - `subjectivity_2024_instruct_dev.json`：附带指令说明的开发集 - `subjectivity_2024_instruct_test.json`：附带指令说明的测试集 - `subjectivity_2024_instruct_train.json`：附带指令说明的训练集 - `licenses_by-nc-sa_4.0_legalcode.txt`：本数据集的授权协议文件，采用知识共享署名-非商业性使用-相同方式共享4.0国际许可协议（Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License） - `README.md`：本说明文件，包含数据集及其结构的相关信息 ## 授权协议本数据集采用知识共享署名-非商业性使用-相同方式共享4.0国际许可协议（Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License）进行授权。完整协议内容可在`licenses_by-nc-sa_4.0_legalcode.txt`文件中查看。 ## 使用方式您可将数据集的TSV或JSON格式文件加载至数据处理流程中使用。 ### Python示例代码 python import pandas as pd import json def load_tsv(file_path): return pd.read_csv(file_path, sep=' ') def load_json(file_path): with open(file_path, 'r', encoding='utf-8') as file: data = json.load(file) # 读取标准JSON文件时使用json.load()方法 return data # 加载训练集 train_data_tsv = load_tsv('data/subjectivity_2024_train.tsv') train_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_train.json') # 加载开发集 dev_data_tsv = load_tsv('data/subjectivity_2024_dev.tsv') dev_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_dev.json') # 加载测试集 test_data_tsv = load_tsv('data/subjectivity_2024_test.tsv') test_data_jsonl = load_json('instruction_explanation_dataset/subjectivity_2024_instruct_test.json') ## 数据划分本数据集采用分层划分策略，分别将70%、10%和20%的数据划分为训练集、开发集与测试集。 ## 引用方式 @article{ThatiAR2024, title = {{ThatiAR}: 阿拉伯语新闻句子主观性检测}, author = {Suwaileh, Reem and Hasanain, Maram and Hubail, Fatema and Zaghouani, Wajdi and Alam, Firoj}, year = {2024}, journal = {arXiv: 2406.05559}, }

提供机构：

maas

创建时间：

2025-06-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集