five

iahlt/arabic_ner_mafat_folds

收藏
Hugging Face2024-01-13 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/iahlt/arabic_ner_mafat_folds
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: fold_1 features: - name: id dtype: string - name: tokens sequence: string - name: raw_tags sequence: string - name: ner_tags sequence: class_label: names: '0': U-ANG '1': L-ANG '2': B-ANG '3': I-ANG '4': U-DUC '5': I-DUC '6': L-DUC '7': B-DUC '8': I-EVE '9': L-EVE '10': B-EVE '11': U-EVE '12': L-FAC '13': U-FAC '14': I-FAC '15': B-FAC '16': L-GPE '17': B-GPE '18': I-GPE '19': U-GPE '20': U-INFORMAL '21': I-INFORMAL '22': L-INFORMAL '23': B-INFORMAL '24': U-LOC '25': I-LOC '26': L-LOC '27': B-LOC '28': I-MISC '29': U-MISC '30': B-MISC '31': L-MISC '32': O '33': I-ORG '34': L-ORG '35': U-ORG '36': B-ORG '37': L-PER '38': B-PER '39': I-PER '40': U-PER '41': I-TIMEX '42': L-TIMEX '43': U-TIMEX '44': B-TIMEX '45': U-TTL '46': L-TTL '47': B-TTL '48': I-TTL '49': B-WOA '50': L-WOA '51': U-WOA '52': I-WOA - name: record dtype: string splits: - name: train num_bytes: 87741254 num_examples: 30000 - name: validation num_bytes: 28643406 num_examples: 10000 - name: test num_bytes: 28643406 num_examples: 10000 download_size: 45076618 dataset_size: 145028066 - config_name: fold_2 features: - name: id dtype: string - name: tokens sequence: string - name: raw_tags sequence: string - name: ner_tags sequence: class_label: names: '0': U-ANG '1': L-ANG '2': B-ANG '3': I-ANG '4': U-DUC '5': I-DUC '6': L-DUC '7': B-DUC '8': I-EVE '9': L-EVE '10': B-EVE '11': U-EVE '12': L-FAC '13': U-FAC '14': I-FAC '15': B-FAC '16': L-GPE '17': B-GPE '18': I-GPE '19': U-GPE '20': U-INFORMAL '21': I-INFORMAL '22': L-INFORMAL '23': B-INFORMAL '24': U-LOC '25': I-LOC '26': L-LOC '27': B-LOC '28': I-MISC '29': U-MISC '30': B-MISC '31': L-MISC '32': O '33': I-ORG '34': L-ORG '35': U-ORG '36': B-ORG '37': L-PER '38': B-PER '39': I-PER '40': U-PER '41': I-TIMEX '42': L-TIMEX '43': U-TIMEX '44': B-TIMEX '45': U-TTL '46': L-TTL '47': B-TTL '48': I-TTL '49': B-WOA '50': L-WOA '51': U-WOA '52': I-WOA - name: record dtype: string splits: - name: train num_bytes: 86867948 num_examples: 30000 - name: validation num_bytes: 29516712 num_examples: 10000 - name: test num_bytes: 29516712 num_examples: 10000 download_size: 45337784 dataset_size: 145901372 - config_name: fold_3 features: - name: id dtype: string - name: tokens sequence: string - name: raw_tags sequence: string - name: ner_tags sequence: class_label: names: '0': U-ANG '1': L-ANG '2': B-ANG '3': I-ANG '4': U-DUC '5': I-DUC '6': L-DUC '7': B-DUC '8': I-EVE '9': L-EVE '10': B-EVE '11': U-EVE '12': L-FAC '13': U-FAC '14': I-FAC '15': B-FAC '16': L-GPE '17': B-GPE '18': I-GPE '19': U-GPE '20': U-INFORMAL '21': I-INFORMAL '22': L-INFORMAL '23': B-INFORMAL '24': U-LOC '25': I-LOC '26': L-LOC '27': B-LOC '28': I-MISC '29': U-MISC '30': B-MISC '31': L-MISC '32': O '33': I-ORG '34': L-ORG '35': U-ORG '36': B-ORG '37': L-PER '38': B-PER '39': I-PER '40': U-PER '41': I-TIMEX '42': L-TIMEX '43': U-TIMEX '44': B-TIMEX '45': U-TTL '46': L-TTL '47': B-TTL '48': I-TTL '49': B-WOA '50': L-WOA '51': U-WOA '52': I-WOA - name: record dtype: string splits: - name: train num_bytes: 87175881 num_examples: 30000 - name: validation num_bytes: 29208779 num_examples: 10000 - name: test num_bytes: 29208779 num_examples: 10000 download_size: 45201250 dataset_size: 145593439 - config_name: fold_4 features: - name: id dtype: string - name: tokens sequence: string - name: raw_tags sequence: string - name: ner_tags sequence: class_label: names: '0': U-ANG '1': L-ANG '2': B-ANG '3': I-ANG '4': U-DUC '5': I-DUC '6': L-DUC '7': B-DUC '8': I-EVE '9': L-EVE '10': B-EVE '11': U-EVE '12': L-FAC '13': U-FAC '14': I-FAC '15': B-FAC '16': L-GPE '17': B-GPE '18': I-GPE '19': U-GPE '20': U-INFORMAL '21': I-INFORMAL '22': L-INFORMAL '23': B-INFORMAL '24': U-LOC '25': I-LOC '26': L-LOC '27': B-LOC '28': I-MISC '29': U-MISC '30': B-MISC '31': L-MISC '32': O '33': I-ORG '34': L-ORG '35': U-ORG '36': B-ORG '37': L-PER '38': B-PER '39': I-PER '40': U-PER '41': I-TIMEX '42': L-TIMEX '43': U-TIMEX '44': B-TIMEX '45': U-TTL '46': L-TTL '47': B-TTL '48': I-TTL '49': B-WOA '50': L-WOA '51': U-WOA '52': I-WOA - name: record dtype: string splits: - name: train num_bytes: 87368897 num_examples: 30000 - name: validation num_bytes: 29015763 num_examples: 10000 - name: test num_bytes: 29015763 num_examples: 10000 download_size: 45120027 dataset_size: 145400423 configs: - config_name: fold_1 data_files: - split: train path: fold_1/train-* - split: validation path: fold_1/validation-* - split: test path: fold_1/test-* - config_name: fold_2 data_files: - split: train path: fold_2/train-* - split: validation path: fold_2/validation-* - split: test path: fold_2/test-* - config_name: fold_3 data_files: - split: train path: fold_3/train-* - split: validation path: fold_3/validation-* - split: test path: fold_3/test-* - config_name: fold_4 data_files: - split: train path: fold_4/train-* - split: validation path: fold_4/validation-* - split: test path: fold_4/test-* ---

数据集信息: - 配置名称:fold_1 特征字段: - 字段名:标识符(id),数据类型:字符串 - 字段名:词元(tokens),类型:字符串序列 - 字段名:原始标签,类型:字符串序列 - 字段名:命名实体识别标签(ner_tags),类型:序列,其类别标签映射如下: 0: U-ANG 1: L-ANG 2: B-ANG 3: I-ANG 4: U-DUC 5: I-DUC 6: L-DUC 7: B-DUC 8: I-EVE 9: L-EVE 10: B-EVE 11: U-EVE 12: L-FAC 13: U-FAC 14: I-FAC 15: B-FAC 16: L-GPE 17: B-GPE 18: I-GPE 19: U-GPE 20: U-INFORMAL 21: I-INFORMAL 22: L-INFORMAL 23: B-INFORMAL 24: U-LOC 25: I-LOC 26: L-LOC 27: B-LOC 28: I-MISC 29: U-MISC 30: B-MISC 31: L-MISC 32: O 33: I-ORG 34: L-ORG 35: U-ORG 36: B-ORG 37: L-PER 38: B-PER 39: I-PER 40: U-PER 41: I-TIMEX 42: L-TIMEX 43: U-TIMEX 44: B-TIMEX 45: U-TTL 46: L-TTL 47: B-TTL 48: I-TTL 49: B-WOA 50: L-WOA 51: U-WOA 52: I-WOA - 字段名:原始记录(record),数据类型:字符串 数据拆分: - 名称:训练集,字节数:87741254,样本数:30000 - 名称:验证集,字节数:28643406,样本数:10000 - 名称:测试集,字节数:28643406,样本数:10000 下载大小:45076618,数据集总大小:145028066 - 配置名称:fold_2 特征字段与fold_1完全一致,数据拆分: - 名称:训练集,字节数:86867948,样本数:30000 - 名称:验证集,字节数:29516712,样本数:10000 - 名称:测试集,字节数:29516712,样本数:10000 下载大小:45337784,数据集总大小:145901372 - 配置名称:fold_3 特征字段与fold_1完全一致,数据拆分: - 名称:训练集,字节数:87175881,样本数:30000 - 名称:验证集,字节数:29208779,样本数:10000 - 名称:测试集,字节数:29208779,样本数:10000 下载大小:45201250,数据集总大小:145593439 - 配置名称:fold_4 特征字段与fold_1完全一致,数据拆分: - 名称:训练集,字节数:87368897,样本数:30000 - 名称:验证集,字节数:29015763,样本数:10000 - 名称:测试集,字节数:29015763,样本数:10000 下载大小:45120027,数据集总大小:145400423 配置项: - 配置名称:fold_1 数据文件: - 划分集:训练集,路径:fold_1/train-* - 划分集:验证集,路径:fold_1/validation-* - 划分集:测试集,路径:fold_1/test-* - 配置名称:fold_2 数据文件: - 划分集:训练集,路径:fold_2/train-* - 划分集:验证集,路径:fold_2/validation-* - 划分集:测试集,路径:fold_2/test-* - 配置名称:fold_3 数据文件: - 划分集:训练集,路径:fold_3/train-* - 划分集:验证集,路径:fold_3/validation-* - 划分集:测试集,路径:fold_3/test-* - 配置名称:fold_4 数据文件: - 划分集:训练集,路径:fold_4/train-* - 划分集:验证集,路径:fold_4/validation-* - 划分集:测试集,路径:fold_4/test-*
提供机构:
iahlt
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作