gagan3012/areta
收藏Hugging Face2023-06-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gagan3012/areta
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: 'Unnamed: 0'
dtype: int64
- name: raw
sequence: string
- name: corrected
sequence: string
- name: tag
sequence:
class_label:
names:
'0': ''
'1': MG
'2': MG+PC
'3': MG+PM
'4': MI
'5': MI+OA+OD+OH
'6': MI+OA+OH
'7': MI+OA+OR
'8': MI+OA+OR+PM
'9': MI+OD
'10': MI+OD+OH
'11': MI+OD+OH+OM
'12': MI+OD+OM
'13': MI+OD+OM+OR
'14': MI+OD+OM+PM
'15': MI+OD+OR
'16': MI+OD+OT
'17': MI+OD+PM
'18': MI+OH
'19': MI+OH+OM
'20': MI+OH+OM+OR
'21': MI+OH+OM+PM
'22': MI+OH+OT
'23': MI+OH+PM
'24': MI+OM
'25': MI+OM+OR
'26': MI+OM+OT
'27': MI+OM+PM
'28': MI+OR
'29': MI+OR+OT
'30': MI+OR+PM
'31': MI+OT
'32': MI+OT+PM
'33': MI+PM
'34': MI+PT
'35': MI+XM
'36': MT
'37': MT+OA+OR
'38': MT+OD
'39': MT+OD+OR
'40': MT+OD+XG
'41': MT+OD+XN
'42': MT+OH
'43': MT+OH+OM
'44': MT+OM
'45': MT+OR
'46': MT+PM
'47': MT+XG
'48': MT+XN
'49': O
'50': OA
'51': OA+OD+OR
'52': OA+OD+OR+PM
'53': OA+OD+OR+XT
'54': OA+OH
'55': OA+OH+OM
'56': OA+OH+OM+PM
'57': OA+OH+PM
'58': OA+OM+OR
'59': OA+OR
'60': OA+OR+PM
'61': OA+OR+XC
'62': OA+OR+XF
'63': OA+OR+XG
'64': OA+OR+XN
'65': OA+OR+XT
'66': OA+PM
'67': OC
'68': OC+PM
'69': OD
'70': OD+OG
'71': OD+OG+PM
'72': OD+OH
'73': OD+OH+OM
'74': OD+OH+OM+OR
'75': OD+OH+OM+PM
'76': OD+OH+OT
'77': OD+OH+PM
'78': OD+OH+XT
'79': OD+OM
'80': OD+OM+OR
'81': OD+OM+OR+PM
'82': OD+OM+OT
'83': OD+OM+PM
'84': OD+OM+XC
'85': OD+OM+XN
'86': OD+OR
'87': OD+OR+OT
'88': OD+OR+PM
'89': OD+OR+PM+XF
'90': OD+OR+XC
'91': OD+OR+XF
'92': OD+OR+XG
'93': OD+OR+XG+XN
'94': OD+OR+XN
'95': OD+OR+XT
'96': OD+OT
'97': OD+OT+PM
'98': OD+PM
'99': OD+PM+XC
'100': OD+PM+XG
'101': OD+PM+XN
'102': OD+XC
'103': OD+XC+XN
'104': OD+XF
'105': OD+XG
'106': OD+XG+XN
'107': OD+XN
'108': OD+XN+XT
'109': OD+XT
'110': OH
'111': OH+OM
'112': OH+OM+OR
'113': OH+OM+OR+OT
'114': OH+OM+OR+PM
'115': OH+OM+OR+XN
'116': OH+OM+OT
'117': OH+OM+OT+PM
'118': OH+OM+PM
'119': OH+OM+XC
'120': OH+OM+XF
'121': OH+OM+XG
'122': OH+OM+XG+XN
'123': OH+OM+XN
'124': OH+OM+XT
'125': OH+OR
'126': OH+OR+PM
'127': OH+OT
'128': OH+OT+PM
'129': OH+OT+XN
'130': OH+PM
'131': OH+PM+XC
'132': OH+PM+XG
'133': OH+PM+XN
'134': OH+PM+XT
'135': OH+PT
'136': OH+XC
'137': OH+XC+XN
'138': OH+XC+XT
'139': OH+XF
'140': OH+XG
'141': OH+XG+XN
'142': OH+XN
'143': OH+XT
'144': OM
'145': OM+ON
'146': OM+OR
'147': OM+OR+OT
'148': OM+OR+PM
'149': OM+OR+XC
'150': OM+OR+XN
'151': OM+OT
'152': OM+OT+PM
'153': OM+PM
'154': OM+XC
'155': OM+XF
'156': OM+XF+XG
'157': OM+XG
'158': OM+XG+XN
'159': OM+XN
'160': OM+XT
'161': 'ON'
'162': OR
'163': OR+OT
'164': OR+OT+PM
'165': OR+PM
'166': OR+PM+XC
'167': OR+PM+XG
'168': OR+PM+XN
'169': OR+PM+XT
'170': OR+XC
'171': OR+XC+XT
'172': OR+XF
'173': OR+XG
'174': OR+XG+XN
'175': OR+XN
'176': OR+XT
'177': OT
'178': OT+PC
'179': OT+PM
'180': OT+PM+XF
'181': OT+XF
'182': OT+XN
'183': OT+XT
'184': OW
'185': OW+PM
'186': PC
'187': PC+XM
'188': PC+XT
'189': PM
'190': PM+SF
'191': PM+SW
'192': PM+UNK
'193': PM+XC
'194': PM+XC+XG
'195': PM+XC+XN
'196': PM+XF
'197': PM+XG
'198': PM+XG+XN
'199': PM+XM
'200': PM+XM+XN
'201': PM+XN
'202': PM+XT
'203': PT
'204': PT+UNK
'205': PT+XM
'206': PT+XT
'207': S
'208': SF
'209': SW
'210': UC
'211': UNK
'212': X
'213': XC
'214': XC+XF
'215': XC+XG
'216': XC+XM
'217': XC+XN
'218': XC+XT
'219': XF
'220': XF+XG
'221': XF+XM
'222': XF+XN
'223': XF+XT
'224': XG
'225': XG+XM
'226': XG+XN
'227': XG+XT
'228': XM
'229': XM+XN
'230': XN
'231': XN+XT
'232': XT
'233': unk
splits:
- name: train
num_bytes: 34643782
num_examples: 19406
- name: validation
num_bytes: 1823164
num_examples: 1017
- name: test
num_bytes: 1734797
num_examples: 968
download_size: 11272127
dataset_size: 38201743
---
# Dataset Card for "areta"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
gagan3012
原始信息汇总
数据集概述
特征信息
- Unnamed: 0: 数据类型为
int64。 - raw: 序列类型为
string。 - corrected: 序列类型为
string。 - tag: 序列类型,包含多个类别标签,具体标签如下:
- 0:
- 1: MG
- 2: MG+PC
- 3: MG+PM
- 4: MI
- 5: MI+OA+OD+OH
- 6: MI+OA+OH
- 7: MI+OA+OR
- 8: MI+OA+OR+PM
- 9: MI+OD
- 10: MI+OD+OH
- 11: MI+OD+OH+OM
- 12: MI+OD+OM
- 13: MI+OD+OM+OR
- 14: MI+OD+OM+PM
- 15: MI+OD+OR
- 16: MI+OD+OT
- 17: MI+OD+PM
- 18: MI+OH
- 19: MI+OH+OM
- 20: MI+OH+OM+OR
- 21: MI+OH+OM+PM
- 22: MI+OH+OT
- 23: MI+OH+PM
- 24: MI+OM
- 25: MI+OM+OR
- 26: MI+OM+OT
- 27: MI+OM+PM
- 28: MI+OR
- 29: MI+OR+OT
- 30: MI+OR+PM
- 31: MI+OT
- 32: MI+OT+PM
- 33: MI+PM
- 34: MI+PT
- 35: MI+XM
- 36: MT
- 37: MT+OA+OR
- 38: MT+OD
- 39: MT+OD+OR
- 40: MT+OD+XG
- 41: MT+OD+XN
- 42: MT+OH
- 43: MT+OH+OM
- 44: MT+OM
- 45: MT+OR
- 46: MT+PM
- 47: MT+XG
- 48: MT+XN
- 49: O
- 50: OA
- 51: OA+OD+OR
- 52: OA+OD+OR+PM
- 53: OA+OD+OR+XT
- 54: OA+OH
- 55: OA+OH+OM
- 56: OA+OH+OM+PM
- 57: OA+OH+PM
- 58: OA+OM+OR
- 59: OA+OR
- 60: OA+OR+PM
- 61: OA+OR+XC
- 62: OA+OR+XF
- 63: OA+OR+XG
- 64: OA+OR+XN
- 65: OA+OR+XT
- 66: OA+PM
- 67: OC
- 68: OC+PM
- 69: OD
- 70: OD+OG
- 71: OD+OG+PM
- 72: OD+OH
- 73: OD+OH+OM
- 74: OD+OH+OM+OR
- 75: OD+OH+OM+PM
- 76: OD+OH+OT
- 77: OD+OH+PM
- 78: OD+OH+XT
- 79: OD+OM
- 80: OD+OM+OR
- 81: OD+OM+OR+PM
- 82: OD+OM+OT
- 83: OD+OM+PM
- 84: OD+OM+XC
- 85: OD+OM+XN
- 86: OD+OR
- 87: OD+OR+OT
- 88: OD+OR+PM
- 89: OD+OR+PM+XF
- 90: OD+OR+XC
- 91: OD+OR+XF
- 92: OD+OR+XG
- 93: OD+OR+XG+XN
- 94: OD+OR+XN
- 95: OD+OR+XT
- 96: OD+OT
- 97: OD+OT+PM
- 98: OD+PM
- 99: OD+PM+XC
- 100: OD+PM+XG
- 101: OD+PM+XN
- 102: OD+XC
- 103: OD+XC+XN
- 104: OD+XF
- 105: OD+XG
- 106: OD+XG+XN
- 107: OD+XN
- 108: OD+XN+XT
- 109: OD+XT
- 110: OH
- 111: OH+OM
- 112: OH+OM+OR
- 113: OH+OM+OR+OT
- 114: OH+OM+OR+PM
- 115: OH+OM+OR+XN
- 116: OH+OM+OT
- 117: OH+OM+OT+PM
- 118: OH+OM+PM
- 119: OH+OM+XC
- 120: OH+OM+XF
- 121: OH+OM+XG
- 122: OH+OM+XG+XN
- 123: OH+OM+XN
- 124: OH+OM+XT
- 125: OH+OR
- 126: OH+OR+PM
- 127: OH+OT
- 128: OH+OT+PM
- 129: OH+OT+XN
- 130: OH+PM
- 131: OH+PM+XC
- 132: OH+PM+XG
- 133: OH+PM+XN
- 134: OH+PM+XT
- 135: OH+PT
- 136: OH+XC
- 137: OH+XC+XN
- 138: OH+XC+XT
- 139: OH+XF
- 140: OH+XG
- 141: OH+XG+XN
- 142: OH+XN
- 143: OH+XT
- 144: OM
- 145: OM+ON
- 146: OM+OR
- 147: OM+OR+OT
- 148: OM+OR+PM
- 149: OM+OR+XC
- 150: OM+OR+XN
- 151: OM+OT
- 152: OM+OT+PM
- 153: OM+PM
- 154: OM+XC
- 155: OM+XF
- 156: OM+XF+XG
- 157: OM+XG
- 158: OM+XG+XN
- 159: OM+XN
- 160: OM+XT
- 161: ON
- 162: OR
- 163: OR+OT
- 164: OR+OT+PM
- 165: OR+PM
- 166: OR+PM+XC
- 167: OR+PM+XG
- 168: OR+PM+XN
- 169: OR+PM+XT
- 170: OR+XC
- 171: OR+XC+XT
- 172: OR+XF
- 173: OR+XG
- 174: OR+XG+XN
- 175: OR+XN
- 176: OR+XT
- 177: OT
- 178: OT+PC
- 179: OT+PM
- 180: OT+PM+XF
- 181: OT+XF
- 182: OT+XN
- 183: OT+XT
- 184: OW
- 185: OW+PM
- 186: PC
- 187: PC+XM
- 188: PC+XT
- 189: PM
- 190: PM+SF
- 191: PM+SW
- 192: PM+UNK
- 193: PM+XC
- 194: PM+XC+XG
- 195: PM+XC+XN
- 196: PM+XF
- 197: PM+XG
- 198: PM+XG+XN
- 199: PM+XM
- 200: PM+XM+XN
- 201: PM+XN
- 202: PM+XT
- 203: PT
- 204: PT+UNK
- 205: PT+XM
- 206: PT+XT
- 207: S
- 208: SF
- 209: SW
- 210: UC
- 211: UNK
- 212: X
- 213: XC
- 214: XC+XF
- 215: XC+XG
- 216: XC+XM
- 217: XC+XN
- 218: XC+XT
- 219: XF
- 220: XF+XG
- 221: XF+XM
- 222: XF+XN
- 223: XF+XT
- 224: XG
- 225: XG+XM
- 226: XG+XN
- 227: XG+XT
- 228: XM
- 229: XM+XN
- 230: XN
- 231: XN+XT
- 232: XT
- 233: unk
数据分割
- train: 包含 19406 个样本,总字节数为 34643782。
- validation: 包含 1017 个样本,总字节数为 1823164。
- test: 包含 968 个样本,总字节数为 1734797。
数据集大小
- 下载大小: 11272127 字节
- 数据集大小: 38201743 字节



