shreyas-singh/autotrain-data-MedicalTokenClassification
收藏AutoTrain Dataset for project: MedicalTokenClassification
数据集描述
- 语言: 数据集的语言BCP-47代码为unk。
数据集结构
数据实例
数据集中的样本示例如下:
json [ { "feat_id": "13104", "tokens": [ "Jackie", "Frank" ], "feat_pos_tags": [ 21, 21 ], "feat_chunk_tags": [ 5, 16 ], "tags": [ 3, 7 ] }, { "feat_id": "9297", "tokens": [ "U.S.", "lauds", "Russian-Chechen", "deal", "." ], "feat_pos_tags": [ 21, 20, 15, 20, 7 ], "feat_chunk_tags": [ 5, 16, 16, 16, 22 ], "tags": [ 0, 8, 1, 8, 8 ] } ]
数据集字段
数据集包含以下字段:
json { "feat_id": "Value(dtype=string, id=None)", "tokens": "Sequence(feature=Value(dtype=string, id=None), length=-1, id=None)", "feat_pos_tags": "Sequence(feature=ClassLabel(num_classes=47, names=[", #, $, "", (, ), ,, ., :, CC, CD, DT, EX, FW, IN, JJ, JJR, JJS, LS, MD, NN, NNP, NNPS, NNS, NN|SYM, PDT, POS, PRP, PRP$, RB, RBR, RBS, RP, SYM, TO, UH, VB, VBD, VBG, VBN, VBP, VBZ, WDT, WP, WP$, WRB, ``], id=None), length=-1, id=None)", "feat_chunk_tags": "Sequence(feature=ClassLabel(num_classes=23, names=[B-ADJP, B-ADVP, B-CONJP, B-INTJ, B-LST, B-NP, B-PP, B-PRT, B-SBAR, B-UCP, B-VP, I-ADJP, I-ADVP, I-CONJP, I-INTJ, I-LST, I-NP, I-PP, I-PRT, I-SBAR, I-UCP, I-VP, O], id=None), length=-1, id=None)", "tags": "Sequence(feature=ClassLabel(num_classes=9, names=[B-LOC, B-MISC, B-ORG, B-PER, I-LOC, I-MISC, I-ORG, I-PER, O], id=None), length=-1, id=None)" }
数据集分割
数据集被分割为训练集和验证集,分割大小如下:
| 分割名称 | 样本数量 |
|---|---|
| 训练 | 10014 |
| 验证 | 4028 |



