KenziL/autotrain-data-test
收藏AutoTrain 项目测试数据集
数据集描述
该数据集由 AutoTrain 自动处理,用于测试项目。
语言
数据集的语言为法语,BCP-47 代码为 fr。
数据集结构
数据实例
数据集的一个样本如下所示:
json [ { "tokens": [ "CCI", "CCI", "CCI", "CCI bifocal G3, 7 et 25 mm", "CCI bifocal G3, 7 et 25 mm", "CCI", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "RO+ 20%", " RO+ 20%", "RO+", "RO+", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "RP-", "RP-", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "HER2 2+", "HER2 2+", "HER2 +", "HER2 +", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "Fish+", "Fish+", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "N+ 17/19", "N+ 17/19", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "CA15-3 : 12 UI", "CA15-3 : 12 UI", "18/04/2019 : mammectomie dt + CA", "18/04/2019 : mammectomie dt + CA", "PS-0", "PS-0", "PS-0", "PS-0", " 03/2020", "08/2020", " 03/2020", "08/2020" ], "tags": [ 28, 28, 28, 37, 37, 28, 14, 14, 29, 29, 29, 29, 32, 32, 33, 33, 34, 34, 19, 19, 19, 19, 20, 20, 17, 17, 18, 18, 23, 23, 24, 24, 6, 6, 7, 7, 27, 27, 27, 27, 12, 12, 12, 12 ] }, { "tokens": [ "K sein D", "1992 : K sein D", "CA15-3 =1890", "CA 15-3 : 5200", "10/18", "11/21", "PS-2", "10/18" ], "tags": [ 28, 14, 6, 6, 7, 7, 27, 12 ] } ]
数据集字段
数据集包含以下字段(也称为“特征”):
json { "tokens": "Sequence(feature=Value(dtype=string, id=None), length=-1, id=None)", "tags": "Sequence(feature=ClassLabel(names=[ALK, ALK_DATE, BRAF, BRAF_DATE, BRCA, BRCA_DATE, CA15-3, CA15-3_DATE, CK20, CK20_DATE, CK7, CK7_DATE, Date PS, Date arru00eat traitement, Date du diagnostic de la tumeur primitive, EGFR, EGFR_DATE, FISH, FISH_DATE, HER2, HER2_DATE, KI67, KI67_DATE, N+, N+_DATE, PDL1, PDL1_DATE, PS, Premier type histologique de cancer, RO, ROS, ROS_DATE, RO_DATE, RP, RP_DATE, TTF1, TTF1_DATE, Taille de la tumeur primitive au diagnostic, motif arru00eat traitement, ru00e9cepteurs hormonaux, ru00e9cepteurs_hormonaux_DATE], id=None), length=-1, id=None)" }
数据集分割
数据集分为训练集和验证集,分割大小如下:
| 分割名称 | 样本数量 |
|---|---|
| train | 999 |
| valid | 508 |



