thangvip/cti-dataset-split
收藏数据集概述
数据集配置
默认配置 (default)
- 特征:
sentence_idx: 整数类型 (int64)words: 字符串序列POS: 整数序列 (int64)tag: 整数序列 (int64)
- 拆分:
train:- 字节数: 16917605
- 样本数: 17480
- 下载大小: 2164774 字节
- 数据集大小: 16917605 字节
子集1 (subset1)
- 特征:
sentence_idx: 整数类型 (int64)words: 字符串序列POS: 整数序列 (int64)tag: 整数序列 (int64)
- 拆分:
train:- 字节数: 13350196.989130436
- 样本数: 13794
- 下载大小: 2008529 字节
- 数据集大小: 13350196.989130436 字节
子集2 (subset2)
- 特征:
sentence_idx: 整数类型 (int64)words: 字符串序列POS: 整数序列 (int64)tag: 整数序列 (int64)
- 拆分:
test:- 字节数: 3338033.1604691073
- 样本数: 3449
- 下载大小: 502967 字节
- 数据集大小: 3338033.1604691073 字节
数据文件配置
默认配置 (default)
- 数据文件:
train:data/train-*
子集1 (subset1)
- 数据文件:
train:subset1/train-*
子集2 (subset2)
- 数据文件:
test:subset2/test-*
字典映射
POS 标签映射
-
POS 到 ID: python pos_2_id = {#: 0, $: 1, "": 2, (: 3, ): 4, .: 5, :: 6, CC: 7, CD: 8, DT: 9, EX: 10, FW: 11, IN: 12, JJ: 13, JJR: 14, JJS: 15, MD: 16, NN: 17, NNP: 18, NNPS: 19, NNS: 20, PDT: 21, POS: 22, PRP: 23, PRP$: 24, RB: 25, RBR: 26, RBS: 27, RP: 28, TO: 29, VB: 30, VBD: 31, VBG: 32, VBN: 33, VBP: 34, VBZ: 35, WDT: 36, WP: 37, WP$: 38, WRB: 39}
-
ID 到 POS: python id_2_pos = {0: #, 1: $, 2: "", 3: (, 4: ), 5: ., 6: :, 7: CC, 8: CD, 9: DT, 10: EX, 11: FW, 12: IN, 13: JJ, 14: JJR, 15: JJS, 16: MD, 17: NN, 18: NNP, 19: NNPS, 20: NNS, 21: PDT, 22: POS, 23: PRP, 24: PRP$, 25: RB, 26: RBR, 27: RBS, 28: RP, 29: TO, 30: VB, 31: VBD, 32: VBG, 33: VBN, 34: VBP, 35: VBZ, 36: WDT, 37: WP, 38: WP$, 39: WRB}
标签映射
-
标签到 ID: python tag_2_id = {B-application: 0, B-cve id: 1, B-edition: 2, B-file: 3, B-function: 4, B-hardware: 5, B-language: 6, B-method: 7, B-os: 8, B-parameter: 9, B-programming language: 10, B-relevant_term: 11, B-update: 12, B-vendor: 13, B-version: 14, I-application: 15, I-edition: 16, I-hardware: 17, I-os: 18, I-relevant_term: 19, I-update: 20, I-vendor: 21, I-version: 22, O: 23}
-
ID 到标签: python id_2_tag = {0: B-application, 1: B-cve id, 2: B-edition, 3: B-file, 4: B-function, 5: B-hardware, 6: B-language, 7: B-method, 8: B-os, 9: B-parameter, 10: B-programming language, 11: B-relevant_term, 12: B-update, 13: B-vendor, 14: B-version, 15: I-application, 16: I-edition, 17: I-hardware, 18: I-os, 19: I-relevant_term, 20: I-update, 21: I-vendor, 22: I-version, 23: O}



