ICT2214Team7/Test_Dataset
收藏Hugging Face2024-06-26 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/ICT2214Team7/Test_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本数据的词性标注、短语标注和命名实体识别标签。数据集分为训练集、验证集和测试集,分别包含80、10和10个样本。每个样本包含id、tokens、pos_tags、chunk_tags和ner_tags等特征。pos_tags用于标注词性,chunk_tags用于标注短语结构,ner_tags用于标注命名实体。
This dataset contains text data with part-of-speech tags, chunk tags, and named entity recognition tags. The dataset is divided into training, validation, and test sets, containing 80, 10, and 10 samples respectively. Each sample includes features such as id, tokens, pos_tags, chunk_tags, and ner_tags. pos_tags are used for part-of-speech tagging, chunk_tags for chunking, and ner_tags for named entity recognition.
提供机构:
ICT2214Team7
原始信息汇总
数据集概述
特征
- id: 字符串类型
- tokens: 字符串序列
- pos_tags: 词性标签序列
- 标签名称:
- 0: "
- 1:
- 2: #
- 3: $
- 4: (
- 5: )
- 6: ,
- 7: .
- 8: :
- 9: ``
- 10: CC
- 11: CD
- 12: DT
- 13: EX
- 14: FW
- 15: IN
- 16: JJ
- 17: JJR
- 18: JJS
- 19: LS
- 20: MD
- 21: NN
- 22: NNP
- 23: NNPS
- 24: NNS
- 25: NN|SYM
- 26: PDT
- 27: POS
- 28: PRP
- 29: PRP$
- 30: RB
- 31: RBR
- 32: RBS
- 33: RP
- 34: SYM
- 35: TO
- 36: UH
- 37: VB
- 38: VBD
- 39: VBG
- 40: VBN
- 41: VBP
- 42: VBZ
- 43: WDT
- 44: WP
- 45: WP$
- 46: WRB
- 标签名称:
- chunk_tags: 短语标签序列
- 标签名称:
- 0: O
- 1: B-ADJP
- 2: I-ADJP
- 3: B-ADVP
- 4: I-ADVP
- 5: B-CONJP
- 6: I-CONJP
- 7: B-INTJ
- 8: I-INTJ
- 9: B-LST
- 10: I-LST
- 11: B-NP
- 12: I-NP
- 13: B-PP
- 14: I-PP
- 15: B-PRT
- 16: I-PRT
- 17: B-SBAR
- 18: I-SBAR
- 19: B-UCP
- 20: I-UCP
- 21: B-VP
- 22: I-VP
- 23: B-PNP
- 24: I-PNP
- 标签名称:
- ner_tags: 命名实体标签序列
- 标签名称:
- 0: O
- 1: B-PER
- 2: I-PER
- 3: B-ORG
- 4: I-ORG
- 5: B-LOC
- 6: I-LOC
- 7: B-MISC
- 8: I-MISC
- 标签名称:
数据集划分
- train:
- 字节数: 44800
- 样本数: 80
- validation:
- 字节数: 5542
- 样本数: 10
- test:
- 字节数: 5280
- 样本数: 10
数据集大小
- 下载大小: 25019 字节
- 数据集总大小: 55622 字节
配置
- config_name: default
- 数据文件路径:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*
- 数据文件路径:



