disi-unibo-nlp/PileUncopyrighted-NER-BIO
收藏Hugging Face2025-08-07 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/disi-unibo-nlp/PileUncopyrighted-NER-BIO
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个主要特征:tokens(文本分词序列)、ner_tags(命名实体识别标签序列)和domain(领域或行业类别)。数据集分为训练集,包含大约91513个示例,总大小约为156MB。数据集适用于需要文本分析和命名实体识别任务的领域,如自然语言处理和信息提取。
The dataset includes three main features: tokens (tokenized text sequences), ner_tags (named entity recognition label sequences), and domain (domain or industry category). The dataset is split into a training set, which contains approximately 91,513 examples and has a total size of about 156MB. The dataset is suitable for domains that require text analysis and named entity recognition tasks, such as natural language processing and information extraction.
提供机构:
disi-unibo-nlp



