SESAME
收藏arXiv2019-08-13 更新2024-06-21 收录
下载链接:
https://sesame-pt.github.io
下载链接
链接失效反馈官方服务:
资源简介:
SESAME数据集是由里约热内卢天主教大学计算机系的研究人员创建的,用于葡萄牙语的命名实体识别。该数据集包含3,650,909条句子,总计87,769,158个标记,主要来源于Wikipedia和DBpedia的结构化数据。创建过程中,研究人员利用了DBpedia和Wikipedia的链接及结构化数据,通过自动化的方法生成标注数据。SESAME数据集的应用领域主要集中在自然语言处理中的命名实体识别任务,旨在通过大规模数据训练复杂的神经网络模型,以提高实体识别的准确性。
The SESAME dataset was created by researchers from the Department of Computer Science, Pontifical Catholic University of Rio de Janeiro, for Portuguese named entity recognition (NER). This dataset contains 3,650,909 sentences, totaling 87,769,158 tokens, and is primarily sourced from structured data of Wikipedia and DBpedia. During its development, researchers utilized the linkages and structured data from DBpedia and Wikipedia to generate annotated data via automated methods. The SESAME dataset is mainly applied to named entity recognition tasks in natural language processing (NLP), aiming to train complex neural network models with large-scale data to improve the accuracy of entity recognition.
提供机构:
里约热内卢天主教大学计算机系
创建时间:
2019-08-13



