five

VoxEL

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/VoxEL/6104759
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has manual annotations with respect to Wikipedia over the same text written in five languages: German (de), English (en), Spanish (es), French (fr) and Italian (it). The dataset is composed of 15 annotated news articles (in each of the 5 languages; 75 articles in total) where there is the same number of sentences in each language, as well as the same set of annotations for each corresponding sentence in the different languages. Each language has a total of 94 sentences across the 15 articles. We propose two annotated versions of the dataset: a strict version that only annotates persons, organizations and places (per, for example, traditional NER/MUC definitions of an entity), and a relaxed version that includes a larger number of annotations (e.g., capturing entity mentions such as “inflation” that have a corresponding Wikipedia article). Both the relaxed and the strict versions have the same text in the same languages. The strict version has 204 annotations per language, while the relaxed version has 674 annotations per language.
创建时间:
2018-04-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作