five

WDV

收藏
Figshare2022-05-04 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/WDV/17159045
下载链接
链接失效反馈
官方服务:
资源简介:
WDV is a dataset for the verbalisation of Wikidata triples. It is thoroughly described in the paper that accompanies it. It consists of a large partially annotated dataset of over 7.6k entries that align a broad collection of Wikidata claims with their respective verbalisations. The attributes seen in each entry consist of: attributes describing the claim, such as its Wikidata ID (claim id ) and its rank (normal, deprecated or preferred); attributes from the claim’s components (subject, predicate, and object), including their Wikidata IDs (e.g. subject id ), labels (e.g. subject label ), descriptions (e.g. subject desc), and aliases (e.g. subject alias); a JSON representation of the object alongside its type (object datatype) as defined by Wikidata; attributes from the claim’s theme such as its root class’ Wikidata ID (theme root class id) and label (theme label); the aligned verbalisation, before and after replacement of tokens unknown to the model (verbalisation unk replaced ); the sampling weight from the stratified sampling process; and the crowdsourced annotations and their aggregations, for those entries (∼1.4k) that are annotated. WDV is a 3 star dataset according to the 5 star deployment scheme for Linked Data. It is available on the web in a structured, machine-readable, and non-proprietary format. WDV is aimed at directly helping with managing reference quality in Wikidata by allowing us to close the gap in form between the data in the KG and the data in its sources. It has already made possible efforts towards automated fact verification in Wikidata.
创建时间:
2022-05-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作