WhoisWho
收藏arXiv2020-07-04 更新2024-06-21 收录
下载链接:
https://www.aminer.cn/billboard/whoiswho
下载链接
链接失效反馈官方服务:
资源简介:
WhoisWho是由清华大学创建的大规模手标作者姓名消歧数据集,包含399,255篇文档和45,187位作者,涉及421个常见作者名。数据集从AMiner数据库中抽样,通过精心设计的注释框架进行标注,旨在提高注释效率和准确性。该数据集通过结合人类和计算机的协作,有效解决了作者姓名歧义问题,适用于评估和改进姓名消歧算法,为学术数字记录的准确性提供了重要支持。
WhoisWho is a large-scale manually annotated author name disambiguation dataset developed by Tsinghua University. It contains 399,255 academic documents and 45,187 authors, spanning 421 common author names. The dataset is sampled from the AMiner database and annotated via a meticulously designed annotation framework, aiming to enhance both annotation efficiency and accuracy. By leveraging human-computer collaboration, this dataset effectively resolves author name ambiguity, and is suitable for evaluating and refining name disambiguation algorithms, providing critical support for the accuracy of academic digital records.
提供机构:
清华大学
创建时间:
2020-07-04



