Entity Normalization
收藏Figshare2019-06-04 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Entity_Normalization/8184365
下载链接
链接失效反馈官方服务:
资源简介:
These json documents contain mappings for materials science entity normalization. Each entity is mapped onto the most frequently occurring synonym that is not an acronym.We provide entity normalization for materials science properties (pro), applications (apl), sample descriptors (dsc), symmetry/phase labels (spl), synthesis methods (smt), and characterization methods (cmt).Each term will have a "most common" entity to which it can be mapped. Sub entities are also included which have also been normalized.*Please note: entities that occur infrequently in our corpus are unlikely to be normalized (and less likely to be normalized correctly). In-line with Zipf's law for NLP, infrequently occurring entities make up the largest portion of unique entities in the corpus, and so a large fraction of entiites in these json files are not normalized. However, frequently occurring terms like "XRD" are very likely to be normalized and should be normalized correctly.
本JSON(JavaScript Object Notation)文档集包含材料科学实体归一化的映射关系。每个实体均映射至出现频次最高的非缩写同义词。我们为材料科学领域的属性(pro)、应用(apl)、样品描述符(dsc)、对称性/物相标签(spl)、合成方法(smt)与表征方法(cmt)提供实体归一化支持。每个术语均可映射至对应的「最常见」实体,同时本数据集亦包含已完成归一化的子实体。
*请注意:在我们的语料库中出现频次较低的实体,其归一化效果通常不佳(且正确归一的概率更低)。根据自然语言处理领域的齐普夫定律(Zipf's law),语料库中占比最高的唯一实体为低频实体,因此本JSON文档集中的绝大多数实体并未完成归一化。不过,诸如「XRD」这类高频术语则极大概率可被正确归一化。
创建时间:
2019-06-04



