Entity Normalization
收藏figshare.com2019-06-04 更新2025-01-15 收录
下载链接:
https://figshare.com/articles/dataset/Entity_Normalization/8184365/1
下载链接
链接失效反馈官方服务:
资源简介:
These json documents contain mappings for materials science entity normalization. Each entity is mapped onto the most frequently occurring synonym that is not an acronym.We provide entity normalization for materials science properties (pro), applications (apl), sample descriptors (dsc), symmetry/phase labels (spl), synthesis methods (smt), and characterization methods (cmt).Each term will have a "most common" entity to which it can be mapped. Sub entities are also included which have also been normalized.*Please note: entities that occur infrequently in our corpus are unlikely to be normalized (and less likely to be normalized correctly). In-line with Zipf's law for NLP, infrequently occurring entities make up the largest portion of unique entities in the corpus, and so a large fraction of entiites in these json files are not normalized. However, frequently occurring terms like "XRD" are very likely to be normalized and should be normalized correctly.
本 JSON 文档包含材料科学实体归一化的映射。每个实体都映射至出现频率最高的非缩写同义词。我们提供了材料科学属性(pro)、应用(apl)、样本描述符(dsc)、对称/相标签(spl)、合成方法(smt)和表征方法(cmt)的实体归一化。每个术语都将有一个“最常见”的实体与之对应。此外,还包含了已归一化的子实体。请注意:在本文档中出现频率较低的实体不太可能被归一化(并且正确归一化的可能性也较小)。遵循自然语言处理领域的 Zipf 定律,出现频率较低的实体构成了语料库中独特实体的最大部分,因此这些 JSON 文件中的大部分实体尚未归一化。然而,像“XRD”这样的高频术语则很可能被正确归一化。
提供机构:
figshare



