five

MaterialBERT for Natural Language Processing of Materials Science Texts

收藏
Taylor & Francis Group2022-12-12 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/MaterialBERT_for_Natural_Language_Processing_of_Materials_Science_Texts/21130151/1
下载链接
链接失效反馈
官方服务:
资源简介:
A BERT (Bidirectional Encoder Representations from Transformers) model, which we named “MaterialBERT,” has been generated using scientific papers in wide area of material science as a corpus. A new vocabulary list for tokenizer was generated using material science corpus. Two BERT models with different vocabulary lists for the tokenizer, one with the original one made by Google and the other newly made by the authors, were generated. Word vectors embedded during the pre-training with the two MaterialBERT models reasonably reflect the meanings of materials names in material-class clustering and in the relationship between base materials and their compounds or derivatives for not only inorganic materials but also organic materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic Acceptability) using the pre-trained MaterialBERT showed a higher score than the original BERT. The two MaterialBERTs could be also utilized as a starting point for transfer learning of a narrower domain-specific BERT.
提供机构:
Kawano, Hiroyuki; Teraoka, Hiroshi; Sato, Fumitaka; Yoshitake, Michiko
创建时间:
2022-09-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作