OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/OpticalBERT_and_OpticalTable-SQA_Text-_and_Table-Based_Language_Models_for_the_Optical-Materials_Domain/22306081
下载链接
链接失效反馈官方服务:
资源简介:
Text mining in the optical-materials domain is becoming
increasingly
important as the number of scientific publications in this area grows
rapidly. Language models such as Bidirectional Encoder Representations
from Transformers (BERT) have opened up a new era and brought a significant
boost to state-of-the-art natural-language-processing (NLP) tasks.
In this paper, we present two “materials-aware” text-based
language models for optical research, OpticalBERT and OpticalPureBERT,
which are trained on a large corpus of scientific literature in the
optical-materials domain. These two models outperform BERT and previous
state-of-the-art models in a variety of text-mining tasks about optical
materials. We also release the first “materials-aware”
table-based language model, OpticalTable-SQA. This is a querying facility
that solicits answers to questions about optical materials using tabular
information that pertains to this scientific domain. The OpticalTable-SQA
model was realized by fine-tuning the Tapas-SQA model using a manually
annotated OpticalTableQA data set which was curated specifically for
this work. While preserving its sequential question-answering performance
on general tables, the OpticalTable-SQA model significantly outperforms
Tapas-SQA on optical-materials-related tables. All models and data
sets are available to the optical-materials-science community.
创建时间:
2023-03-20



