five

OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/OpticalBERT_and_OpticalTable-SQA_Text-_and_Table-Based_Language_Models_for_the_Optical-Materials_Domain/22306081
下载链接
链接失效反馈
官方服务:
资源简介:
Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have opened up a new era and brought a significant boost to state-of-the-art natural-language-processing (NLP) tasks. In this paper, we present two “materials-aware” text-based language models for optical research, OpticalBERT and OpticalPureBERT, which are trained on a large corpus of scientific literature in the optical-materials domain. These two models outperform BERT and previous state-of-the-art models in a variety of text-mining tasks about optical materials. We also release the first “materials-aware” table-based language model, OpticalTable-SQA. This is a querying facility that solicits answers to questions about optical materials using tabular information that pertains to this scientific domain. The OpticalTable-SQA model was realized by fine-tuning the Tapas-SQA model using a manually annotated OpticalTableQA data set which was curated specifically for this work. While preserving its sequential question-answering performance on general tables, the OpticalTable-SQA model significantly outperforms Tapas-SQA on optical-materials-related tables. All models and data sets are available to the optical-materials-science community.
创建时间:
2023-03-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作