JonyC/ScienceGlossary-NER_fit
收藏Hugging Face2025-04-06 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/JonyC/ScienceGlossary-NER_fit
下载链接
链接失效反馈官方服务:
资源简介:
ScienceGlossary-NER数据集包含了来自不同学科的 scientific术语和短语。每个术语都配有用以展示其用法或定义的3-4个例句,这些例句是通过google/flan-t5-xl模型生成的。所有的句子都使用spaCy进行了分词,以便在NER训练期间进行最优的token对齐。每个token都使用BIO标注方案标注了一个标签,以表示是否为科学术语。该数据集是通过网络爬虫和AI生成创建的,旨在帮助进行科学实体识别并提高用于简化科学文本的模型性能。
The ScienceGlossary-NER dataset contains scientific terms and phrases from various disciplines, each accompanied by 3-4 example sentences that illustrate its usage or provide a definition. These sentences are generated using the google/flan-t5-xl model. All sentences are tokenized with spaCy for optimal token alignment during NER training. Each token is annotated with a label using the BIO tagging scheme to indicate whether it is part of a scientific term. The dataset was created through web scraping and AI generation, aiming to assist in scientific entity recognition and improve models for simplifying scientific texts.
提供机构:
JonyC



