five

Document-level Relation Extraction Method Based on Data Augmentation and Dynamic Threshold

收藏
中国科学数据2026-04-13 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0070117
下载链接
链接失效反馈
官方服务:
资源简介:
Relationship Extraction (RE) tasks in the biomedical field often face issues such as data scarcity, class imbalance, and multiple labels. To address these issues, a method that combines data augmentation with a dynamic threshold strategy is proposed. First, the GPT model is fine tuned using a custom loss function and new data is generated based on the Word2Vec model by obtaining feature templates. Second, the BERT classifier is used to screen the generated data, combining high-quality samples with the original dataset to form a richer training set. Finally, a learnable dynamic threshold strategy is proposed to dynamically adjust the classification threshold based on document length and the difference between model output and real labels, enabling the model to flexibly handle multi-label document problems. Experimental results on two publicly available medical datasets showed that the method achieved F1 values of 84.1% and 69.3%, which were 1.6 and 1.1 percentage points higher than those of the ATLOP method, respectively, verifying the effectiveness of the method.
创建时间:
2026-04-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作