five

DataSheet1_Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks.xlsx

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/DataSheet1_Using_Image_Recognition_to_Process_Unbalanced_Data_in_Genetic_Diseases_From_Biobanks_xlsx/19128512
下载链接
链接失效反馈
官方服务:
资源简介:
With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance.
创建时间:
2022-02-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作