SEED-ML: A Multi-Parametric Clinical Dataset on Male Infertility for Predictive Modeling and AI Research.

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/sc8rsz2vd7

下载链接

链接失效反馈

官方服务：

资源简介：

Authors: N. Sánchez-Gómez [1] (nicolassg@us.es), J.A. García-García [*, 1] (juliangg@us.es), J. Navarro-Pando [2,3,4,5] (jose.navarro@inebir.com), MJ Escalona-Cuaresma [1] (mjescalona@us.es). Affiliations: [1]ES3 Group (Engineering and Science for Software Systems group). University of Seville, Spain. Avenida Reina Mercedes, s/n., 41012, Seville, Spain. [2]Cátedra de Reproducción y Genética Humana del Instituto para el Estudio de la Biología de la Reproducción Humana (INEBIR), Seville, Spain. [3]Universidad Europea del Atlántico (UNEATLANTICO), Santander, Spain. [4]Fundación Universitaria Iberoamericana (FUNIBER), Seville, Spain. [5]San Juan de Dios Hospital, Sevilla, Spain. Abstract: SEED-ML (Semen Examination and Evaluation Dataset for Machine Learning) is an openly available, multi-parametric clinical dataset specifically designed to support research in male infertility diagnostics and prediction. The dataset comprises records from 10,124 patients, including detailed semen analysis parameters (pre- and post-treatment), morphological classifications, and clinical alterations. Infertility diagnosis is categorized into nine clinically relevant classes, ranging from normal fertility to complex multi-factor conditions such as oligoasthenoteratozoospermia. All data were anonymized and curated following strict ethical and privacy guidelines to ensure compliance with applicable medical data protection regulations. The dataset reflects real-world clinical distributions, with diagnostic classes ranging from 62.7% (Normozoospermia) to 0.16% (Azoospermia), providing a high-fidelity benchmark for testing machine learning algorithms under conditions of significant class imbalance. SEED-ML offers a valuable resource for developing and benchmarking machine learning models, enabling research in predictive analytics, decision support systems, and computational andrology. This dataset aims to facilitate interdisciplinary collaboration between clinicians, data scientists, and AI (artificial intelligence) researchers, accelerating the development of data-driven solutions in reproductive medicine. The dataset is publicly available in Mendeley under a CC BY 4.0 license.

作者：N. Sánchez-Gómez [1] (nicolassg@us.es)、J.A. García-García [*, 1] (juliangg@us.es)、J. Navarro-Pando [2,3,4,5] (jose.navarro@inebir.com)、MJ Escalona-Cuaresma [1] (mjescalona@us.es)。机构：[1] ES3研究组（工程与软件系统科学组），西班牙塞维利亚大学，西班牙塞维利亚雷纳梅塞达大道s/n，41012。[2] 西班牙塞维利亚人类生殖生物学研究学会（INEBIR）人类生殖与遗传讲席。[3] 欧洲大西洋大学（UNEATLANTICO），西班牙桑坦德。[4] 伊比利亚美洲大学基金会（FUNIBER），西班牙塞维利亚。[5] 塞维利亚圣胡安德迪奥斯医院。摘要：SEED-ML（Semen Examination and Evaluation Dataset for Machine Learning，精液检测与评估机器学习数据集）为公开可用的多参数临床数据集，专为支持男性不育症诊断与预测相关研究打造。数据集涵盖10124名患者的就诊记录，包含详细的精液分析参数（治疗前与治疗后）、形态学分类结果以及临床异常情况。不育症诊断被划分为9个临床相关类别，覆盖从正常生育力到少弱畸精子症（oligoasthenoteratozoospermia）等复杂多因素病症。所有数据均已完成匿名化处理，并严格遵循伦理与隐私准则完成数据质控，以确保符合适用的医疗数据保护法规。该数据集反映了真实的临床分布特征，各类诊断类别的占比从62.7%的正常精子症（Normozoospermia）到0.16%的无精子症（Azoospermia）不等，可为在存在显著类别不平衡的场景下测试机器学习算法提供高保真基准测试集。SEED-ML为机器学习模型的开发与基准测试提供了宝贵资源，可支撑预测分析、决策支持系统以及计算男科学领域的研究工作。本数据集旨在促进临床医师、数据科学家与AI（人工智能）研究者之间的跨学科协作，加速生殖医学领域数据驱动解决方案的研发进程。该数据集以CC BY 4.0许可协议在Mendeley平台公开获取。

创建时间：

2026-01-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集