five

天津市乳腺癌分类辅助诊断模型训练数据

收藏
浙江省数据知识产权登记平台2024-01-13 更新2024-05-08 收录
下载链接:
https://www.zjip.org.cn/home/announce/trends/27346
下载链接
链接失效反馈
官方服务:
资源简介:
通过对样本的数据处理和数据加工,提供给辅助诊断人工智能模型进行训练,帮助人工智能模型更好地理解天津市样本场景下将乳腺癌分型,提取特征,发现规律,最终提高诊断人工智能模型的准确性、鲁棒性和泛化能力。1数据采集:通过正式合作协议,从医疗机构取得匿名化的样本临床数据,包括是否有术后病理结果、术后ER(雌性激素受体)、术后PR(孕激素受体)、术后Her2情况、术后Fish情况,同时还要获取系统内术后Her2阴性阳性分型标记;2数据处理:对数据进行检查核对,确保所有数据去标志化,处于完全匿名化状态且不可还原的状态,将没有病理结果的数据去除,对异常数据进行清洗去除,对部分缺失数据进行生成式补充;3数据加工:基于原始数据和算法规则,生成乳腺癌分类标记,具体判规则为:如果ER满足阳性,同时PR满足阳性,同时HER-2满足阴性,则标记为Luminal型,反之则标记为其他类型。

This dataset is processed and curated from clinical samples, and is provided for training auxiliary diagnostic artificial intelligence (AI) models. It aims to help the AI models better perform breast cancer subtyping in the Tianjin clinical sample scenario, extract features and discover underlying patterns, thereby ultimately improving the accuracy, robustness and generalization ability of the diagnostic AI models. 1. Data Collection: Obtain anonymized clinical sample data from medical institutions through formal cooperation agreements, including postoperative pathological results, postoperative ER (Estrogen Receptor), postoperative PR (Progesterone Receptor), postoperative Her2 status, postoperative FISH status, as well as the postoperative Her2 negative/positive subtyping labels from the institution's internal system; 2. Data Processing: Conduct inspection and verification on the collected data to ensure that all data are fully de-identified, completely anonymized and non-reidentifiable. Remove data without postoperative pathological results, clean and eliminate abnormal data, and perform generative imputation for partially missing data; 3. Data Labeling: Generate breast cancer classification labels based on the original data and predefined algorithm rules. The specific classification rule is: if ER is positive, PR is positive, and HER-2 is negative, the sample will be labeled as Luminal subtype; otherwise, it will be labeled as other subtypes.
提供机构:
杭州智圆惠方科技有限公司
创建时间:
2023-12-29
搜集汇总
数据集介绍
main_image_url
特点
该数据集包含150条天津市乳腺癌患者的临床数据,用于训练辅助诊断人工智能模型,以提高乳腺癌分类的准确性和泛化能力。数据经过匿名化处理,并按照特定算法规则进行分类标记,每年更新一次。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务