five

AI used to diagnose and treat genetic diseases.

收藏
DataCite Commons2025-05-01 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/f63nhynzhx
下载链接
链接失效反馈
官方服务:
资源简介:
Step 2: Data Collection & Preprocessing We will need genetic datasets such as: • 1000 Genomes Project (for genetic variants) • ClinVar (for pathogenic mutations) • GTEx (for gene expression) Python Code for Data Loading and Preprocessing Generate a Synthetic Genetic Dataset This dataset will include: . Gene Mutations (Encoded as numerical values) Expression Levels (Simulating gene expression data) Mutation Type (Categorical: Missense, Nonsense, Frameshift) Disease Labels (Binary classification: 0 = No Disease, 1 = Genetic Disease) import pandas as pd import numpy as np # Set random seed for reproducibility np.random.seed(42) # Generate data num_samples = 1000 gene_mutations = np.random.randint(0, 10, num_samples) # 10 different mutation types expression_levels = np.random.uniform(0.1, 10.0, num_samples) # Simulated expression levels mutation_types = np.random.choice(["Missense", "Nonsense", "Frameshift"], num_samples) disease_labels = np.random.choice([0, 1], num_samples) # 0 = No Disease, 1 = Disease # Create DataFrame df = pd.DataFrame({ "Gene_Mutation": gene_mutations, "Expression_Level": expression_levels, "Mutation_Type": mutation_types, "Disease_Label": disease_labels }) # Save to CSV df.to_csv("genetic_data.csv", index=False) print("Synthetic genetic dataset saved as 'genetic_data.csv'.") Gene_Mutation Expression_Level Mutation_Type Disease_Label 0 6 2.634554 Missense 0 1 3 7.288346 Missense 1 2 7 5.970333 Frameshift 0 3 4 1.111905 Frameshift 1 4 6 9.195630 Missense 0 RangeIndex: 1000 entries, 0 to 999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Gene_Mutation 1000 non-null int64 1 Expression_Level 1000 non-null float64 2 Mutation_Type 1000 non-null object 3 Disease_Label 1000 non-null int64 dtypes: float64(1), int64(2), object(1) memory usage: 31.4+ KB None Gene_Mutation Expression_Level Mutation_Type Disease_Label 0 6 2.634554 1 0 1 3 7.288346 1 1 2 7 5.970333 0 0 3 4 1.111905 0 1 4 6 9.195630 1 0 RangeIndex: 1000 entries, 0 to 999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Gene_Mutation 1000 non-null int64 1 Expression_Level 1000 non-null float64 2 Mutation_Type 1000 non-null int32 3 Disease_Label 1000 non-null int64
提供机构:
Mendeley Data
创建时间:
2025-02-24
二维码
社区交流群
二维码
科研交流群
商业服务