资源简介:
# Datasets --- ### Appendix S1–S4 The files included the methylation data, sample information, and predicted age of each target species/species group. The data in the files are used to build age estimation models. 'domestic cat' in the filename means the file is for the domestic cat; 'leopard cat' means for the Tsushima leopard cat; 'panthera' means for the Panthera species (i.e., jaguar, leopard, lion, snow leopard, and tiger), and 'all' means for all the samples from all species. ### Appendix S5 The file contains the CpG selection results for the best age estimation model of each species/species group, the frequency of being selected in elastic net feature selection of each CpG site, correlation coefficients between the methylation rate and chronological age of each CpG site, and NCBI sequence ID with position. ### CpG No renamed fulllist\_all felidae.csv The file showed the list of CpGs, which were at least contained in one species. ### M%+sampleinfo\*.csv These files are the version of Appendix S1–S4 before adding the predicted age. ### indextable\_skf\_cor\*.csv Raw results of feature selection (correlation-based). ### indextable\_skf\_loio\_ela\*.csv Raw results of feature selection (elastic net-based, leave-one-individual-out cross-validation). ### indextable\_skf\_loso(\_raw)\_ela\*.csv Raw results of feature selection (elastic net-based, leave-one-species-out cross-validation). *P.S. Appendix S1-S5 are referred to in our paper. Other files were only used in the analysis.* # Description of the data sets and file structures ### Appendix S1–S4, M%+sampleinfo\*.csv * amp3_,amp4_, amp8_, amp9_, and bs38\_ in the head are the names of CpG sites. Columns with the heads showed the results of methylation rates. The proximal genes and positions in genomes could be referred to in Appendix S5 and CpG No renamed fulllist_all felidae.csv. * Health_condition_ed: health condition at the time of sampling (good, diseased). * Health_condition (Appendix S2–S4, species other than domestic cats): raw health condition data * Health condition information in Appendix S1 (domestic cats): * Health_condition_Healthy (column K): healthy sample Health_condition_CKD (column L): sample with chronic kidney disease Health_condition_Diabetes (column M): sample with diabetes Health_condition_Cancer (column N): sample with cancer Health_condition_DigestiveDisease (column O): sample with digestive diseases Health_condition_Others (column P): sample with other diseases * Fold: data was split into five folds (0–4) with similar age and species distribution using stratified k-fold. * Age_class: age class of each sample. * Predictedage_*: age predicted through the methods below. | Feature selection methods | Regression methods | Column name (after 'Predictedage\_') | | --------------------------- | ------------------------ | ------------------------------------ | | ---------elastic net------- | -------only once-------- | ela | | elastic net | elastic net | ela\_ela | | elastic net | SVMr | ela\_svmr | | cor ≥ 0.5 | elastic net | cor0\_5\_ela | | cor ≥ 0.7 | elastic net | cor0\_7\_ela | | cor ≥ 0.5 | SVMr | cor0\_5\_svmr | | cor ≥ 0.7 | SVMr | cos0\_7\_svmr | * For Appendix S2 and M%+sampleinfo_leopardcat_paper_final_fold+ageclass.csv * 'Age_stage_at_time_of_protection' shows the age stages estimated when the individuals were protected from morphological methods. * 'Death_date' shows the death date. No data here means the individuals are still alive in 2023. This data was not used in the analysis. * Empty cells mean no data. Captive-born individuals had no data in 'Age_stage_at_time_of_protection'. Wild-born individuals had no data in 'Age', 'Health_condition_ed','Fold', 'Age_class', which were only available for captive-born individuals with age known. The predicted epigenetic age was only calculated using the best model and summarized in 'Predictedage_ela_svmr'. * For Appendix S3 and M%+sampleinfo_panthera_paper_final_fold+ageclass.csv, Appendix S4 and M%+sampleinfo_all_paper_final_fold+relative_ageclass.csv * 'Predictedage_*_loso(_raw)' is age predicted under the model evaluation of leave-one-species-out-cross-validation. * For Appendix S4 * 'Predictedage_* ' is the predicted relative age of each sample. 'Predictedage_*_chronoloical age' is the predicted chronological age under the best models. * Empty cells mean no data. The summarizing standard for domestic cats and other species was different. Therefore, empty cells are in health condition-related columns. ### Appendix S5, CpG No renamed fulllist\_all felidae.csv * Columns E to M showed whether the CpG sites existed in each species group. 0 means the CpG does not exist in the species; 1 means the CpG exists in the species. Panthera_spp. (column L) included species in column G–K (i.e. jaguar, leopard, lion, snow leopard, and tiger). All_spp. (column M) included all species. ### Appendix S5 * Green, yellow, orange, and red columns represent different levels of correlation coefficients between methylation rates of selected CpG sites and chronological age. White columns are CpG sites that were not selected. Grey columns are CpG sites that did not exist in the species group. * Columns named "Features in the best model (correlation_coefficient)—Elastic net + SVMr (frequency ≥ 4 or 5)" showed the correlation coefficient between the chronological age and the methylation rates of features (i.e., CpGs) used in the best models. Elastic net-based feature selection followed by regression using SVMr (Elastic net + SVMr) produced the best models for all species groups. For some species groups, CpGs selected over four times in all five training data sets (frequency≥4) constructed the explanatory variables of the best models; for others, CpGs selected in all five training data sets (frequency ≥ 5) constructed the explanatory variables of the best models. # Code/Software 2023_Qi_etal_paper Rscript.R was run in R 4.3.1. 2023_Qi_etal_Pythonscript.py was run in Python 3.8.8.