five

A Single-cell Transcriptomic Sequencing Dataset of Early Female and Male Chicken (Gallus gallus) Embryos

收藏
Figshare2025-02-06 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_A_Single-cell_Transcriptomic_Sequencing_Dataset_of_Early_Female_and_Male_Chicken_b_b_i_Gallus_gallus_i_b_b_Embryos_b_/28357844
下载链接
链接失效反馈
官方服务:
资源简介:
Quality Control of Single-Cell DataRaw sequencing data were processed using SCOPE-tools (v1.4.0) to generate a gene expression matrix. After extracting and correcting barcodes and unique molecular identifiers (UMIs), adapter sequences and poly(A) tails were removed. The trimmed reads were aligned to the chicken reference genome (GRCg6a) using the integrated STAR (v2.7.9a) algorithm in CellRanger (v5.0.0). Gene mapping was performed with featureCounts, followed by UMI correction and quantification to produce a complete gene expression matrix. The processed data were then compiled into a matrix file. The expression matrix was further analyzed using the Seurat (v4.3.0.1) package to ensure data quality. Cells were filtered based on gene count thresholds (min.cells > 3 and min.features > 200). Cells with fewer than 1,000 UMIs or a log10GenesPerUMI value exceeding 0.7 were excluded. Additionally, cells with mitochondrial gene content exceeding 25% were removed. These quality control measures ensured the reliability of downstream analyses.Dimensionality Reduction and Clustering of Single-Cell DataTo reduce technical noise and ensure high data quality, the gene expression matrix was normalized and scaled using the NormalizeData and ScaleData functions in the Seurat package. The FindVariableFeatures function was applied to calculate the mean expression and dispersion for each gene, identifying 2,000 highly variable genes. Principal component analysis (PCA) was then performed on the high-dimensional data, retaining the top 20 principal components. Simulated doublet data were generated to match the expected doublet rate, and these were integrated with the original dataset. Each cell was assigned a doublet score using a k-nearest neighbor (k-NN) classifier. Potential doublets were identified using the doubletFinder_v3 function with the parameter pN = 0.25 and removed based on the expected doublet threshold, resulting in a final dataset of 70,361 high-quality cells for downstream analyses. To correct for potential batch effects, the Harmony algorithm was applied. For clustering, the FindClusters function was used with a resolution of 0.4, followed by dimensionality reduction and visualization using uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding. The UMAP algorithm was optimized with a neighborhood size of 20 to achieve optimal cell clustering and clear visual representation of the cell populations.Differential Gene Screening To characterize the functional properties of different cell clusters, we identified differentially expressed genes (DEGs) using the "FindAllMarkers" function in the Seurat package. The selection criteria required genes to be expressed in more than 25% of cells in the target cell subpopulation (min.pct = 0.25) and to exhibit significantly higher expression levels in the target cluster compared to others (test.use = "MAST"). To ensure the biological relevance of the results, more stringent thresholds were applied: p-value 1. Cell types were annotated by integrating literature-supported evidence and classical marker genes, allowing for accurate classification of cell populations and elucidation of their biological functions. The expression patterns of marker genes were visualized using the DoHeatmap, DotPlot, and VlnPlot functions in the Seurat package. These visualizations further clarified cell identities and highlighted their functional characteristics.
创建时间:
2025-02-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作