five

Transcriptomic profiling of Calu3 cells infected with SARS-CoV-2 and treated with Xuanfei Baidu Granules

收藏
科学数据银行2025-11-01 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=2feff62d2fd144b48a9a396d351cdd8f
下载链接
链接失效反馈
官方服务:
资源简介:
1. Data Generation Procedure, Processing Methods, Steps, Equipment and Tools1.1 Data GenerationExperimental Design: Calu3 cells were divided into four groups with 4 biological replicates per group: Normal Control (NC), SARS-CoV-2-Infected Group (V), Xuanfei Baidu Granules High-Dose Group (XF500), and Xuanfei Baidu Granules Low-Dose Group (XF250). Cells in the V, XF500, and XF250 groups were infected with SARS-CoV-2, and the XF500 and XF250 groups were further treated with corresponding doses of Xuanfei Baidu Granules.Sequencing Service Provider: Transcriptome sequencing and data generation were performed by Personalbio Technology Co., Ltd. (Shanghai, China).Sequencing Equipment: Illumina high-throughput sequencing platform (specific model consistent with the platform's standard configuration for RNA-seq).Data Types Generated: Raw sequencing reads (Raw Data) and gene expression quantification data (FPKM values).1.2 Data Processing Methods and Steps1.2.1 Raw Data CollationAfter on-machine sequencing, image files were converted into raw sequencing data (Raw Data) in FASTQ format using the built-in software of the Illumina platform. Basic statistics were performed for each sample, including sample name, percentage of ambiguous bases, Q20 value (percentage of bases with a Phred quality score ≥ 20), and Q30 value (percentage of bases with a Phred quality score ≥ 30).1.2.2 Raw Data FilteringTo eliminate interference from low-quality sequences and adapters on subsequent analysis, the following filtering criteria were applied using Cutadapt software:Remove 3' terminal adapters with at least 10 bp overlap with the known adapter sequence (AGATCGGAAG), allowing 20% base mismatch.Discard reads with an average Phred quality score below 20.1.2.3 Quality AssessmentBase Quality Distribution: The quality of individual base positions was evaluated using single-base quality distribution plots. The sequencing error rate increases with sequence length and exhibits higher error rates in the first 6 base positions (caused by incomplete binding of random primers and RNA templates), which is a normal characteristic of the Illumina platform.Base Content Distribution: AT/GC separation was detected using base content distribution plots. For RNA-seq data, GC and AT contents are theoretically equal across sequencing cycles and remain stable throughout the process (chain-specific libraries may exhibit normal AT/GC separation). Preferential nucleotide composition in the first few positions due to 6 bp random primers is considered normal.Average Read Quality Distribution: The overall sequencing quality was assessed using average read quality distribution plots. A distinct peak indicates that most reads are of high quality; a broad peak or front tailing indicates partial low-quality data; a low peak value indicates poor overall quality; and the absence of a peak indicates excellent overall read quality.1.3 Tools and SoftwareRaw data conversion: Illumina platform's built-in software (e.g., bcl2fastq).Adapter trimming and low-quality read filtering: Cutadapt.Quality control analysis: FastQC (used to generate base quality distribution, base content distribution, and average read quality distribution plots).FPKM calculation: Standard RNA-seq quantification pipeline (e.g., HISAT2 + StringTie).2. Temporal and Geographic ScopeGeographic Scope: The experiment and sequencing were conducted at Personalbio Technology Co., Ltd. (Shanghai, China).Temporal Scope: The experiment was performed in 2024, and sequencing and data processing were completed within 1 month after sample collection.Temporal and Spatial Resolution: No additional temporal or spatial resolution parameters apply, as the data represent static gene expression profiles of cells at the endpoint of the treatment period.3. Tabular Data DescriptionTotal number of data files: 1.File content: Filtered high-quality sequencing reads, including sequence information.Format: Text file (txt).Size: Approximately 151 MB per file.Total number of entries in the tabular data: 20,042.Row labels: Gene IDs (unique identifiers for individual genes).Column labels: Expression levels of corresponding genes across different samples.Measurement units for gene expression levels derived from sequencing data: FPKM (Fragments Per Kilobase of transcript per Million mapped fragments) and reads (sequencing fragments without normalization).4. Missing Data DescriptionNo missing data were present in the dataset. All 16 samples successfully completed sequencing, filtering, and quality control processes. FPKM values were calculated for all detected genes, and each sample has complete raw reads and expression quantification data.
提供机构:
Tingting Zhao; Peifang Xie; Ruihan Chen; Yongjie Su; Xuanxuan Li; First Affiliated Hospital of Guangzhou Medical University; Zhenyang Liu; Qinhai Ma
创建时间:
2025-11-01
二维码
社区交流群
二维码
科研交流群
商业服务