肠道微生物数据集|结直肠癌数据集

库帕思2025-12-08 更新2025-12-20 收录

下载链接：

https://www.kupasai.com/corpus/detail?id=412&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

粪便、肿瘤核心、肿瘤表面以及从34位接受手术的结直肠癌（CRC）患者身上收集的健康相邻组织中的16S rRNA（V4）基因测序，包括28份粪便样本和39份组织样本，这些数据来自NCBI Bioproject，项目编号为PRJNA909427。数据特点：该数据集是聚焦 “结直肠癌（CRC）与肠道微生物及 T 细胞浸润关联” 的医学宏基因组学数据集，核心特点围绕 “临床样本 + 多维度检测数据” 的精准关联设计，具体如下： <ul><li>样本与检测类型聚焦：样本来源于 34 名接受手术的 CRC 患者，覆盖两类核心样本类型 ——28 个粪便样本（肠道微生物主要载体）和 39 个组织样本（细分肿瘤核心、肿瘤表面、健康邻近组织 3 种部位），实现 “肠道整体微生物 + 肿瘤局部微环境微生物” 的多场景覆盖；检测手段以16S rRNA（V4 区）基因测序为主（通过 SRA 文件存储测序原始数据），可精准识别样本中的微生物群落组成，同时搭配 T 细胞浸润指标（CD3、CD45R0、CD8 等免疫细胞标志物浓度）与临床信息标注，形成 “微生物 - 免疫 - 临床” 的多维度数据链。</li><li>标注信息结构化与完整性：附带的metadata.txt表格包含 10 个核心字段，涵盖样本唯一标识（Run/Sample Name）、样本来源（host_body_product/sample_site，如粪便 / 肿瘤核心）、患者临床信息（host_sex/Host_Age，性别 / 年龄）、免疫指标（CD3/CD45R0/CD8 浓度，部分样本含缺失值但标注明确）、肠系膜淋巴结状态（Mesenteric_Nodes，Present/Absent），字段定义清晰、格式标准化，可直接用于数据关联分析，无需额外格式转换。</li><li>数据来源权威与可追溯：测序原始数据源自 NCBI Bioproject（项目编号 PRJNA909427），临床样本采集与检测遵循医学研究规范，SRA 文件（测序原始数据格式）可通过 NCBI 或 DataONE 平台溯源，确保数据的科学性与可复现性，适合医学领域的严谨研究。</li></ul>数据规模：包含 34 名 CRC 患者的 67 个样本（28 个粪便样本 + 39 个组织样本），对应 67 个 SRA 格式测序原始文件应用场景：该数据集核心服务于 “结直肠癌微生物组学与免疫机制研究” <ul><li>医学基础研究：用于分析 CRC 患者肠道微生物群落结构特征，如对比 “粪便 vs 肿瘤组织”“肿瘤核心 vs 健康邻近组织” 的微生物组成差异，筛选与 CRC 发生、发展相关的关键微生物物种（如致病菌或益生菌）；同时可关联 T 细胞浸润指标，研究 “微生物群落是否通过调控免疫细胞浸润影响肿瘤进展”，为 CRC 的微生物致病机制提供数据支撑。</li><li>宏基因组学算法开发与验证：作为 16S rRNA 测序数据的标准样本集，可用于验证微生物分类算法（如 QIIME、Mothur）的物种识别精度，或优化 “低生物量样本（如肿瘤组织）” 的微生物检测流程（减少宿主 DNA 干扰），提升宏基因组学分析工具在临床样本中的适用性。</li><li>CRC 诊断标志物筛选：通过对比 CRC 患者与健康人群（需结合外部健康对照数据）的微生物差异，或分析不同临床分期 CRC 患者的微生物特征，筛选潜在的 “微生物诊断标志物”（如某类细菌的相对丰度阈值），辅助开发非侵入性 CRC 早期筛查工具（如粪便微生物检测试剂盒）。</li><li>教学与实验示范：作为高校医学、生物信息学专业的教学数据集，用于演示 “宏基因组学数据处理流程”（SRA 文件解压→质量控制→OTU 聚类→物种注释）、“临床数据与组学数据关联分析”（如用统计模型分析 CD3 浓度与某微生物丰度的相关性），帮助学生理解医学组学研究的完整链路。</li></ul>

16S rRNA (V4 region) gene sequencing data were generated from feces, tumor core, tumor surface, and healthy adjacent tissues collected from 34 surgically treated colorectal cancer (CRC) patients, including 28 fecal samples and 39 tissue samples. All data were sourced from NCBI Bioproject under accession number PRJNA909427. Data characteristics: This dataset is a medical metagenomics study focusing on the association between colorectal cancer (CRC), gut microbiota, and T cell infiltration. Its core design centers on precise linkage of "clinical samples + multi-dimensional detection data", with details as follows: - Focused sample and assay types: Samples were collected from 34 surgically treated CRC patients, covering two core sample categories: 28 fecal samples (the main carrier of gut microbiota) and 39 tissue samples, which are further divided into three subtypes: tumor core, tumor surface, and healthy adjacent tissues. This design enables multi-scenario coverage of both "gut-wide microbiota" and "tumor local microenvironment microbiota". The primary detection method is 16S rRNA (V4 region) gene sequencing, with raw sequencing data stored in SRA files, which can accurately identify the microbial community composition in samples. Additionally, paired with T cell infiltration indicators (concentrations of immune cell markers such as CD3, CD45R0, and CD8) and annotated clinical information, a multi-dimensional data chain of "microbiota-immunity-clinic" is established. - Structured and complete annotation information: The attached metadata.txt table contains 10 core fields, covering sample unique identifiers ("Run/Sample Name"), sample sources (host_body_product/sample_site, e.g., feces/tumor core), patient clinical information (host_sex/Host_Age, sex/age), immune indicators (CD3/CD45R0/CD8 concentrations; some samples have missing values but are clearly annotated), and mesenteric lymph node status (Mesenteric_Nodes, Present/Absent). The fields have clear definitions and standardized formats, allowing direct use for data association analysis without additional format conversion. - Authoritative and traceable data source: Raw sequencing data are sourced from NCBI Bioproject under accession number PRJNA909427. Clinical sample collection and assays comply with medical research standards. SRA files (raw sequencing data format) can be traced back via NCBI or DataONE platforms, ensuring the scientific validity and reproducibility of the data, making it suitable for rigorous medical research. Dataset scale: A total of 67 samples from 34 CRC patients are included (28 fecal samples + 39 tissue samples), corresponding to 67 raw sequencing files in SRA format. Application scenarios: This dataset primarily serves research on "colorectal cancer microbiomics and immune mechanisms" - Basic medical research: Used to analyze the structural characteristics of gut microbial communities in CRC patients, such as comparing microbial composition differences between "feces vs. tumor tissues" and "tumor core vs. healthy adjacent tissues", and screening key microbial species associated with the occurrence and development of CRC (e.g., pathogenic bacteria or probiotics). It can also be linked to T cell infiltration indicators to study whether microbial communities regulate tumor progression by modulating immune cell infiltration, providing data support for the microbial pathogenic mechanism of CRC. - Metagenomic algorithm development and validation: As a standard sample set for 16S rRNA sequencing data, it can be used to verify the species identification accuracy of microbial classification algorithms (e.g., QIIME, Mothur), or optimize microbial detection workflows for low-biomass samples (e.g., tumor tissues) to reduce host DNA interference, thereby improving the applicability of metagenomic analysis tools in clinical samples. - CRC diagnostic biomarker screening: By comparing microbial differences between CRC patients and healthy individuals (requiring external healthy control data), or analyzing microbial characteristics of CRC patients at different clinical stages, potential "microbial diagnostic biomarkers" (e.g., relative abundance thresholds of specific bacterial groups) can be screened, assisting in the development of non-invasive early CRC screening tools (e.g., fecal microbial detection kits). - Teaching and experimental demonstration: As a teaching dataset for university medical and bioinformatics majors, it is used to demonstrate "metagenomic data processing workflows" (SRA file decompression → quality control → OTU clustering → species annotation) and "correlation analysis between clinical data and omics data" (e.g., using statistical models to analyze the correlation between CD3 concentration and the abundance of a specific microbe), helping students understand the complete workflow of medical omics research.

提供机构：

库帕思

创建时间：

2025-09-23

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成