CCLE (Cancer Cell Line Encyclopedia)|癌症研究数据集|分子生物学数据集

portals.broadinstitute.org2024-10-26 收录

癌症研究

分子生物学

下载链接：

https://portals.broadinstitute.org/ccle

下载链接

链接失效反馈

资源简介：

CCLE数据集包含了来自多种癌症细胞系的基因表达、拷贝数变异、突变和药物反应数据。该数据集旨在帮助研究人员理解癌症的分子基础，并开发新的治疗方法。

提供机构：

portals.broadinstitute.org

AI搜集汇总

数据集介绍

构建方式

CCLE（Cancer Cell Line Encyclopedia）数据集的构建基于对多种癌症细胞系的全面基因组和表型分析。研究团队通过高通量测序技术，对超过1000种癌症细胞系的基因表达、突变、拷贝数变异、蛋白质表达等进行了系统性测定。这些数据通过标准化处理和整合，形成了一个庞大的数据库，旨在为癌症研究提供详尽的资源。

特点

CCLE数据集的显著特点在于其广泛性和深度。该数据集涵盖了多种癌症类型，包括但不限于乳腺癌、肺癌、结直肠癌等，且每种癌症类型下又包含多个亚型。此外，CCLE不仅提供了基因层面的数据，还包括药物敏感性、细胞生长速率等表型信息，为多维度研究癌症提供了可能。

使用方法

CCLE数据集的使用方法多样，适用于多种癌症研究场景。研究人员可以通过该数据集进行基因表达谱分析，识别与特定癌症相关的关键基因。同时，结合药物敏感性数据，可以进行药物筛选和个性化治疗方案的制定。此外，CCLE数据集还可用于机器学习模型的训练，以预测癌症的进展和治疗反应。

背景与挑战

背景概述

CCLE（Cancer Cell Line Encyclopedia）数据集由Broad Institute于2009年发起，旨在通过大规模的基因组和药物敏感性分析，揭示癌症细胞系的分子特征。该数据集整合了来自多种癌症类型的细胞系数据，包括基因表达、突变、拷贝数变异和药物反应等信息。CCLE的构建标志着癌症研究从单一基因分析向系统生物学方法的转变，为个性化治疗和药物开发提供了宝贵的资源。其影响力不仅限于学术界，还推动了制药行业对癌症治疗策略的重新评估。

当前挑战

CCLE数据集在构建过程中面临多重挑战。首先，数据整合涉及多种技术平台和数据类型，确保数据的一致性和准确性是一大难题。其次，癌症细胞系的异质性使得从数据中提取有意义的模式变得复杂。此外，药物反应数据的获取和标准化也是一个重要挑战，因为不同实验室的条件和方法可能影响结果的可靠性。最后，如何有效地利用这些海量数据进行临床转化，仍需进一步研究和探索。

发展历史

创建时间与更新

CCLE（Cancer Cell Line Encyclopedia）数据集于2009年首次创建，旨在为癌症研究提供一个全面的细胞系数据库。该数据集自创建以来，经历了多次重要更新，最近一次大规模更新发生在2020年，进一步丰富了其内容和覆盖范围。

重要里程碑

CCLE数据集的重要里程碑之一是其在2012年发布的初始版本，该版本包含了超过1000种癌症细胞系的基因表达、拷贝数变异和突变数据，极大地推动了癌症基因组学的研究。随后，2019年的更新引入了单细胞RNA测序数据，使得研究者能够更深入地理解癌症细胞的异质性。此外，2020年的更新不仅增加了新的细胞系数据，还整合了药物敏感性数据，为个性化医疗提供了宝贵的资源。

当前发展情况

当前，CCLE数据集已成为癌症研究领域不可或缺的资源，其数据被广泛应用于基因组学、药物筛选和生物标志物发现等多个方面。通过不断更新和扩展，CCLE不仅提升了对癌症生物学的理解，还促进了新药开发和临床试验的设计。未来，随着技术的进步和数据的积累，CCLE有望继续引领癌症研究的前沿，为实现精准医疗提供更强大的支持。

发展历程

CCLE项目启动，旨在创建一个全面的癌症细胞系数据库，以支持癌症研究。
2009年
首次发表CCLE数据集，包含超过1000种癌症细胞系的基因表达、拷贝数变异和突变数据。
2012年
CCLE数据集扩展至超过1000种癌症细胞系，并增加了药物敏感性数据。
2015年
CCLE数据集更新，包含超过1500种癌症细胞系的全面基因组和表型数据。
2019年
CCLE数据集进一步扩展，增加了单细胞RNA测序数据，以提供更精细的癌症细胞系分析。
2021年

常用场景

经典使用场景

在癌症研究领域，CCLE（Cancer Cell Line Encyclopedia）数据集被广泛用于探索癌细胞系的基因表达、突变和药物反应等特性。通过分析这些数据，研究人员能够深入了解不同癌症类型的分子机制，从而为个性化治疗提供理论基础。CCLE数据集的经典使用场景包括基因表达谱分析、药物敏感性预测以及癌症驱动基因的识别，这些研究为癌症治疗策略的优化提供了重要依据。

实际应用

在实际应用中，CCLE数据集被广泛用于药物开发和临床试验的设计。制药公司利用该数据集筛选潜在的抗癌药物，并通过模拟实验验证其有效性。此外，临床医生可以利用CCLE数据集中的信息，为患者制定个性化的治疗方案，提高治疗效果和患者生存率。CCLE数据集还支持癌症研究机构进行跨学科合作，推动基础研究向临床应用的转化。

衍生相关工作

CCLE数据集的发布催生了大量相关研究工作，推动了癌症生物学和药物发现领域的进展。例如，基于CCLE数据集的研究揭示了多种癌症驱动基因的功能和调控机制，为靶向治疗提供了新的靶点。此外，CCLE数据集还促进了机器学习和人工智能在癌症研究中的应用，开发出多种预测模型和算法，用于药物反应预测和癌症风险评估。这些衍生工作不仅丰富了癌症研究的理论基础，也为实际应用提供了技术支持。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

Figshare

Figshare是一个在线数据共享平台，允许研究人员上传和共享各种类型的研究成果，包括数据集、论文、图像、视频等。它旨在促进科学研究的开放性和可重复性。

figshare.com 收录

HIT-UAV

HIT-UAV数据集包含2898张红外热成像图像，这些图像从43,470帧无人机拍摄的画面中提取。数据集涵盖了多种场景，如学校、停车场、道路和游乐场，在不同的光照条件下，包括白天和夜晚。

github 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

网易云音乐数据集

该数据集包含了网易云音乐平台上的歌手信息、歌曲信息和歌单信息，数据通过爬虫技术获取并整理成CSV格式，用于音乐数据挖掘和推荐系统构建。

github 收录

VQA

我们提出了自由形式和开放式视觉问答 (VQA) 的任务。给定图像和关于图像的自然语言问题，任务是提供准确的自然语言答案。反映许多现实世界的场景，例如帮助视障人士，问题和答案都是开放式的。视觉问题有选择地针对图像的不同区域，包括背景细节和底层上下文。因此，与生成通用图像说明的系统相比，在 VQA 上取得成功的系统通常需要对图像和复杂推理有更详细的理解。此外，VQA 适合自动评估，因为许多开放式答案仅包含几个单词或一组封闭的答案，可以以多项选择的形式提供。我们提供了一个数据集包含 100，000 的图像和问题并讨论它提供的信息。提供了许多 VQA 基线，并与人类表现进行了比较。

OpenDataLab 收录