bulk-tumour-api: a programmatically accessible dataset of pre-processed bulk tumour sequencing data
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6670390
下载链接
链接失效反馈官方服务:
资源简介:
This repository, including the API, are currently under development.
bulk-tumour-api: A programmatically accessible dataset of pre-processed bulk tumour sequencing data. The python API can be found at https://github.com/tomouellette/bulk-tumour-api. All data stored in this repository have been collected from open access online sources. Original references and sources are provided in database.tsv (for empirical patient data) and synthetic.tsv (for simulated data).
A note on datasets:
All empirical patient sequencing samples have been processed into pseudo-VCF files which at minimum contain the following columns: sample identifier (sample), patient identifier (patient), chromosome (chr), position (pos), variant allele frequency (VAF), alternate read counts (t_alt_count), depth (DP), and total copy number (total_cn). However, if more data is required, unprocessed data including copy number segments or gene-level calls, clinical, and/or biopsy level information can be found in the /raw/.
All synthetic datasets have also been processed in pseudo-VCF files. In some cases, all ground truth information (e.g. subclone frequency) is contained within the pseudo-VCF. In other cases, additional meta/ground-truth information are in separate files; any simulated sample with a column marked has_meta = True will have multiple files that will be downloaded together.
创建时间:
2022-06-21



