songlab/gpn-msa-hg38-scores
收藏GPN-MSA predictions for all possible SNPs in the human genome (~9 billion)
数据集查询
安装tabix
-
在当前conda环境中安装: bash conda install -c bioconda -c conda-forge htslib=1.18
-
在新conda环境中安装: bash conda create -n tabix -c bioconda -c conda-forge htslib=1.18 conda activate tabix
查询特定区域
-
远程文件查询示例(例如BRCA1): bash tabix https://huggingface.co/datasets/songlab/gpn-msa-hg38-scores/resolve/main/scores.tsv.bgz 17:43,044,295-43,125,364
输出格式: | chrom | pos | ref | alt | GPN-MSA score | 示例输出: tsv 17 43044295 T A -1.60 17 43044295 T C -1.47 17 43044295 T G -1.61 17 43044296 G A -1.12 17 43044296 G C -1.46 17 43044296 G T -1.45 17 43044297 G A -1.45 17 43044297 G C -1.55 17 43044297 G T -1.54 17 43044298 A C -1.64
本地文件查询
-
下载文件到本地: bash wget https://huggingface.co/datasets/songlab/gpn-msa-hg38-scores/resolve/main/scores.tsv.bgz wget https://huggingface.co/datasets/songlab/gpn-msa-hg38-scores/resolve/main/scores.tsv.bgz.tbi
-
本地文件查询示例: bash tabix scores.tsv.bgz 17:43,044,295-43,125,364




