Human open reading frame data
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4048342
下载链接
链接失效反馈官方服务:
资源简介:
Data used in the human de novo ORF paper (Dowling et al. 2020 Genome Biology and Evolution evaa194).
The AA and DNA sequences were used to predict sequence properties such as aggregation propensity and intrinsic structural disorder.
In the publication figures 1, 3, 4, 5, and 7 are based on this data.
Figure 2 uses different ORFs and figure 6 uses these ORFs and their predicted chimpanzee homologs
Last updated: 24/09/2020
Contains:
1. hsapiens.orfs.aa.fa
- all human ORFs as amino-acid sequences
- contains 36524 sequences
Note: this has not been filtered for minimum expression of 0.5 TPM
2. hsapiens.orfs.dna.fa
- all human ORFs as DNA seqeunces
- contains 36524 sequences
Note: this has not been filtered for minimum expression of 0.5 TPM
3. orf_age_annotation.csv
- The 29751 ORFs used for the plots in the human de novo publication.
- Contains ORF_id, minimum age (million years) and annotation status.
- ORFs here where filtered for a minimum expression value of 0.5 TPM
Note: In the publication figures were made using ORFs belong to annotation classes 0 (intergenic), 3 (intron), and 5,6, and 7 (CDS/exon).
0 = intergenic
1 = close to gene (same strand)
2 = close to gene (opposite strand)
3 = intron (same strand)
4 = intron (opposite strand)
5 = CDS (out of frame, same strand)
6 = CDS (out of frame, opposite strand)
7 = CDS (in frame)
- information on minimum age given in million years (column 'age')
0 = human-restricted
6.65 = homolog in Pan
9.06 = homolog in Gorilla
15.76 = homolog in Pongo
28.44 = homolog in Macaca
90 = homolog in Mus
Note: in the publication these correspond to conservation classes 0-5 respectively.
创建时间:
2020-09-25



