five

3′UTR Annotation Files and DaPars Alternative Polyadenylation Analysis Resources for the Human hs1 Genome

收藏
DataONE2026-01-12 更新2026-01-24 收录
下载链接:
https://search.dataone.org/view/sha256:89dd7e89938a4f694f0b0738000733ec944257cb10720d2283c18102c6e5309e
下载链接
链接失效反馈
官方服务:
资源简介:
To extract 3′ untranslated regions (3′UTRs), RefSeq gene annotations for the human hs1 genome were downloaded in BED format from the UCSC Genome Browser Table Browser (file: RefSeq_hs1_geneAnnotation.bed). The following options were selected: Assembly: Jan 2022 (T2T CHM13v2.0/hs1); Group: Genes and Gene Predictions; Track: NCBI RefSeq; and Table: RefSeq All. To enable mapping between RefSeq transcript IDs and common gene symbols, the RefSeq gene annotation was downloaded again with all fields selected, including name and name2, which represent the RefSeq transcript ID and gene symbol, respectively (RefSeq_hs1_geneAnnotation_withName.bed). These fields were extracted using the following AWK command: awk 'BEGIN {OFS=\"\t\"} {print $4, $13}' RefSeq_hs1_geneAnnotation_withName.bed > hs1_refseq_IDmapping.txt The 3′UTR regions were then extracted using the Python script DaPars_Extract_Anno.py from DaPars v2.1(https://github.com/3UTR/DaPars2/wiki) with the following command: python DaPars_Extract_Anno.py \ -b RefSeq_hs1_geneAnnotation.bed \ -s hs1_refseq_IDmapping.txt \ -o hs1_RefSeq_extracted_3UTR.bed The resulting file (hs1_RefSeq_extracted_3UTR.bed) was used as the annotation input in the DaPars configuration file, which is the only annotation-related parameter required by DaPars_main.py from DaPars v1.0.0 (https://xiazlab.org/dapars_tutorial/html/DaPars.html). DaPars_main.py was used to perform pairwise comparisons of the percentage of distal poly(A) site usage index (PDUI) between two experimental conditions, incorporating replicates for each condition. Both DaPars_Extract_Anno.py and DaPars_main.py were executed using Python 3. The DaPars_Extract_Anno.py script internally calls the subtractBed function from BEDTools; therefore, the absence of BEDTools results in the error message: “sh: 1: subtractBed: not found”. Using this pipeline, a total of 42,234, 66,579, and 64,936 3′UTRs were extracted for the hg19, hg38, and hs1 genome builds, respectively. The corresponding numbers of RefSeq genes used for 3′UTR extraction were 42,217 (hg19), 172,809(hg38), and 178,178 (hs1). Post-processing and visualization were performed in R, using the output of DaPars_main.py as input. Genes exhibiting significant 3′UTR lengthening (higher PDUI in Group A relative to Group B [wild-type or control]) were labeled in blue, whereas genes exhibiting significant shortening were labeled in red. The criteria used were: absolute PDUI difference ≥ 0.2, absolute log2 fold change ≥ 1, and concordant directional PDUI change between groups. Genes not meeting these criteria were labeled in gray.
创建时间:
2026-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作