Advance in understanding cis regulation of the plant gene with an emphasis on comparative genomics
收藏DataCite Commons2024-03-24 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Advance_in_understanding_cis_regulation_of_the_plant_gene_with_an_emphasis_on_comparative_genomics/1393390
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a list of <em>Arabidopsis thaliana</em> CNS sequences present in all three of the following CNS lists:
1) Haudry et al. (2013) An atalas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45:891-898.
2) PL3.0 (TAIR 10 version):
Turco et al. (2013) Automated conserved noncoding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Frontiers in Plant Genetics and Genomics 4:170-180.
3) Van de Velde et al (2014) Inferences of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. Plant Cell 26:2729-2745.
CNS sequences found in all 3 CNS lists were identified using intersectBed from the BEDTools suite (Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842).
CNSs from the verified list were assigned to an <em>Arabidopsis thaliana</em> gene based on their PL3.0 component. PL3.0 CNSs are defined as syntenic conserved noncoding regions between <em>Arabidopsis thaliana</em> and the early branching Brassicaceae <em>Aethionema</em> <em>arabicum</em>. Orthologous <em>Arabidopsis thaliana-Aethionema arabicum</em> genes were identified using a combination of CoGe: Synfind (Tang et al. (2011) BMC Bioinformatics 12:102) and the PL3.0 CNS pipeline (Turco et al. 2013). closestBed (Bedtools) was then used to map PL3.0 CNSs to the closest Arabidopsis thaliana gene which had an Aethionema arabicum ortholog. Distance to the nearest gene is included in the closestBed output. Proximal regions were defined as being 1000 bp upstream from the transcription start site (5' proximal) or 1000 bp downstream from the gene (3' proximal). For intragenic CNSs, a custom perlscript was used to identify the position of the CNS in introns vs UTRs. Overlap with UTRs and CDS regions was calculated using intersectBed (BEDTools) using bedfiles created from GFF "UTR" and "CDS" features. CNS sequences overlapping CDSs by 50% or more were given "CDS" designations. CNSs overlapping UTRs by 50% or more were given 5' or 3' UTR designations.
Note: CNS assignments to <em>Arabidopsis thaliana</em> genes are best-guess computational assignments; individual PL3.0 CNSs may in actuality function in regulating genes that are not the closest <em>Arabidopsis thaliana</em> gene with an A<em>ethionema arabicum</em> ortholog. This is particularly true for genes with complex regulation. In the GEvo links included in this spreadsheet these can often be seen as clusters of CNSs extending beyond the midpoint between two <em>Arabidopsis thaliana</em> genes. By adding additional orthologous genes to GEvo panels, it is often possible to assign a CNS to an <em>Arabidopsis thaliana</em> gene with greater confidence if only one of the two <em>Arabidopsis thaliana</em> genes is retained in all genomes along with the CNS.
提供机构:
figshare
创建时间:
2015-04-30



