five

Advances in understanding cis regulation of the plant gene with an emphasis on comparative genomics

收藏
DataCite Commons2025-06-01 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/Advances_in_understanding_cis_regulation_of_the_plant_gene_with_an_emphasis_on_comparative_genomics/1397563/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a list of <em>Arabidopsis thaliana</em> CNSs concatenated from the following CNS lists: 1) Haudry et al. (2013) An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45:891-898. 2) PL3.0 (TAIR 10 version): Turco et al. (2013) Automated conserved noncoding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Frontiers in Plant Genetics and Genomics 4:170-180. 3) Van de Velde et al (2014) Inferences of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. Plant Cell 26:2729-2745. CNSs from the individual lists were concatenated. PL3.0 CNSs, syntenic conserved noncoding regions between <em>Arabidopsis thaliana</em> and the early branching Brassicaceae <em>Aethionema arabicum</em>, were assigned to the closest <em>Arabidopsis thaliana</em> gene with an <em>Aethionema arabicum</em> ortholog. Orthologous <em>Arabidopsis thaliana-Aethionema arabicum</em> genes were identified using a combination of CoGe: Synfind (Tang et al. (2011) BMC Bioinformatics 12:102) and the PL3.0 CNS pipeline output (Turco et al. 2013). closestBed (Bedtools) was then used to map PL3.0 CNSs to the closest <em>Arabidopsis thaliana</em> gene which had an <em>Aethionema arabicum</em> ortholog. Distance to the nearest gene is included in the closestBed output. Proximal regions were defined as being 1000 bp upstream from the transcription start site (5' proximal) or 1000 bp downstream from the gene (3' proximal). For intragenic CNSs, a custom perlscript was used to identify the position of the CNS in introns vs UTRs. Overlap with UTRs and CDS regions was calculated using intersectBed (BEDTools) using bedfiles created from GFF "UTR" and "CDS" features. CNS sequences overlapping CDSs by 50% or more were given "CDS" designations. CNSs overlapping UTRs by 50% or more were given 5' or 3' UTR designations. CNSs from the Haudry and Van de Velde CNS lists were then assigned to an <em>Arabidopsis thaliana</em> gene if they were present in the genespace of an arabidopsis gene, with the genespace being defined as the region between and encompassing the 5'-most PL3.0 CNS and the 3'-most PL3.0 CNS. Once assigned to an arabidopsis gene, the distance to that gene was calculated using closestBed (BEDTools) and intersectBed was used, as above, to identify the position of intragenic CNSs. An A<em>rabidopsis thaliana</em> genome has been made available on CoGe, dsgid 25725, decorated with 2 sets of CNSs: 1) the PL3.0 CNSs from this datasheet and 2) a merged set of CNSs from the PL3.0, Haudry, and Van de Velde CNS lists. To see the CNSs, in Results Visualization Options, set "Show preannotated CNSs?" to "Yes". Note: CNS assignments to <em>Arabidopsis thaliana</em> genes are best-guess computational assignments; individual PL3.0 CNSs may in actuality function in regulating genes that are not the closest <em>Arabidopsis thaliana</em> gene with an <em>Aethionema arabicum</em> ortholog. This is particularly true for genes with complex regulation. In the GEvo links included in this spreadsheet these can often be seen as clusters of CNSs extending beyond the midpoint between two <em>Arabidopsis thaliana</em> genes. By adding additional orthologous genes to GEvo panels, it is often possible to assign a CNS to an <em>Arabidopsis thaliana</em> gene with greater confidence if only one of the two <em>Arabidopsis thaliana</em> genes is retained in all genomes along with the CNS.
提供机构:
figshare
创建时间:
2016-01-19
二维码
社区交流群
二维码
科研交流群
商业服务