five

Arabidopsis thaliana CNSs verified in at least 2 CNS lists

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Arabidopsis_thaliana_CNSs_verified_in_at_least_2_CNS_lists/1422166
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a list of Arabidopsis thaliana CNS sequences present in at least two of the three following CNS lists: 1) Haudry et al. (2013) An atalas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45:891-898. 2) PL3.0 (TAIR 10 version): Turco et al. (2013) Automated conserved noncoding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Frontiers in Plant Genetics and Genomics 4:170-180. 3) Van de Velde et al (2014) Inferences of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. Plant Cell 26:2729-2745. CNS sequences found in at least 2 of the 3 CNS lists were identified using multiIntersectBed from the BEDTools suite (Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842). CNSs from the verified2 list were assigned to an Arabidopsis thaliana gene based on their PL3.0 component. PL3.0 CNSs are defined as syntenic conserved noncoding regions between Arabidopsis thaliana and the early branching Brassicaceae Aethionema arabicum. Orthologous Arabidopsis thaliana-Aethionema arabicum genes were identified using a combination of CoGe: Synfind (Tang et al. (2011) BMC Bioinformatics 12:102) and the PL3.0 CNS pipeline (Turco et al. 2013). closestBed (Bedtools) was then used to map PL3.0 CNSs to the closest Arabidopsis thaliana gene with an Aethionema arabicum ortholog. Distance to the nearest gene is included in the closestBed output. Proximal regions were defined as being 1000 bp upstream from the transcription start site (5' proximal) or 1000 bp downstream from the gene (3' proximal). CNSs without a PL3.0 component were also assigned to an Arabidopsis thaliana gene if they were intragenic or if they were in the genespace of an arabidopsis gene, with the genespace being defined as the region between and encompassing the 5'-most PL3.0 CNS and the 3'-most PL3.0 CNS. For intragenic CNSs, a custom perlscript was used to identify the position of the CNS in introns vs UTRs. Overlap with UTRs and CDS regions was calculated using intersectBed (BEDTools) using bedfiles created from GFF "UTR", "gene", and "CDS" features. CNS sequences overlapping CDSs by 50% or more were given "CDS" designations. CNSs overlapping UTRs by 50% or more were given 5' or 3' UTR designations. Note: CNS assignments to Arabidopsis thaliana genes are best-guess computational assignments; individual PL3.0 CNSs may in actuality function in regulating genes that are not the closest Arabidopsis thaliana gene with an Aethionema arabicum ortholog. This is particularly true for genes with complex regulation. In the GEvo links included in this spreadsheet these can often be seen as clusters of CNSs extending beyond the midpoint between two Arabidopsis thaliana genes. By adding additional orthologous genes to GEvo panels, it is often possible to assign a CNS to an Arabidopsis thaliana gene with greater confidence if only one of the two Arabidopsis thaliana genes is retained in all genomes along with the CNS.
创建时间:
2015-05-21
二维码
社区交流群
二维码
科研交流群
商业服务