Data from: Size distribution of function-based human gene sets and the split-merge model
收藏DataONE2016-07-05 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
The sizes of paralogs -- gene families produced by ancestral duplication, are known to follow a power-law distribution. We examine the size distribution of gene sets, or gene families where genes are grouped by a similar function or share a common property. The size distribution of of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a Beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the sub functional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations change the size distribution of Ensembl paralogs could lead to a distribution fitted by a rank Beta function. We further illustrate application of Beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.
创建时间:
2016-07-05



