five

Enformer Celltyping training positions and cell types

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://figshare.com/articles/dataset/Enformer_Celltyping_training_positions_and_cell_types/22040393
下载链接
链接失效反馈
官方服务:
资源简介:
The training positions and cell types for the Enformer Celltyping model. More on this here. In detail: When training Enformer Celltyping, we used the following approach to identify training positions: 1. Bin genome based on predictive window 2. Filter bins to select training set based on DNA and cell type filters. * DNA filters: 1. Leave buffer at start/end of chromosome large enough for DNA and local chromatin accessibility windows 2. Not in blacklist regions * Cell type filters: 1. Coverage for the histone mark > 12.5% of the returned window to prioritise training on regions with peaks. 3. Down sample resulting regions to equal the lowest count of regions for any histone mark so each hist mark has equal representation. This avoids the model biasing training on one mark. This results in 67,007 training & validation positions (cell type and genomic region combinations) and 14,188 unique genomic positions which is similar to number of positions basenji & enformer trained on (14,533). The approach ensures model sees peaks for all histone marks. The validation set positons are randomly shifted by up to a quarter of the predictive window so the model's performance doesn't overfit to the initial genomic bins.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作