Mouse data for whole-embryo lineage reconstruction with linajea
收藏janelia.figshare.com2024-05-09 更新2025-03-21 收录
下载链接:
https://janelia.figshare.com/articles/dataset/Mouse_data_for_whole-embryo_lineage_reconstruction_with_linajea/24768798/1
下载链接
链接失效反馈官方服务:
资源简介:
This article enables access to the mouse dataset (140521) for "Automated reconstruction of whole-embryo cell lineages by learning from sparse annotations" (Malin-Mayor et al. 2023, DOI: https://doi.org/10.1038/s41587-022-01427-7).Here we provide the ground truth tracks used to train the deep learning model, the trained networks, and the predicted tracks. Additionally, we provide information on how to access the image data, although it is not uploaded here due to size. Related artifacts include the source code for experiments and methods.Image DataThe image dataset in n5/zarr format (as used in Malin-Mayor et al. 2023) can be accessed at the following Dropbox link: https://www.dropbox.com/scl/fi/2mt7jxmtl80s3zf2byfyr/140521_mouse.tar.gz?rlkey=n5r311whn8ky4gdabybjdekcc&dl=0. This image dataset was originally published in "In Toto Imaging and Reconstruction of Post-Implantation Mouse Development at the Single-Cell Level" ( McDole et al. 2018, DOI: https://doi.org/10.1016/j.cell.2018.09.031), and can also be accessed in the Image Dataset Repository in .klb format along with associated metadata at https://idr.openmicroscopy.org/webclient/?show=project-502.Ground Truth TracksInside gt_tracks.zip there are a number of files containing different subsets of tracks. Each has the following columns separated by tabs: time, z, y, x, cell_id, parent_id, track_id.tracks.txt is the main file containg manual annotations of individual cells from start to end of video used to train the model. These tracks are sparse, but each cell included in the tracks.txt had its whole lineage traced as completely as possible from start to end of the video.division_tracks.txt is a different set of manually annotated tracks, where each track is around 5 frames long and centers around a division. daughter_cells.txt is a subset of division_tracks.txt containing only the cells directly after a division event, and was generated for convenient and efficient training of models where divisions are oversampled.full_frame_divisions.txt is a set of manually annotated division points (points right before the cell divides) that are as complete as possible for target time points 120, 240, and 360 and the adjacent time frames, which was used for evaluation and not model training.Trained Modelstrained_networks.zip includes all networks trained on the mouse dataset. The model we suggest using for best performance is described in 140521_mouse_simple_train_all_config.json and the weights are included in train_net_checkpoint_400000.*. This model was trained and validated on all available ground truth data, and as such is NOT the same as the models used to report results in the paper.supp_figure_2 includes the configs and models used to report results in the Supplemental Figure 2a of the paper, and Figure 2a of the main text. We separated the data into train/validation/test splits on "early" (times 50-100), "middle" (times 225-275) and "late" (times 400-450). Each model has two time splits held out for validation and testing, and therefore was trained on the remaining split as well as all time frames not in one of the splits. For the mouse, this resulted in 3 trained networks for the main ("setup11_simple") architecture.supp_figure_6b contains the configs and trained models presented in the ablation study in the Supplemental Figure 6B.Predicted Trackspredicted_tracks.zip contains both the TGMM baseline results and the results for the linajea method.tgmm/140521_shifted_TGMM.xml contains the TGMM results provided to us by the authors of the TGMM method.The linajea results are organized similarly to the trained models. mouse_all_results_071621.txt contains the tracks predicted by the model trained on all ground truth tracks (140521_mouse_simple_train_all). Again, these are NOT the tracks evaluated in the paper, but they are likely to be the most correct since they were trained on the most data.supp_figure_2 contains the tracks used in the main Figure 2a and in the Supplemental Figure 2a. supp_figure_6b contains the tracks used in Supplemental Figure 6B (ablation study).
本文允许访问小鼠数据集(140521),用于“通过学习稀疏标注实现整个胚胎细胞谱系的自动重建”(Malin-Mayor 等人,2023,DOI:https://doi.org/10.1038/s41587-022-01427-7)。在此,我们提供了用于训练深度学习模型的真实轨迹、训练网络和预测轨迹。此外,我们还提供了访问图像数据的方法信息,尽管由于数据量较大,图像数据并未在此上传。相关成果包括实验和方法的源代码。图像数据以 n5/zarr 格式(Malin-Mayor 等人,2023 所用格式)存储的图像数据集可通过以下 Dropbox 链接获取:https://www.dropbox.com/scl/fi/2mt7jxmtl80s3zf2byfyr/140521_mouse.tar.gz?rlkey=n5r311whn8ky4gdabybjdekcc&dl=0。该图像数据集最初发表于“在单细胞水平上对植入后小鼠发育进行 In Toto 成像和重建”(McDole 等人,2018,DOI:https://doi.org/10.1016/j.cell.2018.09.031),并且可以在 https://idr.openmicroscopy.org/webclient/?show=project-502 的 Image Dataset Repository 以 .klb 格式及其相关元数据获取。真实轨迹在 gt_tracks.zip 中包含多个文件,分别包含轨迹的不同子集。每个文件包含以下列,以制表符分隔:时间、z 坐标、y 坐标、x 坐标、细胞 ID、父细胞 ID、轨迹 ID。tracks.txt 是主要文件,包含从视频开始到结束对单个细胞的手动标注。这些轨迹是稀疏的,但每个包含在 tracks.txt 中的细胞都尽可能地追踪了从视频开始到结束的整个谱系。division_tracks.txt 是一组不同的手动标注轨迹,每个轨迹大约 5 帧,围绕一个分裂中心。daughters_cells.txt 是 division_tracks.txt 的一个子集,仅包含分裂事件后的细胞,并为在分裂事件上过度采样的模型训练提供了便利和高效的生成。full_frame_divisions.txt 是一组手动标注的分裂点(细胞分裂前的点),尽可能完整,用于目标时间点 120、240 和 360 以及相邻时间帧的评估,而不是模型训练。训练模型在 trained_networks.zip 中包含所有在鼠标数据集上训练的网络。我们建议使用的、以最佳性能为目标的模型在 140521_mouse_simple_train_all_config. 中进行了描述,其权重包含在 train_net_checkpoint_400000.* 中。此模型在所有可用真实数据上进行了训练和验证,因此与论文中报告结果所使用的模型不同。supp_figure_2 包含了用于报告论文补充图 2a 和正文图 2a 的配置和模型。我们根据“早期”(时间 50-100)、“中期”(时间 225-275)和“晚期”(时间 400-450)将数据分开为训练/验证/测试划分。每个模型都有两个时间划分用于验证和测试,因此也在此之外的所有时间帧上进行了训练。对于鼠标,这导致了 3 个主要(“setup11_simple”)架构的训练网络。supp_figure_6b 包含了在补充图 6B 的消融研究中呈现的配置和训练模型。预测轨迹predicted_tracks.zip 包含了 TGMM 基线结果和 linajea 方法的预测结果。tgmm/140521_shifted_TGMM.xml 包含了 TGMM 方法作者提供的 TGMM 结果。linajea 结果的组织方式与训练模型类似。mouse_all_results_071621.txt 包含了由在所有真实轨迹上训练的模型(140521_mouse_simple_train_all)预测的轨迹。这些轨迹不是论文中评估的轨迹,但它们可能是最正确的,因为它们是在最多数据上训练的。supp_figure_2 包含了用于正文图 2a 和补充图 2a 的轨迹。supp_figure_6b 包含了用于补充图 6B(消融研究)的轨迹。
提供机构:
Janelia Research Campus



