five

Data for common data models to streamline metabolomics processing and annotation, and implementation in a Python pipeline

收藏
Mendeley Data2024-06-05 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/10974781
下载链接
链接失效反馈
官方服务:
资源简介:
This upload contains the HZV029 Plasma and HZV029 Two-Phase dataset for reviewers of the "Data for common data models to streamline metabolomics processing and annotation, and implementation in a Python pipeline" submission. Both datasets will be uploaded to metabolomics workbench and the upload completed before final publication of the manuscript. For the he HZV029 Plasma datasets only the final run is included for any sample (i.e., failed injections or other samples with data quality issues that were reran during acquisition were omitted). Also included in the upload is the source code for the MetDataModel and the pcpfm at the time of manuscript re-submission and the pcpfm itself. If you find this upload in the future, please check out the github repos for more updated versions: https://github.com/shuzhao-li-lab/PythonCentricPipelineForMetabolomics https://github.com/shuzhao-li-lab/metDataModel The github repo does not store the input the data for space reasons, they only have the notebooks. However, the .zip here has both the notebooks by themselves in the notebook subdirectory and a separate directory with the notebooks and the data used to generate all the figures and results in the manuscript. Some information that is needed to rerun this analysis: Sequence files are critical to the functioning of the pipeline. The sequence files for all analyses are provided under sequence_files.zip. These can be used to recapitulate the analysis by eitehr changing the filepath to each acquisition to where you put it on your sytem or by placing the sequence file in the same directory as the mzml or raw. In the latter case, the pipeline will search for filenames matching the sample names. The sequence files also store some sample metadata such as the type of sample a given acquisition is (unknown, pooled, qc, etc...) .raw to .mzML conversion works well on MacOS but may not work well on other systems. You will need to use the ability to specify your own conversion command or convert files outside of the pipeline. To replicate the results, you do need to have the annotation sources downloaded which can be done using the pipeline. MS2 annotation requires the files in the AcquireX directory which is MS2 acquisitions on pooled HZV029 plasma samples. For the comparison between MetaboAnalystR and the pcpfm, subsets of the datasets were used. These subsets and the sequence files are in Subsets_for_performance_testing.zip. The sequences are also in the sequence_files directory as well The notebooks reference data in the analysis folders. Copies of these files are located with the notebooks to ease reproduction of the exact results in the paper; however, to do so, you will need to change paths to this data in the notebook. This lets the notebooks be ran during a rerun without copying intermediates back and forth and it keeps the github repo clean. Version History: This version is after reviewer comments and is for resubmission. Contributions: Joshua M Mitchell implemented the pipeline and was first author on the manuscript. Shuzhao Li is the corresponding author on the manuscript. Maheshwor Thapa performed the experiments to collect the HZV029 data. Yuanye Chi helped with testing and documenting the pipeline. Jiangou (Jeff) Xia and Zhiqiang Pang provided the R portion of the analysis.
创建时间:
2024-05-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作