Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset - PXD107710 - ISA representation
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3742218
下载链接
链接失效反馈官方服务:
资源简介:
Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset deposited in PRIDE database with accession number: PXD107710
ISA-Tab annotation for the "SARS-CoV-2 infected host cell proteomics reveal potential therapy targets" publication.
Github repository: https://github.com/ISA-tools/PXD017710
This is part of an effort to (re-)annotate: https://dx.doi.org/10.21203/rs.3.rs-17218/v1
Additional work done as part of:
https://github.com/virtual-biohackathons/covid-19-bh20
https://github.com/virtual-biohackathons/covid-19-bh20/wiki/FairData
Proteomics data
Available from PRIDE at https://www.ebi.ac.uk/pride/archive/projects/PXD017710
and [MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710]
ISA-Tab representation:
Rationale: Demonstrate suitability of the ISA format for representing MS based protein profiling experiment with more granularity and details, thus providing a better representation of the experiment design.
The formatting and re-annotation are based on information extracted from:
- the original publication
- the supplementary tables available from the publishers site
- the 'filtered-results.csv' helper file as supplied to @sneumann during the HUPO-PSI meeting March 2020
Viewing the ISA-tab formatted and re-annotated PXD017710 with ISATab-Viewer
Viewing the ISA-tab formatted and re-annotated PXD017710 locally, do the following:
```bash
python -m http.server 8000
```
Then point your browser to `http://0.0.0.0:8000/isaviewer-demo.html`
Curation tasks performed:
* initial structure of the study design in ISA format:
* linkage of Proteome and Translatome data (supplementary material) to ISA assay tables (via Derived Data File)
* processing the Proteome and Translatome data (supplementary material) with python pandas library to generate the following csv files:
- proteome_intensities_long_table_ggplot2.txt
- proteome_diffanal_ratio_pvalue_long_table_ggplot2.txt
- translatome_intensities_long_table_ggplot2.txt
- translatome_diffanal_ratio_pvalue_long_table_ggplot2
The files are `long table` corresponding to a `melt` on the Excel file originally generated by the users and can be readily loaded in R ggplot2 library for graphical representation.
The statistical relevant elements have been annotated with the STATO ontology and the tables comply with a Frictionless.io Data Package.
The jupyter notebook for the transformation is available.
* conversion of raw data to mzML format: detailed in https://github.com/ISA-tools/PXD017710
install docker:
```bash
>brew update
>brew install docker
```
sign in to docker
```bash
>docker start
>docker login
```
pull docker container for ProteoWizard:
```bash
>docker pull chambm/pwiz-i-agree-to-the-vendor-licenses
```
:warning: be sure to sign-up and login to https://hub.docker.com/
in order to be able to reach
https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses
run the pwiz tool from the container over the raw data:
```bash
docker run -it --rm -e WINEDEBUG=-all -v /Users/Downloads/PXD017710/raw/:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /data/*.raw --mzML
```
* ontology markup for:
* declaration of independent variables as ISA Study Factors:{biological agent, dose, time point, replicate} ->OBI
* Taxonomic information (host cells and virus) -> NCBITaxonomy
* Cell line: CaCo-2 cells -> Cell Line Ontology
* Disease: Colon Cancer -> Human Phenotype Ontology
* MS specific aspect (TMT reagent, instrument ... ) -> PSI-MS
* Statistical Tests -> STATO
Unresolved curatorial issues:
1. ambiguities related to Tandem Mass Tag labelling protocol
- the publication mentions TMT11 (see Figure 2 in https://www.researchsquare.com/article/rs-17218/v1)
- the information available from PRIDE mentions TMT6 (https://www.ebi.ac.uk/pride/archive/projects/PXD017710)
This may require another round of annotation on the TMT agents and fractions in the ISA a_assay representation
2. SARS-Cov2 isolate: no clear NCBI Taxonomic anchoring and unclear origin: -> the markup is made to the parent class (as of 06.04.2020)
Release and packaging as a BDBAG:
The tgz file associated with this upload has been producing using https://github.com/fair-research/bdbag. It contains several manifest files detailing metadata and data files, providing md5 and sha256 checksums.
Github repository: https://github.com/ISA-tools/PXD017710
创建时间:
2020-04-07



