five

Datasets of the study: "Describing variability in pig genes involved in coronavirus infections: towards a One Health perspective in conservation of animal genetic resources"

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3992969
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset description Sequencing data (*.bam files) of four pig genes (ACE2, ANPEP, DPP4 and TMPRSS2) that can serve as receptors or protease for priming the infection of coronaviruses. The datasets are related to 22 European pig breeds and wild boars (Alentejana, AL; Apulo-Calabrese, AC; Basque, BA; Bísara, BI; Black Slavonian, BS; Casertana, CA; Cinta Senese, CS; Gascon, GA; Krškopolje, KR; Lithuanian Indigenous Wattle, LIW; Lithuanian White Old Type, LWOT; Majorcan Black, MB; Mora Romagnola, MR; Moravka, MO; Nero Siciliano, NS; Sarda, SA; Schwäbisch-Hällisches Schwein, SHS; Swallow-Bellied Mangalitsa, SBMA; Turopolje, TU; Italian Duroc, IDU; Italian Large White, ILW; Italian Landrace, ILA; Wild Boar, WB). This work took advantage of a study design developed within the Horizon 2020 TREASURE project. Each folder contains *.bam files and the related indexes *.bai. The name of the investigated breed and gene is part of the file name (e.g. ILW.ACE2.bam identifies the sequencing data related to the ACE2 gene in the Italian Large White pig breed). Details of sequencing and the bioinformatic pipeline are below reported. Sequencing data A total of 22 DNA pools were constructed from the European pig breeds and one DNA pool was constructed from European wild boars, including in each pool 30 or 35 individual DNA samples pooled at equimolar concentration. For the 22 DNA pools, libraries were prepared and fed into an Illumina HiSeq X Ten sequencer for paired-end sequencing, obtaining 150 bp length reads. The wild boar DNA pool was sequenced from 250 bp fragment libraries, with 100 bp long paired-end reads, on the BGISeq 500 platform, following the provider’s procedures. Data processing Reads that were obtained from the sequenced libraries were cleaned by removing adapter sequences and filtering out sequences presenting more than 10% unknown bases (N) and/or containing low-quality bases (Q ≤ 5) over 50% of the total sequenced bases. Then, filtered high-quality reads were mapped on the latest version of the Sus scrofa reference genome (Sscrofa11.1; https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/025/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.fna.gz) using the BWA-MEM algorithm v.0.7.17 and the parameters for paired-end data. Picard v.2.1.1 (https://broadinstitute.github.io/picard/) was used to remove duplicated reads. Whole sequence data are available in the EMBL-EBI European Nucleotide Archive (ENA) repository (http://www.ebi.ac.uk/ena), under the study accession PRJEB36830.  Reads covering the four genes (ACE2: NC_010461.5:12094853-12156275; ANPEP: NC_010449.5:55346083-55378881; DPP4: NC_010457.5:68655849-68748818; TMPRSS2: NC_010455.5:204871561-204907561) were extracted with samtools v.1.7 and exported as aligned, sorted and indexed *.bam files. Gene length includes UTRs and flanking regions of 5 kbp upstream [flanking (5’-UTR)] and downstream [flanking (3’-UTR)].
创建时间:
2020-08-21
二维码
社区交流群
二维码
科研交流群
商业服务