Amplicon sequence variants from the Insect Biome Atlas project

Name: Amplicon sequence variants from the Insect Biome Atlas project
Creator: Fredrik Ronquist
Published: 2025-01-15 14:56:46
License: 暂无描述

DataCite Commons2025-01-15 更新2024-07-13 收录

下载链接：

https://figshare.scilifelab.se/articles/dataset/Amplicon_sequence_variants_from_the_Insect_Biome_Atlas_project/25480681/1

下载链接

链接失效反馈

官方服务：

资源简介：

General informationThe Insect Biome Atlas project was supported by the Knut and Alice Wallenberg Foundation (dnr 2017.0088). The project analyzed the insect faunas of Sweden and Madagascar, and their associated microbiomes, mainly using DNA metabarcoding of Malaise trap samples collected in 2019 (Sweden) or 2019–2020 (Madagascar).Please cite this version of the dataset as: Miraldo A, Iwaszkiewicz-Eggebrecht E, Sundh J, Lokeshwaran M, Granqvist E, Andersson AF, Lukasik P, Roslin T, Tack A, Ronquist F. 2024. Dataset of amplicon sequence variants (ASVs) from the Insect Biome Atlas Project, version 1. https://doi.org/10.17044/scilifelab.25480681Dataset descriptionThis dataset (version 1) contains amplicon sequence variants (ASVs) generated from high-throughput sequencing of the cytochrome c oxidase subunit I (CO1) gene from Malaise trap samples processed with mild lysis, with the exception of 15 samples for which we also provide sequencing data from homogenates and preservative ethanol. It includes both the ASV sequences and abundance information (number of reads) and it also contains metadata files that are needed to interpret and analyse the data further. Future versions of the dataset will include additional data.MethodsSamples were sequenced using Illumina technology. Raw data are available at the European Nucleotide Archive (ENA) under project PRJEB61109. The raw sequence data was preprocessed using a Snakemake workflow available at https://github.com/biodiversitydata-se/amplicon-multi-cutadapt. Preprocessed reads were then used as input to the AmpliSeq Nextflow (v.2.1.0) pipeline to generate Amplicon Sequence Variants (ASVs).Available dataIn this dataset we provide two types of files: ASV files and metadata files. Files marked with 'SE' contain data from Sweden while those marked with 'MG' contain data from Madagascar.The file shasum.txt contains checksums for each of the files. After downloading you can run: shasum -c shasum.txt to check file integrityASV filesThis dataset contains ASV sequences in fasta format (CO1_asv_seqs_SE.fasta.gz and CO1_asv_seqs_MG.fasta.gz) and counts of ASVs in each sample (CO1_asv_counts_SE.tsv.gz and CO1_asv_counts.MG.tsv.gz). Files marked with 'SE' are from samples in Sweden while those marked with 'MG' are from Madagascar. The Swedish dataset contains 636,297 ASVs in 4,873 samples (including negative and positive control samples). The Madagascar dataset contains 559,023 ASVs in 2,081 samples (including negative and positive control samples).Metadata filesThere are three types of metadata files included in this dataset:sequencing_metadata files with information about samples that were processed in the lab and sequencedsamples_metadata files with information about samples that were collected in the field.sites_metadata files with information about sites where samples were collected.Sequencing metadata filesTwo sequencing metadata files are included in this dataset (CO1_sequencing_metadata_SE.tsv and CO1_sequencing_metadata_MG.tsv) with information about samples that were sequenced. Columns in these files are as follows:sampleID_NGI: Sample id given by the sequencing facility (matching the columns in the counts file)sampleID_HISTORICAL: Custom user idsampleID_FIELD: Sample id from field samplingsampleID_LAB: Sample id from handling in the labdataset: Dataset designation for each samplelab_sample_type: Type of sample, e.g. 'sample', 'buffer_blank', 'pcr_neg' etc.country: Country of origin for samplebiological_spikes: True if sample has biological spike ins addedartificial_spikes: True if sample has artificial spike ins added at the time of DNA purificationsample_metadata_file: Corresponding metadata file for samplelysate_rack_ID: Identification of 96-well plate where lysate aliquot is stored in the lab (internal use only)lysate_well_ID: Identification of well position where lysate aliquot is stored in the lab (internal use only)dna_plate_ID: Identification of 96-well plate where purified DNA is stored in the lab (internal use only)dna_plate_well_ID: Identification of well position where lysate is stored in the lab (internal use only)sequencing_batch: Custom user id for sequencing batch numbersequencing_batch_NGI: Sequencing batch number given by the sequencing facilitynotes_lab: Additional information about sample processing in the lab (only for SE file)sequencing_status: Additional information about sample sequencing status. If a sample has a value of “sequencing failed” in this column, then this sample will be missing from the ASV counts filestudy_accession_ENA: Study identification at the European Nucleotide Archivesample_accession_ENA: Sample identification at the European Nucleotide Archiveexperiment_accession_ENA: Experiment identification at the European Nucleotide Archiverun_accession_ENA: Run identification at the European Nucleotide ArchiveSamples metadata filesTwo samples_metadata files are included in this dataset (samples_metadata_malaise_SE.tsv and samples_metadata_malaise_MG.tsv) with information about each sample that was collected in the field. Columns in these files are as follows:sampleID_FIELD: Sample id from field samplingtrapID: Malaise trap id from field samplingbiomass_grams: Wet weight of each bulk sampleplacing_time: Time when sampling startedplacing_date: Date when sampling startedcollecting_time: Time when sampling endedcollecting_date: Date when sampling endedduration_min: Total number of minutes the sample was collectingtrap_condition_collection: Condition of the malaise trap at the time of collecting the sample from the trap (good; acceptable; poor)sample_ethanol_conc: Concentration of preservative ethanol at the time of DNA extraction (only for SE file)processing_group: Processing batch id (for internal use only)sample_accession_ENA: Sample identification at the European Nucleotide Archivesample_status: Additional information about sample processing status in the labSites metadata filesThere are two files that contain information about sampling sites, one for each country: sites_metadata_SE.tsv and sites_metadata_MG.tsv. Columns in these files are as follows:siteID: Sampling site id number. Note that for some sites there can be several Malaise traps assembled (malaise_trap_type=Multitrap)trapID: Malaise trap id from field samplinglatitude_WGS84: Latitude in WGS84 coordinate system. This info specifies the Malaise trap location at the sampling sitelongitude_WGS84: Longitude in WGS84 coordinate system. This info specifies the Malaise trap location at the sitetrap_habitat: Habitat where the Malaise trap was locatedmalaise_trap_type: Identifies if there are multiple traps assembled at the sampling site (Multitrap) or only one (Single_trap)parkID: Name of national park (for MG only)provinceID: Name of province (for MG only)NILS_mhabitat: Habitat for nearest plot of the National Inventory of Landscapes in Sweden (NILS) from the malaise trap location (only for SE file). For more information about NILS sampling design, check: https://www.slu.se/centrumbildningar-och-projekt/nils_old/Datainsamling/bakgrund-och-mal/NILS_square: Identification of nearest NILS square for sampling site (only for SE file)NILS_plot: Identification of nearest NILS plot to the Malaise trap location (only for SE file)trap_orientation_degrees_S: Orientation in degrees of the collection head of the Malaise trapnotes: notes associated with the Malaise trap (only for SE file)

提供机构：

Fredrik Ronquist

创建时间：

2024-05-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集