Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park

Mendeley Data2024-05-30 更新2024-06-27 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.h18931zst

下载链接

链接失效反馈

官方服务：

资源简介：

# Body size modulates the extent of seasonal diet switching by large mammalian herbivores in Yellowstone National Park [https://doi.org/10.5061/dryad.h18931zst](https://doi.org/10.5061/dryad.h18931zst) CHANGES: Version 2 (May 2024) shows updated files that include a name change for one plant taxa included in each file. --- Python scripts, R scripts, and input/output files used to quantify fine-grained dietary variation within and among populations of five large-herbivore species (pronghorn, bighorn sheep, mule deer, elk, bison) in Yellowstone National Park, USA. First, global (step 1) and local (step 2) reference libraries are built for the *trn*L-P6 locus. Raw sequence reads from large-herbivore fecal samples are then cleaned and prepared (step 3) for taxonomy assignment (step 4). The taxonomies assigned using the local and global reference libraries are combined (step 5) and then analyses are conducted to determine correlations between body size and key indicators of diet seasonality (step 6). This dataset contains all code associated with: 1. Creating global reference library for the trnL locus in plants (obitools_Step 1_global ref lib.sh) 2. Creating local Yellowstone National Park reference library for the trnL locus in plants (obitools_Step 2_local ref lib.sh) 3. Preparing and cleaning sequence reads from fecal samples (obitools_Step 3_prepare sequence reads.sh) 4. Assigning taxonomy to cleaned sequence reads using the local and global reference libraries (obitools_Step 4_assign taxonomy.sh) 5. Combining the local and global reference library taxonomy assignment outputs (R_Step 5_combine local and global library outputs.R) 6. Data analyses in R (R_Step 6_Data analyses.R) **Local reference library files:** This dataset also includes the specimen data (bold-specimendata-DS-YNPBP-R1.xlsx), *trn*L input fasta (bold-trnL-DS-YNPBP-R1.fas), and output fasta (YNPP6_completeDB_20230414.fasta) that were used to create the local reference library. Both the input and and output files for the local reference library are in FASTA format where a sequence begins with a single-line description (plant taxonomy ID), followed by lines of sequence data for that taxon. In the bold-specimendata-DS-YNPBP-R1.xlsx, there are 3 different tabs; each tab holds information regarding the plant specimens collected for the local reference library. All columns in each tab are outlined below (cells where information wasn't recorded for a specimen are shown with "n/a"): *"Lab Sheet" tab:* * Project Code = unique identifier for the data project * Process ID = unique code automatically generated by BOLD systems for each new record added to project * Sample ID = internal identifier for the sample being sequenced * Field ID = identifier for specimen assigned in the field * BIN = Barcode index number * Catalog Num = identifier for specimen assigned by formal collection upon accessioning (museum ID) * rbcL Seq. Length = sequence length (bps) of rbcL locus for specimen * rbcL Trace Count = number of trace files for rbcL locus per specimen * rbcL Accession = GenBank accession number for rbcL specimen record * matK Seq. Length = sequence length (bps) of matK locus for specimen * matK Trace Count = number of trace files for matK locus per specimen * matK Accession = GenBank accession number for matK specimen record * trnL-F Seq. Length = sequence length (bps) of trnL-F locus for specimen * trnL-F Trace Count = number of trace files for trnL-F locus per specimen * trnL-F Accession = GenBank accession number for trnL-F specimen record * trnH-psbA Seq. Length = sequence length (bps) of trnH-psbA locus for specimen * trnH-psbA Trace Count =number of trace files for trnH-psbA locus per specimen * trnH-psbA Accession = GenBank accession number for trnH-psbA specimen record * Image Count = number of images associated with specimen on BOLD systems * Barcode Compliant = barcode index number marked as compliant if they contain at least one sequence that meets BOLD systems standards * Contamination = indicates specimen flagged for contamination * Stop Codon = indicates presence of stop codon in loci * Flagged Record = indicates specimen or sequence that was flagged as an issue * Collection Date = date of specimen collection in the field * Identification = taxonomic assignment of specimen * Life Stage = life stage of specimen * Extra Info = extra information about specimen * Voucher Type = indicates special case for accessioning process * Institution = Full name of institution that has physical possession of the voucher specimen * Notes = comments or notes regarding collection event *"Taxonomy" tab:* * SampleID = internal identifier for the sample being sequenced * Phylum = scientific name of collected specimen identified to phylum * Class = scientific name of collected specimen identified to class * Order = scientific name of collected specimen identified to order * Family = scientific name of collected specimen identified to family * Subfamily = scientific name of collected specimen identified to subfamily * Tribe = scientific name of collected specimen identified to tribe * Genus = scientific name of collected specimen identified to genus * Species = scientific name of collected specimen identified to species * Subspecies = scientific name of collected specimen identified to subspecies * Identifier = Full name of primary individual who assigned the specimen to a taxonomic group * Identification Method = The method used to identify the specimen *"Collection Data" tab:* * Sample ID = internal identifier for the sample being sequenced * Collectors = The full or abbreviated names of the individuals or team responsible for collecting the sample in the field * Collection Date = Date of specimen collection * Country/Ocean = Country that specimen was collected * State/Province = State that specimen was collected * Region = region that specimen was collected * Lat = latitude that specimen was collected (Decimal degrees) * Lon = longitude that specimen was collected (Decimal degrees) * Elev = elevation that specimen was collected (m) * Habitat = habitat classification that specimen was collected * Collection Notes = Additional collection notes ## Sharing/Access information Illumina sequence data and sample metadata are available at NCBI (BioProject accession number: PRJNA780500). ## Code/Software These coding steps are designed to follow on from one another. The files created in steps 1, 2, and 3 will be used in step 4. The files created in step 4 will be used in step 5. The files created in step 5 will be used in step 6. All code is annotated. Steps 1-4 require the following python packages: * cutadapt * obitools Steps 5-6 require the following R packages: * plyr * dplyr * here * tidyverse * phyloseq * vegan * vegetarian * ggplot2 * reshape2 * ggpubr * cowplot * car * devtools * moments * nlme * bipartite * RColorBrewer * iNEXT * cetcolor * phangorn * padr **obitools_Step 1_global ref lib.sh -** code to build a global reference database for plants, we use the ecoPCR program in obitools to simulate a PCR and to extract all sequences from the EMBL that may be amplified *in silico* by the two primers (GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC) used for PCR amplification. The list of steps for building this reference database are: 1. Download the whole set of EMBL sequences 2. Download the NCBI taxonomy 3. Format them into the ecoPCR format 4. Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information **obitools_Step 2_local ref lib.sh -** code to build a local Yellowstone National Park reference database for plants, we use the ecoPCR program in obitools to simulate a PCR. All local barcode sequences can be found on BOLD and can be amplified *in silico* by the two primers (GGGCAATCCTGAGCCAA and CCATTGAGTCTCTGCACCTATC) used for PCR amplification. The code results in the creation of the file "YNPP6_completeDB_20230414.fasta" which is included in this dataset. The list of steps for building this reference database are: 1. Extract *trn*L-P6 from BOLD sequences 2. Format them into the ecoPCR format 3. Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information **obitools_Step 3_prepare sequence reads.sh -** code to clean and prepare raw sequence reads from large-herbivore fecal samples to determine their diets. The following steps are taken: 1. Remove primers from forward and reverse reads using cutadapt 2. Recover full sequence reads from forward and reverse reads 3. Remove unaligned sequence records 4. Dereplicate reads into uniq sequences 5. Denoise the sequence dataset 6. Clean the sequences for PCR/sequencing errors **obitools_Step 4_assign taxonomy.sh -** code to assign taxonomy to sequences using global and local reference libraries in order to get the complete list of species associated to each sample. Taxonomic assignment of sequences requires a reference database compiling all possible species to be identified in the sample. Assignment is then done based on sequence comparison between sample sequences and reference sequences. The following steps are taken for both global and local reference libraries: 1. Assign each sequence to a taxon 2. Generate the final result table **R_Step 5_combine local and global library outputs.R -** R code to combine local and global reference library outputs. The following steps are taken: 1. Subset databases to perfect matches (100% matches) 2. Generate summary statistics for subset databases 3. Make output files required to create a phyloseq object 4. Build the physeq object for further analyses **R_Step 6_Data analyses.R -** R code for all analyses conducted on this comparative dietary dataset. The main analyses performed: 1. Data filtering 2. Rarefaction 3. Calculation of Bray-Curtis dissimilarity 4. Calculation of dietary richness 5. Calculation of total dietary breadth 6. Calculation of sample uniqueness at the sample level 7. Calculation of sample uniqueness at the species level

创建时间：

2023-11-18