RMQS1 16S unrarefied cleaned sequences

Name: RMQS1 16S unrarefied cleaned sequences
Creator: Recherche Data Gouv
Published: 2025-05-16 23:22:42
License: 暂无描述

DataCite Commons2025-05-16 更新2025-04-16 收录

下载链接：

https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/02XLWL

下载链接

链接失效反馈

官方服务：

资源简介：

RMQS: The French Soil Quality Monitoring Network (RMQS) is a national program for the assessment and long-term monitoring of the quality of French soils. This network is based on the monitoring of 2,240 sites representative of French soils and their land use. These sites are spread over the whole French territory (metropolitan and overseas) along a systematic square grid of 16 km x 16 km cells. The network covers a broad spectrum of climatic, soil and land-use conditions (croplands, permanent grasslands, woodlands, orchards and vineyards, natural or scarcely anthropogenic land and urban parkland). The first sampling campaign in metropolitan France took place from 2000 to 2009. Dataset: This dataset contains 16S (Archaea and Bacteria) cleaned sequences obtained for each RMQS sample, but unrarefied. Soil 16S rDNA gene was sequenced using pyrosequencing (GS FLX Titanium - Roche 454) at Genosocope. Bioinformatics analysis was performed using BIOCOM-PIPE (previously named GNS-PIPE) metabarcoding pipeline (Terrat et.al. (2019)). See associated articles for details (keep in mind that first results were obtained without post-clustering, richness is therefore higher after post-clustering for each sample). Raw sequencing data are available at EBI under project PRJEB21351. File structure: Clean_Hunt_Derep_Mpreprocess__.fasta: FASTA file containing dereplicated sequences for each sample. sample: Sample ID, containing the id_site found in the RMQS dataset. The id_site can be extracted from the Sample ID deleting the first character and the last third characters (corresponding to other details useful for RMQS sampling survey managers). For example, the 102325C17 Sample ID corresponds to the 2325 id_site .Library Number realized to sequence each sample. Each library can contain 30 samples and one control sample. Details: Each FASTA sequence descriptive line gave the ID sequence, the number of identical sequences found in the sample, and its length, separated by "_" characters (Example: IY3741T02JSO16_32_length=371 corresponds to the IY3741T02JSO16 ID sequence, with a length of 371 bases, found in 32 occurrences in the sequenced sample. Some sites sample could not be collected, they do not appear in dataset. For sites that did not pass laboratory or bioinformatics steps to attain 10,000 sequences before post-clustering, we also obtained some sequences, available here. One can link this dataset with 10.15454/QSXKGA to get each sample physico-chemical property, landuse, coordinates, or filtering sites using its site_officiel column. Sites with ID longer than 4 number are supplementary sites that are not in the center of the cells (e.g. 10797 and 20797 that came from cell 797).

提供机构：

Recherche Data Gouv

创建时间：

2024-08-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集