Full Dataset prior to Cleaning
收藏Figshare2023-03-31 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Full_Dataset_prior_to_Cleaning/22455616/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes all of the data downloaded from GBIF (DOIs provided in README.md as well as below, downloaded Feb 2021) as well as data downloaded from SCAN. This dataset has 2,808,432 records and can be used as a reference to the verbatim data before it underwent the cleaning process. The only modifications made to this datset after direct download from the data portals are the following: <br> 1) for GBIF records, I renamed the countryCode column to be "country" so that the column title is consistent across both GBIF and SCAN 2) A source column was added where I specify if the record came from GBIF or SCAN 3) Duplicate records across SCAN and GBIF were removed by identifying identical instances "catalogNumber" and "institutionCode" 4) Only the Darwin core columns (DwC) that were shared across downloaded datasets were retained. GBIF contained ~249 DwC variables, and SCAN data contained fewer, so this combined dataset only includes the ~80 columns shared between the two datasets <br> ------------- <br> For GBIF, we downloaded the data in three separate chunks, therefore there are three DOIs. See below: <br> GBIF.org (3 February 2021) GBIF Occurrence Downloadhttps://doi.org/10.15468/dl.6cxfsw GBIF.org (3 February 2021) GBIF Occurrence Downloadhttps://doi.org/10.15468/dl.b9rfa7 GBIF.org (3 February 2021) GBIF Occurrence Downloadhttps://doi.org/10.15468/dl.w2nndm
提供机构:
Chesshire, Paige
创建时间:
2023-03-31



