five

Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10782464
下载链接
链接失效反馈
官方服务:
资源简介:
Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset This repository contains the files for the Pl@ntNet South Western Europe (SWE) crowdsourced dataset.It contains all species identification and user votes for observations made between 2017 and 2023 in the SWE flora. In total, more than 6 699 593 plant observations are labeld by 823 251 users between january 2017 and october 2023. In addition, 98 experts were selected to obtain ground truth values for 26 811 observations. The structure of the dataset is described below, and a `readme.md` file is available in the record. In short directory structure Pl@ntNet SWE dataset ├── answers │ ├── answers.json │ └── ground_truth.txt ├── converters │ ├── tasks.json │ └── classes.json └── aggregation ├── authors.txt ├── ai_classes.json ├── ai_answers.json ├── ai_scores.json └── k-southwestern-europe.json Crowdsourced data In the answers folder are located the crowdsourced answers and the associated ground truths.The crowdsourced answers are stored in the answers.json file. It gathers more than 6 million tasks with answers from 823 251 users. It is formatted as a json entry with levels representing the observation ID, the users, and their associated vote for the species label. { obsID: {userID: vote, userID2: vote,...}, ... } A list of 98 experts was created to gather a partial ground truth in the ground_truth.txt file.Each row represents an observation and the associated class label is the current considered ground truth.This file lets us compute several performance metrics such as the accuracy of the label aggregation. Converters In the converters folder, you can find the converters to obtain the Pl@ntNet official observation numbers (the last part of the URL https://identify.plantnet.org/fr/k-world-flora/observations/) from the obsID used in answers.json. This is stored in the tasks.json file.A similar dictionary converts the species proposed by users to a single label in {0, 1, 2, ...}.This mapping is stored in classes.json. As plant species can also have synonyms, we release the two files used to clean the user answers. The species.json file contains a list with all the accepted species determinations from the  World Checklist of Vascular Plants.Then, we focused on the SWE flora and replaced synonyms with the underlying species using the k-southwestern-europe.json checklist by Plants Of the World Online (POWO) by Kew’s Royal Botanical Garden. This checklist is written as follows: [ { "species": species name, "synonyms": [ synonym1, synonym2, ... ] }, ... ] Files to run the Pl@ntNet label aggregation strategy To run the Pl@ntNet label aggregation strategy available in the peerannot library, several other pieces of information are needed and located in the aggregation folder. - First, we need to know for each task which user was the author (if they proposed an initial species determination).This information is stored in the authors.txt dataset, where each row is the obsID and the value is the userID of the author. If the author did not propose any species, this identification is set to -1. - Then, to run the label aggregation strategies taking into account the AI vote, we extend the `classes.json` file with the AI-predicted classes into the ai_classes.json file. Each species is associated with a number, including newly introduced species by the AI.- Then, we need the AI predictions. The AI answers are stored in the ai_answers.json file where each key is the obsID and each value represents the class predicted by the AI. Synonyms were also removed using the k-southwestern-europe.json file.- Finally, for strategies taking into account the prediction score, we release the ai_scores.json file, where each key is the obsID and each value is the probability given for the predicted class.
创建时间:
2024-12-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作