Train and Test Ground-Truth Benchmarks with LLMs-Augmented Data for Matching Agroecological Experimental Variables

Name: Train and Test Ground-Truth Benchmarks with LLMs-Augmented Data for Matching Agroecological Experimental Variables
Creator: CIRAD Dataverse
Published: 2026-05-14 11:38:17
License: 暂无描述

DataCite Commons2026-05-14 更新2026-05-17 收录

下载链接：

https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/VP53W2

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset provides five English ground-truth benchmarks for agroecological experimental variable matching. The benchmarks were manually annotated and contain 533 pairs in total. One benchmark contains source–candidate variable pairs (i.e., SourceAEGIS–CandidateAEGIS), one contains source–model variable pairs (i.e., IntercropValues–STICS), while the three remaining benchmarks contain candidate–model variable pairs. Each benchmark is divided into training and test sets. For the training sets only, each ground-truth pair was used to generate four additional LLM-augmented pairs with six different large language models (LLMs), yielding 24 augmented pairs for each ground-truth pair. The resulting augmented data are provided in the file named “LLMs-generated pairs from the full train ground truth.tab”, which contains 7,680 generated pairs.

提供机构：

CIRAD Dataverse

创建时间：

2026-05-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集