Train and Test Ground-Truth Benchmarks with LLMs-Augmented Data for Matching Agroecological Experimental Variables
收藏DataCite Commons2026-05-14 更新2026-05-17 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/VP53W2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset provides five English ground-truth benchmarks for agroecological experimental variable matching. The benchmarks were manually annotated and contain 533 pairs in total. One benchmark contains source–candidate variable pairs (i.e., SourceAEGIS–CandidateAEGIS), one contains source–model variable pairs (i.e., IntercropValues–STICS), while the three remaining benchmarks contain candidate–model variable pairs. Each benchmark is divided into training and test sets.
For the training sets only, each ground-truth pair was used to generate four additional LLM-augmented pairs with six different large language models (LLMs), yielding 24 augmented pairs for each ground-truth pair. The resulting augmented data are provided in the file named “LLMs-generated pairs from the full train ground truth.tab”, which contains 7,680 generated pairs.
提供机构:
CIRAD Dataverse
创建时间:
2026-05-13



