five

Train and Test Ground-Truth Benchmarks with LLMs-Augmented Data for Matching Agroecological Experimental Variables

收藏
DataCite Commons2026-05-14 更新2026-05-17 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/VP53W2
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset provides five English ground-truth benchmarks for agroecological experimental variable matching. The benchmarks were manually annotated and contain 533 pairs in total. One benchmark contains source–candidate variable pairs (i.e., SourceAEGIS–CandidateAEGIS), one contains source–model variable pairs (i.e., IntercropValues–STICS), while the three remaining benchmarks contain candidate–model variable pairs. Each benchmark is divided into training and test sets. For the training sets only, each ground-truth pair was used to generate four additional LLM-augmented pairs with six different large language models (LLMs), yielding 24 augmented pairs for each ground-truth pair. The resulting augmented data are provided in the file named “LLMs-generated pairs from the full train ground truth.tab”, which contains 7,680 generated pairs.
提供机构:
CIRAD Dataverse
创建时间:
2026-05-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作