five

farming-dataset-synthetic-generator-classification

收藏
魔搭社区2025-10-09 更新2025-04-12 收录
下载链接:
https://modelscope.cn/datasets/burtenshaw/farming-dataset-synthetic-generator-classification
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for farming-dataset-synthetic-generator-classification This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: Configuration: default ```json { "label": 4, "text": "A meta-analysis of 32 studies revealed that regenerative agriculture practices, such as no-till farming and cover cropping, significantly increased soil organic carbon by an average of 1.4% over a period of 10 years, thereby enhancing soil fertility and water retention capacity." } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification", "default") ``` Or simply as it follows, since there's only one configuration and is named `default`: ```python from datasets import load_dataset ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification") ```

# 农业合成生成分类数据集卡片 本数据集由[distilabel](https://distilabel.argilla.io/)构建。 ## 数据集概述 本数据集包含一个`pipeline.yaml`配置文件,可借助`distilabel`命令行界面(CLI)复现生成该数据集的distilabel流程: console distilabel pipeline run --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml" 或查看该流程配置: console distilabel pipeline info --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml" ## 数据集结构 各配置下的数据样本结构如下: 配置:default json { "label": 4, "text": "对32项研究开展的荟萃分析显示,免耕种植、覆盖作物种植等再生农业举措,在10年周期内可使土壤有机碳平均提升1.4%,进而改善土壤肥力与保水能力。" } 该子集可通过以下方式加载: python from datasets import load_dataset ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification", "default") 由于该数据集仅存在一个名为`default`的配置,也可通过如下最简方式加载: python from datasets import load_dataset ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification")
提供机构:
maas
创建时间:
2025-04-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作