farming-dataset-synthetic-generator-classification
收藏魔搭社区2025-10-09 更新2025-04-12 收录
下载链接:
https://modelscope.cn/datasets/burtenshaw/farming-dataset-synthetic-generator-classification
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for farming-dataset-synthetic-generator-classification
This dataset has been created with [distilabel](https://distilabel.argilla.io/).
## Dataset Summary
This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI:
```console
distilabel pipeline run --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml"
```
or explore the configuration:
```console
distilabel pipeline info --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml"
```
## Dataset structure
The examples have the following structure per configuration:
Configuration: default
```json
{
"label": 4,
"text": "A meta-analysis of 32 studies revealed that regenerative agriculture practices, such as no-till farming and cover cropping, significantly increased soil organic carbon by an average of 1.4% over a period of 10 years, thereby enhancing soil fertility and water retention capacity."
}
```
This subset can be loaded as:
```python
from datasets import load_dataset
ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification", "default")
```
Or simply as it follows, since there's only one configuration and is named `default`:
```python
from datasets import load_dataset
ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification")
```
# 农业合成生成分类数据集卡片
本数据集由[distilabel](https://distilabel.argilla.io/)构建。
## 数据集概述
本数据集包含一个`pipeline.yaml`配置文件,可借助`distilabel`命令行界面(CLI)复现生成该数据集的distilabel流程:
console
distilabel pipeline run --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml"
或查看该流程配置:
console
distilabel pipeline info --config "https://huggingface.co/datasets/burtenshaw/farming-dataset-synthetic-generator-classification/raw/main/pipeline.yaml"
## 数据集结构
各配置下的数据样本结构如下:
配置:default
json
{
"label": 4,
"text": "对32项研究开展的荟萃分析显示,免耕种植、覆盖作物种植等再生农业举措,在10年周期内可使土壤有机碳平均提升1.4%,进而改善土壤肥力与保水能力。"
}
该子集可通过以下方式加载:
python
from datasets import load_dataset
ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification", "default")
由于该数据集仅存在一个名为`default`的配置,也可通过如下最简方式加载:
python
from datasets import load_dataset
ds = load_dataset("burtenshaw/farming-dataset-synthetic-generator-classification")
提供机构:
maas
创建时间:
2025-04-07



