asoria/datasets_features_outputs

Name: asoria/datasets_features_outputs
Creator: asoria
Published: 2024-05-06 13:29:49
License: 暂无描述

Hugging Face2024-05-06 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/asoria/datasets_features_outputs

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是通过distilabel工具生成的，包含一个`pipeline.yaml`文件，用于重现生成该数据集的流程。数据集的结构包括多个字段，如`dataset`、`columns`、`instruction`、`generation_model`和`generation`，这些字段分别表示数据集名称、列信息、指令、生成模型和生成内容。数据集的大小为小于1K，包含73个训练样本。

提供机构：

asoria

原始信息汇总

数据集卡片 for datasets_features_outputs

数据集概述

该数据集包含一个 pipeline.yaml 文件，可用于在 distilabel 中重现生成该数据集的管道：

console distilabel pipeline run --config "https://huggingface.co/datasets/asoria/datasets_features_outputs/raw/main/pipeline.yaml"

或者探索配置：

console distilabel pipeline info --config "https://huggingface.co/datasets/asoria/datasets_features_outputs/raw/main/pipeline.yaml"

数据集结构

每个配置的示例具有以下结构：

<details><summary> 配置: default </summary><hr>

json { "columns": "{"text": {"dtype": "string", "_type": "Value"}}", "dataset": "huggingartists/bushido-zho", "generation": "

Question: Which words appear most frequently in the text column of the dataset? {"question": "Which words appear most frequently in the text column of the dataset?", "sql_query": "SELECT word, COUNT(*) as frequency FROM (SELECT TRIM(REGEXP_SPLIT_TO_TABLE(text, \s+)) as word FROM data) words GROUP BY word ORDER BY frequency DESC LIMIT 10"}", "generation_model": "mistralai/Mistral-7B-Instruct-v0.2", "instruction": "You are a data analyst tasked with exploring a dataset named huggingartists/bushido-zho. Below is the dataset schema in SQL format along with a sample of 5 rows: CREATE TABLE "data"("text" VARCHAR); Sample rows: {text: } {text: ...тян

5,000+

优质数据集

54 个

任务类型

进入经典数据集