plaguss/argilla_sdk_docs_queries
收藏数据集卡片 for argilla_sdk_docs_queries
数据集概述
该数据集包含一个 pipeline.yaml 文件,可以使用 distilabel CLI 在 distilabel 中重现生成该数据集的管道:
console distilabel pipeline run --config "https://huggingface.co/datasets/plaguss/argilla_sdk_docs_queries/raw/main/pipeline.yaml"
或者探索配置:
console distilabel pipeline info --config "https://huggingface.co/datasets/plaguss/argilla_sdk_docs_queries/raw/main/pipeline.yaml"
数据集结构
示例按照以下结构进行配置:
<details><summary> 配置: default </summary><hr>
json
{
"anchor": "# Welcome to Argilla. Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.. u003cdiv class="grid cards" markdownu003e. - Get started in 5 minutes!. ---. Install argilla with pip and deploy a Docker locally or for free on Hugging Face to get up and running in minutes.. :octicons-arrow-right-24: Quickstart. - Educational guides. ---",
"distilabel_metadata": {
"raw_output_generate_sentence_pair": "## Positive
Can Argillau0027s collaboration platform ensure high-quality outputs and full data ownership for AI engineers and domain experts?
Negative
The beautiful scenery of the Italian town of Argilla inspired her to write a novel about love and freedom." }, "filename": "argilla-python/docs/index.md", "model_name_query": "meta-llama/Meta-Llama-3-70B-Instruct", "negative": "The beautiful scenery of the Italian town of Argilla inspired her to write a novel about love and freedom.", "positive": "Can Argillau0027s collaboration platform ensure high-quality outputs and full data ownership for AI engineers and domain experts?" }
该子集可以加载为:
python from datasets import load_dataset
ds = load_dataset("plaguss/argilla_sdk_docs_queries", "default")
或者简单地加载,因为只有一个配置并且命名为 default:
python from datasets import load_dataset
ds = load_dataset("plaguss/argilla_sdk_docs_queries")
</details>




