rasdani/cohere-wikipedia-2023-11-en-queries
收藏数据集卡片 for cohere-wikipedia-2023-11-en-queries
数据集概述
该数据集包含一个 pipeline.yaml 文件,可以使用 distilabel CLI 重现生成该数据集的管道:
console distilabel pipeline run --config "https://huggingface.co/datasets/rasdani/cohere-wikipedia-2023-11-en-queries/raw/main/pipeline.yaml"
或者探索配置:
console distilabel pipeline info --config "https://huggingface.co/datasets/rasdani/cohere-wikipedia-2023-11-en-queries/raw/main/pipeline.yaml"
数据集结构
每个配置的示例具有以下结构:
<details><summary> 配置: default </summary><hr>
json { "_id": "20231101.en_399353_52", "model_name": "gpt-4o", "query": "When did the show "Bewitched" start airing on WGN America?", "score": 1.0, "text": "In September 2008, the show began to air on WGN America, and in October 2012, it began to air on Logo, limited to the middle seasons.", "title": "Bewitched", "url": "https://en.wikipedia.org/wiki/Bewitched", "views": 22918.295425348486 }
该子集可以加载为:
python from datasets import load_dataset
ds = load_dataset("rasdani/cohere-wikipedia-2023-11-en-queries", "default")
或者简单地加载,因为只有一个配置并且命名为 default:
python from datasets import load_dataset
ds = load_dataset("rasdani/cohere-wikipedia-2023-11-en-queries")
</details>



