GPT vs Stack Overflow: data collection (A2I2 T2 2023)

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/8403467

下载链接

链接失效反馈

官方服务：

资源简介：

About The dataset components produced by this repo. Please see the documentation there for more information. Each CSV has been individually zipped so that you only have to download the specific file(s) that you want. Overview of Files From using the Stack Exchange Data Dump as the data source (these zip files have a DD_ prefix): Raw dataset before processing: saved_dataset.csv (DD_saved_dataset.zip) Completed tag count: tag_count.csv (DD_tag_count.zip) Processed dataset with completed evaluations: dataset_results.csv (DD_dataset_results.zip) From using Google BigQuery as the data source (these zip files have a BQ_ prefix): Raw dataset before processing: saved_dataset.csv (BQ_saved_dataset.zip) Completed tag count: tag_count.csv (BQ_tag_count.zip) No large-scale evaluation was completed when using BigQuery as a data source. As noted in the linked repo, the use of Google BigQuery as a data source is not recommended for this work, but the working code and dataset have nonetheless been provided for completeness. License This dataset is licensed under the CC BY-SA 4.0 license, the same license used by the Stack Exchange Data Dump.

创建时间：

2023-10-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集