GPT vs Stack Overflow: data collection (A2I2 T2 2023)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8403467
下载链接
链接失效反馈官方服务:
资源简介:
About
The dataset components produced by this repo. Please see the documentation there for more information.
Each CSV has been individually zipped so that you only have to download the specific file(s) that you want.
Overview of Files
From using the Stack Exchange Data Dump as the data source (these zip files have a DD_ prefix):
Raw dataset before processing: saved_dataset.csv (DD_saved_dataset.zip)
Completed tag count: tag_count.csv (DD_tag_count.zip)
Processed dataset with completed evaluations: dataset_results.csv (DD_dataset_results.zip)
From using Google BigQuery as the data source (these zip files have a BQ_ prefix):
Raw dataset before processing: saved_dataset.csv (BQ_saved_dataset.zip)
Completed tag count: tag_count.csv (BQ_tag_count.zip)
No large-scale evaluation was completed when using BigQuery as a data source.
As noted in the linked repo, the use of Google BigQuery as a data source is not recommended for this work, but the working code and dataset have nonetheless been provided for completeness.
License
This dataset is licensed under the CC BY-SA 4.0 license, the same license used by the Stack Exchange Data Dump.
创建时间:
2023-10-06



