five

GPT vs Stack Overflow: data collection (A2I2 T2 2023)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8403467
下载链接
链接失效反馈
官方服务:
资源简介:
About The dataset components produced by this repo. Please see the documentation there for more information. Each CSV has been individually zipped so that you only have to download the specific file(s) that you want.   Overview of Files From using the Stack Exchange Data Dump as the data source (these zip files have a DD_ prefix): Raw dataset before processing: saved_dataset.csv (DD_saved_dataset.zip) Completed tag count: tag_count.csv (DD_tag_count.zip) Processed dataset with completed evaluations: dataset_results.csv (DD_dataset_results.zip) From using Google BigQuery as the data source (these zip files have a BQ_ prefix): Raw dataset before processing: saved_dataset.csv (BQ_saved_dataset.zip) Completed tag count: tag_count.csv (BQ_tag_count.zip) No large-scale evaluation was completed when using BigQuery as a data source. As noted in the linked repo, the use of Google BigQuery as a data source is not recommended for this work, but the working code and dataset have nonetheless been provided for completeness.   License This dataset is licensed under the CC BY-SA 4.0 license, the same license used by the Stack Exchange Data Dump.
创建时间:
2023-10-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作