ivanleomk/gpt4-chain-of-density

Name: ivanleomk/gpt4-chain-of-density
Creator: ivanleomk
Published: 2023-11-12 14:12:28
License: 暂无描述

Hugging Face2023-11-12 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ivanleomk/gpt4-chain-of-density

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - summarization language: - en --- # Introduction The following is a dataset which consists of some chain of density summaries which we generated using GPT-4. The approach is slightly modified to account for GPT-4 timeouts with some additional validation that we added using the [Instructor](https://github.com/jxnl/instructor) Library. We wrote a short blog about how we generated this date [here](https://jxnl.github.io/instructor/blog/2023/11/05/better-summaries-by-finetuning-chain-of-density/#results-and-benchmarks) Here's a quick summary of the individual files that we have 1. `summarization_20`, `summarization_50` and `summarization_all` are the respective `.jsonl` files that we used to fine-tune our models. They contain 20, 50 and 76 examples respectively. 2. `test.csv`: This is a randomly selected group of 100 test articles that were sampled from the original `griffin/chain-of-density` article that was used to create our training sets with GPT-4. 3. `train.csv`: This is a randomly selected group of 20 test articles that were sampled from the original `griffin/chain-of-density` article which were not provided to our fine-tuned models. These were then used to evaluate their quality and performance. 4, `validation-summaries`: These are the summaries generated by `GPT-4` on the test set. We include the following fields: - `text`: The original article that was summarized - `model`: This has a single value of `GPT-4` - `Summary 1` : The first summary created - `Summary 2` : The second rewritten summary - `Summary 3` : The third rewritten summary - `Summary 4` : The fourth rewritten summary - `time` : The time taken for the entire chain of density to be created 5. `vanilla_35.csv`: This contains the summaries generated by a vanilla GPT 3.5 model that was prompted to generate an entity dense summary. 6. `results.csv` : These are the summaries generated by the individual fine-tuned models. We include the following fields - `Article`: The original article that was summarized - `model`: This is either `fine-tuned-20`, `fine-tuned-50` or `fine-tuned-76` which represent our GPT3.5 model that was fine tuned on 20, 50 or 76 examples respectively - `Summary`: The summary generated by the fine-tuned model - `Time` : This was the amount of time it took for the summary to be generated

提供机构：

ivanleomk

原始信息汇总

数据集概述

数据集内容

文件列表

summarization_20, summarization_50, summarization_all
- 类型：.jsonl 文件
- 描述：用于微调模型的数据集，分别包含20、50和76个示例。
test.csv
- 类型：CSV文件
- 描述：从原始的griffin/chain-of-density文章中随机选择的100篇测试文章。
train.csv
- 类型：CSV文件
- 描述：从原始的griffin/chain-of-density文章中随机选择的20篇测试文章，未用于微调模型，用于评估模型质量和性能。
validation-summaries
- 描述：由GPT-4生成的测试集摘要，包含以下字段：
  - text：原始文章
  - model：模型名称，值为GPT-4
  - Summary 1：第一个生成的摘要
  - Summary 2：第二个重写的摘要
  - Summary 3：第三个重写的摘要
  - Summary 4：第四个重写的摘要
  - time：生成整个密度链所需的时间
vanilla_35.csv
- 类型：CSV文件
- 描述：由普通GPT 3.5模型生成的实体密集型摘要。
results.csv
- 类型：CSV文件
- 描述：由各个微调模型生成的摘要，包含以下字段：
  - Article：原始文章
  - model：模型名称，值为fine-tuned-20, fine-tuned-50 或 fine-tuned-76
  - Summary：由微调模型生成的摘要
  - Time：生成摘要所需的时间

5,000+

优质数据集

54 个

任务类型

进入经典数据集