five

ivanleomk/gpt4-chain-of-density

收藏
Hugging Face2023-11-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ivanleomk/gpt4-chain-of-density
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - summarization language: - en --- # Introduction The following is a dataset which consists of some chain of density summaries which we generated using GPT-4. The approach is slightly modified to account for GPT-4 timeouts with some additional validation that we added using the [Instructor](https://github.com/jxnl/instructor) Library. We wrote a short blog about how we generated this date [here](https://jxnl.github.io/instructor/blog/2023/11/05/better-summaries-by-finetuning-chain-of-density/#results-and-benchmarks) Here's a quick summary of the individual files that we have 1. `summarization_20`, `summarization_50` and `summarization_all` are the respective `.jsonl` files that we used to fine-tune our models. They contain 20, 50 and 76 examples respectively. 2. `test.csv`: This is a randomly selected group of 100 test articles that were sampled from the original `griffin/chain-of-density` article that was used to create our training sets with GPT-4. 3. `train.csv`: This is a randomly selected group of 20 test articles that were sampled from the original `griffin/chain-of-density` article which were not provided to our fine-tuned models. These were then used to evaluate their quality and performance. 4, `validation-summaries`: These are the summaries generated by `GPT-4` on the test set. We include the following fields: - `text`: The original article that was summarized - `model`: This has a single value of `GPT-4` - `Summary 1` : The first summary created - `Summary 2` : The second rewritten summary - `Summary 3` : The third rewritten summary - `Summary 4` : The fourth rewritten summary - `time` : The time taken for the entire chain of density to be created 5. `vanilla_35.csv`: This contains the summaries generated by a vanilla GPT 3.5 model that was prompted to generate an entity dense summary. 6. `results.csv` : These are the summaries generated by the individual fine-tuned models. We include the following fields - `Article`: The original article that was summarized - `model`: This is either `fine-tuned-20`, `fine-tuned-50` or `fine-tuned-76` which represent our GPT3.5 model that was fine tuned on 20, 50 or 76 examples respectively - `Summary`: The summary generated by the fine-tuned model - `Time` : This was the amount of time it took for the summary to be generated
提供机构:
ivanleomk
原始信息汇总

数据集概述

数据集内容

文件列表

  1. summarization_20, summarization_50, summarization_all

    • 类型:.jsonl 文件
    • 描述:用于微调模型的数据集,分别包含20、50和76个示例。
  2. test.csv

    • 类型:CSV文件
    • 描述:从原始的griffin/chain-of-density文章中随机选择的100篇测试文章。
  3. train.csv

    • 类型:CSV文件
    • 描述:从原始的griffin/chain-of-density文章中随机选择的20篇测试文章,未用于微调模型,用于评估模型质量和性能。
  4. validation-summaries

    • 描述:由GPT-4生成的测试集摘要,包含以下字段:
      • text:原始文章
      • model:模型名称,值为GPT-4
      • Summary 1:第一个生成的摘要
      • Summary 2:第二个重写的摘要
      • Summary 3:第三个重写的摘要
      • Summary 4:第四个重写的摘要
      • time:生成整个密度链所需的时间
  5. vanilla_35.csv

    • 类型:CSV文件
    • 描述:由普通GPT 3.5模型生成的实体密集型摘要。
  6. results.csv

    • 类型:CSV文件
    • 描述:由各个微调模型生成的摘要,包含以下字段:
      • Article:原始文章
      • model:模型名称,值为fine-tuned-20, fine-tuned-50fine-tuned-76
      • Summary:由微调模型生成的摘要
      • Time:生成摘要所需的时间
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作