A Catalog of DeepMind's Large Language Models including their Seminal Chinchilla Work
收藏DataCite Commons2023-09-01 更新2024-07-13 收录
下载链接:
https://orkg.org/comparison/R609271/
下载链接
链接失效反馈官方服务:
资源简介:
Coming from researchers at DeepMind, in their 2022 paper "Training Compute-Optimal Large Language Models," often called the "Chinchilla paper," it explored the ideal size and training data volume for language models within a set compute budget. The study suggested that many 100B parameter models, like GPT-3, might be over-parameterized and under-trained. They proposed that smaller models could match the performance of larger ones if trained on more extensive datasets. The Chinchilla paper highlights that the best training dataset size for a model is roughly 20 times its parameter count. For instance, a 70B parameter model needs 1.4 trillion tokens for optimal training. Models like GPT-3, OPT, and BLOOM, with 175B parameters, were trained on suboptimal dataset sizes, potentially under-training them. In comparison, LLaMA 65B was trained near the Chinchilla's recommended size. Notably, the compute-optimal Chinchilla model surpassed models like GPT-3 in various tasks. Given these findings, there's a shift towards developing efficient smaller models, challenging the previous "bigger is always better" trend. This comparison characteristically describes Chinchilla as well as other models from DeepMind on certain salient properties of LLMs.
提供机构:
Open Research Knowledge Graph
创建时间:
2023-09-01



