KomeijiForce/llama3_vocabulary_cluster
收藏Hugging Face2024-12-01 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/KomeijiForce/llama3_vocabulary_cluster
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含在llama3-8b-instruct模型的词汇嵌入中发现的聚类。128256个词汇嵌入通过k-means算法分为1024个聚类,这些聚类显示出可能不利于多样化生成的模式相关性。此外,还使用GPT-4o对同一聚类中的词汇共性进行了总结,这些总结可用于进一步分析。该数据集是多样化LLM生成工作的一部分。
This dataset contains the clusters discovered in the vocabulary embeddings of the llama3-8b-instruct model. The 128256 vocabulary embeddings are separated into 1024 clusters by k-means, which show pattern correlations probably undesirable for diverse generation. We also prompt GPT-4o to summarize the commonality of vocabularies in the same cluster, which can be used for further analysis. This dataset is a part of the work on diverse LLM generation.
提供机构:
KomeijiForce



