BunkaTopics
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/charlesdedampierre/BunkaTopics
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于识别数据集中主题的工具,它特别用于分析和可视化与GPT-4语言模型训练相关的主题。此外,该工具还能区分选定答案和被拒绝答案中的主题,从而揭示不同模型响应的独特方面。该数据集确定了30个主题,每个主题包含10个特定术语,其任务是识别并可视化数据集中的主题,以评估语言模型的性能和偏见。
This dataset serves as a tool for topic identification within datasets, and is specifically designed for analyzing and visualizing topics related to the training of the GPT-4 large language model. Additionally, this tool can distinguish between topics in accepted answers and rejected responses, thereby uncovering the unique characteristics of different model outputs. This dataset defines 30 topics, each containing 10 specific terms, with the core task of identifying and visualizing topics in datasets to evaluate the performance and biases of large language models.



