A Comparison of DeepSeek and other LLMs
收藏Figshare2026-01-02 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_Comparison_of_DeepSeek_and_Other_LLMs/30988470
下载链接
链接失效反馈官方服务:
资源简介:
Recently, DeepSeek has been the focus of attention in and beyond the AI community. An interesting problem is how DeepSeek compares to other large language models (LLMs). There are many tasks an LLM can do, and in this article, we use the task of predicting an outcome using a short text for comparison. We consider two settings, an authorship classification setting and a citation classification setting. In the first one, the goal is to determine whether a short text is written by human or AI. In the second one, the goal is to classify a citation into one of four types using the textual content. For each experiment, we compare DeepSeek with four popular LLMs: Claude, Gemini, GPT, and Llama. We find that, in terms of classification accuracy, DeepSeek outperforms Gemini, GPT, and Llama in most cases, but underperforms Claude. We also find that DeepSeek is comparably slower than others but with a low cost to use, while Claude is much more expensive than all the others. Finally, we find that in terms of similarity, the output of DeepSeek is most similar to those of Gemini and Claude (and among all five LLMs, Claude and Gemini have the most similar outputs). In this article, we also present a fully-labeled dataset collected by ourselves, and propose a recipe where we can use the LLMs and a recent dataset, MADStat, to generate new datasets. The datasets in our article can be used as benchmarks for future study on LLMs.
创建时间:
2026-01-02



