Generated And Grounded Language Examples (GAGLE)
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/ibomohsin/gagle
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了超过24万篇文章,这些文章由不同的LLM模型通过不同的解码温度和提示方法生成,同时还包括了相应的人类编写的文本。该数据集的独特之处在于,它将机器生成的文章与人工编写的文本进行了匹配,这为分析LLM生成文本与自然语言之间在分形特征上的差异提供了可能。规模上,数据集包含了24万篇文章,旨在对LLM生成文本的文本生成和分形特性评估进行研究。
This dataset contains over 240,000 articles generated by various LLM models using different decoding temperatures and prompting methods, alongside corresponding human-written texts. The unique aspect of this dataset is that it pairs machine-generated articles with human-written texts, which enables the analysis of differences in fractal characteristics between LLM-generated texts and natural language. With a corpus size of 240,000 articles, this dataset is designed to support research on text generation and fractal characteristic evaluation for LLM-generated texts.



