Don Quixote
收藏arXiv2025-09-30 收录
下载链接:
https://www.gutenberg.org/ebooks/996
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了米格尔·德·塞万提斯所著的《堂吉诃德》的英文翻译版本,旨在研究语言固有的自相似性,并测量其相关维数。该文本被用于大规模语言模型进行的高维序列分析,其规模达到了150,000词,任务包括测量语言的相关维数以及分析语言中的自相似性。
This dataset contains the English translation of *Don Quixote* by Miguel de Cervantes. It is developed to investigate the inherent self-similarity of natural language and measure its correlation dimension. With a word count of 150,000, this text is applied for high-dimensional sequence analysis in large language models (LLMs). The core tasks of this dataset include measuring the correlation dimension of language and analyzing its inherent self-similarity.
提供机构:
Project Gutenberg



