richard-park/llm-corpus
收藏Hugging Face2024-10-08 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/richard-park/llm-corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是aihub 초거대 언어 모델 말뭉치 dataset的整理版本,最大token数未明确说明。数据集的具体背景、用途、结构、创建过程等信息未提供。
This is a large-scale language model corpus dataset curated by aihub, containing a string feature named text. The training set has 24,568,085 samples with a total size of 8,421,180,187 bytes.
提供机构:
richard-park



