Myuxiisoya/Tolando
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Myuxiisoya/Tolando
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- id
license: odc-by
size_categories:
- 100K<n<1M
task_categories:
- text-generation
- feature-extraction
dataset_info:
- config_name: default
features:
- name: text
dtype: string
- name: subset
dtype: string
splits:
- name: train
num_bytes: 22784812902
num_examples: 5000000
download_size: 13920512648
dataset_size: 22784812902
- config_name: mdformat
features:
- name: text
dtype: string
- name: subset
dtype: string
splits:
- name: train
num_bytes: 22803501521
num_examples: 5000000
download_size: 13828649999
dataset_size: 22803501521
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- config_name: mdformat
data_files:
- split: train
path: mdformat/train-*
tags:
- legal
pretty_name: Myx
---
# BEE-spoke-data/TxT360-5M-sample-en
english only sample from [LLM360/TxT360](https://hf.co/datasets/LLM360/TxT360):
- min length 256 GPT-4 tokens
- max length 24576 GPT-4 tokens
GPT-4 tiktoken token count:
```
token_count
count 5.000000e+06
mean 1.003614e+03
std 1.424231e+03
min 2.570000e+02
25% 4.020000e+02
50% 6.220000e+02
75% 1.050000e+03
max 2.457400e+04
```
- Total count: 5018.07 M tokens
提供机构:
Myuxiisoya



