Harley-ml/i-statements
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Harley-ml/i-statements
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- en
tags:
- statements
- simple
- synthetic
- i-statements
---
# I-Statements
This dataset has axproximently 5,335 I-statements generated by [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)-Q4_K_M using [Ollama](https://ollama.com/).
## Stats
| Metric | Value |
|-----------------------|-------------|
| Entries | 5,334 |
| Total tokens (GPT2) | 36,032 |
| Total words | 29,735 |
| Avg. tokens per entry | 6.67 |
| Avg. words per entry | 5.57 |
| Word range | 3–10 |
| Unique vocab (words) | 2,237 |
| Unique verbs | 252 |
We used [GPT2](https://huggingface.co/openai-community/gpt2)'s tokenizer to find the token count.
Note: The tokens may vary depending on the tokenizer used.
## Use Cases
This dataset is best suited for pretraining or fine-tuning models under 1 million total parameters.
提供机构:
Harley-ml



