umarzein/microlang
收藏Hugging Face2023-06-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/umarzein/microlang
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
---
Microlang was designed to test text generation architectures
It consists of 16 tokens:
1. special (implicit):
<>
2. noun:
bob, tom, bike, speech
3. transitive verb:
take, use
4. intransitive verb:
talk, go
5. adjective:
good, active
6. adverb:
not
7. conjunction:
and, then, but
8. punctuation:
.
The tokenizer can be found on `umarzein/microlang-utils` and can be loaded this:
```python
import transformers
tokenizer = transformers.PreTrainedTokenizerFast.from_pretrained("umarzein/microlang-utils")
```
提供机构:
umarzein
原始信息汇总
Microlang 数据集概述
数据集设计目的
Microlang 数据集旨在测试文本生成架构。
数据集组成
Microlang 数据集包含以下 16 个令牌:
- 特殊令牌(隐式):
<>
- 名词:
- bob, tom, bike, speech
- 及物动词:
- take, use
- 不及物动词:
- talk, go
- 形容词:
- good, active
- 副词:
- not
- 连词:
- and, then, but
- 标点符号:
- .
分词器信息
分词器可在 umarzein/microlang-utils 找到,并可通过以下代码加载:
python
import transformers
tokenizer = transformers.PreTrainedTokenizerFast.from_pretrained("umarzein/microlang-utils")



