kinianlo/MMTS
收藏Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kinianlo/MMTS
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: laion2B-en-words-count
features:
- name: count
dtype: int64
- name: word
dtype: string
splits:
- name: train
num_bytes: 2040588603
num_examples: 91658096
download_size: 1365127988
dataset_size: 2040588603
- config_name: shakespeare_laion2B-en_words
features:
- name: word
dtype: string
- name: word_lemma
dtype: string
- name: tag
dtype: string
- name: count_corpus_tag
dtype: int64
- name: count_corpus
dtype: int64
- name: count_laion2B-en
dtype: int64
- name: is_physical_entity
dtype: bool
- name: concreteness
dtype: float64
- name: concreteness_lemma
dtype: float64
splits:
- name: train
num_bytes: 1244660
num_examples: 18548
download_size: 0
dataset_size: 1244660
- config_name: shakespeare_words
features:
- name: word
dtype: string
- name: count_corpus
dtype: int64
- name: count_laion2B-en
dtype: int64
splits:
- name: train
num_bytes: 309689
num_examples: 11456
download_size: 193309
dataset_size: 309689
configs:
- config_name: laion2B-en-words-count
data_files:
- split: train
path: laion2B-en-words-count/train-*
- config_name: shakespeare_laion2B-en_words
data_files:
- split: train
path: shakespeare_laion2B-en_words/train-*
- config_name: shakespeare_words
data_files:
- split: train
path: shakespeare_words/train-*
---
# Dataset Card for "MMTS"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
kinianlo
原始信息汇总
数据集概述
数据集配置
laion2B-en-words-count
- 特征:
count: 数据类型为int64word: 数据类型为string
- 分割:
train: 字节数为 2040588603,样本数为 91658096
- 下载大小: 1365127988 字节
- 数据集大小: 2040588603 字节
shakespeare_laion2B-en_words
- 特征:
word: 数据类型为stringword_lemma: 数据类型为stringtag: 数据类型为stringcount_corpus_tag: 数据类型为int64count_corpus: 数据类型为int64count_laion2B-en: 数据类型为int64is_physical_entity: 数据类型为boolconcreteness: 数据类型为float64concreteness_lemma: 数据类型为float64
- 分割:
train: 字节数为 1244660,样本数为 18548
- 下载大小: 0 字节
- 数据集大小: 1244660 字节
shakespeare_words
- 特征:
word: 数据类型为stringcount_corpus: 数据类型为int64count_laion2B-en: 数据类型为int64
- 分割:
train: 字节数为 309689,样本数为 11456
- 下载大小: 193309 字节
- 数据集大小: 309689 字节
数据文件配置
laion2B-en-words-count
- 数据文件:
train: 路径为laion2B-en-words-count/train-*
shakespeare_laion2B-en_words
- 数据文件:
train: 路径为shakespeare_laion2B-en_words/train-*
shakespeare_words
- 数据文件:
train: 路径为shakespeare_words/train-*



