kinianlo/prlang
收藏Hugging Face2024-02-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kinianlo/prlang
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: conceptnet5_vocabulary_en
features:
- name: word
dtype: string
- name: tag
dtype: string
splits:
- name: train
num_bytes: 123167929
num_examples: 6846008
download_size: 45799508
dataset_size: 123167929
- config_name: wiki_20220301_en_nltk_adjectives
features:
- name: adj_id
dtype: uint32
- name: adj
dtype: string
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 39119443
num_examples: 1323576
download_size: 24403987
dataset_size: 39119443
- config_name: wiki_20220301_en_nltk_nouns
features:
- name: noun_id
dtype: uint32
- name: noun
dtype: string
splits:
- name: train
num_bytes: 12442756.0
num_examples: 676770
download_size: 11115529
dataset_size: 12442756.0
- config_name: wiki_20220301_en_nltk_phrases
features:
- name: phrase_id
dtype: uint32
- name: adj_id
dtype: uint32
- name: noun_id
dtype: uint32
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 207602960
num_examples: 10380148
download_size: 129734024
dataset_size: 207602960
- config_name: wiki_20220301_en_nltk_phrases_with_string
features:
- name: phrase_id
dtype: uint32
- name: adj
dtype: string
- name: noun
dtype: string
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 377124084
num_examples: 10380148
download_size: 172157247
dataset_size: 377124084
- config_name: wiki_20220301_en_nltk_uncased_adjectives
features:
- name: adj_id
dtype: uint32
- name: adj
dtype: string
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 36784396
num_examples: 1235601
download_size: 22724468
dataset_size: 36784396
- config_name: wiki_20220301_en_nltk_uncased_nouns
features:
- name: noun_id
dtype: uint32
- name: noun
dtype: string
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 17153952
num_examples: 647524
download_size: 10809791
dataset_size: 17153952
- config_name: wiki_20220301_en_nltk_uncased_phrases
features:
- name: phrase_id
dtype: uint32
- name: adj_id
dtype: uint32
- name: noun_id
dtype: uint32
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 198626820
num_examples: 9931341
download_size: 124034311
dataset_size: 198626820
- config_name: wiki_20220301_en_nltk_uncased_phrases_clean
features:
- name: phrase_id
dtype: uint32
- name: adj_id
dtype: uint32
- name: noun_id
dtype: uint32
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 67986800
num_examples: 3399340
download_size: 41983842
dataset_size: 67986800
- config_name: wiki_20220301_en_nltk_uncased_phrases_with_string
features:
- name: phrase_id
dtype: uint32
- name: adj
dtype: string
- name: noun
dtype: string
- name: count
dtype: uint64
splits:
- name: train
num_bytes: 361160989
num_examples: 9931341
download_size: 164282553
dataset_size: 361160989
- config_name: wiki_20220301_simple_tags_nltk_adjectives
features:
- name: id
dtype: int32
- name: adjective
dtype: string
- name: count
dtype: int64
splits:
- name: train
num_bytes: 508056
num_examples: 21152
download_size: 351437
dataset_size: 508056
- config_name: wiki_20220301_simple_tags_nltk_contexts
features:
- name: noun1_id
dtype: int64
- name: noun2_id
dtype: int64
- name: noun1_bert_id
dtype: int64
- name: noun2_bert_id
dtype: int64
- name: adjective1_id
dtype: int64
- name: adjective2_id
dtype: int64
- name: schema_id
dtype: int64
- name: sentence
dtype: string
- name: mask_position
dtype: int64
splits:
- name: train
num_bytes: 5162738640
num_examples: 34644320
download_size: 562170983
dataset_size: 5162738640
- config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon
features:
- name: noun1_id
dtype: int64
- name: noun2_id
dtype: int64
- name: adjective1_id
dtype: int64
- name: adjective2_id
dtype: int64
- name: schema_id
dtype: int64
- name: epsilon
dtype: float64
splits:
- name: train
num_bytes: 1662927360
num_examples: 34644320
download_size: 342106520
dataset_size: 1662927360
- config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro
features:
- name: noun1_id
dtype: int64
- name: noun2_id
dtype: int64
- name: adjective1_id
dtype: int64
- name: adjective2_id
dtype: int64
- name: schema_id
dtype: int64
- name: epsilon
dtype: float64
splits:
- name: train
num_bytes: 1662927360
num_examples: 34644320
download_size: 337961367
dataset_size: 1662927360
- config_name: wiki_20220301_simple_tags_nltk_contexts_no_intro
features:
- name: noun1_id
dtype: int64
- name: noun2_id
dtype: int64
- name: noun1_bert_id
dtype: int64
- name: noun2_bert_id
dtype: int64
- name: adjective1_id
dtype: int64
- name: adjective2_id
dtype: int64
- name: schema_id
dtype: int64
- name: sentence
dtype: string
- name: mask_position
dtype: int64
splits:
- name: train
num_bytes: 4022762320
num_examples: 34644320
download_size: 285243023
dataset_size: 4022762320
- config_name: wiki_20220301_simple_tags_nltk_filtered_noun_pairs
features:
- name: noun1_id
dtype: int64
- name: noun2_id
dtype: int64
- name: adjectives_id
sequence: int64
splits:
- name: train
num_bytes: 25983240
num_examples: 433054
download_size: 4499602
dataset_size: 25983240
- config_name: wiki_20220301_simple_tags_nltk_noun_pairs
features:
- name: noun1_id
dtype: int32
- name: noun2_id
dtype: int32
- name: adjectives_id
sequence: int32
splits:
- name: train
num_bytes: 125583432
num_examples: 3245260
download_size: 44230314
dataset_size: 125583432
- config_name: wiki_20220301_simple_tags_nltk_nouns
features:
- name: id
dtype: int32
- name: noun
dtype: string
- name: count
dtype: int64
splits:
- name: train
num_bytes: 221774
num_examples: 9521
download_size: 154872
dataset_size: 221774
- config_name: wiki_20220301_simple_tags_nltk_phrases
features:
- name: adjective_id
dtype: int32
- name: noun_id
dtype: int32
- name: count
dtype: int64
splits:
- name: train
num_bytes: 3514128
num_examples: 219633
download_size: 1993091
dataset_size: 3514128
- config_name: wiki_20220301_simple_tags_nltk_scenarios
features:
- name: noun1_id
dtype: uint32
- name: noun2_id
dtype: uint32
- name: adjectives_id
sequence: uint32
- name: epsilons
sequence: float64
splits:
- name: train
num_bytes: 2702256960
num_examples: 51966480
download_size: 553286399
dataset_size: 2702256960
- config_name: wiki_20220301_simple_tags_nltk_scenarios_epsilon
features:
- name: noun1_id
dtype: int64
- name: noun2_id
dtype: int64
- name: adjectives_id
sequence: int64
- name: epsilons
sequence: float64
- name: adjectives_entropy
dtype: float64
splits:
- name: train
num_bytes: 4157318400
num_examples: 51966480
download_size: 0
dataset_size: 4157318400
configs:
- config_name: conceptnet5_vocabulary_en
data_files:
- split: train
path: conceptnet5_vocabulary_en/train-*
- config_name: wiki_20220301_en_nltk_adjectives
data_files:
- split: train
path: wiki_20220301_en_nltk_adjectives/train-*
- config_name: wiki_20220301_en_nltk_nouns
data_files:
- split: train
path: wiki_20220301_en_nltk_nouns/train-*
- config_name: wiki_20220301_en_nltk_phrases
data_files:
- split: train
path: wiki_20220301_en_nltk_phrases/train-*
- config_name: wiki_20220301_en_nltk_phrases_with_string
data_files:
- split: train
path: wiki_20220301_en_nltk_phrases_with_string/train-*
- config_name: wiki_20220301_en_nltk_uncased_adjectives
data_files:
- split: train
path: wiki_20220301_en_nltk_uncased_adjectives/train-*
- config_name: wiki_20220301_en_nltk_uncased_nouns
data_files:
- split: train
path: wiki_20220301_en_nltk_uncased_nouns/train-*
- config_name: wiki_20220301_en_nltk_uncased_phrases
data_files:
- split: train
path: wiki_20220301_en_nltk_uncased_phrases/train-*
- config_name: wiki_20220301_en_nltk_uncased_phrases_clean
data_files:
- split: train
path: wiki_20220301_en_nltk_uncased_phrases_clean/train-*
- config_name: wiki_20220301_en_nltk_uncased_phrases_with_string
data_files:
- split: train
path: wiki_20220301_en_nltk_uncased_phrases_with_string/train-*
- config_name: wiki_20220301_simple_tags_nltk_adjectives
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_adjectives/train-*
- config_name: wiki_20220301_simple_tags_nltk_contexts
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_contexts/train-*
- config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_contexts_epsilon/train-*
- config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro/train-*
- config_name: wiki_20220301_simple_tags_nltk_contexts_no_intro
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_contexts_no_intro/train-*
- config_name: wiki_20220301_simple_tags_nltk_filtered_noun_pairs
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_filtered_noun_pairs/train-*
- config_name: wiki_20220301_simple_tags_nltk_noun_pairs
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_noun_pairs/train-*
- config_name: wiki_20220301_simple_tags_nltk_nouns
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_nouns/train-*
- config_name: wiki_20220301_simple_tags_nltk_phrases
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_phrases/train-*
- config_name: wiki_20220301_simple_tags_nltk_scenarios
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_scenarios/train-*
- config_name: wiki_20220301_simple_tags_nltk_scenarios_epsilon
data_files:
- split: train
path: wiki_20220301_simple_tags_nltk_scenarios_epsilon/train-*
---
# Dataset Card for "prlang"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
kinianlo
原始信息汇总
数据集概述
数据集配置
conceptnet5_vocabulary_en
- 特征:
word: 字符串tag: 字符串
- 分割:
train:- 字节数: 123167929
- 样本数: 6846008
- 下载大小: 45799508
- 数据集大小: 123167929
wiki_20220301_en_nltk_adjectives
- 特征:
adj_id: 无符号32位整数adj: 字符串count: 无符号64位整数
- 分割:
train:- 字节数: 39119443
- 样本数: 1323576
- 下载大小: 24403987
- 数据集大小: 39119443
wiki_20220301_en_nltk_nouns
- 特征:
noun_id: 无符号32位整数noun: 字符串
- 分割:
train:- 字节数: 12442756.0
- 样本数: 676770
- 下载大小: 11115529
- 数据集大小: 12442756.0
wiki_20220301_en_nltk_phrases
- 特征:
phrase_id: 无符号32位整数adj_id: 无符号32位整数noun_id: 无符号32位整数count: 无符号64位整数
- 分割:
train:- 字节数: 207602960
- 样本数: 10380148
- 下载大小: 129734024
- 数据集大小: 207602960
wiki_20220301_en_nltk_phrases_with_string
- 特征:
phrase_id: 无符号32位整数adj: 字符串noun: 字符串count: 无符号64位整数
- 分割:
train:- 字节数: 377124084
- 样本数: 10380148
- 下载大小: 172157247
- 数据集大小: 377124084
wiki_20220301_en_nltk_uncased_adjectives
- 特征:
adj_id: 无符号32位整数adj: 字符串count: 无符号64位整数
- 分割:
train:- 字节数: 36784396
- 样本数: 1235601
- 下载大小: 22724468
- 数据集大小: 36784396
wiki_20220301_en_nltk_uncased_nouns
- 特征:
noun_id: 无符号32位整数noun: 字符串count: 无符号64位整数
- 分割:
train:- 字节数: 17153952
- 样本数: 647524
- 下载大小: 10809791
- 数据集大小: 17153952
wiki_20220301_en_nltk_uncased_phrases
- 特征:
phrase_id: 无符号32位整数adj_id: 无符号32位整数noun_id: 无符号32位整数count: 无符号64位整数
- 分割:
train:- 字节数: 198626820
- 样本数: 9931341
- 下载大小: 124034311
- 数据集大小: 198626820
wiki_20220301_en_nltk_uncased_phrases_clean
- 特征:
phrase_id: 无符号32位整数adj_id: 无符号32位整数noun_id: 无符号32位整数count: 无符号64位整数
- 分割:
train:- 字节数: 67986800
- 样本数: 3399340
- 下载大小: 41983842
- 数据集大小: 67986800
wiki_20220301_en_nltk_uncased_phrases_with_string
- 特征:
phrase_id: 无符号32位整数adj: 字符串noun: 字符串count: 无符号64位整数
- 分割:
train:- 字节数: 361160989
- 样本数: 9931341
- 下载大小: 164282553
- 数据集大小: 361160989
wiki_20220301_simple_tags_nltk_adjectives
- 特征:
id: 32位整数adjective: 字符串count: 64位整数
- 分割:
train:- 字节数: 508056
- 样本数: 21152
- 下载大小: 351437
- 数据集大小: 508056
wiki_20220301_simple_tags_nltk_contexts
- 特征:
noun1_id: 64位整数noun2_id: 64位整数noun1_bert_id: 64位整数noun2_bert_id: 64位整数adjective1_id: 64位整数adjective2_id: 64位整数schema_id: 64位整数sentence: 字符串mask_position: 64位整数
- 分割:
train:- 字节数: 5162738640
- 样本数: 34644320
- 下载大小: 562170983
- 数据集大小: 5162738640
wiki_20220301_simple_tags_nltk_contexts_epsilon
- 特征:
noun1_id: 64位整数noun2_id: 64位整数adjective1_id: 64位整数adjective2_id: 64位整数schema_id: 64位整数epsilon: 64位浮点数
- 分割:
train:- 字节数: 1662927360
- 样本数: 34644320
- 下载大小: 342106520
- 数据集大小: 1662927360
wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro
- 特征:
noun1_id: 64位整数noun2_id: 64位整数adjective1_id: 64位整数adjective2_id: 64位整数schema_id: 64位整数epsilon: 64位浮点数
- 分割:
train:- 字节数: 1662927360
- 样本数: 34644320
- 下载大小: 337961367
- 数据集大小: 1662927360
wiki_20220301_simple_tags_nltk_contexts_no_intro
- 特征:
noun1_id: 64位整数noun2_id: 64位整数noun1_bert_id: 64位整数noun2_bert_id: 64位整数adjective1_id: 64位整数adjective2_id: 64位整数schema_id: 64位整数sentence: 字符串mask_position: 64位整数
- 分割:
train:- 字节数: 4022762320
- 样本数: 34644320
- 下载大小: 285243023
- 数据集大小: 4022762320
wiki_20220301_simple_tags_nltk_filtered_noun_pairs
- 特征:
noun1_id: 64位整数noun2_id: 64位整数adjectives_id: 64位整数序列
- 分割:
train:- 字节数: 25983240
- 样本数: 433054
- 下载大小: 4499602
- 数据集大小: 25983240
wiki_20220301_simple_tags_nltk_noun_pairs
- 特征:
noun1_id: 32位整数noun2_id: 32位整数adjectives_id: 32位整数序列
- 分割:
train:- 字节数: 125583432
- 样本数: 3245260
- 下载大小: 44230314
- 数据集大小: 125583432
wiki_20220301_simple_tags_nltk_nouns
- 特征:
id: 32位整数noun: 字符串count: 64位整数
- 分割:
train:- 字节数: 221774
- 样本数: 9521
- 下载大小: 154872
- 数据集大小: 221774
wiki_20220301_simple_tags_nltk_phrases
- 特征:
adjective_id: 32位整数noun_id: 32位整数count: 64位整数
- 分割:
train:- 字节数: 3514128
- 样本数: 219633
- 下载大小: 1993091
- 数据集大小: 3514128
wiki_20220301_simple_tags_nltk_scenarios
- 特征:
noun1_id: 无符号32位整数noun2_id: 无符号32位整数adjectives_id: 无符号32位整数序列epsilons: 64位浮点数序列
- 分割:
train:- 字节数: 2702256960
- 样本数: 51966480
- 下载大小: 553286399
- 数据集大小: 2702256960
wiki_20220301_simple_tags_nltk_scenarios_epsilon
- 特征:
noun1_id: 64位整数noun2_id: 64位整数adjectives_id: 64位整数序列epsilons: 64位浮点数序列adjectives_entropy: 64位浮点数
- 分割:
train:- 字节数: 4157318400
- 样本数: 51966480
- 下载大小: 0
- 数据集大小: 4157318400



