five

kinianlo/prlang

收藏
Hugging Face2024-02-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kinianlo/prlang
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: conceptnet5_vocabulary_en features: - name: word dtype: string - name: tag dtype: string splits: - name: train num_bytes: 123167929 num_examples: 6846008 download_size: 45799508 dataset_size: 123167929 - config_name: wiki_20220301_en_nltk_adjectives features: - name: adj_id dtype: uint32 - name: adj dtype: string - name: count dtype: uint64 splits: - name: train num_bytes: 39119443 num_examples: 1323576 download_size: 24403987 dataset_size: 39119443 - config_name: wiki_20220301_en_nltk_nouns features: - name: noun_id dtype: uint32 - name: noun dtype: string splits: - name: train num_bytes: 12442756.0 num_examples: 676770 download_size: 11115529 dataset_size: 12442756.0 - config_name: wiki_20220301_en_nltk_phrases features: - name: phrase_id dtype: uint32 - name: adj_id dtype: uint32 - name: noun_id dtype: uint32 - name: count dtype: uint64 splits: - name: train num_bytes: 207602960 num_examples: 10380148 download_size: 129734024 dataset_size: 207602960 - config_name: wiki_20220301_en_nltk_phrases_with_string features: - name: phrase_id dtype: uint32 - name: adj dtype: string - name: noun dtype: string - name: count dtype: uint64 splits: - name: train num_bytes: 377124084 num_examples: 10380148 download_size: 172157247 dataset_size: 377124084 - config_name: wiki_20220301_en_nltk_uncased_adjectives features: - name: adj_id dtype: uint32 - name: adj dtype: string - name: count dtype: uint64 splits: - name: train num_bytes: 36784396 num_examples: 1235601 download_size: 22724468 dataset_size: 36784396 - config_name: wiki_20220301_en_nltk_uncased_nouns features: - name: noun_id dtype: uint32 - name: noun dtype: string - name: count dtype: uint64 splits: - name: train num_bytes: 17153952 num_examples: 647524 download_size: 10809791 dataset_size: 17153952 - config_name: wiki_20220301_en_nltk_uncased_phrases features: - name: phrase_id dtype: uint32 - name: adj_id dtype: uint32 - name: noun_id dtype: uint32 - name: count dtype: uint64 splits: - name: train num_bytes: 198626820 num_examples: 9931341 download_size: 124034311 dataset_size: 198626820 - config_name: wiki_20220301_en_nltk_uncased_phrases_clean features: - name: phrase_id dtype: uint32 - name: adj_id dtype: uint32 - name: noun_id dtype: uint32 - name: count dtype: uint64 splits: - name: train num_bytes: 67986800 num_examples: 3399340 download_size: 41983842 dataset_size: 67986800 - config_name: wiki_20220301_en_nltk_uncased_phrases_with_string features: - name: phrase_id dtype: uint32 - name: adj dtype: string - name: noun dtype: string - name: count dtype: uint64 splits: - name: train num_bytes: 361160989 num_examples: 9931341 download_size: 164282553 dataset_size: 361160989 - config_name: wiki_20220301_simple_tags_nltk_adjectives features: - name: id dtype: int32 - name: adjective dtype: string - name: count dtype: int64 splits: - name: train num_bytes: 508056 num_examples: 21152 download_size: 351437 dataset_size: 508056 - config_name: wiki_20220301_simple_tags_nltk_contexts features: - name: noun1_id dtype: int64 - name: noun2_id dtype: int64 - name: noun1_bert_id dtype: int64 - name: noun2_bert_id dtype: int64 - name: adjective1_id dtype: int64 - name: adjective2_id dtype: int64 - name: schema_id dtype: int64 - name: sentence dtype: string - name: mask_position dtype: int64 splits: - name: train num_bytes: 5162738640 num_examples: 34644320 download_size: 562170983 dataset_size: 5162738640 - config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon features: - name: noun1_id dtype: int64 - name: noun2_id dtype: int64 - name: adjective1_id dtype: int64 - name: adjective2_id dtype: int64 - name: schema_id dtype: int64 - name: epsilon dtype: float64 splits: - name: train num_bytes: 1662927360 num_examples: 34644320 download_size: 342106520 dataset_size: 1662927360 - config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro features: - name: noun1_id dtype: int64 - name: noun2_id dtype: int64 - name: adjective1_id dtype: int64 - name: adjective2_id dtype: int64 - name: schema_id dtype: int64 - name: epsilon dtype: float64 splits: - name: train num_bytes: 1662927360 num_examples: 34644320 download_size: 337961367 dataset_size: 1662927360 - config_name: wiki_20220301_simple_tags_nltk_contexts_no_intro features: - name: noun1_id dtype: int64 - name: noun2_id dtype: int64 - name: noun1_bert_id dtype: int64 - name: noun2_bert_id dtype: int64 - name: adjective1_id dtype: int64 - name: adjective2_id dtype: int64 - name: schema_id dtype: int64 - name: sentence dtype: string - name: mask_position dtype: int64 splits: - name: train num_bytes: 4022762320 num_examples: 34644320 download_size: 285243023 dataset_size: 4022762320 - config_name: wiki_20220301_simple_tags_nltk_filtered_noun_pairs features: - name: noun1_id dtype: int64 - name: noun2_id dtype: int64 - name: adjectives_id sequence: int64 splits: - name: train num_bytes: 25983240 num_examples: 433054 download_size: 4499602 dataset_size: 25983240 - config_name: wiki_20220301_simple_tags_nltk_noun_pairs features: - name: noun1_id dtype: int32 - name: noun2_id dtype: int32 - name: adjectives_id sequence: int32 splits: - name: train num_bytes: 125583432 num_examples: 3245260 download_size: 44230314 dataset_size: 125583432 - config_name: wiki_20220301_simple_tags_nltk_nouns features: - name: id dtype: int32 - name: noun dtype: string - name: count dtype: int64 splits: - name: train num_bytes: 221774 num_examples: 9521 download_size: 154872 dataset_size: 221774 - config_name: wiki_20220301_simple_tags_nltk_phrases features: - name: adjective_id dtype: int32 - name: noun_id dtype: int32 - name: count dtype: int64 splits: - name: train num_bytes: 3514128 num_examples: 219633 download_size: 1993091 dataset_size: 3514128 - config_name: wiki_20220301_simple_tags_nltk_scenarios features: - name: noun1_id dtype: uint32 - name: noun2_id dtype: uint32 - name: adjectives_id sequence: uint32 - name: epsilons sequence: float64 splits: - name: train num_bytes: 2702256960 num_examples: 51966480 download_size: 553286399 dataset_size: 2702256960 - config_name: wiki_20220301_simple_tags_nltk_scenarios_epsilon features: - name: noun1_id dtype: int64 - name: noun2_id dtype: int64 - name: adjectives_id sequence: int64 - name: epsilons sequence: float64 - name: adjectives_entropy dtype: float64 splits: - name: train num_bytes: 4157318400 num_examples: 51966480 download_size: 0 dataset_size: 4157318400 configs: - config_name: conceptnet5_vocabulary_en data_files: - split: train path: conceptnet5_vocabulary_en/train-* - config_name: wiki_20220301_en_nltk_adjectives data_files: - split: train path: wiki_20220301_en_nltk_adjectives/train-* - config_name: wiki_20220301_en_nltk_nouns data_files: - split: train path: wiki_20220301_en_nltk_nouns/train-* - config_name: wiki_20220301_en_nltk_phrases data_files: - split: train path: wiki_20220301_en_nltk_phrases/train-* - config_name: wiki_20220301_en_nltk_phrases_with_string data_files: - split: train path: wiki_20220301_en_nltk_phrases_with_string/train-* - config_name: wiki_20220301_en_nltk_uncased_adjectives data_files: - split: train path: wiki_20220301_en_nltk_uncased_adjectives/train-* - config_name: wiki_20220301_en_nltk_uncased_nouns data_files: - split: train path: wiki_20220301_en_nltk_uncased_nouns/train-* - config_name: wiki_20220301_en_nltk_uncased_phrases data_files: - split: train path: wiki_20220301_en_nltk_uncased_phrases/train-* - config_name: wiki_20220301_en_nltk_uncased_phrases_clean data_files: - split: train path: wiki_20220301_en_nltk_uncased_phrases_clean/train-* - config_name: wiki_20220301_en_nltk_uncased_phrases_with_string data_files: - split: train path: wiki_20220301_en_nltk_uncased_phrases_with_string/train-* - config_name: wiki_20220301_simple_tags_nltk_adjectives data_files: - split: train path: wiki_20220301_simple_tags_nltk_adjectives/train-* - config_name: wiki_20220301_simple_tags_nltk_contexts data_files: - split: train path: wiki_20220301_simple_tags_nltk_contexts/train-* - config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon data_files: - split: train path: wiki_20220301_simple_tags_nltk_contexts_epsilon/train-* - config_name: wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro data_files: - split: train path: wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro/train-* - config_name: wiki_20220301_simple_tags_nltk_contexts_no_intro data_files: - split: train path: wiki_20220301_simple_tags_nltk_contexts_no_intro/train-* - config_name: wiki_20220301_simple_tags_nltk_filtered_noun_pairs data_files: - split: train path: wiki_20220301_simple_tags_nltk_filtered_noun_pairs/train-* - config_name: wiki_20220301_simple_tags_nltk_noun_pairs data_files: - split: train path: wiki_20220301_simple_tags_nltk_noun_pairs/train-* - config_name: wiki_20220301_simple_tags_nltk_nouns data_files: - split: train path: wiki_20220301_simple_tags_nltk_nouns/train-* - config_name: wiki_20220301_simple_tags_nltk_phrases data_files: - split: train path: wiki_20220301_simple_tags_nltk_phrases/train-* - config_name: wiki_20220301_simple_tags_nltk_scenarios data_files: - split: train path: wiki_20220301_simple_tags_nltk_scenarios/train-* - config_name: wiki_20220301_simple_tags_nltk_scenarios_epsilon data_files: - split: train path: wiki_20220301_simple_tags_nltk_scenarios_epsilon/train-* --- # Dataset Card for "prlang" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
kinianlo
原始信息汇总

数据集概述

数据集配置

conceptnet5_vocabulary_en

  • 特征:
    • word: 字符串
    • tag: 字符串
  • 分割:
    • train:
      • 字节数: 123167929
      • 样本数: 6846008
  • 下载大小: 45799508
  • 数据集大小: 123167929

wiki_20220301_en_nltk_adjectives

  • 特征:
    • adj_id: 无符号32位整数
    • adj: 字符串
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 39119443
      • 样本数: 1323576
  • 下载大小: 24403987
  • 数据集大小: 39119443

wiki_20220301_en_nltk_nouns

  • 特征:
    • noun_id: 无符号32位整数
    • noun: 字符串
  • 分割:
    • train:
      • 字节数: 12442756.0
      • 样本数: 676770
  • 下载大小: 11115529
  • 数据集大小: 12442756.0

wiki_20220301_en_nltk_phrases

  • 特征:
    • phrase_id: 无符号32位整数
    • adj_id: 无符号32位整数
    • noun_id: 无符号32位整数
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 207602960
      • 样本数: 10380148
  • 下载大小: 129734024
  • 数据集大小: 207602960

wiki_20220301_en_nltk_phrases_with_string

  • 特征:
    • phrase_id: 无符号32位整数
    • adj: 字符串
    • noun: 字符串
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 377124084
      • 样本数: 10380148
  • 下载大小: 172157247
  • 数据集大小: 377124084

wiki_20220301_en_nltk_uncased_adjectives

  • 特征:
    • adj_id: 无符号32位整数
    • adj: 字符串
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 36784396
      • 样本数: 1235601
  • 下载大小: 22724468
  • 数据集大小: 36784396

wiki_20220301_en_nltk_uncased_nouns

  • 特征:
    • noun_id: 无符号32位整数
    • noun: 字符串
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 17153952
      • 样本数: 647524
  • 下载大小: 10809791
  • 数据集大小: 17153952

wiki_20220301_en_nltk_uncased_phrases

  • 特征:
    • phrase_id: 无符号32位整数
    • adj_id: 无符号32位整数
    • noun_id: 无符号32位整数
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 198626820
      • 样本数: 9931341
  • 下载大小: 124034311
  • 数据集大小: 198626820

wiki_20220301_en_nltk_uncased_phrases_clean

  • 特征:
    • phrase_id: 无符号32位整数
    • adj_id: 无符号32位整数
    • noun_id: 无符号32位整数
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 67986800
      • 样本数: 3399340
  • 下载大小: 41983842
  • 数据集大小: 67986800

wiki_20220301_en_nltk_uncased_phrases_with_string

  • 特征:
    • phrase_id: 无符号32位整数
    • adj: 字符串
    • noun: 字符串
    • count: 无符号64位整数
  • 分割:
    • train:
      • 字节数: 361160989
      • 样本数: 9931341
  • 下载大小: 164282553
  • 数据集大小: 361160989

wiki_20220301_simple_tags_nltk_adjectives

  • 特征:
    • id: 32位整数
    • adjective: 字符串
    • count: 64位整数
  • 分割:
    • train:
      • 字节数: 508056
      • 样本数: 21152
  • 下载大小: 351437
  • 数据集大小: 508056

wiki_20220301_simple_tags_nltk_contexts

  • 特征:
    • noun1_id: 64位整数
    • noun2_id: 64位整数
    • noun1_bert_id: 64位整数
    • noun2_bert_id: 64位整数
    • adjective1_id: 64位整数
    • adjective2_id: 64位整数
    • schema_id: 64位整数
    • sentence: 字符串
    • mask_position: 64位整数
  • 分割:
    • train:
      • 字节数: 5162738640
      • 样本数: 34644320
  • 下载大小: 562170983
  • 数据集大小: 5162738640

wiki_20220301_simple_tags_nltk_contexts_epsilon

  • 特征:
    • noun1_id: 64位整数
    • noun2_id: 64位整数
    • adjective1_id: 64位整数
    • adjective2_id: 64位整数
    • schema_id: 64位整数
    • epsilon: 64位浮点数
  • 分割:
    • train:
      • 字节数: 1662927360
      • 样本数: 34644320
  • 下载大小: 342106520
  • 数据集大小: 1662927360

wiki_20220301_simple_tags_nltk_contexts_epsilon_no_intro

  • 特征:
    • noun1_id: 64位整数
    • noun2_id: 64位整数
    • adjective1_id: 64位整数
    • adjective2_id: 64位整数
    • schema_id: 64位整数
    • epsilon: 64位浮点数
  • 分割:
    • train:
      • 字节数: 1662927360
      • 样本数: 34644320
  • 下载大小: 337961367
  • 数据集大小: 1662927360

wiki_20220301_simple_tags_nltk_contexts_no_intro

  • 特征:
    • noun1_id: 64位整数
    • noun2_id: 64位整数
    • noun1_bert_id: 64位整数
    • noun2_bert_id: 64位整数
    • adjective1_id: 64位整数
    • adjective2_id: 64位整数
    • schema_id: 64位整数
    • sentence: 字符串
    • mask_position: 64位整数
  • 分割:
    • train:
      • 字节数: 4022762320
      • 样本数: 34644320
  • 下载大小: 285243023
  • 数据集大小: 4022762320

wiki_20220301_simple_tags_nltk_filtered_noun_pairs

  • 特征:
    • noun1_id: 64位整数
    • noun2_id: 64位整数
    • adjectives_id: 64位整数序列
  • 分割:
    • train:
      • 字节数: 25983240
      • 样本数: 433054
  • 下载大小: 4499602
  • 数据集大小: 25983240

wiki_20220301_simple_tags_nltk_noun_pairs

  • 特征:
    • noun1_id: 32位整数
    • noun2_id: 32位整数
    • adjectives_id: 32位整数序列
  • 分割:
    • train:
      • 字节数: 125583432
      • 样本数: 3245260
  • 下载大小: 44230314
  • 数据集大小: 125583432

wiki_20220301_simple_tags_nltk_nouns

  • 特征:
    • id: 32位整数
    • noun: 字符串
    • count: 64位整数
  • 分割:
    • train:
      • 字节数: 221774
      • 样本数: 9521
  • 下载大小: 154872
  • 数据集大小: 221774

wiki_20220301_simple_tags_nltk_phrases

  • 特征:
    • adjective_id: 32位整数
    • noun_id: 32位整数
    • count: 64位整数
  • 分割:
    • train:
      • 字节数: 3514128
      • 样本数: 219633
  • 下载大小: 1993091
  • 数据集大小: 3514128

wiki_20220301_simple_tags_nltk_scenarios

  • 特征:
    • noun1_id: 无符号32位整数
    • noun2_id: 无符号32位整数
    • adjectives_id: 无符号32位整数序列
    • epsilons: 64位浮点数序列
  • 分割:
    • train:
      • 字节数: 2702256960
      • 样本数: 51966480
  • 下载大小: 553286399
  • 数据集大小: 2702256960

wiki_20220301_simple_tags_nltk_scenarios_epsilon

  • 特征:
    • noun1_id: 64位整数
    • noun2_id: 64位整数
    • adjectives_id: 64位整数序列
    • epsilons: 64位浮点数序列
    • adjectives_entropy: 64位浮点数
  • 分割:
    • train:
      • 字节数: 4157318400
      • 样本数: 51966480
  • 下载大小: 0
  • 数据集大小: 4157318400
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作