five

MoritzLaurer/dataset_test_disaggregated_nli

收藏
Hugging Face2023-11-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MoritzLaurer/dataset_test_disaggregated_nli
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: mnli_m path: data/mnli_m-* - split: mnli_mm path: data/mnli_mm-* - split: fevernli path: data/fevernli-* - split: anli_r1 path: data/anli_r1-* - split: anli_r2 path: data/anli_r2-* - split: anli_r3 path: data/anli_r3-* - split: wanli path: data/wanli-* - split: lingnli path: data/lingnli-* - split: wellformedquery path: data/wellformedquery-* - split: rottentomatoes path: data/rottentomatoes-* - split: amazonpolarity path: data/amazonpolarity-* - split: imdb path: data/imdb-* - split: yelpreviews path: data/yelpreviews-* - split: hatexplain path: data/hatexplain-* - split: massive path: data/massive-* - split: banking77 path: data/banking77-* - split: emotiondair path: data/emotiondair-* - split: emocontext path: data/emocontext-* - split: empathetic path: data/empathetic-* - split: agnews path: data/agnews-* - split: yahootopics path: data/yahootopics-* - split: biasframes_sex path: data/biasframes_sex-* - split: biasframes_offensive path: data/biasframes_offensive-* - split: biasframes_intent path: data/biasframes_intent-* - split: financialphrasebank path: data/financialphrasebank-* - split: appreviews path: data/appreviews-* - split: hateoffensive path: data/hateoffensive-* - split: trueteacher path: data/trueteacher-* - split: spam path: data/spam-* - split: wikitoxic_toxicaggregated path: data/wikitoxic_toxicaggregated-* - split: wikitoxic_obscene path: data/wikitoxic_obscene-* - split: wikitoxic_identityhate path: data/wikitoxic_identityhate-* - split: wikitoxic_threat path: data/wikitoxic_threat-* - split: wikitoxic_insult path: data/wikitoxic_insult-* - split: manifesto path: data/manifesto-* - split: capsotu path: data/capsotu-* dataset_info: features: - name: text dtype: string - name: hypothesis dtype: string - name: labels dtype: class_label: names: '0': entailment '1': not_entailment - name: task_name dtype: string - name: label_text dtype: string splits: - name: mnli_m num_bytes: 2055427 num_examples: 9815 - name: mnli_mm num_bytes: 2181179 num_examples: 9832 - name: fevernli num_bytes: 7532028 num_examples: 19652 - name: anli_r1 num_bytes: 433064 num_examples: 1000 - name: anli_r2 num_bytes: 432927 num_examples: 1000 - name: anli_r3 num_bytes: 501290 num_examples: 1200 - name: wanli num_bytes: 940472 num_examples: 5000 - name: lingnli num_bytes: 1078241 num_examples: 4893 - name: wellformedquery num_bytes: 815799 num_examples: 5934 - name: rottentomatoes num_bytes: 493664 num_examples: 2132 - name: amazonpolarity num_bytes: 10798222 num_examples: 20000 - name: imdb num_bytes: 27862150 num_examples: 20000 - name: yelpreviews num_bytes: 15688830 num_examples: 20000 - name: hatexplain num_bytes: 710204 num_examples: 2922 - name: massive num_bytes: 23911774 num_examples: 175466 - name: banking77 num_bytes: 40018400 num_examples: 221760 - name: emotiondair num_bytes: 2202560 num_examples: 12000 - name: emocontext num_bytes: 3575972 num_examples: 22036 - name: empathetic num_bytes: 52139926 num_examples: 81344 - name: agnews num_bytes: 9630696 num_examples: 30400 - name: yahootopics num_bytes: 343270530 num_examples: 500000 - name: biasframes_sex num_bytes: 1830030 num_examples: 8808 - name: biasframes_offensive num_bytes: 1785704 num_examples: 7676 - name: biasframes_intent num_bytes: 1592094 num_examples: 7296 - name: financialphrasebank num_bytes: 514854 num_examples: 2070 - name: appreviews num_bytes: 2414054 num_examples: 8000 - name: hateoffensive num_bytes: 493480 num_examples: 2586 - name: trueteacher num_bytes: 24821652 num_examples: 17910 - name: spam num_bytes: 292810 num_examples: 2070 - name: wikitoxic_toxicaggregated num_bytes: 9026954 num_examples: 20000 - name: wikitoxic_obscene num_bytes: 7951550 num_examples: 17382 - name: wikitoxic_identityhate num_bytes: 5734460 num_examples: 11424 - name: wikitoxic_threat num_bytes: 5174652 num_examples: 10422 - name: wikitoxic_insult num_bytes: 7364528 num_examples: 16854 - name: manifesto num_bytes: 417565056 num_examples: 953008 - name: capsotu num_bytes: 24646828 num_examples: 70455 download_size: 10536386 dataset_size: 1057482061 --- # Dataset Card for "dataset_test_disaggregated_nli" Dataset for testing a universal classifier. Additional information and training code available here: https://github.com/MoritzLaurer/zeroshot-classifier
提供机构:
MoritzLaurer
原始信息汇总

数据集概述

数据集配置

  • 默认配置:包含多个数据文件,每个文件对应不同的数据分割。

数据文件列表

  • mnli_m:路径为 data/mnli_m-*
  • mnli_mm:路径为 data/mnli_mm-*
  • fevernli:路径为 data/fevernli-*
  • anli_r1:路径为 data/anli_r1-*
  • anli_r2:路径为 data/anli_r2-*
  • anli_r3:路径为 data/anli_r3-*
  • wanli:路径为 data/wanli-*
  • lingnli:路径为 data/lingnli-*
  • wellformedquery:路径为 data/wellformedquery-*
  • rottentomatoes:路径为 data/rottentomatoes-*
  • amazonpolarity:路径为 data/amazonpolarity-*
  • imdb:路径为 data/imdb-*
  • yelpreviews:路径为 data/yelpreviews-*
  • hatexplain:路径为 data/hatexplain-*
  • massive:路径为 data/massive-*
  • banking77:路径为 data/banking77-*
  • emotiondair:路径为 data/emotiondair-*
  • emocontext:路径为 data/emocontext-*
  • empathetic:路径为 data/empathetic-*
  • agnews:路径为 data/agnews-*
  • yahootopics:路径为 data/yahootopics-*
  • biasframes_sex:路径为 data/biasframes_sex-*
  • biasframes_offensive:路径为 data/biasframes_offensive-*
  • biasframes_intent:路径为 data/biasframes_intent-*
  • financialphrasebank:路径为 data/financialphrasebank-*
  • appreviews:路径为 data/appreviews-*
  • hateoffensive:路径为 data/hateoffensive-*
  • trueteacher:路径为 data/trueteacher-*
  • spam:路径为 data/spam-*
  • wikitoxic_toxicaggregated:路径为 data/wikitoxic_toxicaggregated-*
  • wikitoxic_obscene:路径为 data/wikitoxic_obscene-*
  • wikitoxic_identityhate:路径为 data/wikitoxic_identityhate-*
  • wikitoxic_threat:路径为 data/wikitoxic_threat-*
  • wikitoxic_insult:路径为 data/wikitoxic_insult-*
  • manifesto:路径为 data/manifesto-*
  • capsotu:路径为 data/capsotu-*

数据集信息

特征

  • text:数据类型为 string
  • hypothesis:数据类型为 string
  • labels:数据类型为 class_label,包含两个类别:entailmentnot_entailment
  • task_name:数据类型为 string
  • label_text:数据类型为 string

数据分割

  • mnli_m:2055427 字节,9815 个样本
  • mnli_mm:2181179 字节,9832 个样本
  • fevernli:7532028 字节,19652 个样本
  • anli_r1:433064 字节,1000 个样本
  • anli_r2:432927 字节,1000 个样本
  • anli_r3:501290 字节,1200 个样本
  • wanli:940472 字节,5000 个样本
  • lingnli:1078241 字节,4893 个样本
  • wellformedquery:815799 字节,5934 个样本
  • rottentomatoes:493664 字节,2132 个样本
  • amazonpolarity:10798222 字节,20000 个样本
  • imdb:27862150 字节,20000 个样本
  • yelpreviews:15688830 字节,20000 个样本
  • hatexplain:710204 字节,2922 个样本
  • massive:23911774 字节,175466 个样本
  • banking77:40018400 字节,221760 个样本
  • emotiondair:2202560 字节,12000 个样本
  • emocontext:3575972 字节,22036 个样本
  • empathetic:52139926 字节,81344 个样本
  • agnews:9630696 字节,30400 个样本
  • yahootopics:343270530 字节,500000 个样本
  • biasframes_sex:1830030 字节,8808 个样本
  • biasframes_offensive:1785704 字节,7676 个样本
  • biasframes_intent:1592094 字节,7296 个样本
  • financialphrasebank:514854 字节,2070 个样本
  • appreviews:2414054 字节,8000 个样本
  • hateoffensive:493480 字节,2586 个样本
  • trueteacher:24821652 字节,17910 个样本
  • spam:292810 字节,2070 个样本
  • wikitoxic_toxicaggregated:9026954 字节,20000 个样本
  • wikitoxic_obscene:7951550 字节,17382 个样本
  • wikitoxic_identityhate:5734460 字节,11424 个样本
  • wikitoxic_threat:5174652 字节,10422 个样本
  • wikitoxic_insult:7364528 字节,16854 个样本
  • manifesto:417565056 字节,953008 个样本
  • capsotu:24646828 字节,70455 个样本

数据集大小

  • 下载大小:10536386 字节
  • 数据集大小:1057482061 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作