five

Someman/hindi-summarization

收藏
Hugging Face2023-05-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Someman/hindi-summarization
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - summarization language: hi original_source: >- https://www.kaggle.com/datasets/disisbig/hindi-text-short-and-large-summarization-corpus dataset_info: features: - name: headline dtype: string - name: summary dtype: string - name: article dtype: string splits: - name: train num_bytes: 410722079.5542422 num_examples: 55226 - name: test num_bytes: 102684238.44575782 num_examples: 13807 - name: valid num_bytes: 128376473 num_examples: 17265 download_size: 150571314 dataset_size: 641782791 pretty_name: hindi summarization size_categories: - 10K<n<100K --- # Dataset Card for Dataset Name ## Dataset Description - Homepage: https://www.kaggle.com/datasets/disisbig/hindi-text-short-and-large-summarization-corpus?select=test.csv ### Dataset Summary Hindi Text Short and Large Summarization Corpus is a collection of ~180k articles with their headlines and summary collected from Hindi News Websites. This is a first of its kind Dataset in Hindi which can be used to benchmark models for Text summarization in Hindi. This does not contain articles contained in Hindi Text Short Summarization Corpus which is being released parallely with this Dataset. The dataset retains original punctuation, numbers etc in the articles. ### Languages The language is Hindi. ### Licensing Information MIT ### Citation Information https://www.kaggle.com/datasets/disisbig/hindi-text-short-and-large-summarization-corpus?select=test.csv ### Contributions
提供机构:
Someman
原始信息汇总

数据集概述

数据集名称

  • 名称:Hindi Text Short and Large Summarization Corpus

数据集描述

  • 描述:该数据集包含约180,000篇来自印度新闻网站的文章,每篇文章都附有标题和摘要。这是首个用于评估印度语文本摘要模型的数据集。

语言

  • 语言:印度语

许可信息

  • 许可:MIT

数据集特征

  • 特征:
    • 名称:headline
      • 类型:字符串
    • 名称:summary
      • 类型:字符串
    • 名称:article
      • 类型:字符串

数据集拆分

  • 训练集:
    • 示例数:55,226
    • 字节数:410,722,079.5542422
  • 测试集:
    • 示例数:13,807
    • 字节数:102,684,238.44575782
  • 验证集:
    • 示例数:17,265
    • 字节数:128,376,473

数据集大小

  • 下载大小:150,571,314字节
  • 数据集大小:641,782,791字节

数据集类别

  • 大小类别:10K<n<100K
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作