Someman/hindi-summarization
收藏Hugging Face2023-05-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Someman/hindi-summarization
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- summarization
language: hi
original_source: >-
https://www.kaggle.com/datasets/disisbig/hindi-text-short-and-large-summarization-corpus
dataset_info:
features:
- name: headline
dtype: string
- name: summary
dtype: string
- name: article
dtype: string
splits:
- name: train
num_bytes: 410722079.5542422
num_examples: 55226
- name: test
num_bytes: 102684238.44575782
num_examples: 13807
- name: valid
num_bytes: 128376473
num_examples: 17265
download_size: 150571314
dataset_size: 641782791
pretty_name: hindi summarization
size_categories:
- 10K<n<100K
---
# Dataset Card for Dataset Name
## Dataset Description
- Homepage: https://www.kaggle.com/datasets/disisbig/hindi-text-short-and-large-summarization-corpus?select=test.csv
### Dataset Summary
Hindi Text Short and Large Summarization Corpus is a collection of ~180k articles with their headlines and summary collected from Hindi News Websites.
This is a first of its kind Dataset in Hindi which can be used to benchmark models for Text summarization in Hindi. This does not contain articles contained in Hindi Text Short Summarization Corpus which is being released parallely with this Dataset.
The dataset retains original punctuation, numbers etc in the articles.
### Languages
The language is Hindi.
### Licensing Information
MIT
### Citation Information
https://www.kaggle.com/datasets/disisbig/hindi-text-short-and-large-summarization-corpus?select=test.csv
### Contributions
提供机构:
Someman
原始信息汇总
数据集概述
数据集名称
- 名称:Hindi Text Short and Large Summarization Corpus
数据集描述
- 描述:该数据集包含约180,000篇来自印度新闻网站的文章,每篇文章都附有标题和摘要。这是首个用于评估印度语文本摘要模型的数据集。
语言
- 语言:印度语
许可信息
- 许可:MIT
数据集特征
- 特征:
- 名称:headline
- 类型:字符串
- 名称:summary
- 类型:字符串
- 名称:article
- 类型:字符串
- 名称:headline
数据集拆分
- 训练集:
- 示例数:55,226
- 字节数:410,722,079.5542422
- 测试集:
- 示例数:13,807
- 字节数:102,684,238.44575782
- 验证集:
- 示例数:17,265
- 字节数:128,376,473
数据集大小
- 下载大小:150,571,314字节
- 数据集大小:641,782,791字节
数据集类别
- 大小类别:10K<n<100K



