five

India News Headlines Dataset

收藏
www.kaggle.com2023-11-11 更新2025-01-16 收录
下载链接:
https://www.kaggle.com/therohk/india-headlines-news-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
### Context This news dataset is a persistent historical archive of noteable events in the Indian subcontinent from start-2001 to q2-2023, recorded in real-time by the journalists of India. It contains approximately 3.8 million events published by Times of India. A majority of the data is focusing on Indian local news including national, city level and entertainment. Prepared by **Rohit Kulkarni** ### Content Time Range : Start Date: 2001-01-01 ; End Date: 2023-06-30 CSV Rows: 3,876,557 Columns: 1. **publish_date**: Date of the article being published online in yyyyMMdd format 2. **headline_category**: Category of the headline, ascii, dot delimited, lowercase values 3. **headline_text**: Text of the Headline in English, only ascii characters ### Inspiration Times Group as a news agency, reaches out a very wide audience across Asia and drawfs every other agency in the quantity of English articles published per day. Due to the heavy daily volume (avg. 600 articles) over multiple years, this data offers a deep insight into Indian society, its priorities, events, issues and talking points and how they have unfolded over time. It is possible to chop this dataset into a smaller piece based on one or more facets. - Time Range: Headlines during 2006 Mumbai bombings, 2014 election, ongoing health crisis - One or more Categories: like [Citywise](/therohk/india-news-publishing-trends-and-cities), Bollywood, ICC updates, Magazine, Middle East - One or more Keywords: like crime or ecology related [tokens](https://www.kaggle.com/code/therohk/times-of-india-tokens-library), names of political parties, celebrities, corporations. Similar news datasets exploring other attributes, countries and topics can be seen on my profile.

{'Context': '本新闻数据集系印度次大陆自2001年起至2023年第二季度间的显著事件之持久历史档案,由印度记者实时记录。该数据集包含由《印度时报》发布的约380万条事件记录。', 'Content': {'Time Range': '时间范围:起始日期:2001-01-01;结束日期:2023-06-30', 'CSV Rows': 'CSV 行数:3,876,557', 'Columns': {'publish_date': '发布日期:文章在线发布的日期,格式为yyyyMMdd', 'headline_category': '标题类别:标题的分类,ASCII字符,点分隔,小写值', 'headline_text': '标题文本:标题的英文文本,仅包含ASCII字符'}}, 'Inspiration': '作为新闻机构,Times Group 在亚洲拥有极为广泛的受众基础,其每日发布的英文文章数量远超其他新闻机构。由于多年的高日发行量(平均600篇文章),该数据提供了对印度社会、其优先事项、事件、问题和讨论焦点的深刻洞察,及其随时间的发展变化。可根据一个或多个维度将此数据集分割成更小的部分。 - 时间范围:2006年孟买爆炸事件、2014年选举、持续的健康危机期间的头版新闻 - 一个或多个类别:如[按城市](/therohk/india-news-publishing-trends-and-cities)、宝莱坞、ICC更新、杂志、中东 - 一个或多个关键词:如犯罪或生态相关[标记](https://www.kaggle.com/code/therohk/times-of-india-tokens-library)、政党、名人、公司的名称。 在我的个人资料中可以找到探索其他属性、国家及主题的类似新闻数据集。'}
提供机构:
Kaggle
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含印度时报2001年至2023年间的380万条新闻标题,主要涵盖印度本地新闻,包括国家、城市和娱乐等内容,包含发布日期、标题类别和标题文本三个字段。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作