five

ISNA-Set

收藏
arXiv2018-08-22 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1808.07046v1
下载链接
链接失效反馈
官方服务:
资源简介:
ISNA-Set是由伊朗学生新闻社(ISNA)创建的英文新闻数据集,包含1015篇新闻,涵盖政治、经济等多个类别。数据集通过网络爬虫从ISNA英文网站收集,经过HTML解析、数据清洗和模型生成等步骤构建。该数据集旨在支持自然语言处理研究,特别是新闻文本的情感分析、实体提取和词性标注等领域。

ISNA-Set is an English-language news dataset developed by the Iranian Students News Agency (ISNA). It consists of 1015 news articles spanning multiple categories including politics, economy and others. The dataset was gathered from the official English website of ISNA using web crawling techniques, and was constructed through a series of processing steps such as HTML parsing, data cleaning and model generation. This dataset aims to support natural language processing research, especially in domains such as news text sentiment analysis, entity extraction and part-of-speech tagging.
提供机构:
智能计算与智能信息处理卓越中心
创建时间:
2018-08-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作