jrtec/Superheroes
收藏Hugging Face2023-01-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jrtec/Superheroes
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- summarization
language:
- en
tags:
- superheroes
- heroes
- anime
- manga
- marvel
size_categories:
- 1K<n<10K
---
# Dataset Card for Superheroes
## Dataset Description
1400+ Superheroes history and powers description to apply text mining and NLP [Original source](https://www.kaggle.com/datasets/jonathanbesomi/superheroes-nlp-dataset/code?resource=download)
## Context
The aim of this dataset is to make text analytics and NLP even funnier. All of us have dreamed to be like a superhero and save the world, yet we are still on Kaggle figuring out how python works. Then, why not improve our NLP competences by analyzing Superheros' history and powers?
The particularity of this dataset is that it contains categorical and numerical features such as overall_score, intelligence_score, creator, alignment, gender, eye_color but also text features history_text and powers_text. By combining the two, a lot of interesting insights can be gathered!
## Content
We collected all data from superherodb and cooked for you in a nice and clean tabular format.
The dataset contains 1447 different Superheroes. Each superhero row has:
* overall_score - derivated by superherodb from the power stats features. Can you find the relationship?
* history_text - History of the Superhero (text features)
* powers_text - Description of Superheros' powers (text features)
* intelligence_score, strength_score, speed_score, durability_score, power_score and combat_score. (power stats features)
* "Origin" (full_name, alter_egos, …)
* "Connections" (occupation, base, teams, …)
* "Appareance" (gender, type_race, height, weight, eye_color, …)
## Acknowledgements
The following [Github repository](https://github.com/jbesomi/texthero/tree/master/dataset/Superheroes%20NLP%20Dataset) contains the code used to scrape this Dataset.
提供机构:
jrtec
原始信息汇总
数据集概述
基本信息
- 许可证: CC0-1.0
- 任务类别: 摘要
- 语言: 英语
- 标签: 超级英雄, 英雄, 动漫, 漫画, 漫威
- 大小类别: 1K<n<10K
数据集描述
- 名称: Superheroes
- 描述: 包含1400多个超级英雄的历史和能力描述,用于文本挖掘和自然语言处理。
- 原始来源: Kaggle链接
内容
- 数据来源: 从superherodb收集并整理成表格格式。
- 数据集大小: 包含1447个不同的超级英雄。
- 数据字段:
- 整体评分 (overall_score)
- 历史文本 (history_text)
- 能力描述 (powers_text)
- 能力评分 (intelligence_score, strength_score, speed_score, durability_score, power_score, combat_score)
- 起源信息 (full_name, alter_egos, …)
- 关联信息 (occupation, base, teams, …)
- 外观信息 (gender, type_race, height, weight, eye_color, …)
数据集用途
- 目的: 使文本分析和自然语言处理更加有趣,通过分析超级英雄的历史和能力来提高NLP技能。
数据集特点
- 特点: 结合了分类和数值特征以及文本特征,通过这些特征可以获得丰富的洞察。
致谢
- 代码来源: Github链接,包含用于抓取此数据集的代码。



