jinaai/cities_wiki_clustering
收藏Hugging Face2023-10-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jinaai/cities_wiki_clustering
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
---
# WikiCities Clustering Dataset
This dataset was created from the (Wikipedia)[https://huggingface.co/datasets/wikipedia] training dataset by using a list of countries,
retrieving all cities for each country, and then finding their corresponding Wikipedia article in the Wikipedia dataset. Postprocessing
removed the last 25th percentile of countries with fewest city articles, and also took a maximum of 200 articles per country.
The final set has a total of 126 countries, and a total of 3531 cities.
Below is a distribution of cities by country.

提供机构:
jinaai
原始信息汇总
WikiCities Clustering Dataset
数据集概述
- 来源:从Wikipedia训练数据集中创建。
- 创建过程:
- 使用国家列表,获取每个国家的所有城市。
- 在Wikipedia数据集中找到这些城市的相应文章。
- 后处理步骤包括:
- 移除城市文章最少的后25%的国家。
- 每个国家最多保留200篇文章。
- 最终数据集:
- 包含126个国家。
- 包含3531个城市。
数据分布
- 提供了按国家分布的城市数量图表。



