five

Enriched Tourism Dataset Paris (POIs)

收藏
doi.org2025-01-15 收录
下载链接:
http://doi.org/10.17632/vh4g4g2322.1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the Paris subset of the Tourpedia dataset, specifically focusing on points of interest (POIs) categorized as attractions (dataset available at http://tour-pedia.org/download/paris-attraction.csv). The original dataset comprises 4,351 entries that encompass a variety of attractions across Paris, providing details on several attributes for each POI. These attributes include a unique identifier, POI name, category, location information (address), latitude, longitude, specific details, and user-generated reviews. The review fields contain textual feedback from users, aggregated from platforms such as Google Places, Foursquare, and Facebook, offering a qualitative insight into each location. However, due to the initial dataset's high proportion of incomplete or inconsistently structured entries, a rigorous cleaning process was implemented. This process entailed the removal of erroneous and incomplete data points, ultimately refining the dataset to 477 entries that meet criteria for quality and structural coherence. These selected entries were subjected to further validation to ensure data integrity, enabling a more accurate representation of Paris' attractions. - Paris.csv It contains columns including a unique identifier, POI name, category, location information (address), latitude, longitude, specific details, and user-generated reviews. Those reviews have been previously retrieved and pre-processed from Google Places, Foursquare, and Facebook, and have different formats: all words, only nouns, nouns + verbs, noun + adjectives and nouns + verbs + adjectives. - Paris_annotated.csv It contains the ground truth relating to the previous dataset, with manual annotations made by humans on the categorisation of each of the POIs into 12 different pre-defined categories. It has the following columns: * POI name * POI's address * One column for each of the above categories. 1 means that the POI belongs to the category while blank indicates that it does not.

本数据集收录了Tourpedia数据集的巴黎子集,专指被归类为旅游景点(POI)的地点(数据集可从http://tour-pedia.org/download/paris-attraction.csv获取)。原始数据集包含4,351条记录,涵盖了巴黎众多景点的多样化信息,并为每个POI提供了多个属性细节。这些属性包括唯一标识符、POI名称、类别、地理位置信息(地址)、纬度、经度、具体细节以及用户生成评论。评论字段包含了来自Google Places、Foursquare和Facebook等平台的用户文本反馈,为每个地点提供了质的洞察。 然而,由于原始数据集中存在大量不完整或结构不一致的条目,因此实施了一系列严格的清洗过程。此过程涉及去除错误和不完整的数据点,最终将数据集精炼至477条符合质量和结构一致性的记录。这些精选条目进一步经过人工验证,以确保数据完整性,从而更准确地反映巴黎的旅游景点。 - Paris.csv 它包含包括唯一标识符、POI名称、类别、地理位置信息(地址)、纬度、经度、具体细节和用户生成评论在内的列。这些评论已从Google Places、Foursquare和Facebook等平台检索并预处理,具有不同的格式:所有单词、仅名词、名词+动词、名词+形容词以及名词+动词+形容词。 - Paris_annotated.csv 它包含了关于前述数据集的基准真实信息,其中人类对每个POI的分类进行了12个预先定义类别的手动标注。它具有以下列: * POI名称 * POI的地址 * 每个上述类别的单独一列。1表示POI属于该类别,而空白表示不属于。
提供机构:
Mendeley Data
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作