five

Enriched Tourism Dataset London (POIs)

收藏
DataCite Commons2026-05-10 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/Enriched_Tourism_Dataset_London_POIs_/27628029
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Enriched Tourism Dataset London (POIs)</b>====================================================================<b>OVERVIEW</b>====================================================================This dataset contains an enriched and curated subset of Points of Interest (POIs) located in London, derived from the original Tourpedia dataset. The dataset has been specifically designed for research purposes in areas such as recommender systems, tourism analytics, natural language processing, machine learning, semantic enrichment, and knowledge graph construction.The original Tourpedia London attractions dataset contained several thousand entries collected from multiple online social platforms. Due to the presence of incomplete, duplicated, inconsistent, or noisy entries, a rigorous cleaning and validation process was performed. The final dataset contains validated POIs with high-quality metadata and manually curated annotations.The dataset is intended to support reproducible research in tourism recommendation, semantic analysis, user modelling, and artificial intelligence applications.====================================================================<b>DATASET CONTENTS</b>====================================================================The dataset includes the following files:<b>1. London.csv</b>Main enriched dataset containing information about tourism-related POIs in London.<b>Direct download</b>:https://figshare.com/ndownloader/files/50292426<b>Format</b>:CSV (UTF-8)<b>2. London_annotated.csv</b>Ground-truth annotations manually created by human annotators. The annotations categorise POIs into predefined tourism-related categories.<b>Direct download:</b>https://figshare.com/ndownloader/files/56874800<b>Format</b>:CSV (UTF-8)====================================================================<b>PROVENANCE</b>====================================================================<b>Original Source</b>:Tourpedia:http://tour-pedia.org<b>Original file</b>:http://tour-pedia.org/download/london-attraction.csv====================================================================<b>DATA PROCESSING PIPELINE</b>====================================================================The following processing steps were applied:1. Removal of incomplete entries2. Elimination of duplicated POIs3. Correction of malformed fields4. Validation of geographical coordinates5. Cleaning and normalisation of textual reviews6. Semantic enrichment of POI descriptions7. Manual validation of selected entries8. Human annotation of POI categories====================================================================<b>DATASET VERSION</b>====================================================================Version:1.0DOI:10.6084/m9.figshare.27628029Persistent URL:https://doi.org/10.6084/m9.figshare.27628029====================================================================<b>LICENSE</b>====================================================================This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License URL:https://creativecommons.org/licenses/by/4.0/Users are free to:- Share- Adapt- Redistributeprovided that appropriate attribution is given.====================================================================<b>ACCESS INFORMATION</b>====================================================================This is an open-access dataset.No authentication or registration is required to access the files.====================================================================<b>RECOMMENDED CITATION</b>====================================================================Please cite the following publication when using this dataset:R. Hermoso, S. Ilarri, R. Trillo-Lado, &amp; C. Marzo.Recommending Needles in a Haystack: The SURGE Approach.International Journal of Geographical Information Systems, 2025.<b>Dataset DOI</b>:https://doi.org/10.6084/m9.figshare.27628029====================================================================<b>FILE DESCRIPTION</b>====================================================================London.csvDescription:This file contains the enriched tourism POIs.The dataset includes:- Unique POI identifiers- POI names- Categories- Addresses- Geographical coordinates- Additional POI details- User-generated reviewsReviews were originally collected from:- Google Places- Foursquare- FacebookDifferent preprocessed versions of the reviews are included, such as:- Raw text- Nouns only- Nouns + verbs- Nouns + adjectives- Nouns + verbs + adjectives====================================================================<b>LONDON.CSV COLUMNS</b>====================================================================<b>id</b>Type: IntegerDescription: Unique POI identifier<b>name</b>Type: StringDescription: Name of the POI<b>category</b>Type: StringDescription: Tourism category<b>address</b>Type: StringDescription: Postal address<b>latitude</b>Type: FloatDescription: WGS84 latitude<b>longitude</b>Type: FloatDescription: WGS84 longitude<b>details</b>Type: StringDescription: Additional POI information<b>review_text</b>Type: StringDescription: User-generated review<b>review_nouns</b>Type: StringDescription: Review nouns only<b>review_nouns_verbs</b>Type: StringDescription: Review nouns and verbs<b>review_nouns_adjectives</b>Type: StringDescription: Review nouns and adjectives<b>review_nouns_verbs_adjectives</b>Type: StringDescription: Review nouns, verbs, and adjectives====================================================================<b>LONDON_ANNOTATED.CSV</b>====================================================================<b>Description</b>:This file contains manually created ground-truth annotations for the POIs.Each POI may belong to one or more tourism-related categories.Annotations were manually validated by human annotators.====================================================================<b>LONDON_ANNOTATED.CSV COLUMNS</b>====================================================================<b>poi_name</b>Type: StringDescription: Name of the POI<b>address</b>Type: StringDescription: POI address<b>category_*</b>Type: BinaryDescription: Category membership (1 = belongs to category)====================================================================<b>SEMANTIC RESOURCES AND INTEROPERABILITY</b>====================================================================The dataset follows interoperable and reusable standards whenever possible.Semantic Standards:- schema.org/Dataset- schema.org/Place- WGS84 geospatial vocabulary- DataCite Metadata SchemaRelated Technologies:- Knowledge Graphs- Semantic enrichment- Recommender systems- Natural Language ProcessingSuggested Controlled Keywords:- Tourism recommender systems- Artificial Intelligence- Machine Learning- Semantic Web- Tourism Analytics- POI Recommendation- schema.org- GeoNames- WGS84====================================================================<b>METADATA STANDARDS</b>====================================================================The dataset metadata is exposed through:- DataCite Metadata Schema- schema.org====================================================================<b>DATA REUSABILITY NOTES</b>====================================================================The dataset is provided in open, text-based formats to facilitate:- Long-term preservation- Reproducibility- Reusability- InteroperabilityRecommended software:- Python- R- MATLAB- Weka- RapidMiner- Apache Spark- Pandas====================================================================<b>CHARACTER ENCODING</b>====================================================================All files use UTF-8 encoding.====================================================================<b>FILE FORMATS</b>====================================================================Formats included:- CSV- TXTThese formats are open and machine-readable.====================================================================<b>AUTHORS</b>====================================================================Ramon HermosoUniversity of ZaragozaORCID:https://orcid.org/0000-0002-1517-2820Sergio IlarriUniversity of ZaragozaRaquel Trillo-LadoUniversity of Zaragoza====================================================================<b>RELATED RESOURCES</b>====================================================================<b>Dataset DOI</b>:https://doi.org/10.6084/m9.figshare.27628029<b>Related publication</b>:https://doi.org/10.1080/13658816.2025.2582692<b>Original dataset</b>:http://tour-pedia.org====================================================================<b>CONTACT</b>====================================================================For questions, issues, or collaborations related to this dataset, please contact the dataset authors through their institutional affiliations.
提供机构:
figshare
创建时间:
2024-11-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作