Enriched Tourism Dataset Paris (POIs)
收藏DataCite Commons2026-05-10 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/Enriched_Tourism_Dataset_Paris_POIs_/27628344
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Contents<br>The dataset includes the following files:<br>### 1. Paris.csv<br>Main enriched dataset containing information about tourism-related POIs in Paris.<br>Direct download:https://figshare.com/ndownloader/files/50293953<br>Format:CSV (UTF-8)<br>### 2. Paris_annotated.csv<br>Ground-truth annotations manually created by human annotators. The annotations categorise POIs into predefined tourism-related categories.<br>Direct download:https://figshare.com/ndownloader/files/56874794<br>Format:CSV (UTF-8)<br>---<br>## Provenance<br>### Original Source<br>The dataset was derived from:<br>Tourpedia:http://tour-pedia.org<br>Original file:http://tour-pedia.org/download/paris-attraction.csv<br>---<br>## Data Processing Pipeline<br>The following processing steps were applied:<br>1. Removal of incomplete entries2. Elimination of duplicated POIs3. Correction of malformed fields4. Validation of geographical coordinates5. Cleaning and normalisation of textual reviews6. Semantic enrichment of POI descriptions7. Manual validation of selected entries8. Human annotation of POI categories<br>---<br>## Dataset Version<br>Version: 1.0<br>Publication date: 2025-10-24<br>Last modification date: 2026-05-10<br>DOI:10.6084/m9.figshare.27628344<br>Persistent URL:https://doi.org/10.6084/m9.figshare.27628344<br>---<br>## License<br>This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).<br>License URL:https://creativecommons.org/licenses/by/4.0/<br>Users are free to:<br>- Share- Adapt- Redistribute<br>provided that appropriate attribution is given.<br>---<br>## Access Information<br>This is an open-access dataset.<br>No authentication or registration is required to access the files.<br>---<br>## Recommended Citation<br>Please cite the following publication when using this dataset:<br>R. Hermoso, S. Ilarri, R. Trillo-Lado, & C. Marzo.Recommending Needles in a Haystack: The SURGE Approach.International Journal of Geographical Information Systems, 2025.<br>Dataset DOI:https://doi.org/10.6084/m9.figshare.27628344<br>---<br># File Description<br>## Paris.csv<br>### Description<br>This file contains the enriched tourism POIs.<br>The dataset includes:<br>- Unique POI identifiers- POI names- Categories- Addresses- Geographical coordinates- Additional POI details- User-generated reviews<br>Reviews were originally collected from:<br>- Google Places- Foursquare- Facebook<br>Different preprocessed versions of the reviews are included, such as:<br>- Raw text- Nouns only- Nouns + verbs- Nouns + adjectives- Nouns + verbs + adjectives<br>---<br>## Paris.csv Columns<br>| Column | Type | Description ||---|---|---|| id | Integer | Unique POI identifier || name | String | Name of the POI || category | String | Tourism category || address | String | Postal address || latitude | Float | WGS84 latitude || longitude | Float | WGS84 longitude || details | String | Additional POI information || review_text | String | User-generated review || review_nouns | String | Review nouns only || review_nouns_verbs | String | Review nouns and verbs || review_nouns_adjectives | String | Review nouns and adjectives || review_nouns_verbs_adjectives | String | Review nouns, verbs, and adjectives |<br>---<br># Paris_annotated.csv<br>## Description<br>This file contains manually created ground-truth annotations for the POIs.<br>Each POI may belong to one or more tourism-related categories.<br>Annotations were manually validated by human annotators.<br>---<br>## Paris_annotated.csv Columns<br>| Column | Type | Description ||---|---|---|| poi_name | String | Name of the POI || address | String | POI address || category_* | Binary | Category membership (1 = belongs to category) |<br>---<br>## Semantic Resources and Interoperability<br>The dataset follows interoperable and reusable standards whenever possible.<br>### Semantic Standards<br>- schema.org/Dataset- schema.org/Place- WGS84 geospatial vocabulary- DataCite Metadata Schema<br>### Related Technologies<br>- Knowledge Graphs- Semantic enrichment- Recommender systems- Natural Language Processing<br>### Suggested Controlled Keywords<br>- Tourism recommender systems- Artificial Intelligence- Machine Learning- Knowledge Graphs- Semantic Web- Tourism Analytics- POI Recommendation- schema.org- GeoNames- WGS84<br>---<br>## Metadata Standards<br>The dataset metadata is exposed through:<br>- DataCite Metadata Schema- schema.org- JSON-LD<br>---<br>## Data Reusability Notes<br>The dataset is provided in open, text-based formats to facilitate:<br>- Long-term preservation- Reproducibility- Reusability- Interoperability<br>Recommended software:<br>- Python- R- MATLAB- Weka- RapidMiner- Apache Spark- Pandas<br>---<br>## Character Encoding<br>All files use:<br>UTF-8 encoding<br>---<br>## File Formats<br>- CSV- Markdown- JSON-LD<br>These formats are open and machine-readable.<br>---<br>## Dataset Size<br>Approximate dataset size:1.7 MB<br>---<br>## Checksums<br>### Paris.csv<br>SHA256:TO_BE_COMPLETED<br>### Paris_annotated.csv<br>SHA256:TO_BE_COMPLETED<br>---<br>## Authors<br>### Ramon HermosoUniversity of ZaragozaORCID:https://orcid.org/0000-0002-1517-2820<br>### Sergio IlarriUniversity of Zaragoza<br>### Raquel Trillo-LadoUniversity of Zaragoza<br>---<br>## Related Resources<br>### Dataset DOI<br>https://doi.org/10.6084/m9.figshare.27628344<br>### Related Publication<br>https://doi.org/10.1080/13658816.2025.2582692<br>### Original Dataset<br>http://tour-pedia.org<br>---<br>## Contact<br>For questions, issues, or collaborations related to this dataset, please contact the dataset authors through their institutional affiliations.<br>
提供机构:
figshare
创建时间:
2024-11-07



