five

fsq-os-places

收藏
魔搭社区2026-01-06 更新2024-12-07 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/fsq-os-places
下载链接
链接失效反馈
官方服务:
资源简介:
Foursquare OS Places is now a gated dataset on Hugging Face. Read more about why we are making this change here: https://medium.com/@foursquare/evolving-fsq-os-places-fa7a3f5197cd # Access FSQ OS Places With Foursquare’s Open Source Places, you can access free data to accelerate geospatial innovation and insights. View the [Places OS Data Schemas](https://docs.foursquare.com/data-products/docs/places-os-data-schema) for a full list of available attributes. ## Prerequisites In order to access Foursquare's Open Source Places data, it is recommended to use Spark. Here is how to load the Places data in Spark from Hugging Face. - For Spark 3, you can use the `read_parquet` helper function from the [HF Spark documentation](https://huggingface.co/docs/hub/datasets-spark). It provides an easy API to load a Spark Dataframe from Hugging Face, without having to download the full dataset locally: ```python places = read_parquet("hf://datasets/foursquare/fsq-os-places/release/dt=2025-12-16/places/parquet/*.parquet") ``` - For Spark 4, there will be an official Hugging Face Spark data source available. Alternatively you can download the following files to your local disk or cluster: - Parquet Files: - **Places** - [release/dt=2025-12-16/places/parquet](https://huggingface.co/datasets/foursquare/fsq-os-places/tree/main/release/dt%3D2025-11-18/places/parquet) - **Categories** - [release/dt=2025-12-16/categories/parquet](https://huggingface.co/datasets/foursquare/fsq-os-places/tree/main/release/dt%3D2025-11-18/categories/parquet) Hugging Face provides the following [download options](https://huggingface.co/docs/hub/datasets-downloading). ## Example Queries The following are examples on how to query FSQ Open Source Places using Athena and Spark: - Filter [Categories](https://docs.foursquare.com/data-products/docs/categories#places-open-source--propremium-flat-file) by the parent level - Filter out [non-commercial venues](#non-commercial-categories-table) - Find open and recently active POI ### Filter by Parent Level Category **SparkSQL** ```sql SparkSQL WITH places_exploded_categories AS ( -- Unnest categories array SELECT fsq_place_id, name, explode(fsq_category_ids) as fsq_category_id FROM places ), distinct_places AS ( SELECT DISTINCT(fsq_place_id) -- Get distinct ids to reduce duplicates from explode function FROM places_exploded_categories p JOIN categories c -- Join to categories to filter on Level 2 Category ON p.fsq_category_id = c.category_id WHERE c.level2_category_id = '4d4b7105d754a06374d81259' -- Restaurants ) SELECT * FROM places WHERE fsq_place_id IN (SELECT fsq_place_id FROM distinct_places) ``` ### Filter out Non-Commercial Categories **SparkSQL** ```sql SparkSQL SELECT * FROM places WHERE arrays_overlap(fsq_category_ids, array('4bf58dd8d48988d1f0931735', -- Airport Gate '62d587aeda6648532de2b88c', -- Beer Festival '4bf58dd8d48988d12b951735', -- Bus Line '52f2ab2ebcbc57f1066b8b3b', -- Christmas Market '50aa9e094b90af0d42d5de0d', -- City '5267e4d9e4b0ec79466e48c6', -- Conference '5267e4d9e4b0ec79466e48c9', -- Convention '530e33ccbcbc57f1066bbff7', -- Country '5345731ebcbc57f1066c39b2', -- County '63be6904847c3692a84b9bb7', -- Entertainment Event '4d4b7105d754a06373d81259', -- Event '5267e4d9e4b0ec79466e48c7', -- Festival '4bf58dd8d48988d132951735', -- Hotel Pool '52f2ab2ebcbc57f1066b8b4c', -- Intersection '50aaa4314b90af0d42d5de10', -- Island '58daa1558bbb0b01f18ec1fa', -- Line '63be6904847c3692a84b9bb8', -- Marketplace '4f2a23984b9023bd5841ed2c', -- Moving Target '5267e4d9e4b0ec79466e48d1', -- Music Festival '4f2a25ac4b909258e854f55f', -- Neighborhood '5267e4d9e4b0ec79466e48c8', -- Other Event '52741d85e4b0d5d1e3c6a6d9', -- Parade '4bf58dd8d48988d1f7931735', -- Plane '4f4531504b9074f6e4fb0102', -- Platform '4cae28ecbf23941eb1190695', -- Polling Place '4bf58dd8d48988d1f9931735', -- Road '5bae9231bedf3950379f89c5', -- Sporting Event '530e33ccbcbc57f1066bbff8', -- State '530e33ccbcbc57f1066bbfe4', -- States and Municipalities '52f2ab2ebcbc57f1066b8b54', -- Stoop Sale '5267e4d8e4b0ec79466e48c5', -- Street Fair '53e0feef498e5aac066fd8a9', -- Street Food Gathering '4bf58dd8d48988d130951735', -- Taxi '530e33ccbcbc57f1066bbff3', -- Town '5bae9231bedf3950379f89c3', -- Trade Fair '4bf58dd8d48988d12a951735', -- Train '52e81612bcbc57f1066b7a24', -- Tree '530e33ccbcbc57f1066bbff9', -- Village )) = false ``` ### Find Open and Recently Active POI **SparkSQL** ```sql SparkSQL SELECT * FROM places p WHERE p.date_closed IS NULL AND p.date_refreshed >= DATE_SUB(current_date(), 365); ``` ## Licensing information The dataset is available under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) ## Appendix ### Non-Commercial Categories Table | Category Name | Category ID | | :------------------------ | :----------------------- | | Airport Gate | 4bf58dd8d48988d1f0931735 | | Beer Festival | 62d587aeda6648532de2b88c | | Bus Line | 4bf58dd8d48988d12b951735 | | Christmas Market | 52f2ab2ebcbc57f1066b8b3b | | City | 50aa9e094b90af0d42d5de0d | | Conference | 5267e4d9e4b0ec79466e48c6 | | Convention | 5267e4d9e4b0ec79466e48c9 | | Country | 530e33ccbcbc57f1066bbff7 | | County | 5345731ebcbc57f1066c39b2 | | Entertainment Event | 63be6904847c3692a84b9bb7 | | Event | 4d4b7105d754a06373d81259 | | Festival | 5267e4d9e4b0ec79466e48c7 | | Hotel Pool | 4bf58dd8d48988d132951735 | | Intersection | 52f2ab2ebcbc57f1066b8b4c | | Island | 50aaa4314b90af0d42d5de10 | | Line | 58daa1558bbb0b01f18ec1fa | | Marketplace | 63be6904847c3692a84b9bb8 | | Moving Target | 4f2a23984b9023bd5841ed2c | | Music Festival | 5267e4d9e4b0ec79466e48d1 | | Neighborhood | 4f2a25ac4b909258e854f55f | | Other Event | 5267e4d9e4b0ec79466e48c8 | | Parade | 52741d85e4b0d5d1e3c6a6d9 | | Plane | 4bf58dd8d48988d1f7931735 | | Platform | 4f4531504b9074f6e4fb0102 | | Polling Place | 4cae28ecbf23941eb1190695 | | Road | 4bf58dd8d48988d1f9931735 | | State | 530e33ccbcbc57f1066bbff8 | | States and Municipalities | 530e33ccbcbc57f1066bbfe4 | | Stopp Sale | 52f2ab2ebcbc57f1066b8b54 | | Street Fair | 5267e4d8e4b0ec79466e48c5 | | Street Food Gathering | 53e0feef498e5aac066fd8a9 | | Taxi | 4bf58dd8d48988d130951735 | | Town | 530e33ccbcbc57f1066bbff3 | | Trade Fair | 5bae9231bedf3950379f89c3 | | Train | 4bf58dd8d48988d12a951735 | | Tree | 52e81612bcbc57f1066b7a24 | | Village | 530e33ccbcbc57f1066bbff9 |

Foursquare OS Places 现已在 Hugging Face 平台成为受访问控制的数据集(gated dataset)。欲了解此次调整的更多详情,请访问:https://medium.com/@foursquare/evolving-fsq-os-places-fa7a3f5197cd # 访问 Foursquare OS Places 借助 Foursquare 开源地点数据集,您可获取免费数据以加速地理空间创新与洞察产出。如需查看完整可用属性列表,请参阅[开源地点数据架构(Places OS Data Schemas)](https://docs.foursquare.com/data-products/docs/places-os-data-schema)。 ## 前置要求 如需访问 Foursquare 开源地点数据集,推荐使用 Spark。以下为在 Spark 中从 Hugging Face 加载该数据集的方法: - 针对 Spark 3,您可使用[Hugging Face Spark 文档](https://huggingface.co/docs/hub/datasets-spark)中的 `read_parquet` 辅助函数。该函数提供了简便的 API,可直接从 Hugging Face 加载 Spark DataFrame,无需将完整数据集下载至本地: python places = read_parquet("hf://datasets/foursquare/fsq-os-places/release/dt=2025-12-16/places/parquet/*.parquet") - 针对 Spark 4,将推出官方 Hugging Face Spark 数据源。 您也可将以下文件下载至本地磁盘或计算集群: - Parquet 文件: - **地点数据** - [release/dt=2025-12-16/places/parquet](https://huggingface.co/datasets/foursquare/fsq-os-places/tree/main/release/dt%3D2025-11-18/places/parquet) - **分类数据** - [release/dt=2025-12-16/categories/parquet](https://huggingface.co/datasets/foursquare/fsq-os-places/tree/main/release/dt%3D2025-11-18/categories/parquet) Hugging Face 提供了以下[下载方式](https://huggingface.co/docs/hub/datasets-downloading)。 ## 示例查询 以下为使用 Athena 与 Spark 查询 FSQ 开源地点数据集的示例: - 按父级分类过滤[分类体系](https://docs.foursquare.com/data-products/docs/categories#places-open-source--propremium-flat-file) - 过滤[非商业分类](#非商业分类表) - 查找已上线且近期活跃的 POI(Point of Interest) ### 按父级分类过滤 **SparkSQL** sql SparkSQL WITH places_exploded_categories AS ( -- 展开分类数组 SELECT fsq_place_id, name, explode(fsq_category_ids) as fsq_category_id FROM places ), distinct_places AS ( SELECT DISTINCT(fsq_place_id) -- 获取唯一 ID 以消除 explode 函数带来的重复项 FROM places_exploded_categories p JOIN categories c -- 关联分类表以筛选二级分类 ON p.fsq_category_id = c.category_id WHERE c.level2_category_id = '4d4b7105d754a06374d81259' -- 餐饮场所 ) SELECT * FROM places WHERE fsq_place_id IN (SELECT fsq_place_id FROM distinct_places) ### 过滤非商业分类 **SparkSQL** sql SparkSQL SELECT * FROM places WHERE arrays_overlap(fsq_category_ids, array('4bf58dd8d48988d1f0931735', -- 机场登机口 '62d587aeda6648532de2b88c', -- 啤酒节 '4bf58dd8d48988d12b951735', -- 公交路线 '52f2ab2ebcbc57f1066b8b3b', -- 圣诞集市 '50aa9e094b90af0d42d5de0d', -- 城市 '5267e4d9e4b0ec79466e48c6', -- 会议 '5267e4d9e4b0ec79466e48c9', -- 大会 '530e33ccbcbc57f1066bbff7', -- 国家 '5345731ebcbc57f1066c39b2', -- 郡/县 '63be6904847c3692a84b9bb7', -- 娱乐活动 '4d4b7105d754a06373d81259', -- 活动 '5267e4d9e4b0ec79466e48c7', -- 节庆活动 '4bf58dd8d48988d132951735', -- 酒店泳池 '52f2ab2ebcbc57f1066b8b4c', -- 交叉口 '50aaa4314b90af0d42d5de10', -- 岛屿 '58daa1558bbb0b01f18ec1fa', -- 线路 '63be6904847c3692a84b9bb8', -- 集市 '4f2a23984b9023bd5841ed2c', -- 移动目标 '5267e4d9e4b0ec79466e48d1', -- 音乐节 '4f2a25ac4b909258e854f55f', -- 街区 '5267e4d9e4b0ec79466e48c8', -- 其他活动 '52741d85e4b0d5d1e3c6a6d9', -- 游行 '4bf58dd8d48988d1f7931735', -- 飞机 '4f4531504b9074f6e4fb0102', -- 站台 '4cae28ecbf23941eb1190695', -- 投票站 '4bf58dd8d48988d1f9931735', -- 道路 '5bae9231bedf3950379f89c5', -- 体育赛事 '530e33ccbcbc57f1066bbff8', -- 州/省 '530e33ccbcbc57f1066bbfe4', -- 国家与行政区 '52f2ab2ebcbc57f1066b8b54', -- 庭院旧货售卖 '5267e4d8e4b0ec79466e48c5', -- 街头集市 '53e0feef498e5aac066fd8a9', -- 街头美食集会 '4bf58dd8d48988d130951735', -- 出租车 '530e33ccbcbc57f1066bbff3', -- 城镇 '5bae9231bedf3950379f89c3', -- 贸易展会 '4bf58dd8d48988d12a951735', -- 火车 '52e81612bcbc57f1066b7a24', -- 树木 '530e33ccbcbc57f1066bbff9', -- 村庄 )) = false ### 查找已上线且近期活跃的 POI **SparkSQL** sql SparkSQL SELECT * FROM places p WHERE p.date_closed IS NULL AND p.date_refreshed >= DATE_SUB(current_date(), 365); ## 授权许可信息 本数据集采用[Apache 2.0 开源许可协议](https://www.apache.org/licenses/LICENSE-2.0)发布。 ## 附录 ### 非商业分类表 | 分类名称 | 分类 ID | | :------------------------ | :----------------------- | | 机场登机口 | 4bf58dd8d48988d1f0931735 | | 啤酒节 | 62d587aeda6648532de2b88c | | 公交路线 | 4bf58dd8d48988d12b951735 | | 圣诞集市 | 52f2ab2ebcbc57f1066b8b3b | | 城市 | 50aa9e094b90af0d42d5de0d | | 会议 | 5267e4d9e4b0ec79466e48c6 | | 大会 | 5267e4d9e4b0ec79466e48c9 | | 国家 | 530e33ccbcbc57f1066bbff7 | | 郡/县 | 5345731ebcbc57f1066c39b2 | | 娱乐活动 | 63be6904847c3692a84b9bb7 | | 活动 | 4d4b7105d754a06373d81259 | | 节庆活动 | 5267e4d9e4b0ec79466e48c7 | | 酒店泳池 | 4bf58dd8d48988d132951735 | | 交叉口 | 52f2ab2ebcbc57f1066b8b4c | | 岛屿 | 50aaa4314b90af0d42d5de10 | | 线路 | 58daa1558bbb0b01f18ec1fa | | 集市 | 63be6904847c3692a84b9bb8 | | 移动目标 | 4f2a23984b9023bd5841ed2c | | 音乐节 | 5267e4d9e4b0ec79466e48d1 | | 街区 | 4f2a25ac4b909258e854f55f | | 其他活动 | 5267e4d9e4b0ec79466e48c8 | | 游行 | 52741d85e4b0d5d1e3c6a6d9 | | 飞机 | 4bf58dd8d48988d1f7931735 | | 站台 | 4f4531504b9074f6e4fb0102 | | 投票站 | 4cae28ecbf23941eb1190695 | | 道路 | 4bf58dd8d48988d1f9931735 | | 州/省 | 530e33ccbcbc57f1066bbff8 | | 国家与行政区 | 530e33ccbcbc57f1066bbfe4 | | 庭院旧货售卖 | 52f2ab2ebcbc57f1066b8b54 | | 街头集市 | 5267e4d8e4b0ec79466e48c5 | | 街头美食集会 | 53e0feef498e5aac066fd8a9 | | 出租车 | 4bf58dd8d48988d130951735 | | 城镇 | 530e33ccbcbc57f1066bbff3 | | 贸易展会 | 5bae9231bedf3950379f89c3 | | 火车 | 4bf58dd8d48988d12a951735 | | 树木 | 52e81612bcbc57f1066b7a24 | | 村庄 | 530e33ccbcbc57f1066bbff9 |
提供机构:
maas
创建时间:
2024-12-06
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是Foursquare的Open Source Places,提供免费的地理空间数据,包含地点和类别信息,以Parquet格式存储,适用于加速地理空间创新和洞察。数据集建议使用Spark进行访问和查询,并提供了示例查询方法,如过滤类别和查找活跃地点。数据集采用Apache 2.0许可证,但已变为门控数据集,需注意访问限制。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作