five

Search engine pages results and web pages on urban rivers

收藏
DataCite Commons2026-03-12 更新2026-05-04 收录
下载链接:
https://nakala.fr/10.34847/nkl.3bc0e3e0
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus contains 72,722 web pages about urban rivers. The pages were obtained by sending Google queries of the type {river AND city}, for a total of 303 cities worldwide and their associated river(s) (373 city-river pairs in total, see the 'list_city_river' file), sent in April 2024. The cities were selected as part of the GloUrb project based on several criteria, including hydrological, geomorphological and population factors. The selected rivers are located in dense urban centres as defined by the Global Human Settlement layer (JRC 2019) and are 30 metres wide or more. The 'corpus' file includes information about the websites (website title, position on the results page, snippet, web address, domain, etc.) and contains their tokenized content (i.e. the dictionary form is associated to each word of the text, removing stop words). The original content is not given for data protection reasons (GDPR). The results were collected using the free software R and the Value SERP API. The queries were first launched in English and without any specific location. Then, the queries were launched in each local language of each city and located in the country. Finally, each web page was scrapped using R.
提供机构:
NAKALA - https://nakala.fr (Huma-Num - CNRS)
创建时间:
2025-07-29
二维码
社区交流群
二维码
科研交流群
商业服务