Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/10636246

下载链接

链接失效反馈

官方服务：

资源简介：

Main dataset (main.csv) The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns: id: Unique identifier of the file (corresponds to the last part of the filename) filename: Name of the file associated with the row (the file is in serp_html.zip) engine: The search engine used (Google Scholar or Semantic Scholar). browser: The web browser used for the search (Firefox or Chrome) region: The geographical region where the search was made. year: The year when the search was made month: The month when the search was made day: The day when the search was made query: The full search query that was used query_type: The type of the search query (health or technology) topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee') trt: Treatment variable associated with the search (benefits or risks). url: The URL of the (article) search result title: The title of the (article) search result. authorship: The author(s) of the (article) search result. abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx abstract_hash: Hash value of the abstract for data integrity link_n: The total number of results in the search page rank: The rank of the search result on the search engine results page. annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}') valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits Annotated abstracts (annotated-abstracts_v0.6.xlsx) Manually annotated abstracts resulting from the searches. Raw search engine result pages (serp_html.zip) The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.

创建时间：

2024-02-08