Replication Data for: A Multi-site Data Sample for Analyzing the Online Commercial Sex Ecosystem
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/4FMSQD
下载链接
链接失效反馈官方服务:
资源简介:
This entry contains an anonymized sample of over 10 million advertisements collected from active commercial sex sites between 5/1/2022 and 8/1/2022. Please note that given the nature of the sites the data is collected from, the text data contains sexually explicit content. The corresponding paper is currently being submitted to journals for publication consideration. The data is split across several Parquet files. Each data file begins with the prefix "data_url." In addition to data, we provide code demonstrating how to work with the provided data. The code also demonstrates how to perform analyses such as assessing the degree of post and phash duplication among sites, understanding frequent emojis, and linking the data in a graph representation. We use the Python programming language for all analyses, and the code is given as a Jupyter Notebook. The first cell in the notebook provides explicit details regarding third-party packages needed to replicate the analyses and the expected file structure. We use the Anaconda Python distribution and have provided an environment file to simplify the installation of dependencies for Anaconda users on Windows, Mac, or Linux machines.
创建时间:
2024-12-23



