five

Replication Data for: A Multi-site Data Sample for Analyzing the Online Commercial Sex Ecosystem

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/4FMSQD
下载链接
链接失效反馈
官方服务:
资源简介:
This entry contains an anonymized sample of over 10 million advertisements collected from active commercial sex sites between 5/1/2022 and 8/1/2022. Please note that given the nature of the sites the data is collected from, the text data contains sexually explicit content. The corresponding paper is currently being submitted to journals for publication consideration. The data is split across several Parquet files. Each data file begins with the prefix "data_url." In addition to data, we provide code demonstrating how to work with the provided data. The code also demonstrates how to perform analyses such as assessing the degree of post and phash duplication among sites, understanding frequent emojis, and linking the data in a graph representation. We use the Python programming language for all analyses, and the code is given as a Jupyter Notebook. The first cell in the notebook provides explicit details regarding third-party packages needed to replicate the analyses and the expected file structure. We use the Anaconda Python distribution and have provided an environment file to simplify the installation of dependencies for Anaconda users on Windows, Mac, or Linux machines.
创建时间:
2024-12-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作