five

MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data

收藏
DataCite Commons2025-07-03 更新2025-04-09 收录
下载链接:
https://dataverse.nl/citation?persistentId=doi:10.34894/UR3RVE
下载链接
链接失效反馈
官方服务:
资源简介:
The MATCHED dataset is a novel multimodal collection of escort advertisements curated to support research in Authorship Attribution (AA) and related tasks. It comprises 27,619 unique text descriptions and 55,115 images (in jpg format) sourced from Backpage escort ads across seven major U.S. cities–Atlanta, Dallas, Detroit, Houston, Chicago, San Fransisco, and New York. These cities are further categorized into four geographical regions—South, Midwest, West, and Northeast—offering a structured dataset that enables both in-distribution and out-of-distribution (OOD) evaluations. Each ad in the dataset contains metadata that links text and visual components, providing a rich resource for studying multimodal patterns, vendor identification, and verification tasks. The dataset is uniquely suited for multimodal authorship attribution, vendor linking, stylometric analysis, and understanding the interplay between textual and visual patterns in advertisements. All text descriptions are carefully processed to redact any explicit references to phone numbers, email addresses, advertisement IDs, age-related information, or other contact details that could be used to identify individuals or vendors. The structured metadata allows researchers to explore how multimodal features contribute to uncovering latent patterns in stylometry and vendor behaviors. A demi-data file showcasing the format and structure of our MATCHED dataset is attached with the entry. Given the sensitivity of the subject matter, the actual dataset resides securely on Maastricht University's servers. Only the metadata will be publicly released on Dataverse to ensure ethical use. Researchers interested in accessing the full dataset must sign a Non-Disclosure Agreement (NDA) and a Data Transfer Agreement with Prof. Dr. Gijs Van Dijck from Maastricht University. Access will only be granted under strict restrictions, and recipients must adhere to the ethical guidelines established by the university's committee. These guidelines emphasize the responsible use of the dataset to prevent misuse and to safeguard the privacy and dignity of all individuals involved.
提供机构:
DataverseNL
创建时间:
2024-12-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作