five

zidcenek/XCampaignDataset

收藏
Hugging Face2026-01-30 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/zidcenek/XCampaignDataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification - reinforcement-learning language: - en - multilingual tags: - recommendation-system - recommendation - machine-learning - email - tabular - marketing - click-through-rate-prediction pretty_name: XCampaign Dataset size_categories: - 10M<n<100M --- # XCampaign Dataset <h4> <a href="https://github.com/zidcenek/Active-Learning-for-Email-Interaction-Dynamics" target="_blank"> 💻Github Repo</a> <a href="https://dl.acm.org/doi/10.1145/3746252.3760832" target="_blank">📖Paper Link</a> </h4> ## Introduction This repository contains the Mailprofiler's **XCampaign Dataset** -- provided by Mailprofiler; [XCampaign](https://xcampaign.info/switzerland-en/) represents an email campaign management platform. The dataset was published alongside our CIKM 2025 paper *Active Recommendation for Email Outreach Dynamics*. The dataset of almost 15 million interactions captures user-level interactions with periodic marketing mailshots, including whether an email was opened and the time-to-open (TTO). ## Dataset and Fields The **XCampaign Dataset** includes the following fields: - `mailshot_id`: (or template id) identifier of the mailshot campaign - `user_id`: anonymized recipient identifier - `opened`: binary label (\(1\) if opened, \(0\) otherwise) - `time_to_open`: time delta between send and open (a parseable string of a timedelta `0 days 09:39:32`) ## Global Statistics All statistics below are computed from the full dataset. - `Rows`: 14,908,085; `Users`: 131,918; `Mailshots`: 160 - Global open rate: 9.09% - Per-mailshot open rate: $9.13\% \pm 3.58\%$ - Per-user open rate: mean $12.33\% \pm 20.46\%$ - Time-to-open (opened only): mean 1d 17h 25m; median 6h 25m - Fraction opened within 1h: 25.9%; within 24h: 71.2%; within 7d: 93.0% - Sent to users at each mailshot: $93,175 \pm 19,162$ - Item \(\times\) User interaction matrix density: 70.63% ## How to Use and Cite The XCampaign Dataset is made available under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**. This license allows you to share and adapt the dataset for any purpose, **including commercial use**, as long as you provide appropriate credit. If you use this dataset in your work, please **cite the following paper**, which introduced the dataset: ### Plain Text Citation > Čeněk Žid, Rodrigo Alves, and Pavel Kordík. 2025. Active Recommendation for Email Outreach Dynamics. In *Proceedings > of the 34th ACM International Conference on Information and Knowledge Management (CIKM '25)}*. Association for > Computing Machinery, New York, NY, USA, 5540–5544. https://doi.org/10.1145/3746252.3760832 ### BibTeX Citation ```bibtex @inproceedings{10.1145/3746252.3760832, author = {\v{Z}id, \v{C}en\v{e}k and Kord\'{\i}k, Pavel and Alves, Rodrigo}, title = {Active Recommendation for Email Outreach Dynamics}, year = {2025}, isbn = {9798400720406}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3746252.3760832}, doi = {https://doi.org/10.1145/3746252.3760832}, booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management}, pages = {5540–5544}, numpages = {5}, keywords = {email outreach, reinforcement learning, shallow autoencoder}, location = {Seoul, Republic of Korea}, series = {CIKM '25} } ``` ![Global open rate and distribution of per-user open rates.](./assets/user_open_rate_hist.png) Global open rate and distribution of per-user open rates. ## Time to Open (TTO) Time-to-open is heavy-tailed: while the median is about 6.4 hours, most opens occur within a week. Specifically, 93.0\% of opens arrive within 7 days, so 7.0\% arrive later than 7 days. The plots below are truncated at 7 days to emphasize the main mass of the distribution. The CDF and histogram are shown in Figure~\ref{fig:tto}. ![Distribution of time-to-open for opened emails.](./assets/time_to_open_hist.png) Distribution of time-to-open for opened emails. ![CDF of time-to-open for opened emails.](assets/time_to_open_cdf.png) CDF of time-to-open for opened emails. The heavy-tailed TTO suggests robust objectives and appropriate censoring strategies. The two user segments motivate segment-aware priors and exploration strategies; mailshot-level heterogeneity motivates per-mailshot features or random effects. ## Dataset Versions The current version of the dataset contains 12 months of data (2024-04 -- 2025-03). Future dataset might include additional months of data. The data collection is still ongoing. ## Acknowledgements Čeněk Žid's research was supported by the Grant Agency of the Czech Technical University (SGS20/213/OHK3/3T/18). We warmly thank *Mailprofiler* for providing the dataset for this research. <p align="center"> <a href="https://fit.cvut.cz/en" target="_blank"> <img src="assets/logo-fit-en-modra.jpg" alt="FIT CTU" width="200"/> </a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="https://xcampaign.info/switzerland-en/" target="_blank"> <img src="assets/Xcampaign_logo.svg" alt="XCampaign" width="220"/> </a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="https://www.recombee.com/" target="_blank"> <img src="assets/recombee_logo.png" alt="Recombee" width="150"/> </a> </p>
提供机构:
zidcenek
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作