five

TWINS dataset used for experiment in the paper How to select predictive models for causal inference ?

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14674617
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset obtained form the shalit-lab github and used in our experiments. Raw url for the dataset : "https://raw.githubusercontent.com/shalit-lab/Benchmarks/master/Twins/Final_data_twins.csv". Explanation on the dataset :   Louizos et al. (2017) introduced the Twins dataset as an augmentation of the    real data on twin births and twin mortality rates in the USA from 1989-1991    (Almond et al., 2005). The treatment is "born the heavier twin" so, in one    sense, we can observe both potential outcomes. Louizos et al. (2017) create an    observational dataset out of this by hiding one of the twins (for each pair) in    the dataset. To ensure there is some confounding, Louizos et al. (2017)    simulate the treatment assignment (which twin is heavier) as a function of the    GESTAT10 covariate, which is the number of gestation weeks prior to birth.    GESTAT10 is highly correlated with the outcome and it seems intuitive that it    would be a cause of the outcome, so this should simulate some confounding.    They simulate this "treatment" with a sigmoid model based on GESTAT10 (number of gestation weeks before birth) and x, the 45 other covariates:    $\mathbf{t}_{i} \mid \mathbf{x}_{i}, \mathbf{z}_{i} \sim \operatorname{Bern}\left(\sigma\left(w_{o}^{\top} \mathbf{x}+w_{h}(\mathbf{z} / 10-0.1)\right)\right) \quad with \; w_{o} \sim \mathcal{N}(0,0.1 \cdot I), w_{h} \sim \mathcal{N}(5,0.1)$    Furthermore, to make sure the twins are very similar, they limit    the data to the twins that are the same sex. To look at data with higher    mortality rates, they further limit the dataset to twins that were born weighing    less than 2 kg.     References:         Almond, D., Chay, K. Y., & Lee, D. S. (2005). The costs of low birth weight.            The Quarterly Journal of Economics, 120(3), 1031-1083.         Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., & Welling, M.            (2017). Causal effect inference with deep latent-variable models. In            Advances in Neural Information Processing Systems (pp. 6446-6456).        B. Neal, C.-W. Huang, et S. Raghupathi. RealCause: Realistic Causal Inference Benchmarking. arXiv:2011.15007 [cs, stat], march 2021
创建时间:
2025-01-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作