{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "80f01878", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message:\n", "\"package 'pwr' was built under R version 3.6.3\"" ] } ], "source": [ "library(pwr)\n", "library(purrr)\n", "\n", "inv.logit <- function(x) {\n", " 1 / (1 + exp(-x))\n", "}\n", "\n", "data_causal_inf <- function(n, tau = 1, b = 2, base = 100) {\n", " ## DGP\n", " age <- rnbinom(n, 10, 0.3)\n", " ps <- inv.logit(0.2*(age- mean(age)))\n", " Y0s <- round(base + b * age + rnorm(n, 0, 3))\n", " Y1s <- round(Y0s + tau)\n", " Z <- map_dbl(ps, ~ sample(c(0,1), 1, F, c(1-., .)))\n", " Y <- Z * Y1s + (1-Z) * Y0s\n", " return(list(age = age, Y = Y, Z = Z, ps = ps))\n", "}\n", "\n", "data_causal_inf_rand_exp <- function(n, tau = 1, b = 2, base = 100) {\n", " ## DGP\n", " age <- rnbinom(n, 10, 0.3)\n", " ps <- inv.logit(0.2*(age- mean(age)))\n", " Y0s <- round(base + b * age + rnorm(n, 0, 3))\n", " Y1s <- round(Y0s + tau)\n", " Z <- rbinom(n, 1, 0.5)\n", " Y <- Z * Y1s + (1-Z) * Y0s\n", " return(list(age = age, Y = Y, Z = Z, ps = ps))\n", "}\n", "\n", "set.seed(7)\n", "n <- 500\n", "\n", "causal_int_obs <- data_causal_inf(n, tau = 0.9123, b = - 1.0314)\n", "\n", "Age <- causal_int_obs$age; StreamingMinutes <- causal_int_obs$Y; AccountType <- ifelse(causal_int_obs$Z, \"Premium\", \"Free\")\n", "musicfi <- data.frame(Age, AccountType, StreamingMinutes)\n", "\n", "causal_int_obsrand <- data_causal_inf_rand_exp(n=500, tau = 0.9123, b = - 0.5314)\n", "\n", "Age <- causal_int_obsrand$age; StreamingMinutes <- causal_int_obsrand$Y; AccountType <- ifelse(causal_int_obsrand$Z, \"Premium\", \"Free\")\n", "musicfiExp <- data.frame(Age, AccountType, StreamingMinutes)" ] }, { "cell_type": "markdown", "id": "09922b5b", "metadata": {}, "source": [ "# Randomized Experiments\n", "\n", "Randomized experiments remove the selection problem and ensure that there are no confounding variables (observed or unobserved). They do this by removing the individual’s opportunity to select whether or not they receive the treatment. In the Musicfi example, suppose we ran an experiment with 500 participants where we randomly upgraded some free accounts to premium accounts. Now, we no longer have to adjust for age as (on average) there will be no age difference between the treated and control subjects. Suppose the data from our experiment is stored in a data frame called `musicfiExp`:" ] }, { "cell_type": "code", "execution_count": 2, "id": "5d3cdcbb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Age | AccountType | StreamingMinutes |
---|---|---|
20 | Premium | 92 |
28 | Free | 88 |
20 | Premium | 89 |
24 | Premium | 86 |
22 | Premium | 91 |
15 | Premium | 93 |