Simulate the branching process of flipping until failure for K clusters

simulate_bp(
  K,
  inf_params,
  sample_covariates_df,
  covariate_names,
  covariate_weights = NULL,
  max_size = 50
)

Arguments

K	number of total clusters to simulate
inf_params	vector with beta coefficients to use in logistic function for probability of transmission
sample_covariates_df	Data frame of covariates to sample from
covariate_names	names of the covariates. Must match size of inf_params - 1.
covariate_weights	default is NULL which draws uniformly at random with replacement from the sample_covariates_df. Otherwise, the weights are used.
max_size	maximum size a cluster can be

Value

data frame with the following columns

cluster_id

unique cluster ID

person_id

order of infection in the cluster

gen

generation number (>=0)

inf_id

ID of the infector

n_inf

number of people infected by person

censored

whether the cluster end was censored or not

cluster_size

size of the cluster

covariates

covariates of the individuals

Details

Generate a branching process according to the following process. First a root infector is drawn covariates $X$ from some distribution $F$ (given by the set of covariates in sample_covariates_df) and has probability of transmission according to a logit function. The number of infections produced by the root node $N_(1,1) is a geometric random variable with probability $p_(1,1)$ where the indexing represents $(g=$, generation, $i=$ index). If $N_(1,1) > 0$, then the $N_(1,1)$ infections are added to the cluster and assigned to generation $g=2$ with indices $i=1, ..., N_(1,1)$ and covariats are drawn for these new infections. The infection process continues with individuals $(2, 1)$ through $(2, $N_(1,1))$ where new infections are added, in order to the subsequent generation. The process terminates when either there are no new infections or the maximum number of infections specified in max_size is reached. $$X_{(g,i)} \sim F$$ $$p_{(g,i)} = logit^{-1}\left ( X_{(g,i)} \beta\right )$$ $$N_{(g,i)} \sim Geometric(p_{(g,i)})$$

Examples

set.seed(2020)
inf_params <- c("beta_0" = -2, "beta_1" = 2)
df <- data.frame(x= c(0, 1))
branching_processes <- simulate_bp(K = 10,
inf_params = inf_params,
covariate_names = "x",
sample_covariates_df = df)
head(branching_processes)
#>   cluster_id person_id gen   inf_id n_inf cluster_size x censored
#> 1          1  C1-G1-N1   1     <NA>     1            2 1    FALSE
#> 2          1  C1-G2-N1   2 C1-G1-N1    NA            2 1    FALSE
#> 3          2  C2-G1-N1   1     <NA>    NA            1 0    FALSE
#> 4          3  C3-G1-N1   1     <NA>     1            3 1    FALSE
#> 5          3  C3-G2-N1   2 C3-G1-N1     1            3 1    FALSE
#> 6          3  C3-G3-N1   3 C3-G2-N1    NA            3 1    FALSE
table(branching_processes$cluster_size) /
sort(unique(branching_processes$cluster_size))
#> 
#>  1  2  3  4 11 
#>  6  1  1  1  1