Calculate the estimated general loglikelihood

general_loglike(
  inf_params,
  mc_trees,
  return_neg = TRUE,
  cov_mat = NULL,
  cov_names = NULL,
  multiple_outside_transmissions = FALSE,
  use_outsider_prob = FALSE,
  return_clust_loglikes = FALSE,
  messages = FALSE
)

Arguments

inf_params

vector of p parameters

cluster_id

unique cluster ID

person_id

order of infection in the cluster

gen

generation number (>=0)

inf_id

ID of the infector

n_inf

number of people infected by person

censored

whether the cluster end was censored or not

cluster_size

size of the cluster

covariates

covariates of the individuals

mc_trees

data frame of samples that correspond to the data. See details. This is the output of general_cond_tree_sims().

return_neg

default is TRUE. Returns the negative loglike

cov_mat

optional matrix of covariates corresponding to the mc_trees

cov_names

covariate vector of length p which correspond in order to the betas

multiple_outside_transmissions

logical indicating whether to use multiple outside method to compute likelihood.

use_outsider_prob

a separate parameter for the outsider infections. Default is FALSE

return_clust_loglikes

if TRUE, function returns the likelihood of every individual cluster

messages

Should we print messages.

Value

estimated average loglikelihood for the observed data

Details

This is a specialized log likelihood function where we first estimate the average log likelihood of trees conditioned by their total size through sampling. This is very much dependent on the values of mc_trees.

The base likelihood for a single tree is given by $$L(T) = \prod_{i=1}^n (1-p_i)p_i^{N_i}$$ where \(p_i\) is the probability of transmission for individual \(i\) and \(N_i\) is the number of individuals infected by individual \(i\). The approximate average likelihood for a given cluster $C_m$ is then $$\bar{L}_K(C_m) = \frac{1}{K}\sum_{k=1}^K L(T_k)$$ where \(K\) is the number of Monte Carlo transmission tree samples \(T_k\) for cluster \(C_m\). Finally, the log likelihood is the sum of the log likelihoods for each cluster, $$\ell(C_1, \dots, C_M) = \sum_{m=1}^M log(\bar{L}_K(C_m))$$ For the multiple outside transmissions model, the above likelihood calculation is changed only for a single tree (and the transmission trees are of a different form). The likelihood \(L_O(T)\) is $$L_O(T) = (1-p_1)p_1^{N_1-1}\prod_{i=2}(1-p_i)p_i^{N_i}$$ because we condition on the outsider having at least one successful infection.