Summarize binary covariate clusters

summarize_binary_clusters(df, covariate_name = "x")

Arguments

df

data frame with the following columns

cluster_id

unique cluster ID

covariate_name

actual name of feature to summarize over, a binary (0/1) covariate

covariate_name

name of the single binary covariate

Value

a data frame with the following columns

freq

frequency of the following clusters

cluster_size

total size of the cluster

x_pos

number of individuals in the cluster with the feature of interest =1

x_neg

number of individuals in the cluster with the feature of interest = 0

Details

Condense data from data frames about each individuals to the summary of the number of indivduals who have a particular covariate feature (1/0). This assumes the trees are in order by generation.

Examples

example_cluster <- data.frame(cluster_id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), x = c(0, 1, 1, 0, 0, 1, 0, 1, 1, 0)) summarize_binary_clusters(example_cluster)
#> # A tibble: 4 x 4 #> freq cluster_size x_pos x_neg #> <int> <int> <int> <int> #> 1 1 1 0 1 #> 2 1 2 0 2 #> 3 1 3 2 1 #> 4 1 4 3 1