Overview

In this vignette, we will explore the data set provided here in data(tb_clean), which is a data set originally collected and analyzed in Xie et al. (2018).

Exploratory Data Analysis

The data tb_clean has 389 rows and 8 columns, where each row in the data corresponds to an individual with a detected case of Tuberculosis (TB).

The columns include demographic characteristics about the patients as well as two diagnostic tests of the disease, including a AFB sputum smear test, which we analyze here.

tb_sub <- tb_clean %>%
  mutate(cluster_id = group) %>%
  group_by(cluster_id) %>%
        mutate(cluster_size = n()) %>%
  ungroup()

This data has 158 clusters with the following distribution of cluster sizes:

sizes <- tb_sub %>% group_by(cluster_id) %>%
  summarize(cluster_size = n()) %>% 
  ungroup() %>%
  group_by(cluster_size) %>%
  summarize(freq = n()) 
  
  
sizes %>%
  kable() %>%
  kable_styling(bootstrap_options = c("condensed", "hover", "striped", "responsive"),
                full_width = FALSE, position = "center")
cluster_size freq
1 75
2 41
3 20
4 5
5 4
6 3
7 3
8 2
9 1
10 1
16 1
17 1
25 1

This means that 47.5% of the clusters are singletons and 86.1% of the clusters have size at most 3.

A single cluster of individuals looks like the following:

tb_sub %>% filter(group == "27") %>%
  arrange(rel_time) %>%
  select(group, rel_time, everything()) %>%
   kable() %>%
  kable_styling(bootstrap_options = c("condensed", "hover", "striped", "responsive"),
                full_width = FALSE, position = "center")
group rel_time sex county race spsmear hivstatus homeless cluster_id cluster_size
27 0 days Male MONTGOMERY Black or African American Positive Negative No 27 7
27 442 days Female MONTGOMERY Black or African American Positive Negative No 27 7
27 805 days Female MONTGOMERY Black or African American Positive Negative No 27 7
27 834 days Female MONTGOMERY Black or African American Positive Negative No 27 7
27 941 days Male PRINCE GEORGES Black or African American Positive Negative No 27 7
27 1015 days Male PRINCE GEORGES Black or African American Positive Positive No 27 7
27 1167 days Male PRINCE GEORGES Black or African American Positive Negative No 27 7

In our data set of 389 individuals, 56 or 14% of individuals are HIV+. In the clusters with more than 3 individuals, 37 of 172 or 22% of individuals are HIV+.