In this vignette, we will explore the data set provided here in data(tb_clean)
, which is a data set originally collected and analyzed in Xie et al. (2018).
The data tb_clean
has 389 rows and 8 columns, where each row in the data corresponds to an individual with a detected case of Tuberculosis (TB).
The columns include demographic characteristics about the patients as well as two diagnostic tests of the disease, including a AFB sputum smear test, which we analyze here.
tb_sub <- tb_clean %>%
mutate(cluster_id = group) %>%
group_by(cluster_id) %>%
mutate(cluster_size = n()) %>%
ungroup()
This data has 158 clusters with the following distribution of cluster sizes:
sizes <- tb_sub %>% group_by(cluster_id) %>%
summarize(cluster_size = n()) %>%
ungroup() %>%
group_by(cluster_size) %>%
summarize(freq = n())
sizes %>%
kable() %>%
kable_styling(bootstrap_options = c("condensed", "hover", "striped", "responsive"),
full_width = FALSE, position = "center")
cluster_size | freq |
---|---|
1 | 75 |
2 | 41 |
3 | 20 |
4 | 5 |
5 | 4 |
6 | 3 |
7 | 3 |
8 | 2 |
9 | 1 |
10 | 1 |
16 | 1 |
17 | 1 |
25 | 1 |
This means that 47.5% of the clusters are singletons and 86.1% of the clusters have size at most 3.
A single cluster of individuals looks like the following:
tb_sub %>% filter(group == "27") %>%
arrange(rel_time) %>%
select(group, rel_time, everything()) %>%
kable() %>%
kable_styling(bootstrap_options = c("condensed", "hover", "striped", "responsive"),
full_width = FALSE, position = "center")
group | rel_time | sex | county | race | spsmear | hivstatus | homeless | cluster_id | cluster_size |
---|---|---|---|---|---|---|---|---|---|
27 | 0 days | Male | MONTGOMERY | Black or African American | Positive | Negative | No | 27 | 7 |
27 | 442 days | Female | MONTGOMERY | Black or African American | Positive | Negative | No | 27 | 7 |
27 | 805 days | Female | MONTGOMERY | Black or African American | Positive | Negative | No | 27 | 7 |
27 | 834 days | Female | MONTGOMERY | Black or African American | Positive | Negative | No | 27 | 7 |
27 | 941 days | Male | PRINCE GEORGES | Black or African American | Positive | Negative | No | 27 | 7 |
27 | 1015 days | Male | PRINCE GEORGES | Black or African American | Positive | Positive | No | 27 | 7 |
27 | 1167 days | Male | PRINCE GEORGES | Black or African American | Positive | Negative | No | 27 | 7 |
In our data set of 389 individuals, 56 or 14% of individuals are HIV+. In the clusters with more than 3 individuals, 37 of 172 or 22% of individuals are HIV+.