R/agent-to-aggregate.R
agents_to_aggregate.Rd
This function converts data on an agent-based level (1 row = 1 agent)
relative when an agent is in each state and aggregates it, so that the user
can know how many agents are in each state at a given time point (integer
based). This function can take standard data.frame
s and
grouped_df
data.frame
s (from dplyr). For the
later, this function aggregates within grouping parameters and also provides
the columns associated with the grouping.
agents_to_aggregate( agents, states, death = NULL, birth = NULL, min_max_time = c(0, NA), integer_time_expansion = TRUE )
agents | data frame style object (currently either of class
|
---|---|
states | time entered state. Do not include column for original state.
These need to be ordered, for example: for an SIR model, with columns
" |
death | string for column with death time information (default
|
birth | string for column with birth time information (default
|
min_max_time | vector (length 2) of minimum and maximum integer time,
the second value can be |
integer_time_expansion | boolean if every integer time point in the
range of |
dataset with aggregated information, we label classes X{i}
for i in 0:(length(states))
. Potentially calculated per group of a
grouped_df
(and retains grouping columns).
This function converts data on an agent-based level (1 row = 1 agent)
relative when an agent is in each state and aggregates it, so that the user
can know how many agents are in each state at a given time point (integer
based). This function can take standard data.frame
s and
grouped_df
data.frame
s (from dplyr). For the
later, this function aggregates within grouping parameters and also provides
the columns associated with the grouping.
D.1. What each column should have (NAs, orderings, births & deaths,...)
The parameters state
, death
, birth
, and
min_max_time
provide the user with the flexibility to capture any
potential structure related to agent's progression to through the epidemic
(and life).
As mentioned in the states
parameter details, we expect a set of
column names X1
, X2
, ..., XK
that contain information on
when an individual enters each state. Also mentioned in the parameter details
is that the function assumes that each agent is in the initial state
X0
until X1
(except if min_max_time[1] >= X1
,
which means the agent starts out at state X1
).
This function expects transition in an ordered fashion, i.e.
X(I+1) >= X(I)
, but does allow agents to jump states. This can either
be recorded with a value at the jumped state the same as the next non-jumped
state or an NA
(and the authors of this package believe this is a
cleaner approach - and matches expectation in birth
and death
).
Specifically, birth
and death
can contain NA
values,
which the function interprets as an individual not being born (or dying
respectively) in the given time interval.
The time interval (defined by min_max_time
) can be moved, which
abstractly just shifts the rows (or time points) the user gets at the end.
D.2. Changing time points
Beyond defining the time interval with min_max_time
, if a user wishes
to have more minute (smaller) time steps than integers, we recommend they
just multiple all values by \(1/s\) where \(s\) is the length of the
desired time steps. A transformation of the output's t
column by
\(s\) would get the time back to the standard time.
#> #>#>#> #>#>#> #>agents <- EpiCompare::hagelloch_raw # making babies set.seed(5) babies <- sample(nrow(agents),size = 5) agents$tBIRTH <- NA agents$tBIRTH[babies] <- agents$tI[babies] - 5 aggregate_b <- agents_to_aggregate(agents, states = c(tI, tR), death = NULL, birth = tBIRTH) # looking at when babies where born: agents %>% dplyr::filter(!is.na(.data$tBIRTH)) %>% dplyr::pull(.data$tBIRTH) %>% ceiling() %>% sort#> [1] 23 26 29 29 32# vs: data.frame(counts = 1:nrow(aggregate_b), num_people = aggregate_b %>% select(-t) %>% apply(1, sum))#> counts num_people #> 1 1 183 #> 2 2 183 #> 3 3 183 #> 4 4 183 #> 5 5 183 #> 6 6 183 #> 7 7 183 #> 8 8 183 #> 9 9 183 #> 10 10 183 #> 11 11 183 #> 12 12 183 #> 13 13 183 #> 14 14 183 #> 15 15 183 #> 16 16 183 #> 17 17 183 #> 18 18 183 #> 19 19 183 #> 20 20 183 #> 21 21 183 #> 22 22 183 #> 23 23 183 #> 24 24 184 #> 25 25 184 #> 26 26 184 #> 27 27 185 #> 28 28 185 #> 29 29 185 #> 30 30 187 #> 31 31 187 #> 32 32 187 #> 33 33 188 #> 34 34 188 #> 35 35 188 #> 36 36 188 #> 37 37 188 #> 38 38 188 #> 39 39 188 #> 40 40 188 #> 41 41 188 #> 42 42 188 #> 43 43 188 #> 44 44 188 #> 45 45 188 #> 46 46 188 #> 47 47 188 #> 48 48 188 #> 49 49 188 #> 50 50 188 #> 51 51 188 #> 52 52 188 #> 53 53 188 #> 54 54 188 #> 55 55 188 #> 56 56 188 #> 57 57 188 #> 58 58 188 #> 59 59 188 #> 60 60 188 #> 61 61 188 #> 62 62 188 #> 63 63 188 #> 64 64 188 #> 65 65 188 #> 66 66 188 #> 67 67 188 #> 68 68 188 #> 69 69 188 #> 70 70 188 #> 71 71 188 #> 72 72 188 #> 73 73 188 #> 74 74 188 #> 75 75 188 #> 76 76 188 #> 77 77 188 #> 78 78 188 #> 79 79 188 #> 80 80 188 #> 81 81 188 #> 82 82 188 #> 83 83 188 #> 84 84 188 #> 85 85 188 #> 86 86 188 #> 87 87 188 #> 88 88 188 #> 89 89 188 #> 90 90 188 #> 91 91 188 #> 92 92 188 #> 93 93 188 #> 94 94 188# including death aggregate_d <- agents_to_aggregate(agents, states = c(tI, tR), death = tDEAD, birth = NULL) # looking at when people died: agents %>% dplyr::filter(!is.na(.data$tDEAD)) %>% dplyr::pull(.data$tDEAD) %>% ceiling() %>% sort#> [1] 20 41 44 44 44 45 46 47 47 47 49 60# vs: data.frame(counts = 1:nrow(aggregate_d), num_people = aggregate_d %>% select(-t) %>% apply(1, sum))#> counts num_people #> 1 1 188 #> 2 2 188 #> 3 3 188 #> 4 4 188 #> 5 5 188 #> 6 6 188 #> 7 7 188 #> 8 8 188 #> 9 9 188 #> 10 10 188 #> 11 11 188 #> 12 12 188 #> 13 13 188 #> 14 14 188 #> 15 15 188 #> 16 16 188 #> 17 17 188 #> 18 18 188 #> 19 19 188 #> 20 20 188 #> 21 21 187 #> 22 22 187 #> 23 23 187 #> 24 24 187 #> 25 25 187 #> 26 26 187 #> 27 27 187 #> 28 28 187 #> 29 29 187 #> 30 30 187 #> 31 31 187 #> 32 32 187 #> 33 33 187 #> 34 34 187 #> 35 35 187 #> 36 36 187 #> 37 37 187 #> 38 38 187 #> 39 39 187 #> 40 40 187 #> 41 41 187 #> 42 42 186 #> 43 43 186 #> 44 44 186 #> 45 45 183 #> 46 46 182 #> 47 47 181 #> 48 48 178 #> 49 49 178 #> 50 50 177 #> 51 51 177 #> 52 52 177 #> 53 53 177 #> 54 54 177 #> 55 55 177 #> 56 56 177 #> 57 57 177 #> 58 58 177 #> 59 59 177 #> 60 60 177 #> 61 61 176 #> 62 62 176 #> 63 63 176 #> 64 64 176 #> 65 65 176 #> 66 66 176 #> 67 67 176 #> 68 68 176 #> 69 69 176 #> 70 70 176 #> 71 71 176 #> 72 72 176 #> 73 73 176 #> 74 74 176 #> 75 75 176 #> 76 76 176 #> 77 77 176 #> 78 78 176 #> 79 79 176 #> 80 80 176 #> 81 81 176 #> 82 82 176 #> 83 83 176 #> 84 84 176 #> 85 85 176 #> 86 86 176 #> 87 87 176 #> 88 88 176 #> 89 89 176 #> 90 90 176 #> 91 91 176 #> 92 92 176 #> 93 93 176 #> 94 94 176### # for grouped_df objects (agents_to_aggregate.grouped_df) ### max_time <- 100 agents_g <- hagelloch_raw %>% filter(SEX %in% c("female", "male")) %>% group_by(SEX) sir_group <- agents_to_aggregate(agents_g, states = c(tI, tR), min_max_time = c(0, max_time)) agents <- agents_g %>% filter(SEX == "female") %>% ungroup() sir_group1 <- agents_to_aggregate(agents, states = c(tI, tR), min_max_time = c(0, max_time)) sir_group_1 <- sir_group %>% filter(SEX == "female") assertthat::are_equal(sir_group1, sir_group_1 %>% ungroup %>% select(t, X0, X1, X2))#> [1] TRUE