Aggregate Datasets With Several Kinds of Missing Values
aggregateData.Rd
Aggregate datasets with constraints on missing values
Usage
aggregateData(dat, subunits, units, aggregatemissings = NULL,
rename = FALSE, recodedData = TRUE, suppressErr = FALSE,
recodeErr = "mci", verbose = FALSE)
Arguments
- dat
A data frame containing the data to be aggregated.
- subunits
A data frame with subunit information. See ‘Details’.
- units
A data frame with unit information. See ‘Details’.
- aggregatemissings
Optional: A symmetrical n x n matrix with information on how missing values should be aggregated. If no matrix is given, the default will be used. See 'Examples'.
- rename
Logical indicating whether units with only one subunit should be renamed to their unit name? Default is
FALSE
.- recodedData
Logical indicating whether colnames in
dat
are the subunit names (as insubunits$subunit
) or recoded subunit names (as insubunits$subunitRecoded
). Default isTRUE
, meaning that colnames are recoded subitem names.- suppressErr
Logical indicating whether aggregated cells with
err
(see ‘Details’) should be recoded to another value.- recodeErr
Character vector of length 1 indicating to which
err
should be recoded. This argument is only evaluated whensuppressErr = TRUE
- verbose
Logical. If
TRUE
additional information is printed.
Details
aggregateData
aggregates units in data frames with special consideration of missing values.The aggregation of missing values is specified in the argument aggregatemissings
. The rownames and colnames of this n x n matrix correspond to the missing codes in the data (see collapseMissings
for supported missing values). Additionally, the values vc
(for valid code) and err
(for error) are used. If aggregatemissings
is a data frame, it will be coerced to a matrix with the first column of the data frame being transformed into the rownames of the matrix. A warning will be given if the matrix is not symmetrical.
aggregateData
combines the subunits one by one, i.e. it aggregates the first two subunits of a unit, then adds the third subunit to the new aggregated variable and continues in this manner until all subunits are aggregated. In every step during the process a value of the first variable (e.g., the aggregated variable from the previous step) is matched with the rownames of aggregatemissings
and the corresponding value of the second variable (e.g., the next subitem to be aggregated) is matched with the colnames of aggregatemissings
. The new value of the aggregated variable will therefore be the value in aggregatemissings[firstVar, secondVar]
.If the value in the final aggregated variable is vc
, either the mean or the sum of subunits will be calculated. The rule given in units$unitAggregateRule
determines which one will be chosen, with SUM
being the default if column units$unitAggregateRule
is empty.
The user can specify combinations of missing values that cannot occur simultaneously in one unit by setting the respective cell in aggregatemissings
to err
. For example, it is unlikely that one subunit is not administered (missing by design, mbd
) and another subunit of the same unit was intentionally left blank by the person working on the test booklet (missing by intention mbi
). Thus, this combination of missing values is defaulted to produce an error (err
) in the aggregated variable. If the aggregation produces err
at any point, it will produce a warning. Values err
can be recoded to a different value by specifying the arguments suppressErr
and recodeErr
.
Examples of data frames subunits
and units
can be found via data(inputList)
.
Warning
Missings are only correctly aggregated if their values correspond to the values in aggregatemissings
. aggregateData
does not check for value types or whether codes are valid. Use of checkData
and recodeData
before using aggregateData
is therefore strongly recommended.
Examples
data(inputDat)
data(inputList)
dat1 <- inputDat[[1]] # get first dataset from inputDat
# recode data
datRec <- recodeData(dat1, inputList$values, inputList$subunits)
# define matrix for missing aggregation:
# combination of valid code and missing by intention produces missing by intention
# --> if any subunit is missing by intention,
# the aggregated unit is coded missing by intention
am <- matrix(c(
"vc" , "mvi", "vc" , "mci", "err", "vc" , "mbi", "err",
"mvi", "mvi", "err", "mci", "err", "err", "err", "err",
"vc" , "err", "mnr", "mci", "err", "mir", "mnr", "err",
"mci", "mci", "mci", "mci", "err", "mci", "mci", "err",
"err", "err", "err", "err", "mbd", "err", "err", "err",
"vc" , "err", "mir", "mci", "err", "mir", "mir", "err",
"mbi", "err", "mnr", "mci", "err", "mir", "mbi", "err",
"err", "err", "err", "err", "err", "err", "err", "err" ),
nrow = 8, ncol = 8, byrow = TRUE)
dimnames(am) <-
list(c("vc" ,"mvi", "mnr", "mci", "mbd", "mir", "mbi", "err"),
c("vc" ,"mvi", "mnr", "mci", "mbd", "mir", "mbi", "err"))
print(am)
#> vc mvi mnr mci mbd mir mbi err
#> vc "vc" "mvi" "vc" "mci" "err" "vc" "mbi" "err"
#> mvi "mvi" "mvi" "err" "mci" "err" "err" "err" "err"
#> mnr "vc" "err" "mnr" "mci" "err" "mir" "mnr" "err"
#> mci "mci" "mci" "mci" "mci" "err" "mci" "mci" "err"
#> mbd "err" "err" "err" "err" "mbd" "err" "err" "err"
#> mir "vc" "err" "mir" "mci" "err" "mir" "mir" "err"
#> mbi "mbi" "err" "mnr" "mci" "err" "mir" "mbi" "err"
#> err "err" "err" "err" "err" "err" "err" "err" "err"
datAggr <- aggregateData(datRec, inputList$subunits, inputList$units,
aggregatemissings = am, rename = TRUE, recodedData = TRUE,
suppressErr = TRUE, recodeErr = "mci", verbose = TRUE)
#> All aggregation rules will be defaulted to 'SUM', because no other type is currently supported.
#> Found 20 unit(s) with only one subunit in 'dat'. This/these subunit(s) will not be aggregated and renamed to their respective unit name(s).
#> 1 units were aggregated: I12.