Skip to contents

Aggregate datasets with constraints on missing values

Usage

aggregateData(dat, subunits, units, aggregatemissings = NULL,
    rename = FALSE, recodedData = TRUE, suppressErr = FALSE,
    recodeErr = "mci", verbose = FALSE)

Arguments

dat

A data frame containing the data to be aggregated.

subunits

A data frame with subunit information. See ‘Details’.

units

A data frame with unit information. See ‘Details’.

aggregatemissings

Optional: A symmetrical n x n matrix with information on how missing values should be aggregated. If no matrix is given, the default will be used. See 'Examples'.

rename

Logical indicating whether units with only one subunit should be renamed to their unit name? Default is FALSE.

recodedData

Logical indicating whether colnames in dat are the subunit names (as in subunits$subunit) or recoded subunit names (as in subunits$subunitRecoded). Default is TRUE, meaning that colnames are recoded subitem names.

suppressErr

Logical indicating whether aggregated cells with err (see ‘Details’) should be recoded to another value.

recodeErr

Character vector of length 1 indicating to which err should be recoded. This argument is only evaluated when suppressErr = TRUE

verbose

Logical. If TRUE additional information is printed.

Details

aggregateData aggregates units in data frames with special consideration of missing values.The aggregation of missing values is specified in the argument aggregatemissings. The rownames and colnames of this n x n matrix correspond to the missing codes in the data (see collapseMissings for supported missing values). Additionally, the values vc (for valid code) and err (for error) are used. If aggregatemissings is a data frame, it will be coerced to a matrix with the first column of the data frame being transformed into the rownames of the matrix. A warning will be given if the matrix is not symmetrical.

aggregateData combines the subunits one by one, i.e. it aggregates the first two subunits of a unit, then adds the third subunit to the new aggregated variable and continues in this manner until all subunits are aggregated. In every step during the process a value of the first variable (e.g., the aggregated variable from the previous step) is matched with the rownames of aggregatemissings and the corresponding value of the second variable (e.g., the next subitem to be aggregated) is matched with the colnames of aggregatemissings. The new value of the aggregated variable will therefore be the value in aggregatemissings[firstVar, secondVar].If the value in the final aggregated variable is vc, either the mean or the sum of subunits will be calculated. The rule given in units$unitAggregateRule determines which one will be chosen, with SUM being the default if column units$unitAggregateRule is empty.

The user can specify combinations of missing values that cannot occur simultaneously in one unit by setting the respective cell in aggregatemissings to err. For example, it is unlikely that one subunit is not administered (missing by design, mbd) and another subunit of the same unit was intentionally left blank by the person working on the test booklet (missing by intention mbi). Thus, this combination of missing values is defaulted to produce an error (err) in the aggregated variable. If the aggregation produces err at any point, it will produce a warning. Values err can be recoded to a different value by specifying the arguments suppressErr and recodeErr.

Examples of data frames subunits and units can be found via data(inputList).

Value

A data frame with aggregated units and, if rename = TRUE, renamed subunits.

Author

Nicole Mahler, Karoline Sachse, Anna Lenski

Warning

Missings are only correctly aggregated if their values correspond to the values in aggregatemissings. aggregateData does not check for value types or whether codes are valid. Use of checkData and recodeData before using aggregateData is therefore strongly recommended.

See also

Examples

data(inputDat)
data(inputList)

dat1 <- inputDat[[1]]  # get first dataset from inputDat

# recode data
datRec <- recodeData(dat1, inputList$values, inputList$subunits)

# define matrix for missing aggregation:
# combination of valid code and missing by intention produces missing by intention
# --> if any subunit is missing by intention,
# the aggregated unit is coded missing by intention

am <- matrix(c(
  "vc" , "mvi", "vc" , "mci", "err", "vc" , "mbi", "err",
  "mvi", "mvi", "err", "mci", "err", "err", "err", "err",
  "vc" , "err", "mnr", "mci", "err", "mir", "mnr", "err",
  "mci", "mci", "mci", "mci", "err", "mci", "mci", "err",
  "err", "err", "err", "err", "mbd", "err", "err", "err",
  "vc" , "err", "mir", "mci", "err", "mir", "mir", "err",
  "mbi", "err", "mnr", "mci", "err", "mir", "mbi", "err",
  "err", "err", "err", "err", "err", "err", "err", "err" ),
  nrow = 8, ncol = 8, byrow = TRUE)

dimnames(am) <-
  list(c("vc" ,"mvi", "mnr", "mci",  "mbd", "mir", "mbi", "err"),
       c("vc" ,"mvi", "mnr", "mci",  "mbd", "mir", "mbi", "err"))

print(am)
#>     vc    mvi   mnr   mci   mbd   mir   mbi   err  
#> vc  "vc"  "mvi" "vc"  "mci" "err" "vc"  "mbi" "err"
#> mvi "mvi" "mvi" "err" "mci" "err" "err" "err" "err"
#> mnr "vc"  "err" "mnr" "mci" "err" "mir" "mnr" "err"
#> mci "mci" "mci" "mci" "mci" "err" "mci" "mci" "err"
#> mbd "err" "err" "err" "err" "mbd" "err" "err" "err"
#> mir "vc"  "err" "mir" "mci" "err" "mir" "mir" "err"
#> mbi "mbi" "err" "mnr" "mci" "err" "mir" "mbi" "err"
#> err "err" "err" "err" "err" "err" "err" "err" "err"

datAggr <- aggregateData(datRec, inputList$subunits, inputList$units,
    aggregatemissings = am, rename = TRUE, recodedData = TRUE,
    suppressErr = TRUE, recodeErr = "mci", verbose = TRUE)
#> All aggregation rules will be defaulted to 'SUM', because no other type is currently supported.
#> Found 20 unit(s) with only one subunit in 'dat'. This/these subunit(s) will not be aggregated and renamed to their respective unit name(s).
#> 1 units were aggregated: I12.