
Automate Data Preparation using Functions from Package eatPrep
automateDataPreparation.RdThis function facilitates automated data preparation and wraps most functions from the eatPrep package.
Usage
automateDataPreparation(datList = NULL, inputList, path = NULL,
readSpss, checkData, mergeData, recodeData, recodeMnr = FALSE,
aggregateData, scoreData, writeSpss, collapseMissings = FALSE,
filedat = "mydata.txt", filesps = "readmydata.sps", breaks=NULL,
nMbi = 2, rotation.id = NULL, suppressErr = FALSE, recodeErr = "mci",
aggregatemissings = NULL, rename = TRUE, recodedData = TRUE,
addLeadingZeros=FALSE, truncateSpaceChar = TRUE,
newID = NULL, oldIDs = NULL, addMbd = TRUE, overwriteMbdSilently=TRUE,
missing.rule = list(mvi = 0, mnr = 0, mci = NA, mbd = NA, mir = 0, mbi = 0),
verbose=FALSE)Arguments
- datList
A list of data frames (see
data(inputDat)). IfNULL,readSPSShas to beTRUE. In this case, the function attempts to read SPSS .sav files.- inputList
A list of data frames containing neccessary information for data preparaton (see
data(inputList)for details).- path
A character vector containing the path required by for
writeSpss. Default is the current R working directory.- readSpss
Logical: If
TRUE, the functionreadSpsswill be called.- checkData
Logical: If
TRUE, the functioncheckDatawill be called.- mergeData
Logical: If
TRUE, the functionmergeDatawill be called.- recodeData
Logical: If
TRUE, the functionrecodeDatawill be called.- recodeMnr
Logical: If
TRUE, the functionmnrCodingwill be called.- aggregateData
Logical: If
TRUE, the functionaggregateDatawill be called.- scoreData
Logical: If
TRUE, the functionscoreDatawill be called.- collapseMissings
Logical: If
TRUE, the functioncollapseMissingswill be called and a data frame with recoded missing values according to argumentmissing.rulewill be returned.- writeSpss
Logical: If
TRUE, the functionwriteSpsswill be called.- filedat
a character string containing the name of the output data file for
writeSpss.- filesps
a character string containing the name of the output syntax file for
writeSpss.- breaks
Numeric vector passed on to function
mnrCodingcontaining the number of blocks after whichmbishall be recoded tomnr, e.g.,c(1,2)to specify breaks after the first and second block. numeric vector (argument used by ).- nMbi
Numeric vector of length 1 passed on to function
mnrCodingcontaining the number ofmbi-Codes required at the end of a block to codemnr. Needs to be > 0.- rotation.id
-
Character vector of length 1 passed on to function
mnrCodingindicating the name of the rotation indicator (e.g. “booklet”) in the dataset. - suppressErr
Logical passed on to function
aggregateDataindicating whether aggregated cells witherrshould be recoded to another value..- recodeErr
Character vector of length 1 passed on to function
aggregateDataindicating to whicherrshould be recoded. This argument is only evaluated whensuppressErr = TRUE
.
- missing.rule
A named list with definitions how to recode the different types of missings in the dataset. If
writeSPSS = TRUE, missing values will be recoded to 0 orNAprior to writing the SPSS dataset. SeecollapseMissingsfor supported missing values.- aggregatemissings
A symmetrical n x n matrix or a data frame from
inputList$aggrMisspassed on to functionaggregateDatawith information on how missing values should be aggregated. If no matrix is given, the default will be used. See 'Details' inaggregateData.- rename
Logical passed on to function
aggregateDataindicating whether units with only one subunit should be renamed to their unit name? Default isFALSE.- recodedData
Logical passed on to function
aggregateDataindicating whether colnames indatare the subunit names (as insubunits$subunit) or recoded subunit names (as insubunits$subunitRecoded). Default isTRUE, meaning that colnames are recoded subitem names.- addLeadingZeros
logical. See
readSpss.- truncateSpaceChar
logical. See
readSpss.- newID
A character string containing the case IDs name in the final data frame. Default is
IDor a character string specified ininputList$newID.- oldIDs
A vector of character strings containing the IDs names in the original SPSS datasets. Default is as specified in
inputList$savFiles.- addMbd
Logical. Whether
mbdshould be added when merging, seemergeData. Also used inprep2GADS.- overwriteMbdSilently
Logical. Whether
mbdwill overwritten silently when other non-empty values are available when merging, seemergeData.- verbose
Logical: If
TRUE, progress and additional information is printed.
Examples
data(inputList)
data(inputDat)
preparedData <- automateDataPreparation(inputList = inputList,
datList = inputDat, path = getwd(),
readSpss = FALSE, checkData = TRUE, mergeData = TRUE,
recodeData = TRUE, recodeMnr = TRUE, breaks = c(1,2),
aggregateData = TRUE, scoreData = TRUE,
writeSpss = FALSE, verbose = TRUE)
#> Starting automateDataPreparation 2025-09-12 10:57:11.466176
#>
#> Check data...
#>
#> Checking dataset booklet1
#> Only valid codes in ID variable.
#> No duplicated entries in ID variable.
#> No duplicated variable names.
#> Found no variable information about variable(s) hisei. This/These variables will not be checked for missings and invalid codes.
#> Found no invalid codes.
#>
#> Checking dataset booklet2
#> Only valid codes in ID variable.
#> No duplicated entries in ID variable.
#> No duplicated variable names.
#> Found no variable information about variable(s) hisei. This/These variables will not be checked for missings and invalid codes.
#> Found no invalid codes.
#>
#> Checking dataset booklet3
#> Only valid codes in ID variable.
#> No duplicated entries in ID variable.
#> No duplicated variable names.
#> Found no variable information about variable(s) hisei. This/These variables will not be checked for missings and invalid codes.
#> Found no invalid codes.
#>
#> Start merging.
#> Start merging of dataset 1.
#> Start merging of dataset 2.
#> Start merging of dataset 3.
#> Start adding mbd according to data pattern.
#>
#> Start recoding.
#>
#> Found no recode information for variable(s):
#> ID, hisei.
#> This/These variable(s) will not be recoded.
#>
#> Variables... I01, I02, I03, I04, I05, I06, I07, I08, I09, I10, I11, I12a, I12b, I12c, I13, I14, I15, I16, I17, I18, I19, I20, I21, I22, I23, I24, I25, I26, I27, I28
#> ...have been recoded.
#>
#> Start recoding Mbi to Mnr.
#> ...identifying items in data (reference is blocks$subunit)
#> Variables in data not recognized as items:
#> ID, booklet, hisei
#> If some of these excluded variables should have been identified as items (and thus be used for mnr coding) check 'blocks', 'subunits', 'dat'.
#> ...identifying items with no mbi-codes ('mbi'):
#> I04R, I08R
#> If you expect mbi-codes on these variables check your data and option 'mbiCode'
#> mnr statistics:
#> mnr cells: 553
#> unique cases with at least one mnr code: 89
#> unique items with at least one mnr code: 16
#> unique cases ('ID') per booklet and booklet section (0s omitted):
#> booklet booklet.section N.ID
#> 1 booklet1 2 11
#> 2 booklet1 3 28
#> 3 booklet2 1 28
#> 4 booklet2 2 11
#> 5 booklet2 3 1
#> 6 booklet3 3 31
#>
#> start recoding (item-wise)
#> done
#> elapsed time: 0.1 secs
#>
#> Start aggregating
#> Since inputList$aggrMiss exists, this will be used instead of default.
#> All aggregation rules will be defaulted to 'SUM', because no other type is currently supported.
#> Found 27 unit(s) with only one subunit in 'dat'. This/these subunit(s) will not be aggregated and renamed to their respective unit name(s).
#> 1 units were aggregated: I12.
#>
#> Start scoring.
#> ✔ 1 unit was scored: `I12`.
#>
#> No SPSS-File has been written.
#>
#> Missings are UNcollapsed.
#> automateDataPreparation terminated successfully! 2025-09-12 10:57:11.727791