Getting Started
getting_started.Rmd
Introduction
#remotes::install_github("https://github.com/sachseka/eatPrep")
library(eatPrep)
#> Loading required package: readxl
The goal of eatPrep
, short for
Educational Assesment Tools: Preparation
, is to automate
the preparation of item response test data; especially but not
exclusively for typical use cases at the Institute for Educational
Quality Improvement (IQB). Therefore it works together with
IQB-specific applications such as the ItemDB
or
ZKDeamon
.
The main purpose of eatPrep
is to read, score, aggregate
and merge data from a regular SPSS
(.sav
) data
file (via the function readSpss()
) and metadata from
databases (readDaemonXlsx()
) and Excel spreadsheets
(readMerkmalXlsx()
). Additional functions include checking
the data (checkData()
), meta data
(checkInputList()
), and the design
(checkDesign()
), recoding of (missing) values
(recodeData()
, mnrCoding()
,
aggregateData()
,
scoreData()
,collapseMissings()
), working with
rater data (meanAgree()
, meanKappa()
),
evaluating item categories (catPbc()
,
evalPbc()
) and exporting prepared or raw item data sets to
different formats (writeSpss()
, prep2GADS()
).
Most of this can be done with a single function call to
automateDataPreparation()
.
In a prototypical scenario, we are constructing a test at the IQB,
e.g. for educational purposes. For our pilot study, we have created
several overlapping booklets and a variety of items with different item
formats (constructed response, multiple choice).
Thereby, the data consists of several layers:
booklets | containing blocks |
blocks | containing units (= items) |
units | containing subunits/subitems |
subunits | having values and |
values | including missings and recode values |
Now we would like to prepare the data collected in this way for
scaling. In doing so, we would like to know if the booklets are speeded
(number of mnr). Additionally, we intend to use different missing
treatments for the actual scaling, so we preserve missing values during
data preparation and recode them right before scaling the data.
eatPrep
can help us with all of that.
As a reminder, here is an overview for different value types the IQB uses:
Code | Label | Abbr | Explanation |
---|---|---|---|
-98 | mir | missing invalid response | (1) Item was edited, and (2a) empty answer or (2b) invalid (joke) answer. |
-99 | mbo | missing by omission | Item wasn’t edited but seen, or wasn’t seen, but there are seen or edited subsequent Items. |
-96 | mnr | missing not reached | (1) Item wasn’t seen, and (2) all subsequent Items weren’t seen, either. |
-97 | mci | missing coding impossible | (1) Item should/could have been edited, and (2) answer can’t be analysed due to technical problems. |
-94 | mbd | missing by design | no answer, because Item wasn’t shown to the testperson by design. |
Input
Information/metadata about items is stored in many different ways,
e.g., databases, Excel sheets, sometimes even in item names.
eatPrep
requires this metadata to have a specific form,
namely a series of relational tables. For eatPrep
it is
most convenient to store these tables in a list. When downloading the
eatPrep
package, you can find files with some example
tables in the data
folder.
Sheets have specific names, also the columns in those sheets have specific names.
inputMinimal
contains the bare minimum for all functions
in eatPrep
to work, inputList
contains
additional metadata about the items (such as information about the
task’s content).
# Example: inputMinimal
inputMinimal <- list(units = inputList$units[ -nrow(inputList$units), c("unit", "unitAggregateRule")],
subunits = inputList$subunits[, c("unit", "subunit","subunitRecoded")],
values = inputList$values[ , c("subunit", "value", "valueRecode", "valueType")],
unitRecodings = inputList$unitRecodings[ , c("unit", "value", "valueRecode", "valueType")],
blocks = inputList$blocks,
booklets = inputList$booklets,
rotation = inputList$rotation)
Input tables
Let’s look at the tables from bottom to top, starting with items and their values. In the values sheet, we define values and recode values for each subitem.
Here we see 3 items (called units) from inputList
bzw.
inputMinimal
. One mc (item 1), one short response (item 5),
one item with 3 subitems short response (item 12)
items <- c("I01", "I05", "I12")
subitems <- c("I01", "I05", "I12a", "I12b", "I12c")
kable(inputMinimal$units[which(inputMinimal$units$unit %in% items), ])
unit | unitAggregateRule | |
---|---|---|
1 | I01 | |
5 | I05 | |
12 | I12 | SUM |
The values sheet contains all possible values for each subitem and scoring information for each value (e.g., whether it is true, false, or missing). There are several types of missing values.
subunit | value | valueRecode | valueType | |
---|---|---|---|---|
1 | I01 | 1 | 0 | vc |
2 | I01 | 2 | 0 | vc |
3 | I01 | 3 | 1 | vc |
4 | I01 | 6 | mnr | mnr |
5 | I01 | 7 | mbd | mbd |
6 | I01 | 8 | mir | mir |
7 | I01 | 9 | mbi | mbi |
30 | I05 | 0 | 0 | vc |
31 | I05 | 1 | 1 | vc |
32 | I05 | 6 | mnr | mnr |
33 | I05 | 7 | mbd | mbd |
34 | I05 | 8 | mir | mir |
35 | I05 | 9 | mbi | mbi |
80 | I12a | 0 | 0 | vc |
81 | I12a | 1 | 1 | vc |
82 | I12a | 6 | mnr | mnr |
83 | I12a | 7 | mbd | mbd |
84 | I12a | 8 | mir | mir |
85 | I12a | 9 | mbi | mbi |
86 | I12b | 0 | 0 | vc |
87 | I12b | 1 | 1 | vc |
88 | I12b | 6 | mnr | mnr |
89 | I12b | 7 | mbd | mbd |
90 | I12b | 8 | mir | mir |
91 | I12b | 9 | mbi | mbi |
92 | I12c | 1 | 0 | vc |
93 | I12c | 2 | 0 | vc |
94 | I12c | 3 | 0 | vc |
95 | I12c | 4 | 1 | vc |
96 | I12c | 6 | mnr | mnr |
97 | I12c | 7 | mbd | mbd |
98 | I12c | 8 | mir | mir |
99 | I12c | 9 | mbi | mbi |
The subunits sheet contains information about the
relationship between subitems and items. This information is used to
aggregate subitems to items. If recoded subitems should be named
differently than the unrecoded variables in the original data, this can
be specified via subunitRecoded
. The only type of
aggregation currently supported by eatPrep
is to count the
number of correct subitems per item.
unit | subunit | subunitRecoded | |
---|---|---|---|
1 | I01 | I01 | I01R |
5 | I05 | I05 | I05R |
12 | I12 | I12a | I12aR |
13 | I12 | I12b | I12bR |
14 | I12 | I12c | I12cR |
unitrecodings contains the same information as the values sheet, but for aggregated items. Here, the value on an item is the aggregated number of correct subitems and the recode value gives a threshold of how many subitems need to be solved to obtain a credit for the item. In the example, I12 has 3 subitems, and one receives a credit if all subitems are solved correctly.
kable(inputMinimal$unitRecodings)
unit | value | valueRecode | valueType |
---|---|---|---|
I12 | 0 | 0 | vc |
I12 | 1 | 0 | vc |
I12 | 2 | 0 | vc |
I12 | 3 | 1 | vc |
I12 | mbd | mbd | mbd |
I12 | mir | mir | mir |
I12 | mbi | mbi | mbi |
The minimal use case is that each subitem corresponds to an item. In this case, the value sheet contains all information needed for recoding the items (apart from possibly renaming the recoded items).
input data
We have several overlapping booklets with several blocks in each booklet. Moreover, there is a unique identifier for each person and some additional information about each student like their gender or their socioeconomic status. There is one data set per booklet. In order to prepare the data, we need to construct one large data set.
# looking at the data
str(inputDat)
#> List of 3
#> $ booklet1:'data.frame': 100 obs. of 25 variables:
#> ..$ ID : chr [1:100] "person100" "person101" "person102" "person103" ...
#> ..$ hisei: num [1:100] 49 NA 57 32 59 56 55 47 NA 50 ...
#> ..$ I01 : chr [1:100] "1" "9" "3" "2" ...
#> ..$ I02 : chr [1:100] "4" "4" "4" "4" ...
#> ..$ I03 : chr [1:100] "1" "2" "2" "3" ...
#> ..$ I04 : chr [1:100] "2" "2" "3" "1" ...
#> ..$ I05 : chr [1:100] "0" "0" "0" "1" ...
#> ..$ I06 : chr [1:100] "0" "9" "0" "1" ...
#> ..$ I07 : chr [1:100] "2" "2" "2" "3" ...
#> ..$ I08 : chr [1:100] "0" "1" "0" "0" ...
#> ..$ I09 : chr [1:100] "2" "2" "2" "3" ...
#> ..$ I10 : chr [1:100] "2" "1" "1" "4" ...
#> ..$ I11 : chr [1:100] "4" "1" "3" "2" ...
#> ..$ I12a : chr [1:100] "1" "0" "1" "0" ...
#> ..$ I12b : chr [1:100] "0" "9" "0" "1" ...
#> ..$ I12c : chr [1:100] "4" "1" "2" "4" ...
#> ..$ I13 : chr [1:100] "9" "4" "9" "4" ...
#> ..$ I14 : chr [1:100] "9" "4" "9" "1" ...
#> ..$ I15 : chr [1:100] "9" "3" "9" "1" ...
#> ..$ I16 : chr [1:100] "9" "2" "9" "2" ...
#> ..$ I17 : chr [1:100] "9" "3" "9" "4" ...
#> ..$ I18 : chr [1:100] "9" "1" "9" "1" ...
#> ..$ I19 : chr [1:100] "9" "4" "9" "2" ...
#> ..$ I20 : chr [1:100] "9" "1" "9" "1" ...
#> ..$ I21 : chr [1:100] "9" "2" "9" "3" ...
#> $ booklet2:'data.frame': 100 obs. of 25 variables:
#> ..$ ID : chr [1:100] "person200" "person201" "person202" "person203" ...
#> ..$ hisei: num [1:100] 69 76 47 58 62 78 70 26 57 70 ...
#> ..$ I08 : chr [1:100] "0" "0" "1" "0" ...
#> ..$ I09 : chr [1:100] "2" "4" "1" "2" ...
#> ..$ I10 : chr [1:100] "3" "1" "2" "2" ...
#> ..$ I11 : chr [1:100] "4" "2" "1" "4" ...
#> ..$ I12a : chr [1:100] "1" "0" "1" "1" ...
#> ..$ I12b : chr [1:100] "1" "0" "1" "1" ...
#> ..$ I12c : chr [1:100] "4" "3" "2" "2" ...
#> ..$ I13 : chr [1:100] "4" "2" "1" "2" ...
#> ..$ I14 : chr [1:100] "3" "4" "2" "3" ...
#> ..$ I15 : chr [1:100] "4" "1" "4" "4" ...
#> ..$ I16 : chr [1:100] "4" "3" "3" "2" ...
#> ..$ I17 : chr [1:100] "4" "1" "2" "3" ...
#> ..$ I18 : chr [1:100] "1" "4" "3" "2" ...
#> ..$ I19 : chr [1:100] "2" "1" "4" "3" ...
#> ..$ I20 : chr [1:100] "4" "9" "2" "2" ...
#> ..$ I21 : chr [1:100] "2" "9" "2" "1" ...
#> ..$ I22 : chr [1:100] "1" "9" "2" "1" ...
#> ..$ I23 : chr [1:100] "9" "9" "2" "3" ...
#> ..$ I24 : chr [1:100] "1" "9" "3" "3" ...
#> ..$ I25 : chr [1:100] "1" "9" "0" "0" ...
#> ..$ I26 : chr [1:100] "9" "9" "1" "0" ...
#> ..$ I27 : chr [1:100] "0" "9" "1" "0" ...
#> ..$ I28 : chr [1:100] "1" "9" "1" "0" ...
#> $ booklet3:'data.frame': 100 obs. of 23 variables:
#> ..$ ID : chr [1:100] "person300" "person301" "person302" "person303" ...
#> ..$ hisei: num [1:100] 49 NA 57 32 59 56 55 47 NA 50 ...
#> ..$ I01 : chr [1:100] "2" "3" "2" "1" ...
#> ..$ I02 : chr [1:100] "4" "3" "3" "3" ...
#> ..$ I03 : chr [1:100] "3" "1" "1" "1" ...
#> ..$ I04 : chr [1:100] "1" "3" "1" "1" ...
#> ..$ I05 : chr [1:100] "0" "1" "1" "0" ...
#> ..$ I06 : chr [1:100] "9" "1" "0" "1" ...
#> ..$ I07 : chr [1:100] "8" "2" "2" "1" ...
#> ..$ I15 : chr [1:100] "3" "9" "9" "9" ...
#> ..$ I16 : chr [1:100] "1" "3" "3" "3" ...
#> ..$ I17 : chr [1:100] "4" "4" "4" "4" ...
#> ..$ I18 : chr [1:100] "4" "4" "4" "4" ...
#> ..$ I19 : chr [1:100] "2" "9" "9" "9" ...
#> ..$ I20 : chr [1:100] "3" "4" "2" "4" ...
#> ..$ I21 : chr [1:100] "3" "2" "2" "2" ...
#> ..$ I22 : chr [1:100] "3" "4" "4" "9" ...
#> ..$ I23 : chr [1:100] "3" "1" "1" "9" ...
#> ..$ I24 : chr [1:100] "1" "1" "1" "9" ...
#> ..$ I25 : chr [1:100] "0" "1" "9" "9" ...
#> ..$ I26 : chr [1:100] "1" "0" "9" "9" ...
#> ..$ I27 : chr [1:100] "0" "9" "9" "9" ...
#> ..$ I28 : chr [1:100] "0" "9" "9" "9" ...