# Resources - CCS

Change Text Size

Untitled Document

## Appendix B - Statistical expressions

This appendix provides the details on computing the statistics in the Test Response System. The statistics are summarized below.

Statistics included the Test Response System

 Course and section Course only Alpha Part 2 Count of items Part 1, Part 2 Count of students Part 1, Part 2 Guessing penalty Part 4, 5 Item difficulty Part 2 Item-test correlation Part 2 Kuder-Richardson Part 2 Mean score Parts 3, 4 Part 2 Response frequencies Parts 1, 2 Score: Raw Parts 3, 4, 8, 11 Score: Percentage Parts 4, 5, 9, 12 Score: New maximum Parts 6, 7, 10, 13 Standard deviation Part 2 Standard error of measurement Part 2 Student rank Parts 3, 4 Unweighted results Part 4 Variance Part 2 Weighted results Part 4

Data for the examples are calculated from the example described in the printout guide. The reader will note that the data used in these examples are not the same as the data used to illustrate the Topics examples.

 ```                         Question number          1 2 3 4 5 6 7 8 Correct answers          1 2 3 4 5 1 2 3 Student                  Students' number           Name    answers 1111111119       Ann     1 2 3 4 5 1 2 1 1111111111       Bob     1 2 4 4 * 1 2 3 1111111118       Cam     1 2 5 4 5 5 2 2 1111111112       Don     1 2 3 4 1 3 2 4 1111111117       Eve     . 2 1 4 5 2 2 3 1111111113       Fay     1 2 3 3 4 1 3 5 1111111116       Guy     1 . 3 5 1 1 2 1 1111111114       Hal     1 2 2 1 * 1 2 2 1111111115       Ian     1 . 3 2 3 1 3 4 1111111110       Joy     1 . 3 1 2 4 3 5```

Mean score
In the example, the calculation of the mean is:

where M is the mean score for each section or for the entire class,
N is the number of students, and
X is a student's score.

Standard deviation
The standard deviation is an indication of the range of scores in the test.

where SDt is the standard deviation of the test,
M is the mean of all students' scores,
X is a student's score and
N is the number of students.

In the example the calculation is:

See Part 2 for more details.

Item-test correlation and correction (Magnusson, p. 200)
The item-test correlation is a point biserial correlation coefficient that indicates the extent to which an item measures the same attribute as the test as a whole. The item-test correlation, rit, is the correlation between item i and scores for the scores for all the items. Correlations of at least +.30 are desirable.

where rit is the item-test correlation of an item i with the total test score,
Mc is the mean score of students who answered the item correctly,
Mw is the mean score of students who answered the item incorrectly,
SDt is the standard deviation of the test,
SDi is the standard deviation of the item,, where
p is the proportion of students who answered the item correctly, and
q is the proportion of students who answered the item incorrectly

See Part 2 for more details.

Since the item contributes to the total score, it should be removed from the correlation. To correct for inclusion of item i the following correction is applied (Magnusson p. 212):

ri(t-i) =

where ri(t-i) is the corrected item-test correlation of an item,
SDt is the standard deviation of the test, where
SDi is the standard deviation of an item,
p is the proportion of students who answered the item correctly,
q is the proportion of students who answered the item incorrectly and
rit is the unadjusted item-test correlation.

The table below contains the data required to calculate the correction for Question 1. This item was answered correctly answered by students Ann, Bob, Cam, Don, Fay, Guy, Hal, Ian and Joy, whose mean score for all items was 4.4. Eve, the only student who answered the item wrongly, received a score of 5.0 for all items.

The calculation for the item-test correlation for Question 1 is:

rit =

 Correctly answered by . . . X p q Question Ann Bob Cam Don Eve Fay Guy Hal Ian Joy 1 9 0.9 0.1 2 7 0.7 0.3 3 6 0.6 0.4 4 5 0.5 0.5 5 3 0.3 0.7 6 6 0.6 0.4 7 7 0.7 0.3 8 2 0.2 0.8 Score (X) 7 6 5 5 5 4 4 4 3 2 45 Deviation (X - M) 6.25 2.25 0.25 0.25 0.25 0.25 0.25 0.25 2.25 6.25 (X - M)=18.5

The correction to exclude Question 1 is:

=

Kuder Richardson 20 measurement of reliability (dichotomous answers) (Magnusson, p. 116)
See Part 2 for more details.

The Kuder-Richardson Formula 20 (KR20) is an estimate of the test's reliability. It varies between 0.00 and 1.00.

where KR20 is the Kuder Richardson 20 reliability estimate of the test, assuming dichotomous answers (answers containing only two choices, ie.,
correct or incorrect).
k is the number of items in the test,
p is the proportion of students who answered the item correctly,
q is the proportion of students who answered the item incorrectly and
SDt is the standard deviation of the test

The calculation of Kuder Richardson in the example is:

The statistical significance of KR20 is assessed using the F distribution with the formula,

In the example the computation for F of Kuder Richardson is

The probability of the F value is obtained from the SAS probf distribution (SAS, p.579), which returns the probability that an observation from an F distribution is less than or equal to the observed numeric random variable. It has the following form,

1 - PROBF(x, ndf ,ddf),

where x is the F value of the Kuder Richardson statistic,
ndf is the degrees of freedom in the numerator (number of students in the class - 1), and
ddf is the degrees of freedom in the denominator (number of students - 1 × number of items - 1).

In the example the result of .3270 is obtained with the following parameters:

F = 1.17
ndf = 10 - 1 = 9
ddf = (n - 1)(k - 1) = (10 - 1)(8 - 1) = 63.

Under the null hypothesis (KR20=0, no test reliability), the probability of the observed statistic (KR20=.15) is .3270. By convention, only probabilities below .05 are considered significant. Therefore the data provide no evidence that this examination is reliable.

Raw coefficient alpha (continuous answers) (Nunnaly, p. 214)
See Part 2 for more details.

Cronbach's is a more general measurement of reliability than Kuder Richardson. It applies to a continuous distribution of values and is therefore appropriate for measuring the reliability of attitudes, opinions or behaviour with the five-point scale used in the Test Response System. It is confined to students who answered all the items Because of these computational differences, and KR20 are not directly comparable in TRS reports.

There are two ways to express Cronbach's . In the first, perfect reliability is given the value 1 and the error component is the ratio between the sum of the item variances and the variance of the student scores.

where a   is Cronbach's a,
k    is the number of items on the test,
SDi is the standard deviation of an item,
SDt is the standard deviation of the test

A second approach is equivalent to the first but expresses the relation as a ratio of two ratios comprised of the variance and covariances of the scores.

where a is Cronbach's a ,
k is the number of items on the test,
is the average covariance, and

is the average variance.

The computation is confined to students who answered all items. As a result, it is difficult to compare the results between Kuder Richardson and Cronbach's alpha unless we use an artificial example. Only four students answered all questions -- Ann, Cam, Don and Fay.

Notice that the data, shown in the table below, represent values on a scale of 1 to 5, unlike a binary correct-incorrect dichotomy used to assess reliability of academic performance in the Kuder Richardson example.

 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Total Ann 1 2 3 4 5 1 2 1 19 Cam 1 2 5 4 5 5 2 2 26 Don 1 2 3 4 1 3 2 4 20 Fay 1 2 3 3 4 1 3 5 22 Test variance SDt 9.58 Item variance SDi 0.00 0.00 1.00 0.25 3.58 3.67 0.25 3.33 12.08

The computation using the ratio of the sum of the item variances to the test variance is

The second method, mathematically equivalent to the method shown above, uses. a ratio of variance and covariance. Covariance is the mean product of the variation about the mean between each pair of items. Covariance is calculated by dividing the total covariation by N-1, where N is the number of students who answered all items on the test. An example of covariance is shown below for items 3 and 4:

 Q3 X Variation (X-) Q4 Y Variation (Y-) Covariation ( Ann 3 -0.5 4 0.25 -0.125 Cam 5 1.5 4 0.25 0.375 Don 3 -0.5 4 0.25 -0.125 Fay 3 -0.5 3 -0.75 0.375 Mean () 3.5 3.75 Sum of squared variation 3 0.75 Variance=Sum of squared variation ÷(N-1) 1 0.25 Total covariation 0.5 Covariance /(N-1) 0.1667

Covariance matrix for all questions in the example

 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q1 0 0 0 0 0 0 0 Q2 0 0 0 0 0 0 Q3 0.1667 0.8333 1.6667 -0.1667 -0.6667 Q4 -0.0833 0.5 -0.25 -0.6667 Q5 -0.1667 0.0833 -2 Q6 -0.5 -0.6667 Q7 0.6667

The mean covariance (28 cells) is -.0446. Coefficient alpha also requires mean variance of the items. This value is derived from the total squared deviation for each item, divided by N-1, whose mean for the 8 items is 1.5104. The calculation in the example is:

The negative coefficient is an anomaly arising from an extremely small sample and artificial data.

The F value for is obtained from the formula,

In the example the computation for F of is

The statistical significance of is assessed with the F distribution (as for KR20).

The probability of the F value is obtained from the SAS probf distribution (SAS, p.579), which returns the probability that an observation from an F distribution is less than or equal to the observed numeric random variable. It has the following form,

1 - PROBF(x, ndf ,ddf),

where x is the raw coefficient alpha,
ndf is the degrees of freedom in the numerator (number of students in the class - 1), and
ddf is the degrees of freedom in the denominator (number of students - 1 × number of items - 1).

In the example the result of .6439 is obtained with the following parameters:

F = -.7703
ndf = 10 - 1 = 9
ddf = (n - 1)(k - 1) = (10 - 1)(8 - 1) = 63.

There is a high probability (.6439) that sampling error could account for the obtained . We conclude that does not differ significantly from zero and that the examination is not reliable.

Standard error of measurement
See Part 2  for more details.

The standard error of measurement (SE) is an estimate of the error component in a student's score due to imperfections in the test from illness, distractions, and fatigue.

where SE is the standard error of measurement,
SDt is the standard deviation of the test, and
Reliability coefficient is Kuder Richardson 20 or

The calculation in the example is:

Bookmark with: