Guide to Results
The Test Response System issues a printout that shows the results of the examination. Grading reports in plain text and Excel formats will be included in the email.
Results are described from two points of view:
 Students' performance (measured absolutely and relatively) and
 The efficiency of the examination in gauging student achievement.
The examples in this section will be illustrated with results from ten students who answered an examination with eight questions. The students' names and their answers are shown in the table below. Questions whose answers were left blank are represented with a period (.) and answers where the student chose more than one response are shown with an asterisk (*).
Emslie & Emslie (1992) present a detailed analysis of the example. The article offers suggestions for the interpretation of TRS statistics and shows how the psychometric indices help identify good and bad questions. The paper is available on request from the authors.
The printout is divided into fourteen parts:
The number at the right side of each row shows the number of students answering the question. The number at the bottom of each column shows the number of times an answer is chosen.
The example shows frequencies for the eight questions answered by the class of ten students.
The examination elicited 80 responses (the product of the number of students and the number of questions). Failure to answer any question occurred 4 times and multiple answers (i.e., two or more response circles darkened) occurred 2 times. The most frequently chosen answer ("1") occurred 22 times.
Each cell contains three numbers.
 The count of students stating an answer  in Question 3, 6 students answered "3";
 The percentage of students stating an answer  in Question 4, 10.00% of the students chose "5";
 The column percentage  27.27% of all "1"s occurred in Question 6. Column percentages are useful in revealing whether there is a concentration of nonresponse or multiresponse answers.
Part 2 describes the efficiency of the examination in terms of criteria established by psychologists for mental measurement. Each row contains a summary for one question. Refer to page 13 as you read the description of each column. All results in Part 2 are for unweighted responses.
Response frequencies
For each question, response frequency is the proportion of students who responded in the ways indicated, i.e., chose a particular alternative, failed to answer or gave multiple answers. These values are the same as the row percentages described in Part 1. The sum of these proportions will always equal 1.
Correct answer
This is the correct answer for each question as indicated by the corresponding entry on the master sheet.
Item difficulty
This column (p) is the proportion of students answering the item correctly. Paradoxically, therefore, the higher the value the easier the item. Values range from .00 (no one answered correctly) to 1.00 (all respondents answered correctly). The closer the value is to .5 the better the item discriminates among students.
Item variance
Item variance is the product of item difficulty (p) and 1p. Thus, if item difficulty is .60, the variance for that item is (.60)(.40) = .24. Values range from .00 to .25. The higher the value the greater the potential contribution of the item to test reliability.
Itemtest correlation
The itemtest correlation is a point biserial correlation coefficient that indicates the extent to which an item measures the same attribute as the test as a whole. Values range between 1.00 and +1.00. To avoid spuriousness, the item is not included in the test when itemtest correlations are calculated.
Positive values indicate that persons who answer an item correctly also tend to do well on the entire test. The higher the positive correlation the stronger this tendency and the greater the item's contribution to test reliability as measured by the KuderRichardson coefficient (described below).
A negative correlation means that persons who obtain high scores on the test tend to answer that item incorrectly. Items with negative values are counterproductive because they reduce the reliability of the test.
Because situations vary, there is no absolute value for an acceptable item test correlation. As a rule of thumb, correlations of at least +.30 are desirable. If the test is to be refined for future use, items with low or negative correlations should be replaced or reworded.
Mean score
The mean score is the average test score obtained by the respondents. It is the raw unweighted sum of correct responses divided by the number of respondents.
Standard deviation
The standard deviation is an indication of the range of scores in the test. If the scores are normally distributed the range will be about six standard deviations. For example, in a test with a mean of 70 and a standard deviation of 5, the highest score will be about 85 (70 plus 3 standard deviations) and the lowest score about 55 (70 minus 3 standard deviations). The approximate range would be 85  55 = 30 = 6 standard deviations. The higher the standard deviation the more widely scattered the scores of the individuals taking the test.
KuderRichardson 20
The KuderRichardson Formula 20 (KR20) is an estimate of the test's reliability. It varies between 0.00 and 1.00. Negative values occur in very rare circumstances and indicate severe violation of the assumptions on which the statistic is based. High KR20 values indicate that the items in the test sheet are a relatively homogeneous set, i.e., that the individual items measure something in common.
A low KR20 is a warning that the test contains items that are only loosely related and that the test is impure because it appears to be measuring several different attributes. In this event the meaning of the test score is unclear.
What constitutes an acceptable level of consistency varies with the testing situation. KR20 coefficients below 0.64 are often an indication that the test scores should be interpreted with caution.
Standard error of measurement
The score obtained by a student on an achievement test is determined in part by that individual's true knowledge and in part by errors of measurement due to imperfections in the test. Other chance events such as illness or distractions during testing also affect performance. The standard error of measurement (SE) is an estimate of the error component.
Assuming the errors are random (nonsystematic), the probability is approximately 95% that a student's true score falls within two standard errors of the obtained score. Thus, if a student achieved a score of 80 on a test with a SE of 3, the instructor is reasonably confident that the student's true score lies within the range 74 to 86.
The weights specified on the weight sheet control the contribution that each question makes toward the test results. In the absence of a weight sheet, the weights are set to 1, i.e., all questions are considered of equal importance.
The weights and correct responses are shown under a ruler that gives question numbers from 1 to 100. A blank for a correct response signifies an omitted question. In the example, all weights are assigned a value of 1 because the weight sheet was omitted from the submission.
Student numbers and names are at the left side of each row, followed by the number of correct (C) and wrong (W) responses. Part 3 is designed as a confidential document because it contains both student numbers and names.
Rank
Student rank shows the position of each student in the entire class according to the number of correct responses. The students with the lowest ranks are the highest achievers. Ties are expressed with the lowest of the corresponding rank.
Part 4 is a confidential document because it includes student names. Use Part 5, if you want to post the results in a public place without revealing student names.
Note: Part 4 is always generated, regardless of the presence or absence of the system sheet, or what reports have been requested.
An example of Part 4 is shown below. Results are shown with each section of the class starting on a fresh page. The number of questions indicates the highest question number chosen on the master sheet. The number of omitted questions refers to the total number of response circles left blank on the master sheet between the lowest question number and the highest question number. The heading at the top of the page reminds you that in Part 4 final results are based on 100 percent. If you wish to alter the maximum score, you must use Part 6 or Part 7.
Name and rank
Student names are listed in alphabetical order with their ranks based on their standing in the entire class. Results are shown both for unweighed and weighted scores.
Guessing adjustment
You may adjust scores for guessing by deducting a proportion of wrong answers. The available penalties are shown below. In theory, the appropriate deduction is 1/(k1) where k is the number of alternative answers per question.
C = None
CW = All wrong
CW/2 = Half of wrong
CW/3 = Onethird of wrong
CW/4 = Onequarter of wrong
The calculations for Joy are given as examples. She had 2 correct answers and 5 incorrect responses. Note that she failed to answer question 2. All percentages use weighted scores.
C = (2/8) * 100 = 25.0%
CW = ((25) / 8) = 37.5%
CW/2 = ((2  (5/2)) / 8) = 6.3%
CW/3 = ((2  (5/3)) / 8) = 4.2%
CW/4 = ((2  (5/4)) / 8) = 9.4%
Unweighted results
Numbers shown are results without weights.
Weighted scores
These scores are the results when the credit for each correct answer is multiplied by the weight assigned to that question.
Part 14 is a confidential document because it includes student names. Use Part 5, if you want to post the results in a public place without revealing student names.
An example of Part 14 is shown below. Results are shown with each section of the class starting on a fresh page. The number of questions indicates the highest question number chosen on the master sheet. The number of omitted questions refers to the total number of response circles left blank on the master sheet between the lowest question number and the highest question number. The heading at the top of the page reminds you that in Part 14 final results are based on 100 percent. If you wish to alter the maximum score, you must use Part 6 or Part 7.
Name and rank
Student names are listed in alphabetical order with their ranks based on their standing in the entire class. Results are shown both for unweighted and weighted scores.
Guessing adjustment
You may adjust scores for guessing by deducting a proportion of wrong answers. The available penalties are shown below. In theory, the appropriate deduction is 1/(k1) where k is the number of alternative answers per question.
C = None
CW = All wrong
CW/2 = Half of wrong
CW/3 = Onethird of wrong
CW/4 = Onequarter of wrong
The calculations for Joy are given as examples. She had 2 correct answers and 5 incorrect responses. Note that she failed to answer question 2. All percentages use weighted scores.
C = (2/8) * 100 = 25.0%
CW = ((25) / 8) = 37.5%
CW/2 = ((2  (5/2)) / 8) = 6.3%
CW/3 = ((2  (5/3)) / 8) = 4.2%
CW/4 = ((2  (5/4)) / 8) = 9.4%
Unweighted results
Numbers shown are results without weights.
Weighted scores
These scores are the results when the credit for each correct answer is multiplied by the weight assigned to that question.
Part 5 provides the same information as Part 4 but students are identified only by number. It is suitable for posting on a bulletin board.
Part 6 provides the same information as Part 4 but the maximum score is adjusted to a new base specified on the System sheet. The example shows results from Part 4 adjusted to a maximum score of 60 points.
Part 7 provides the same information as Part 5 but the maximum score is adjusted to a new base specified on the System sheet. The example shows results from Part 5 adjusted to a maximum score of 60 points.
Reports by Topics, see instructions for Including Topics in your Results.
Questions arranged by topic  Student names
This example shows the results for a class of six students who answered an examination with 80 questions arranged in three topics. All results are fictitious and are taken from a different data set than the one used to illustrate the statistics in previous parts.
In Part 8, student names appear on the left edge of the page. One column is shown for each topic. The left side of the display contains unweighted results and the right side contains weighted results. The final two columns show the total raw scores for each student.
The "Points available" row shows the number of points in each topic. In the Unweighted portion of the display, this number is equal to the number of questions assigned to the topic. In the Weighted portion, this number is the weighted value of the questions in each topic.
In the example, the numbers in this row show that 30 questions were assigned to Surgery and that this topic was worth 82 points as a result of weighting. Together, the 80 questions were worth 210 points.
Student Bailey obtained 9 unweighted points and 19 weighted points for Surgery.
The results in Part 8 are converted into percentages.
The raw scores found in Part 8 are adjusted to reflect the maximum values marked in the Identification Number area of the Topic sheet.
Compare the results in Part 10 to the results in Part 8. Surgery is now worth 40 points. Student Bailey now has 12 unweighted points and 9 weighted points in Surgery.
Questions arranged by topic  Student identification numbers
The left edge contains Student Identification Numbers.
The left edge contains Student Identification Numbers.
The left edge contains Student Identification Numbers. Same as Part 10 with student numbers instead of names.
It is important that the instructions for completing a test are clear and complete. To a large extent, the directions depend on how the examiner intends to deal with guessing. Will there be a penalty for incorrect guesses or not? Will guessing be discouraged or encouraged?
The argument for a guessing penalty
Because of the low number of alternative answers in multiplechoice tests, students' performances are probably inflated by guessing. Only true knowledge should be credited. Therefore, the enhancement due to guessing should be subtracted from the obtained scores as follows.
Each multiplechoice item has c alternative answers: 1 correct alternative and c1 wrong alternatives. If the student chooses at random, the probability of a correct guess is 1/c and the probability of an incorrect guess is (c1)/c. On average there will be 1 correct guess for every c1 incorrect guesses. Wrong answers on a test indicate how often the student has guessed incorrectly. From this it can be inferred that the number of correct guesses is the number of wrong answers multiplied by 1/(c1). Thus, an appropriate scoring formula is,
Guessingadjusted score = RIGHTS  WRONGS/(c1)
where
RIGHTS is the number of items answered correctly,
WRONGS is the number of items answered incorrectly and
c is the number of alternative answers per item.
TRS provides unadjusted scores and four adjustments for guessing. Part 4 of the printout displays the students' weighted results with a penalty for each wrong answer of 0, 1, ½, 1/3 and 1/4 suitable for unadjusted scoring, truefalse tests, and multiple choice tests with 3, 4, or 5 alternative answers respectively.
The argument against a guessing penalty
Students do not guess blindly. They are usually able to eliminate one or more alternatives on the basis of partial knowledge. Good students make educated guesses. Thus the assumption on which the correction formula is based (random guessing) is not met. In any event, guessing corrected scores correlate very highly with uncorrected scores. Finally, students find it hard to accept a test score that is less than the number of items they have answered correctly. This engenders negative attitudes toward education.
The argument for discouraging guessing
To advocate guessing is to invite the students to rely on chance rather than learning. This encourages a cynical attitude toward education. Lucky guesses do not represent true knowledge and should be considered as errors of measurement from the moral and accuracy points of view. Therefore, guessing should not be condoned.
The argument against discouraging guessing
Even if the instruction is to refrain from guessing, some students will disobey, putting students who follow directions at a disadvantage. Correct guesses are made on the basis of partial information, not pure luck. In life, many decisions have to be made on the basis of incomplete information. It is appropriate, therefore, that all students be encouraged to make use of whatever information they have at their disposal.
The majority view
Most authorities conclude that the research favours the view that students should be instructed to answer every item (guessing if necessary) and that no penalty for guessing should be imposed.
Whatever the individual examiner decides, it is essential that the test instructions and scoring scheme are compatible. In addition, students should be informed of the scoring system and of the tactics that maximize their scores. Sample instructions for common situations are given below.
Guessing discouraged
If the examiner assumes that students guess randomly, suitable instructions are:
"Choose the single best alternative. If you are sure of the answer to an item, mark it as explained. If you are not sure of the answer, you can either guess or omit the item. If you guess correctly, you will receive 1 point. If you guess incorrectly, you will be penalized by [choose a penalty: 1, ½, 1/3, 1/4] of a point. If you omit the item, you will receive zero points. Because there is a penalty for wrong answers, to maximize your score, mark only the answers you know and avoid making random guesses."
Guessing encouraged with a penalty
If the examiner assumes that students guess on partial information, suitable instructions are:
"Choose the single best alternative. If you are sure of the answer to an item, mark it as explained. If you are not sure of the answer, you can either guess or omit the item. If you guess correctly, you will receive 1 point. If you guess incorrectly, you will be penalized by [choose a penalty: 1, ½, 1/3, 1/4] of a point. If you omit the item, you will receive zero points. Although there is a penalty for wrong answers, there is also the possibility of guessing correctly. Research shows that it is to your advantage to answer every item even if you have no idea about the right answer."
Guessing encouraged with no penalty
If the examiner is unconcerned about guessing, appropriate instructions are:
"Because there is no penalty for wrong answers, it is to your advantage to answer every item even if you have no idea about the right answer."
This appendix provides the details on computing the statistics in the Test Response System. The statistics are summarized below.
Statistics in Test Response System
Course and section  Course only  

Alpha  Part 2  
Count of items  Part 1, Part 2  
Count of students  Part 1, Part 2  
Guessing penalty  Part 4, 5  
Item difficulty  Part 2  
Itemtest correlation  Part 2  
KuderRichardson  Part 2  
Mean score  Parts 3, 4  Part 2 
Response frequencies  Parts 1, 2  
Score: Raw  Parts 3, 4, 8, 11  
Score: Percentage  Parts 4, 5, 9, 12  
Score: New maximum  Parts 6, 7, 10, 13  
Standard deviation  Part 2  
Standard error of measurement  Part 2  
Student rank  Parts 3, 4  
Unweighted results  Part 4  
Variance  Part 2  
Weighted results  Part 4 
Data for the examples are calculated from the example described in the printout guide. The reader will note that the data used in these examples are not the same as the data used to illustrate the Topics examples.
Mean score
In the example, the calculation of the mean is:
where M is the mean score for each section or for the entire class,
N is the number of students, and
X is a student's score.
Standard deviation
The standard deviation is an indication of the range of scores in the test.
where SDt is the standard deviation of the test,
M is the mean of all students' scores,
X is a student's score and
N is the number of students.
In the example, the calculation is:
See Part 2 for more details.
Itemtest correlation and correction (Magnusson, p. 200)
The itemtest correlation is a point biserial correlation coefficient that indicates the extent to which an item measures the same attribute as the test as a whole. The itemtest correlation, r_{it}, is the correlation between item i and scores for the scores for all the items. Correlations of at least +.30 are desirable.
where r_{it} is the itemtest correlation of an item i with the total test score,
M_{c} is the mean score of students who answered the item correctly,
M_{w} is the mean score of students who answered the item incorrectly,
SDt is the standard deviation of the test,
SDi is the standard deviation of the item, pq, where
p is the proportion of students who answered the item correctly, and
q is the proportion of students who answered the item incorrectly
See Part 2 for more details.
Since the item contributes to the total score, it should be removed from the correlation. To correct for inclusion of item ithe following correction is applied (Magnusson p. 212):
where r_{i(ti)} is the corrected itemtest correlation of an item,
SDt is the standard deviation of the test, where
SDi is the standard deviation of an item,
p is the proportion of students who answered the item correctly,
q is the proportion of students who answered the item incorrectly and
r_{it }is the unadjusted itemtest correlation.
The table below contains the data required to calculate the correction for Question 1. This item was answered correctly answered by students Ann, Bob, Cam, Don, Fay, Guy, Hal, Ian and Joy, whose mean score for all items was 4.4. Eve, the only student who answered the item wrongly, received a score of 5.0 for all items.
The calculation for the itemtest correlation for Question 1 is:
The data required to calculate the correction for Question 1.
Quest ion 
Ann  Bob  Cam  Don  Eve  Fay  Guy  Hal  Ian  Joy  Σ x  p  q 

1  9  0.9  0.1  
2  7  0.7  0.3  
3  6  0.6  0.4  
4  5  0.5  0.5  
5  3  0.3  0.7  
6  6  0.6  0.4  
7  7  0.7  0.3  
8  2  0.2  0.8  
Score (X)  7  6  5  5  5  4  4  4  3  2  45  
Deviation (X  M) 
6.25  2.25  0.25  0.25  0.25  0.25  0.25  0.25  2.25  6.25  (X  M)=18.5 
The correction to exclude Question 1 is:
Kuder Richardson 20 measurement of reliability (dichotomous answers) (Magnusson, p. 116)
See Part 2 for more details.
The KuderRichardson Formula 20 (KR20) is an estimate of the test's reliability. It varies between 0.00 and 1.00.
where KR20 is the Kuder Richardson 20 reliability estimate of the test, assuming dichotomous answers (answers containing only two choices, ie.,
correct or incorrect).
k is the number of items in the test,
p is the proportion of students who answered the item correctly,
q is the proportion of students who answered the item incorrectly and
SDt is the standard deviation of the test
The calculation of Kuder Richardson in the example is:
The statistical significance of KR20 is assessed using the F distribution with the formula,
In the example, the computation for F of Kuder Richardson is:
The probability of the F value is obtained from the SAS probf distribution (SAS, p.579), which returns the probability that an observation from an F distribution is less than or equal to the observed numeric random variable. It has the following form,
1  PROBF(x, ndf ,ddf),
where x is the F value of the Kuder Richardson statistic,
ndf is the degrees of freedom in the numerator (number of students in the class  1), and
ddf is the degrees of freedom in the denominator (number of students  1 × number of items  1).
In the example, the result of .3270 is obtained with the following parameters:
F = 1.17
ndf = 10  1 = 9
ddf = (n  1)(k  1) = (10  1)(8  1) = 63.
Under the null hypothesis (KR20=0, no test reliability), the probability of the observed statistic (KR20=.15) is .3270. By convention, only probabilities below .05 are considered significant. Therefore, the data provide no evidence that this examination is reliable.
Raw coefficient alpha (continuous answers) (Nunnaly, p. 214)
See Part 2 for more details.
Cronbach's is a more general measurement of reliability than Kuder Richardson. It applies to a continuous distribution of values and is therefore appropriate for measuring the reliability of attitudes, opinions or behaviour with the fivepoint scale used in the Test Response System. It is confined to students who answered all the items. Because of these computational differences, and KR20 are not directly comparable in TRS reports.
There are two ways to express Cronbach's . In the first, perfect reliability is given the value 1 and the error component is the ratio between the sum of the item variances and the variance of the student scores.
where a is Cronbach's a,
k is the number of items on the test,
SDi is the standard deviation of an item,
SDt is the standard deviation of the test
A second approach is equivalent to the first but expresses the relation as a ratio of two ratios comprised of the variance and covariances of the scores.
where a is Cronbach's a ,
k is the number of items on the test,
cov is the average covariance, and var is the average variance.
The computation is confined to students who answered all items. As a result, it is difficult to compare the results between Kuder Richardson and Cronbach's alpha unless we use an artificial example. Only four students answered all questions  Ann, Cam, Don and Fay.
Notice that the data, shown in the table below, represent values on a scale of 1 to 5, unlike a binary correctincorrect dichotomy used to assess reliability of academic performance in the Kuder Richardson example.
Data, shown in the table represent values on a scale of 1 to 5
Q1  Q2  Q3  Q4  Q5  Q6  Q7  Q8  Total  

Ann  1  2  3  4  5  1  2  1  19 
Cam  1  2  5  4  5  5  2  2  26 
Don  1  2  3  4  1  3  2  4  20 
Fay  1  2  3  3  4  1  3  5  22 
Test variance SDt 
9.58  
Item variance SDi 
0.00  0.00  1.00  0.25  3.58  3.67  0.25  3.33  12.08 
The computation using the ratio of the sum of the item variances to the test variance is:
The second method, mathematically equivalent to the method shown above, uses. a ratio of variance and covariance. Covariance is the mean product of the variation about the mean between each pair of items. Covariance is calculated by dividing the total covariation by N1, where N is the number of students who answered all items on the test. An example of covariance is shown below for items 3 and 4:
Data, shown in the table uses a ratio of variance and covariance
Q3 X 
Variation (XX̅) 
Q4 Y 
Variation (YY̅) 
Covariation (XX̅)(YY̅) 


Ann  3  0.5  4  0.25  0.125 
Cam  5  1.5  4  0.25  0.375 
Don  3  0.5  4  0.25  0.125 
Fay  3  0.5  3  0.75  0.375 
Mean (X̅ )  3.5  3.75  
Sum of squared variation  3  0.75  
Variance=Sum of squared variation ÷(N1)  1  0.25  
Total covariation (XX̅) (YY̅)  0.5 

Covariance (XX̅) (YY̅) /(N1)  0.1667 
Covariance matrix for all questions in the example
Q2  Q3  Q4  Q5  Q6  Q7  Q8  

Q1  0  0  0  0  0  0  0 
Q2  0  0  0  0  0  0  
Q3  0.1667  0.8333  1.6667  0.1667  0.6667  
Q4  0.0833  0.5  0.25  0.6667  
Q5  0.1667  0.0833  2  
Q6  0.5  0.6667  
Q7  0.6667 
The mean covariance (28 cells) is .0446. Coefficient alpha also requires mean variance of the items. This value is derived from the total squared deviation for each item, divided by N1, whose mean for the 8 items is 1.5104. The calculation in the example is:
The negative coefficient is an anomaly arising from an extremely small sample and artificial data.
The F value for is obtained from the formula,
In the example, the computation for F of is:
The statistical significance of is assessed with the F distribution (as for KR20).
The probability of the F value is obtained from the SAS probf distribution (SAS, p.579), which returns the probability that an observation from an F distribution is less than or equal to the observed numeric random variable. It has the following form,
1  PROBF(x, ndf ,ddf),
where x is the raw coefficient alpha,
ndf is the degrees of freedom in the numerator (number of students in the class  1), and
ddf is the degrees of freedom in the denominator (number of students  1 × number of items  1).
In the example, the result of .6439 is obtained with the following parameters:
F = .7703
ndf = 10  1 = 9
ddf = (n  1)(k  1) = (10  1)(8  1) = 63.
There is a high probability (.6439) that sampling error could account for the obtained . We conclude that does not differ significantly from zero and that the examination is not reliable.
Standard error of measurement
See Part 2 for more details.
The standard error of measurement (SE) is an estimate of the error component in a student's score due to imperfections in the test from illness, distractions, and fatigue.
where SE is the standard error of measurement,
SDt is the standard deviation of the test, and
Reliability coefficient is Kuder Richardson 20 or
The calculation in the example is:
Teachers who want to use the Test Response System to improve their tests can capitalize on its diagnostic features to improve the fairness and accuracy of their tests. Using a worked example, this appendix describes the specific numerical criteria for enhancing aptness, simplicity, and mutual compatibility of questions. We wish to thank Gordon Emslie and Judith Emslie for writing the paper which forms this appendix.
PDF fileImproving Classroom MultipleChoice Tests: A Worked Example Using Statistical Criteria
 Cronbach, L. J. (1951), Coefficient alpha and the internal structure of tests. Psychometrica, 16, 297334.
 Cucchiara, A. J., Kenny, S. J., & Costiloe, J.P. (1990), The distribution of Cronbach's coefficient alpha from the CORR procedure of SAS software. SUGI Conference Proceedings, 15.
 Diamond, J. & Evans, W. (1973), The correction for guessing. Review of Educational Research, 43, 181191.
 Emslie, G. R. & Emslie, J. R. (1992). Testing the test: Using the Ryerson Test Response System to assess the quality of educational measurement.
 Ferguson, George A. and Yoshi Takane (1989), Statistical Analysis in Psychology and Education, Sixth edition, Toronto: McGrawHill Book Company.
 Klein, W. J. & Emslie, Gordon (1991), Psychometric applications with optical scanning and analysis. Paper presented at SAS Users Group International 16, New Orleans.
 Lord, F. M.(1952), A theory of test scores. Psychometric Monograph, No. 7.
 Magnusson, David (1966), Test Theory, Don Mills: AddisonWesley.
 Nunnally, J. (1978), Psychometric Theory, Second edition, Toronto:McGrawHill.