BIOSTATISTICS:     PRACTICE PROBLEM solutions     


     These are the solutions to the set of questions [1-56] for ENH440 as well as additional questions 56-78 which appear on THIS page below the answers.  Just keep scrolling.)  For most questions the simple numeric answers have been given; you will need to supply the interpretation for each situation.   Some of the more complex questions will also link to a more detailed solution. 


     EXACT 'P' RESULTS:  In some cases I have given the P value you should have extracted using the tables. In other cases an "exact" P value from a computerized calculation has been given to illustrate the interpretation of such a value. For example, if P is shown as = 0.0347 this still agrees with P < 0.05   but not with   P < 0.01.   Likewise, P = 0.0017 is clearly < 0.005 but not < 0.001.   We WILL be calculating ONE example of an 'exact P' as part of the "Fisher's Exact Test".



1 OR: 1.94,  1.39,  1.19,  3.26 (CHI-SQ=4.06, P=0.04),  0.63  detailed solution
2    t = 1.2905,  248 df,   P > 0.05    detailed solution
3   only #4
4/5   F(treat) = 2.385,   F(age) = 5.474    F(Tr x Age) =  0.987      detailed solution:
6  OR: 5.00, P=0.215, (FET)
7    y' = 150.252 - 1.423(x),  t= 11.917
8    CL: +0.2514, -0.2194  t=0.145,   328.8
9 River rafting ...  (F.Exact.T)   P= 0.0035      Detailed solution
10  y'= 0.72 + 0.27(x), y=2.88 cm, t=20.0  NEW SCATTERPLOT to illustrate this solution 
11   t=2.429  P<0.05,  76,  127,  178 μgPb/L  Detailed solution
13   46 df,  P < 0.05
14   7.512  P<0.01,   4.814   P<0.05,     1.951   P>0.05
15   t = 3.015,  8 df,  P < 0.02    detailed solution
16   t = 3.953,  38 df,  P < 0.001
17   t = 1.272,  21df   detailed solution
18  t=2.494, 18 df
19  t=3.993, 5df
20   chi-sq = 22.1  detailed solution
21  t (age) = 1.62, 58 df, P>0.05;     t (weight )= 0.74, 58 df, P>0.05;      t (bp) = 6.04, 58 df, P<0.001
22    t = 0.5606,  11 df,   P = 0.59
23    t = 0.7845,  4 df,  P = 0.527 (this is paired)
24    t = 0.6978,  11 df,  P > 0.05       detailed solution
25    t = 1.736,   16 df,  P >0.05
26  Ho: "No difference between Nur and Eng students in terms of mean GPA"   P < 0.05
27   P > 0.05
28   chi-sq    3.525    P = 0.06    OR = 2.85
29   chi-sq    6.764    1 df,   P < 0.01   RR = 3.3 (protective)
30    t = 4.202, 18 df,  P < 0.001       detailed solution
31       t = 2.583,   22 df,   P < 0.02     detailed solution
32   (individual calculation)
33   ms(species) = 73.44,   F=4.54 ,  p>0.05        MS CHEM= 13.083,  F = 0.808   p>0.05     
35  chi-sq = 7.18, 2df, P<0.05    (Actually P=0.0276)
36  F = 0.693,  5.403,   6.500
37  This is a paired t-test but requires a log transform because the data are exponential     t = 1.109, P=0.3306 
38   MPchi-sq:0.93, P:0.335;   HPchi-sq:7.10, P:0.0077(prot);    ESFET: P:0.278
39 (1) RH r=0.75, y=0.654+0.0892, t=2.971, P<0.05;   (2)  VEL: r=0.74, y=-0.510+0.0884(x); t=2.879, P<0.05   (3) CORR betw RH&Vel: r=0.22
40  please change 6.0 to 60, and 7.0 to 70]   Now this is correct:   Pb = 233.4 - 31.73(pH)  = 233.4 - 31.73(pH)
41  chi-sq: 4.43,   P+0.035,  OR 3.18,  CL: 0.94 to11.15
42    t = 3.315,  7 df,   P = 0.0105 
43  F=10.939,  2,12 df, P=0.002301     detailed solution
44  (chi-sq goodness of fit)(chi-sq goodness of fit)   (You don't need this for the exam)
45  F(freq):8.44, P<0.01  F(size): 11.25, P<0.01   F(Freq x Size): 4.85, P<0.05  see detailed solution here
46  b= 0.1429,    CL: 0.1074 to 0.1783,   t = 34.65,  4df,    P <0.001 
47  d= 2.688,  t= 9.622,   7df,  P <0.001


 (a) 0.13, (b) 0.81, (c) 0.61, (d) 0.18, (e) >.05, <.002,  >.05,  >.05  (f) best return in 2 yrs   detailed solution 

49   inverse,   r=0.58,  10 df,  P<0.05
50 Fisher's Exact Test      P = 0.0204  
51  đ = 7.4,   t = 2.608,   P = 0.025
52  F= 6.17;  P<0.01       Detailed solution   
53  Completed in class.   t(unpaired) = 1.077   t(paired) = 8.624  (The correct method is paired)  detailed solution
54 (a) X² 0.47, P=0.49;    (b) X² 17.30, (prot)  P=0.000032,    (c) X² 9.49, P=0.00207
55  (a) X² 0.08, P=0.78;    (b) X² 0.82, P=0.365,    (c) (error:please omit)
56  X² 6.94,  P=0.0084 (or "P<0.01");    OR: 2.37       
Page 27

 [A]   F/ratios  =  6.37,   22.96,   0.91;    detailed solution

 [B]  F/ratios   =  15.79,   5.85,   63.16     detailed solution

ANSWERS TO EXTRA QUESTION SET #57- 81  (Scroll down for these questions)

57  OR: 1.0, 1.17, 0.63, 2.00, 5.44  X?11.1, P=0.0009   (Scroll down for question)
58  (c) at  250M, Y= 480 ppm,  at  500M, Y= 182 ppm,  (d)  t=9.88     (Scroll down for question)  please note the changes here
59  (a) >0.05    (b) no   (d) 26                         (Scroll down for question)
60  t = 9.7,  98 df,   ss  P<0.001                                       (Scroll down for question)
61  RR=4.02   ChiSq: 18.9,  1 df    P=0.000014  (P<0.001)   Exposed personnel >4x as likely to develop lung ca.  reject Ho.      (Scroll down for question) 
62 Only expected values are important here.  2 of 9 is 22%, so >20% cells have E values 5 or less.  ChiSq not valid
63  lead(ppm) = 724 - 0.899 M    6df,   t = 9.90,   P <0.001     (inverse rel.  stat sig.)        (Scroll down for question)
64 removed. (essentially the same as #61)
65  HC% = 47.19 - 0.3695 (ppm LEAD).  t = 13.5,   11df,    P <.001,   r= - 0.97,  r2 = 0.94    (Scroll down for question)
70A     SOLUTION: We have here TWO variables. The independent variable is the 'treatment' and is a categorical variable with two levels (a hand barrier cream either antiseptic or not).  The dependent variable is presence or absence of E. coli, clearly a categorical variable, with two levels.  So the data will be displayed as a 2x2 table.  Analysis would be by ChiSquare. (Of course odds ratio would be useful for explaining the strength and direction of the association). 
70B         SOLUTION: The independent variable is the same but the dependent variable is not continuous.  This is a candidate for either 1-way ANOVA or unpaired t-test, either of which would be applicable.
70C         SOLUTION: This introduces a second independent variable, also categorical, with two levels.  If the dependent variable remains as in 70B (continuous), then this is a 2-way ANOVA, and with 240 workers, there will obviously be a number in each group, allowing the "factorial design with interaction".
70D         SOLUTION:   Now this takes on a complicated arrangement.   If the arrangement in 70B is used, we have a pre-post test (paired t-test) with all 240 people tested before and after using the antiseptic hand cream, thus 240 pairs of data, (239df).     But if 70D is used in this way we have TWO independent variables, and the solution is beyond the scope of this course, but might include Raw foods pre-post and Cooked foods pre-post.  A t-test in either case would be used.
71         SOLUTION:  The data would be appropriately analysed by means of the paired t-test.  Note that the 'pairing' is taking place on the water samples, each of which is being assessed using BOTH tests.  Thus the 15 water samples with known lead content produces 30 separate results, or 15 pairs.   (14 df)  
72         SOLUTION: t-test for unpaired data.  (18 df) 
73         SOLUTION: Chi-square analysis in 2x2 contingency table.  True relative risk is appropriate here because you DO have the true incidence data. The two exposure groups were all healthy at the start of the study 30 years ago, and have been followed.  Therefore you have the incidence data.  RR = Ie/Io or Incidence rate for exposed group over the Incidence rate for non-exposed group.
74  SOLUTION: Two Variables: Ind.var is categorical with 3 levels.  Dep.var is continuous.   1-way ANOVA
75   SOLUTION: first Ind.var is categorical (3 levels), second Ind.var is also categorical with three levels, Dep.var. is continuous.  Two-way ANOVA.  Block design if only one obs per group, or Factorial if >1 obs per group
76  SOLUTION: Chi-square analysis in 3x3 contingency table.  But things can get complex.  If we do this and just have the total count in each cell, we get a test of the relationship between haz types and training groups.  (No pass/fail)  So we may stratify - and that is beyond the scope of this course
77   SOLUTION: Chi-square analysis in 2x3 contingency table.  
78  Click here for detailed solution
79  F= 23.53,   2, 33 df,    P<0.01 detailed solution   
80  F= 3.190,   2,12 df,    P>0.05  detailed solution
81  detailed solution
57.  You are investigating risk factors among farm workers for contracting leptospirosis. A group of 30 patients has been identified, as well as a group of 40 non-leptospirosis controls.  The following data are the results from enquiries about possible exposures in the previous three months.  Data shown are the number stating "yes" to each exposure:
Have you handled wild animals?
Do you have a mice infestation?
Have you visited a zoo?
Have you handled garden soil?
Have you repaired sewer pipes or drains?
58.  Data: lead (in ppm) in soil samples (Y) measured at a distance (X) metres from the smoke stack of a lead processing plant.  The regression coefficient is  -1.1927; the standard error of the regression coeff. is 0.119488;   estimated lead concentration at the base of the stack is 778.05 ppm.  (a) Plot the data on a scattergram; (b) show the least-squares line; (c) predict the lead in the soil at 250M and 500M distance from the stack; (d) test the null hypothesis of no association; (e) clearly summarize. (Note: you do NOT have to calculate the parameters from the original data.
lead(ppm) y
distance(M) x
59 The results of an investigation into the cadmium content in the blood of children from two areas (A and B) concludes with a statement that the (mean of A) minus (mean of B) was 8.3 µg Cd/100ml blood; 24 df, t=2.01.  (a) What is the probability that a mean difference of this amount could have been observed in these two sample groups if there was really no difference between the two areas in terms of children's blood-cadmium?   (b) Is this difference statistically significant at the 5% rejection level?  (c)  Describe clearly what you understand from the results;  (d) how many children participated in the study?  
60.  In correlating haemoglobin levels with estimated exposure to toxic solvents among 100 victims of a chemical spill, the linear correlation coefficient is found to be -0.70    Explain this relationship in detail, including a test of Ho that rho=zero.  
61 N=520 men who worked at a Northern Ontario asbestos mill between 1950 and 1970 have been identified through employment records and medical records, and their health/survival status since then is compared with paper mill workers in the same region during the same period.  It is found that of the asbestos workers, 25 have developed (or died from) lung cancer. Of 1005 paper mill workers, 12 have been diagnosed (or died from) the disease.  (a) Calculate an appropriate risk measurement, and (b) explain what this measurement means.
62.  Three cells in a 3x3 contingency table have observed values less than 5, and two of them have expected values less than 5.   (a) Is chi-square analysis appropriate here?  (b)  Why or why not?
63. The effect of distance from an incinerator smoke stack is being investigated. The following data show the lead content in the soil at various distances downwind from the base of the stack.  (A) construct a scattergram showing the data and the least-squares line.   (B) From your regression model (the equation, not the scattergram) estimate the lead in soil at 650M and 400M.   (C) test the null hypothesis that ?zero.    (D) summarize fully.
Distance (M)
Lead in soil (ppm)
65. The following data describe the effect of blood lead on haematocrit (packed red blood cells as % of whole blood). Calculate correlation and regression coefficients.   Plot the data and the least-squares line on the scattergram.  Test the hypotheses (2) that beta=zero and rho=zero
lead µg/dl


70.  What method would you use for these? (70 a-b-c-d-)


70A... A study of effectiveness of new skin antiseptic barrier cream to be used in food manufacturing.  Two hundred & forty workers are recruited and randomized into two groups - one group to receive the antiseptic cream, and the other group to receive a similar-looking product without antiseptic action.  Each day a hand swab test is taken from the workers and analyzed for E. coli  which are reported as "present" or "absent"


70B.... The above study but the outcome (E.coli) is required as a count.

70C...Someone has suggested that those people working with raw foods would be exposed to a much greater bacterial load, so a further division is made between "raw" and "cooked" foods.  How would this change the analysis?

70D....As an alternative to the basic study in (70C) above, the investigators contemplate taking all 240 workers and letting them work for a month (with hand swabs every day) before they are asked to use the antiseptic hand cream daily.  They are then swabbed daily for another month.  What is the statistical analysis you would recommend here?



71... A filter system for removing lead from the water supply is being compared with a conventional filter system.  Fifteen water samples with known (but different) concentrations of lead are passed through each of the filters, and the resulting filtrate is examined for lead content.  There are 30 samples tested in this way.  What method of analysis would you use here?


72....The number of new cases of diarrhoea in infants in a six month period is the outcome measure used in a study of the effectiveness (if any) of a strict double-handwash procedure after changing diapers.  Ten centres are selected and invited to participate in the trial, along with another 10 which will undertake normal handwashing practices.


73...  In 1975, during the replacement of fuel rods at a nuclear plant, 32 workers were accidentally exposed to radiation for about 4 hours.  They have been followed and their health outcomes compared to a group of 48 welders and pipe fitters in a conventional power station.  After thirty years, the 8 of the nuclear plant workers and 9 of the conventional plant works have been diagnosed with some form of cancer.  What is the appropriate risk measurement?  What is the method of analysis to be used?


74    Three methods of training ESL workers  on WHMIS procedures are being compared: lecture, power-point, and animated cartoon.  The outcome is a knowledge test score out of 40.   What method of analysis is to be used here?


75    Same three methods of WHMIS training but this time someone suggests the type of work may play am important part.  So the workers are divided into low hazard, medium hazard, and high hazard jobs. What method of analysis is to be used here? ?/fon

76    Same project comparison between three WHMIS training groups x three types of hazard with which the people are working, but this time the dependent var is just ‘pass/fail' the standard test. What method of analysis is to be used here?





77      A herbal treatment for contact dermatitis is being evaluated against the conventional treatment of corticosteroids and emollient creams. The outcome after 7 days is recorded as improved? unchanged? or worse? What method of analysis is to be used here?


78        You are investigating the extent to which INDOOR air quality (IAQ) in office buildings without openable windows compares to air quality in similar buildings equipped with windows that open for 5 hours/day. Measurements are taken in three types of EXTERNAL air quality (EAQ). The dependent variable is the count of suspended particles (10 microns or less) per 10 cc. Complete the ANOVA table and summarize.

N=10 N=10 N=10 N=10 N=10 N=10
MEAN=49.6 MEAN=36.3 MEAN=49.4 MEAN=45.6 MEAN=49.7 MEAN=54.5
 source SS df MS F P
 all groups 1842.53        
 EAQ 810.13        
 wind O/C 240.00        





# 79   

Three methods of teaching infection control are being compared using three groups of students selected at random.  Their scores out of 20 are shown.  Is there a significant difference between the means of each group?   Explain clearly.


Method A:

Method B

Method C


5.0     9.0     5.0     8.0     5.0     7.0    6.0    7.0    6.0    6.0    6.0    7.0  

12.0   6.0    11.0    7.0    10.0    7.0    10.0    8.0    10.0    9.0    9.0    9.0  

3.0    3.0    8.0     7.0     4.0     4.0     6.0     4.0     5.0     5.0     4.0     4.0  




# 80   

The pH of three types of canned tomatoes are being compared using the pH readings are shown.  Is there a significant difference between the means of each group?   Explain clearly.


Method 1:

Method 2

Method 3


1.9,   2.4,   2.5,    2.6,  3.0

2.0,  2.1,   3.0,   3.1,   3.3

2.8,   2.9,   3.3,   3.3,   3.8




81.  In the following ANOVA analysis, two systems of purifying water (‘old’ and ‘new’) are being compared, with the pH of the water also being studied. The dependent variable is the ppm of the contaminant (least is better). Twenty-eight separate measurements have been made. Note that you do not have the original data, just the means, total, and N for each group.  The ANOVA table has been completed and is shown below.  




Low pH

High pH

Low pH

High pH























ALL GRPS          
EQ X pH          
TOTAL 1850.11        


 [a]   Enter the missing values in the ANOVA table above                     

 [b]   Now select the single correct statement that summarizes the analysis                                                            

            A. Old equipm and new equipm perform equally in low pH situations 

            B. New equipm is not affected by pH, but performs consistently better than the old

            C. In high pH older equipm performs better, but in low pH new equipm is better 

            D. Old equipm and new equipm perform equally in high pH situations 

            E.  The performance of the old equipment is not affected by pH