 
BIOSTATISTICS:
PRACTICE PROBLEM solutions
RETURN TO ENH440 SPECIAL PAGE
These are the solutions to
the set of questions [156]
for ENH440
as well as additional questions 5678 which appear on THIS page below the answers. Just keep scrolling.) For
most questions the
simple numeric answers have been given; you will need to supply the interpretation
for each situation. Some of the more complex
questions will also link to a more detailed solution.
EXACT 'P' RESULTS: In some cases I have
given the P value you should have extracted using the
tables. In other cases an "exact" P value from a computerized calculation
has been given to illustrate the interpretation of such a value.
For example,
if P is shown as
= 0.0347 this still agrees with P
< 0.05 but not with
P < 0.01.
Likewise, P
= 0.0017 is clearly < 0.005 but not < 0.001.
We WILL be calculating ONE example of an 'exact P' as part of the "Fisher's Exact Test".

1 
OR: 1.94,
1.39, 1.19, 3.26 (CHISQ=4.06,
P=0.04), 0.63
detailed
solution 
2 
t = 1.2905, 248 df, P > 0.05
detailed solution 
3 
only #4 
4/5 
F(treat)
= 2.385, F(age)
= 5.474 F(Tr
x Age) =
0.987
detailed solution: 
6 
OR:
5.00, P=0.215, (FET) 
7 
y'
= 150.252  1.423(x), t= 11.917 
8 
CL: +0.2514, 0.2194 t=0.145, 328.8

9 
River rafting ... (F.Exact.T) P=
0.0035
Detailed
solution 
10 
y'= 0.72 + 0.27(x), y=2.88 cm, t=20.0
NEW SCATTERPLOT to
illustrate this solution

11 
t=2.429 P<0.05, 76, 127, 178 µgPb/L
Detailed solution 
12 

13 
46 df, P < 0.05 
14 
7.512 P<0.01, 4.814 P<0.05,
1.951 P>0.05 
15 
t = 3.015, 8 df, P < 0.02
detailed
solution 
16 
t = 3.953, 38 df, P < 0.001 
17 
t = 1.272,
21df
detailed solution

18 
t=2.494, 18 df 
19 
t=3.993, 5df 
20 
chisq = 22.1
detailed
solution 
21 
t (age)
= 1.62,
58 df, P>0.05; t
(weight )= 0.74, 58 df, P>0.05;
t (bp) =
6.04, 58 df, P<0.001 
22 
t = 0.5606, 11 df, P
= 0.59 
23 
t = 0.7845, 4 df, P = 0.527
(this is paired) 
24 
t = 0.6978,
11 df, P > 0.05
detailed solution

25 
t = 1.736, 16 df,
P >0.05

26 
Ho: "No
difference between Nur and Eng students in terms of mean GPA" P < 0.05 
27 
P > 0.05 
28 
chisq
3.525 P = 0.06 OR = 2.85 
29 
chisq
6.764 1 df, P < 0.01 RR = 3.3 (protective) 
30 
t = 4.202, 18 df, P < 0.001
detailed solution 
31 
t = 2.583, 22 df,
P < 0.02
detailed solution

32 
(individual
calculation) 
33 
ms(species) =
73.44, F=4.54 ,
p>0.05 MS
CHEM= 13.083,
F = 0.808 p>0.05

34 

35 
chisq
= 7.18,
2df, P<0.05 (Actually P=0.0276) 
36 
F =
0.693, 5.403, 6.500 
37 
This
is a paired ttest but requires a log transform because
the data are exponential t
= 1.109, P=0.3306 
38 
MPchisq:0.93,
P:0.335; HPchisq:7.10, P:0.0077(prot);
ESFET:
P:0.278 
39 
(1) RH r=0.75, y=0.654+0.0892, t=2.971, P<0.05;
(2)
VEL:
r=0.74, y=0.510+0.0884(x); t=2.879, P<0.05
(3) CORR betw RH&Vel: r=0.22 
40 
please
change 6.0
to 60, and 7.0 to 70] Now this is correct: Pb = 233.4  31.73(pH)
= 233.4  31.73(pH) 
41 
chisq: 4.43,
P+0.035, OR 3.18, CL: 0.94 to11.15 
42 
t = 3.315, 7 df, P = 0.0105 
43 
F=10.939, 2,12
df, P=0.002301
detailed solution 
44 
(chisq goodness of
fit)(chisq goodness of
fit) (You don't need this for the exam) 
45 
F(freq):8.44, P<0.01 F(size): 11.25, P<0.01
F(Freq x Size): 4.85, P<0.05
see detailed solution here 
46 
b= ―0.1429,
CL: ―0.1074
to ―0.1783, t
= 34.65, 4df,
P <0.001 
47 
d= 2.688,
t= 9.622, 7df, P
<0.001 
48 
(a) 0.13, (b) 0.81, (c) 0.61, (d) 0.18, (e) >.05,
<.002, >.05, >.05 (f) best return in 2 yrs
detailed solution

49 
inverse,
r=0.58, 10 df, P<0.05 
50 
Fisher's Exact Test P
= 0.0204 
51 
đ = 7.4,
t = 2.608, P = 0.025 
52 
F=
6.17; P<0.01
Detailed solution

53 
Completed in class. t(unpaired) =
1.077 t(paired) = 8.624
(The correct method is paired) detailed
solution 
54 
(a) X² 0.47,
P=0.49; (b) X² 17.30, (prot) P=0.000032,
(c) X²
9.49, P=0.00207 
55 
(a) X²
0.08,
P=0.78; (b) X²
0.82, P=0.365, (c) (error:please
omit) 
56 
X² 6.94,
P=0.0084 (or "P<0.01"); OR:
2.37

ANSWERS TO EXTRA QUESTION SET #57
81
(Scroll down for these questions)
57 
OR: 1.0, 1.17, 0.63, 2.00, 5.44 X?11.1, P=0.0009 (Scroll down for question) 
58 
(c)
at
250M, Y= 480 ppm, at 500M, Y= 182 ppm, (d) t=9.88 (Scroll down for question)
please note the changes here

59 
(a)
>0.05 (b) no (d) 26 (Scroll down for question) 
60 
t = 9.7, 98 df, ss P<0.001
(Scroll down for question) 
61 
RR=4.02 ChiSq: 18.9,
1 df P=0.000014
(P<0.001) Exposed
personnel >4x as likely to develop lung ca. reject Ho.
(Scroll down for question) 
62 
Only expected values are important here. 2 of 9 is
22%, so >20% cells have E values 5 or less. ChiSq not valid 
63 
lead(ppm) =
724
 0.899 M 6df, t
= 9.90, P
<0.001
(inverse rel. stat sig.) (Scroll down for question) 
64 
removed. (essentially the
same as #61) 
65 
HC%
= 47.19  0.3695 (ppm LEAD). t =
13.5, 11df,
P <.001,
r=  0.97,
r2 = 0.94 (Scroll down for question) 
70A 
SOLUTION: We have here TWO variables. The independent variable is the
'treatment' and is a categorical variable with two levels (a hand barrier
cream either antiseptic or not). The dependent variable is presence or
absence of E. coli, clearly a categorical variable, with two levels. So the
data will be displayed as a 2x2 table. Analysis would be by ChiSquare. (Of
course odds ratio would be useful for explaining the strength and direction of the
association). 
70B 
SOLUTION:
The independent variable is the same but the dependent variable is not
continuous. This is a candidate for either 1way ANOVA or unpaired
ttest, either of which would be applicable. 
70C 
SOLUTION: This introduces a second independent variable, also categorical,
with two levels. If the dependent variable remains as in 70B
(continuous), then this is a 2way ANOVA, and with 240 workers, there will
obviously be a number in each group, allowing the "factorial design with
interaction". 
70D 
SOLUTION: Now this takes on a complicated arrangement.
If the arrangement in 70B is used, we have a prepost test (paired ttest) with
all 240 people tested before and after using the antiseptic hand cream, thus 240
pairs of data, (239df). But if 70D is used in this way
we have TWO independent variables, and the solution is beyond the scope of this
course, but might include Raw foods prepost and Cooked foods prepost. A
ttest in either case would be used. 
71 
SOLUTION: The data would be appropriately analysed
by means of the paired ttest. Note that the 'pairing' is taking place on
the water samples, each of which is being assessed using BOTH tests. Thus
the 15 water samples with known lead content produces 30 separate results, or 15
pairs. (14 df)

72 
SOLUTION: ttest for unpaired data. (18 df)

73 
SOLUTION: Chisquare analysis in 2x2 contingency table.
True relative risk is appropriate here because you DO have the true incidence
data. The two exposure groups were all healthy at the start of the study 30
years ago, and have been followed. Therefore you have the incidence data.
RR = Ie/Io or Incidence rate for exposed group over the Incidence rate for nonexposed group. 
74 
SOLUTION: Two Variables: Ind.var is categorical with 3
levels. Dep.var is continuous. 1way ANOVA 
75 
SOLUTION: first Ind.var is categorical (3 levels),
second Ind.var is also categorical with three levels, Dep.var. is
continuous. Twoway ANOVA. Block design if only one obs per
group, or Factorial if >1 obs per group 
76 
SOLUTION: Chisquare analysis in 3x3 contingency
table. But things can get complex. If we do this and just have
the total count in each cell, we get a test of the relationship between haz
types and training groups. (No pass/fail) So we may stratify 
and that is beyond the scope of this course 
77 
SOLUTION: Chisquare analysis in
2x3 contingency table.

78 
Click
here for detailed solution 
79 
F= 23.53, 2, 33 df, P<0.01
detailed solution 
80 
F= 3.190, 2,12 df, P>0.05
detailed solution

81 
detailed
solution 
ˇ
EXTRA QUESTION SET [ #5780 ]
57. You are investigating risk factors among farm workers
for contracting leptospirosis. A group of 30 patients has been identified, as well as a
group of 40 nonleptospirosis controls. The following data are the results from
enquiries about possible exposures in the previous three months. Data shown are the
number stating "yes" to each exposure:
(Q57)

Have you
handled wild animals?

Do you have a
mice infestation?

Have you
visited a zoo?

Have you
handled garden soil?

Have you
repaired sewer pipes or drains?

cases

6

10

4

20

21

controls

8

12

6

20

12

58. Data: lead (in ppm) in soil samples (Y) measured at a
distance (X) metres from the smoke stack of a lead processing plant. The regression
coefficient is 1.1927; the standard error of the regression coeff. is 0.119488;
estimated lead concentration at the base of the stack is 778.05 ppm. (a) Plot
the data on a scattergram; (b) show the leastsquares line; (c) predict the lead in the
soil at 250M and 500M distance from the stack; (d) test the null hypothesis of no
association; (e) clearly summarize. (Note: you do NOT have to calculate the parameters
from the original data.
lead(ppm)
y

40

510

330

160

700

610

220

440

distance(M)
x

650

180

405

510

90

190

380

290

59
The results of an investigation into the cadmium content in
the blood of children from two areas (A and B) concludes with a statement that the (mean
of A) minus (mean of B) was 8.3 µg Cd/100ml blood; 24 df, t=2.01. (a) What is the
probability that a mean difference of this amount could have been observed in these two
sample groups if there was really no difference between the two areas in terms of
children's bloodcadmium? (b) Is this difference statistically significant at
the 5% rejection level? (c) Describe clearly what you understand from the
results; (d) how many children participated in the study?
60. In correlating haemoglobin levels with estimated
exposure to toxic solvents among 100 victims of a chemical spill, the linear correlation
coefficient is found to be 0.70 Explain this relationship in detail,
including a test of Ho that rho=zero.
61. N=520 men who worked at a Northern Ontario asbestos
mill between 1950 and 1970 have been identified through employment records and medical
records, and their health/survival status since then is compared with paper mill workers
in the same region during the same period. It is found that of the asbestos workers,
25 have developed (or died from) lung cancer. Of 1005 paper mill workers, 12 have been
diagnosed (or died from) the disease. (a) Calculate an appropriate risk measurement,
and (b) explain what this measurement means.
62. Three cells in a 3x3 contingency table have observed
values less than 5, and two of them have expected values less than 5. (a) Is
chisquare analysis appropriate here? (b) Why or why not?
63. The effect of distance from an incinerator smoke stack is being investigated. The
following data show the lead content in the soil at various distances downwind from the
base of the stack. (A) construct a scattergram showing the data and the
leastsquares line. (B) From your regression model (the equation, not the
scattergram) estimate the lead in soil at 650M and 400M. (C) test the null
hypothesis that ?zero. (D) summarize fully.
Distance (M)

100

180

220

340

430

500

610

700

Lead in soil
(ppm)

680

580

450

420

380

205

200

110

65. The following data describe the effect of blood lead on haematocrit (packed red
blood cells as % of whole blood). Calculate correlation and regression coefficients.
Plot the data and the leastsquares line on the scattergram. Test the
hypotheses (2) that beta=zero and rho=zero
lead µg/dl

5

11

13

18

21

27

32

38

41

44

50

58

60

H.crit%

45

43

44

39

41

35

37

33

29

32

31

26

24

70.
What method would you use for these?
(70 abcd)
70A... A study of
effectiveness of new skin antiseptic barrier cream to be used in food
manufacturing. Two hundred & forty workers are recruited and randomized
into two groups  one group to receive the antiseptic cream, and the other group
to receive a similarlooking product without antiseptic action. Each day a
hand swab test is taken from the workers and analyzed for
E. coli which are reported as "present" or "absent"
70B.... The above study
but the outcome (E.coli) is required
as a count.
70C...Someone has suggested that those people working with raw foods would be exposed to
a much greater bacterial load, so a further division is made between "raw" and
"cooked" foods. How would this change the analysis?
70D....As an alternative to the basic study in (70C) above, the investigators contemplate
taking all 240 workers and letting them work for a month (with hand swabs every day)
before they are asked to use the antiseptic hand cream daily. They are then swabbed daily
for another month. What is the statistical analysis you would recommend here?
71... A filter system for removing lead from the water supply is being compared with a
conventional filter system. Fifteen water samples with known (but different)
concentrations of lead are passed through each of the filters, and the resulting filtrate
is examined for lead content. There are 30 samples tested in this way. What
method of analysis would you use here?
72....The number of new cases of diarrhoea in infants in a six month period is the
outcome measure used in a study of the effectiveness (if any) of a strict doublehandwash
procedure after changing diapers. Ten centres are selected and invited to
participate in the trial, along with another 10 which will undertake normal handwashing
practices.
73... In 1975, during
the replacement of fuel rods at a nuclear plant, 32 workers were accidentally
exposed to radiation for about 4 hours. They have been followed and their
health outcomes compared to a group of 48 welders and pipe fitters in a
conventional power station. After thirty years, the 8 of the nuclear plant
workers and 9 of the conventional plant works have been diagnosed with some form
of cancer. What is the appropriate risk measurement? What is the
method of analysis to be used?
74
Three methods of training ESL workers
on WHMIS procedures are being compared: lecture, powerpoint, and
animated cartoon. The outcome is a
knowledge test score out of 40.
What method of analysis is to be used here?
?/fo
75
Same
three methods of WHMIS training but this time someone suggests the type of
work may play am important part. So
the workers are divided into low hazard, medium hazard, and high hazard
jobs.
What method of analysis is to be used here?
?/fon
76
Same
project comparison between three WHMIS training groups x three
types of hazard
with which the people are working,
but this time the dependent var is just
‘pass/fail' the standard test.
What method of analysis is to be used here?
?/fo
77
A
herbal treatment for contact dermatitis is being
evaluated against the conventional treatment of corticosteroids and
emollient creams. The outcome after 7 days is recorded as improved? unchanged?
or worse?
What method of analysis is to be used here?
78
You are investigating the extent to which INDOOR air quality (IAQ) in office buildings
without openable windows compares to air quality in similar buildings equipped with
windows that open for 5 hours/day. Measurements are taken in three types of EXTERNAL air
quality (EAQ). The dependent variable is the count of suspended particles (10 microns or
less) per 10 cc. Complete the ANOVA table and summarize.
GOOD 
MODERATE 
POOR 
CLO 
OPN 
CLO 
OPN 
CLO 
OPN 
N=10 
N=10 
N=10 
N=10 
N=10 
N=10 
MEAN=49.6 
MEAN=36.3 
MEAN=49.4 
MEAN=45.6 
MEAN=49.7 
MEAN=54.5 
source 
SS 
df 
MS 
F 
P 
all groups 
1842.53 




EAQ 
810.13 




wind O/C 
240.00 




interaction 





residual 





# 79
Three methods of teaching infection control are
being compared using three groups of students selected at random.
Their scores out of 20 are shown. Is there a significant difference
between the means of each group? Explain clearly.

Method
A:
Method B
Method C 
5.0 9.0
5.0 8.0 5.0
7.0 6.0 7.0 6.0
6.0 6.0 7.0
12.0 6.0 11.0
7.0 10.0 7.0 10.0
8.0 10.0 9.0 9.0
9.0
3.0 3.0 8.0
7.0 4.0 4.0
6.0 4.0 5.0
5.0 4.0 4.0

# 80
The pH of three types of canned tomatoes are being
compared using the pH readings are shown. Is there a
significant difference between the means of each group? Explain
clearly.

Method
1:
Method 2
Method 3 
1.9,
2.4, 2.5, 2.6, 3.0
2.0, 2.1, 3.0, 3.1, 3.3
2.8, 2.9, 3.3, 3.3, 3.8

NEW
81.
In the following ANOVA analysis, two systems of purifying water (‘old’
and ‘new’) are being compared, with the pH of the water also being
studied. The dependent variable is the ppm of the contaminant (least is
better). Twentyeight separate measurements have been made. Note that
you do not have the original data, just the means, total, and N for each
group. The ANOVA table has been completed and is shown below. 

NEW EQUIPM 
OLD EQUIPM 
Low pH 
High pH 
Low pH 
High pH 
TOTAL
MEAN
N 
174.00
24.86
7.00 
172.00
24.57
7.00 
303.00
43.29
7.00 
238.00
34.00
7.00

source 
SS 
DF 
MS 
F 
P 
ALL GRPS 





EQUIP 





pH 





EQ
X pH 





RESIDUAL 











TOTAL 
1850.11 




_________________________________________________________________
[a]
Enter the missing values in the ANOVA table above
[b]
Now select the single correct statement that summarizes the
analysis
A. Old equipm and
new equipm perform equally in low pH situations
B. New equipm is
not affected by pH, but performs consistently better than the old
C. In high pH
older equipm performs better, but in low pH new equipm is better
D. Old equipm and
new equipm perform equally in high pH situations
E. The
performance of the old equipment is not affected by pH

