An Introduction to Research methodology: Chaper 15

INFERENTIAL STATISTICS

Inferential statistics deal with, of all things, inferences. Inferences about what? Inferences about populations based on the results of samples. Inferential statistics allow researchers to generalize to a population of individuals based on information obtained from a limited number of research participants. Most educational research studies deal with samples from larger populations. The more representative a sample is, the more generalizable its results will be to the population from which the sample was selected. Results that are representative only of that particular sample are of very limited research use. Consequently, random samples are preferred. Inferential statistics are concerned with determining whether results obtained from a sample or samples are the same as would have been obtained for the entire population.

Inferential statistics is statistical procedures that are used to make inferences or generalizations about a population from a set of data. Statistical inference is based on probability theory. (Jack C. Richard, 2002:256)

The Null Hypothesis

Hypothesis testing is a process of decision making about the results of a study. If the experimental group’s mean is 35 and the control group’s mean is 27, the researcher has to decide whether the difference in the treatments or simply sampling error. When we talk about the difference between two sample means being a true or real difference we mean that the difference was caused by the treatment (the independent variable), and not by chance. In other words, an observed difference is either caused by the treatment, as stated in the research hypothesis, or is the result of chance, random sampling error. The chance explanation for the difference is called the null hypothesis. The null hypothesis states that there is no true difference or relationship between parameters in the populations, and that any difference or relationship found for the samples is the result of sampling error. A null hypothesis might state:

There is no significant difference between the mean reading comprehension of first-grade students who receive whole language reading instruction and first-grade students who receive basal reading instruction.

This hypothesis says that there really is not any difference between two methods, and if you find one in your study, it is not a true difference, but a chance difference resulting from sampling error.

The null hypothesis for a study is usually (although not necessarily) different from the research hypothesis. The research hypothesis typically states that one method is expected to be more effective than another, while the null hypothesis states that there is not difference between the methods.

In a research study, the test of significance selected to determine whether a difference between means is a true difference provides a test of the null hypothesis. As a result , the null hypothesis is either rejected as being probably false, or not rejected as being probably true. Notice the word probably. We never know with total certainty that we are making the correct decision; what we can do is estimate the probability of our being wrong. After we make the decision to reject or not reject the null hypothesis, we make an inference back to our research hypothesis. If, for example, our research hypothesis states that A is better than B, and if we reject the null hypothesis (that there is no difference between A and B), and if the mean for A is greater than the mean for B, Then we conclude that our research hypothesis was supported—not proven! If we do not reject the null hypothesis (A is not different from B), then we conclude that our research hypothesis was not supported.

In order to test a null hypothesis we need a test of significance and we need to select a probability level that indicates how much risk we are willing to take that the decision we make is wrong.

Test of Significance

The test of significance is usually carried out using a preselected probability level that serves as a criterion to determine whether we reject or fail to reject the null hypothesis. The usual preselected probability level is either 5 out of 100 or 1 out of 100 chances that the observed difference did not occur by chance. If the probability of the difference between two means is likely to occur less than 5 times in 100 (or 1 time in 100), it is very unlikely to have occurred by chance, sampling error. Thus, there is a high (but not perfect) probably that the difference between the means did not occur by chance. Thus, the most likely explanation for the difference is that the two treatments were differentially effective. That is, there was a real difference between the means. Obviously, if we can say we would expect such a difference by chance only 1 time in 100 times, we are more confident in our decision that if we say we would expect such a chance difference 5 times in 100. how confident we are depends on the level of significance, or probability level, at which we perform our test of significance.

Degrees of Freedom

Degrees of freedom (df) are dependent upon the number of participants and the number of groups. Suppose I ask you to name any five number. You agree and say “1, 2, 3, 4, 5.” In this case N is equal to 5—you had 5 choices or 5 degrees of freedom to select the numbers. Now suppose I tell you to name 5 numbers and you say “1, 2, 3, 4, …,” and I say, “Wait! The mean of the five numbers you choose must be 4.” Now you have no choice—your last number must be 10 because 1 + 2 + 3 + 4 + 10 = 20 and 20 divided by 5 = 4. you lost one degree of freedom because of the restriction (lack of freedom) that the mean must be 4. in other words, instead of having N = 5 degrees of freedom, you only had N = 4 (5 – 1) degrees of freedom.

Each test of significance has its own formula for determining degrees of freedom. For the correlation coefficient, r, the formula is N – 2. The number 2 is a constant, requiring that degrees of freedom for r are always determined by subtracting 2 from N, the number of participants.

The t test.

The t test is used to determine whether two means are significantly different at a selected probability level. In determining significance, the t test makes adjustments for the fact that the distribution of scores for small samples becomes increasingly different from the normal distribution as sample sizes become increasingly smaller. For example, distributions for smaller samples tend to be higher at the mean and at the two ends of the distribution. Because of this, the t values required to reject a null hypothesis are higher for small samples. As the size of the samples becomes larger, the score distribution approaches normality. There are two different types of t test, the t test for independent samples and the t test for nonindependent samples.

Independent samples are two samples that are randomly formed without any type of matching. The members of one sample are not related to members of the other sample in any systematic way, other than that they are selected from the same population. If two groups are randomly formed, the expectation is that at the beginning of a study they are essentially the same with respect to performance on the dependent variable. Therefore, if they are also essentially the same at the end of the study (their means are close), the null hypothesis is probably true. If, on the other hand, their means are not close at the end of the study, the null hypothesis is probably false and should be rejected. The key word is essentially.

The t test for nonindependent samples is used to compare groups that are formed by some type of matching or to compare a single group’s performance on a pre- and posttest or on two different treatments. When samples are not independent, the members of one group are systematically related to the members of a second group (especially it is the same group at two different times). If samples are nonindependent, scores on the dependent variable are expected to be correlated with each other, and a special t test for correlated, or nonindependent, means is used. When samples are nonindependent, the error term of the t test tends to be smaller, and therefore, there is a higher probability that the null hypothesis will be rejected. Thus, the t test for nonindependent samples is used to determine whether there is probably a significant difference between the means of two matched, or nonindependent, samples or between the means for one sample at two different times.

Simple Analysis of Variance

Simple, or one-way, analysis of variance (ANOVA) is used to determine whether there is a significant difference between two or more means at a selected probability level. Thus, for a study involving three groups, ANOVA is the appropriate analysis technique. Like two posttest means in the t test, three (or more) posttest means in ANOVA are unlikely to be identical, so the key question is whether the differences among the means represent true, significant differences or chance differences due to sampling error. To answer this question ANOVA is used and an F ratio is computed. You may be wondering why you cannot just compute a bunch of t tests, one for each pair of means. Aside from some statistical problems concerning resulting distortion of your probability level, it is more convenient to perform one ANOVA than to perform several t tests. For example, to analyze four means, six separate t test would be required (X₁ – X₂, X₁ – X₃, X₁ – X₄, X₂ – X₃, X₂ – X₄, X₃ – X₄). ANOVA is much more efficient and keeps the error rate under control.

The concept underlying ANOVA is that the total variation, or variance, of scores can be divided into two sources—treatment variance (variance between groups, caused by the treatment groups) and error variance (variance within groups). A ratio is formed, (the F ratio) with treatment variance as the numerator (variance between groups). It is assumed that randomly formed groups of participants are chosen and are essentially the same at the beginning of a study on a measure of the dependent variable. At the end of the study, we determine whether the variance between groups differs from the error variance by more than what would be expected by chance. In other words, if the treatment variance is sufficiently larger than the error variance, a significant F ratio results; the null hypothesis is rejected, and it is concluded that the treatment had a significant effect on the dependent variable. If, on the other hand, the treatment variance and error variance do not differ by more than what would be expected by chance, the resulting F ratio is not significant and the null hypothesis is not rejected. The greater the difference, the larger the F ratio. To determine whether the F ratio is significant, consult F table. Find the place corresponding to the selected probability level and the appropriate degree of freedom. The degrees of freedom for the F ratio are a function of the number of groups and the number of participants.

Suppose we have the following set of posttest scores for three different posttests from the same group.

No	Code of	Sample 1		Sample 2		Sample 3
	Students	X₁	X₁²	X₂	X₂²	X₃	X₃²
1	S1	55.7	3102.5	63.7	4057.69	68.7	4719.69
2	S2	68.3	4664.9	76.3	5821.69	83	6889
3	S3	66	4356	69.7	4858.09	75.3	5670.09
4	S4	60.3	3636.1	62	3844	81	6561
5	S5	49.3	2430.5	56.7	3214.89	71.3	5083.69
6	S6	47.7	2275.3	59	3481	78.7	6193.69
7	S7	59.3	3516.5	69.7	4858.09	82	6724
8	S8	47.7	2275.3	51.3	2631.69	73.7	5431.69
9	S9	61	3721	65	4225	78.7	6193.69
10	S10	56	3136	61.7	3806.89	74.3	5520.49
11	S11	49	2401	56.3	3169.69	69.7	4858.09
12	S12	46	2116	59	3481	81	6561
13	S13	63.7	4057.7	78	6084	84.7	7174.09
14	S14	64	4096	67.7	4583.29	71.7	5140.89
15	S15	60.3	3636.1	71.3	5083.69	75	5625
16	S16	55.3	3058.1	65.3	4264.09	75.3	5670.09
17	S17	48	2304	61.3	3757.69	78	6084
18	S18	45.3	2052.1	63	3969	66.7	4448.89
19	S19	63.3	4006.9	68.7	4719.69	76.7	5882.89
20	S20	62.3	3881.3	74	5476	85.3	7276.09
21	S21	68	4624	76.7	5882.89	83.3	6938.89
22	S22	67.7	4583.3	72.3	5227.29	86.3	7447.69
23	S23	56.3	3169.7	75	5625	87.3	7621.29
24	S24	49.7	2470.1	60.3	3636.09	72.3	5227.29
25	S25	51	2601	52.7	2777.29	73	5329
26	S26	51.3	2631.7	70	4900	73.3	5372.89
27	S27	51.3	2631.7	74.7	5580.09	77.3	5975.29
28	S28	49	2401	62.3	3881.29	73.7	5431.69
29	S29	53.3	2840.9	60.7	3684.49	73.3	5372.89
30	S30	52.3	2735.3	60.3	3636.09	74	5476
31	S31	59.3	3516.5	75.7	5730.49	81	6561
32	S32	68.3	4664.9	72	5184	82	6724
33	S33	50	2500	71.3	5083.69	82	6724
34	S34	60	3600	66.3	4395.69	74.3	5520.49
35	S35	61.3	3757.7	65.7	4316.49	77.7	6037.29
36	S36	63.7	4057.7	66.3	4395.69	78.7	6193.69
37	S37	46.7	2180.9	57.7	3329.29	70.7	4998.49
38	S38	61.7	3806.9	70.7	4998.49	82.7	6839.29
39	S39	56.7	3214.9	68.7	4719.69	75	5625
40	S40	67	4489	80	6400	81.7	6674.89
41	S41	61	3721	73.3	5372.89	78.3	6130.89
42	S42	52	2704	67.3	4529.29	72.3	5227.29
	∑	2386.1	137625	2799.7	188673	3241	251157
		n₁ = 42		n₂ = 42		n₃ = 42

∑X = ∑X₁ + ∑X₂ + ∑X₃ = 2386.1 + 2799.7 + 3241= 8426.8

∑X² = ∑ X₁² + ∑ X₂²+ ∑ X₃² = 137625 + 188673 + 251157 = 577455.9

N = n₁ + n₂ + n₃ = 42 + 42 + 42 = 126

First, find the SS_total.

SS_total. = ∑X² _ (∑X)²

= 577455.7 _ (8426.8)²

126

= 577455.7 – 563579

SS_total. = 13876,7

Next, do SS_between.

SS_between. = (∑X₁)² + (∑X₂)² + (∑X₃)² _ (∑X)²

n₁ n₂ n₃ N

= (2386.1)² + (2799.7)² + (3241)² _ (8426.8)²

42 42 42 126

= 135558.9 + 186626.7 + 250097.2 – 563579

= 572282.8 – 563579

SS_between = 8703.8

Now how are we going to get SS_within ? we subtract SS_between from SS_total. :

SS_within = SS_total. - SS_between

= 13876.7 – 8703.8

SS_within = 5172.9

Now we have everything we need to begin! Seriously, we have all the pieces, but we are not quite there yet. Let us fill in a summary table with what we have and you will see what is missing:

__________________________________________________

Source of Sum of Mean

Variation Squares df Square F

__________________________________________________

Between 8703.8 (K – 1)

Within 5172.9 (N – K)

Total 13876.7 (N – 1)

__________________________________________________

The first thing you probably notices is that each term has its own formula for degrees of freedom. The formula for the between term is K – 1, where K is the number of treatment groups; thus, the degrees of freedom are K – 1 = 3 – 1 = 2. The formula for the within term is N – K, where N is the total sample size and K is still the number of treatment groups; thus, degrees of freedom for the within term N – K = 126 – 3 = 123. We do not need them, but for the total term, df = N – 1 = 42 – 1 = 41. Now what about mean squares? Mean squares are found by dividing each sum of squares as MS, using the subscript B for between and W for within. Thus, we have the equation:

Mean square = Sum of squares

Degrees of freedom

MS = SS

for between, MS_B, we get:

MS_B = SS_B

= 8703,8

MS_B = 4351,9 (= 4352)

for within, MS_w, we get:

MS_w = SS_w

5172,9

123

MS_w = 42

Now all we need is our F ratio. The F ratio is a ratio of MS_B and MS_w :

F = MS_B

MS_w

F = 4352

F = 103.6

Filling in the rest of our summary table we have:

__________________________________________________

Source of Sum of Mean

Variation Squares df Square F

__________________________________________________

Between 8703.8 (K – 1) = 2 4352 103.6

Within 5172.9 (N – K) = 123 42

Total 13876.7 (N – 1) = 125

__________________________________________________

Note that we simply divided across (8703.8: 2 = 4352 and 5172.9 : 123 = 42) and then down (4352 : 42 = 103.6). thus, F = 103.6 with 2 and 123 degrees of freedom.

Assuming that a = .05, we are now ready to go to our F table. (look at books of statistics to find the table). We find that F table = 4.79, the value of F required for statistical significance (required in order to reject the null hypothesis) if a = .05. The question is whether our F value, 103.6, is greater than 4.79. Obviously it is. Therefore, we reject the null hypothesis and conclude that there is a significant difference among the three group means.

Tuesday, May 10, 2011

Chaper 15

No comments:

Post a Comment