if we have Normal populations and independent observations and ?1 and ?2 are known

z=\frac{(\overline{x}_1 - \overline{x}_2) - d_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}

(Wackerly, D., Mendenhall, W., & Scheaffer, R. L., 2001, p. 401)


When Unequal sample sizes, unequal variance and unknown variances


This test is used only when the two population variances are assumed to be different (the two sample sizes may or may not be equal) and hence must be estimated separately. See also Welch’s t-test. The t statistic to test whether the population means are different can be calculated as follows:

t = {\overline{X}_1 - \overline{X}_2 \over s_{\overline{X}_1 - \overline{X}_2}}


s_{\overline{X}_1 - \overline{X}_2} = \sqrt{{s_1^2 \over n_1} + {s_2^2  \over n_2}}.

Where s2 is the unbiased estimator of the variance of the two samples, n = number of participants, 1 = group one, 2 = group two. Note that in this case,  {s_{\overline{X}_1 - \overline{X}_2}}^2 is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student’s t distribution with the degrees of freedom calculated using

 \mathrm{d.f.} = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)}.
Reinard, J. C. (2006), p.165

This is called the Welch–Satterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances: see Behrens–Fisher problem.


if the standard deviationof the two populations are the same then the formula will simply depend on variance.

we don’t know what is the population variance it but we can estimate it.

maximum likelihood estimate

s_p^2=\frac{\sum_{i=1}^k((n_i - 1)s_i^2)}{\sum_{i=1}^k n_i }

is biased but we can show that

Unbiased least square estimate is

s_p^2=\frac{\sum_{i=1}^k((n_i - 1)s_i^2)}{\sum_{i=1}^k(n_i - 1)}
This mean variance is calculated by weighting the individual values with the size of the subset for each level of x. Thus, the pooled variance is defined by

S_P^2 = \frac{(n_1-1)S_1^2+(n_2-1)S_2^2 + \cdots + (n_k - 1)S_k^2}{(n_1 - 1) + (n_2 - 1) + \cdots +(n_k - 1)}

where n1, n2, . . . nk are the sizes of the data subsets at each level of the variable x, and S12, S22, . . ., Sk2 are their respective variances.

The proof of being unbiased:



It can be shown that when this estimator is used the statistic is t with n1+n2-2 df

t=\frac{(\overline{x}_1 - \overline{x}_2) - d_0}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}},
s_p^2=\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2},
df=n_1 + n_2 - 2 \ [10]

( Wackerly, D., Mendenhall, W., & Scheaffer, R. L. (2001) p.402)

Wackerly, D., Mendenhall, W., & Scheaffer, R. L. (2001). Mathematical Statistics with Applications (6th ed.). Duxbury Press.
Reinard, J. C. (2006). Communication Research Statistics. Sage Publications, Inc.
One-way analysis of variance (ANOVA)
to compare means of three or more samples (using the F distribution). This technique can be used only for numerical data.

One-way analysis of variance – Wikipedia, the free …


Two-way analysis of variance (ANOVA)

examines the influence of two different categorical independent variables on one continuous dependent variable.

it is used when there are two or more dependent variables, although statistical reports provide individual p-values for each dependent variable in order to test for statistical significance.
It helps to answer:
-Do changes in the independent variable(s) have significant effects on the dependent variables?
-What are the interactions among the dependent variables?
-And among the independent variables?



Estimators for the difference of means of populations — No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>