To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Friday, January 23, 2009

ANOVA

Analysis of variance (ANOVA) is a parametric statistical technique used to compare datasets. This technique was invented by R. A. Fisher, and is thus often referred to as Fisher’s ANOVA as well. It is similar in application to techniques such as t-test and z-test, in that it is used to compare means and the relative variance between them, although ANOVA is best applied where more than 2 populations or samples are meant to be compared.

Using ANOVA over other methods such as multiple t-tests offers significant advantages such as:

· A larger sample size and a larger number of samples can be compared using ANOVA.

· ANOVA can be used to evaluate much larger and more complex problems, using multiple variables and datasets, using both different types of ANOVA as well as its extensions.

ANOVA can be leveraged in a number of ways depending on the number of samples, variables or datasets in question, as well as how many variables are to be tested at once.

· One way between groups

The simplest form of ANOVA; it involves measuring the differences in the observations between two or more different groups, to determine if there is statistically significant difference between them.

· One way repeated measures

This is similar to one-way ANOVA except for the fact that the same variable is tested again and again to measure the results over say a period of time or multiple iterations of a treatment.

· Two way between groups

This is used where the impact of two independent variables need to be measured in a single instance. For instance, a researcher can look at performance by the impact of two different factors (independent variables) such as health (good and bad) and worker efficiency (high vs. low).

· Two way repeated measures

Again, this is the same as a two-way ANOVA, except for the fact that the analysis is done on multiple observations of the same variables.

Rather than types of ANOVA, extensions such as MANOVA and ANCOVA are also used where the need exists. MANOVA or Multiple analysis of variance is typically used where two-way ANOVA limits the need to have multiple (more than 2) independent variables whose impact needs to be measured.

The use of ANOVA involves certain key assumptions, including the following:

· Normality: being a parametric test, the use of ANOVA as a statistical technique requires that the dataset(s) in question be normally distributed, and if they aren’t they must be normalized. In order to determine normality of the dataset(s), tests such as the Kolmogorov Smirnov and Shapiro-Wilk test, in addition to examining the congruence of the mean, median and mode can be used.

· Homogeneity

While using ANOVA, sample variances are expected to be equal. In other words, the error terms in the samples should be the same and should not exhibit too much volatility. If this is not the case, the samples will not satisfy the said assumption and will be considered heterogeneous. Homogeneity can be measured using Durbin-Watson, Bartlett and/or other tests. In case of heterogeneity, first difference will be taken.

· Independence

ANOVA requires case independence, which means that each of the observations in each variable is independent of each other. It’s important to remember however, that this does not mean that the variables themselves have to be independent of each other, as in a repeat-measure experiment, where ANOVA is widely used.

Like any other statistical and scientific technique, ANOVA has it’s drawbacks. The most important ones to remember while using it are:

· Given the need for data normality while using ANOVA, it is relatively difficult to find a real world sample or population which is truly normal. The need for normality is usually fulfilled by transformation of the data with logs.

· ANOVA can be fairly ineffective as a technique for measuring variance, if the error time is not consistent throughout the samples. This problem of heteroscedasity implies that unless the variance between each of the samples in question is ‘equal’, ANOVA’s results may or may not be reliable. For instance, ANOVA may suggest statistical significance where there is none.

Click Here for ANOVA help