# Anova with Repeated Samples

Consider a hypothetical experiment to compare placebo, ibuprofin and naproxen for their effects on blood pressure.   One option is to pick a group of volunteers for placebo, a group of volunteers for ibuprofin, and a group of volunteers for naproxen.   One advantage is that the number of individual in each group does not have to be exactly the same; Anova analysis of one variable considers only the average and standard deviation of each group.

However because there may be a wide spread in blood pressures between individuals, a small effect due to treatment may be masked by the difference in the average between groups just due to the random assignment of individuals to groups.

An alternative that is likely to work better is to take baseline pressure for a group of individuals, treat them with naproxen for one week, measure the same patients again, allow some time to pass, then repeat with ibuprofin.    In that way, each patient serves as their own control.

The latter experiment is known as Anova with Repeated Measures, and can be considered a special case of multi-variate Anova.   Each combination of a patient and a treatment is a separate sample, with the list of samples having two variables defined, one being the treatment and the other being the individual ID.    A sample sheet for such an experiment would look like

To take advantage of patients being their own controls in the differential analysis, select the variable "Treatment" as a primary variable, and "Patient ID" as a control variable in the variable analysis menu.

Mathematically, the consequence is that when the workbench computes the ratio between the within-group variation and the between-group variation in order to assess signifance of the difference between means, the within-group variation for each treatment group is corrected by the between-patient variation.   As a result, the significance of any result will be enhanced.   The "signal to noise ratio" is improved by eliminating a source of noise, namely, the variation between patients.

If there are several such variables, for instance, patient ID, and "blood pressure taken in the morning" vs "blood pressure taken in the evening", and capsule / tablet, all such variables can be treated as a control variables.

A disadvantage of such a design relative to the simple design is that the matrix of patients and treatments must be fully populated - that is, every treatment must have every patient and every patient must have tried all the treatments.   Any patient who missed a treatment, or who wasn't measured prior to treatment, should be dropped.    If not, the workbench will warn and drop back to single-variable Anova.

There should also be more than one entry in each sub-group, in order to define within-group variance.  There is no requirement that the number of replicates in each experimental cell needs to be the same.  For instance, in the above example, Harry could have 5 BP measurements with ibuprofin and 3 BP measurements with naproxen.