|Two groups (e.g. Case / Control)
||Multiple groups (e.g. Non-smoker, exposed
to passive smoke, light smoker, heavy smoker)
||Two-sample t-test with
correction for multiple comparisons
||Anova test with correction
for multiple comparisons
When there is a single probe of interest and just two groups, the null hypothesis is that the mean of the probe in both groups is the same. The mean signal in each well of the first group are treated as data points for that group, the well means for the other group are treated as the data points for the other group, and a general two-sample t-test is performed.
When multiple probes are assayed at once, the issue of multiple comparisons arises.
The two-sample t-test used for a single probe is likely to generate many "significant" results by chance alone when many probes are tested at once, just as the odds of winning the lottery with 70 tickets are better than with 1. To put it another way, how seriously would you take somebody who rolled the dice 20 times, came up with snake eyes three times, and decided the dice was biased towards snake eyes, because they got a roll that had only has a 2.77% probability on a single toss? The many probes that are not significant in a multiplex assay matter, in just the same way as the many rolls of the dice that did not turn up snake eyes. Xkcd has an excellent cartoon.
To compensate for the statistical likelihood of finding an apparently significant result when many different probes are analyzed, both the "raw" and an "adjusted" score are presented by the software. The raw score is the t-test result for each probe as if that probe had been the only probe. The Bonferroni method provides an adjusted score which corrects for the issue of multiple comparisons and provides an improved estimate of significance. That is, it provides a better estimate of how likely that observed result would have occurred by chance even if the null hypothesis were true (H0 = the probe is the same in every group).
The Bonferroni method controls the experiment-wise false-positive rate. An alternative approach preferred by some is the Benjamini-Hochberg (BH) method to control the False
Discovery Rate The procedure is to rank the raw p-values p1,
p2, p3, ... pn starting with the smallest, and then to compare each pi with an adjusted significance threshold of (i/n) α where α is the desired FDR rate, by default 5%. The null hypothesis is rejected (i.e. the data is considered significant) when the p-values are below the adjusted threshold.
Very crudely speaking, the Bonferroni method tries to keep the number of false positives less than 1, while the BH tries to limit the number of false positives to no more than 5% of all the positives. Unless there are of order 20 significant probes, the two methods can be expected to give similar results. The figure below is one way to visualize the difference. In each case, the probes that are potentially significant are those whose p-values fall below the green line representing the significance threshold.
|Raw p-value and significance threshold of 5%
||Bonferroni-adjusted p-values and significance threshold of 5%. The raw p-values are shifted up, approximately by a factor equal to the number of probes n.
||Raw p-values and Benjamini-Hochberg adjusted significance threshold. The start of the significance curve is adjusted down by 1/n, the second point by 2/n, the third by 3/n, etc.
Chapter 12, "Fundamentals of Biostatistics", 7th Edition, Bernard Rosner
ISBN-10: 0538733497 Brooks/Cole, Cengage Learning (2011) .
General Anova analysis (Wikipedia)
Multiple comparisons (Wikipedia)
Bonferroni method (Wikipedia)
False Discovery Rate (Wikipedia)
Benjamini-Hochberg method (Wikipedia)