Differential Expression

When variables are defined, the software computes the fold-change between the groups of the variable, and determines whether the changes are significant

The data are presented in both a table and a chart.

To clarify the discussion of variables, it is helpful to break out four cases of increasing complexity.

Two groups (e.g. Case / Control)
Multiple groups (e.g. Non-smoker, exposed to passive smoke, light smoker, heavy smoker)
Simplex assay
Two-sample t-test
Anova test
Multiplex assay
Two-sample t-test with correction for multiple comparisons
Anova test with correction for multiple comparisons

In addition to these standard cases, repeated-sample Anova is also supported.

Two-sample t-test

When there is a single probe of interest and just two groups, the null hypothesis is that the mean of the probe in both groups is the same.   The mean signal in each well of the first group are treated as data points for that group, the well means for the other group are treated as the data points for the other group, and a general two-sample t-test is performed.

Two-sample t-test with correction for multiple comparisons

When multiple probes are assayed at once, the issue of multiple comparisons arises.  
The two-sample t-test used for a single probe is likely to generate many "significant" results by chance alone when many probes are tested at once, just as the odds of winning the lottery with 70 tickets are better than with 1.   To put it another way, how seriously would you take somebody who rolled the dice 20 times, came up with snake eyes three times, and decided the dice was biased towards snake eyes, because they got a roll that had only has a 2.77% probability on a single toss?   The many probes that are not significant in a multiplex assay matter, in just the same way as the many rolls of the dice that did not turn up snake eyes.   Xkcd has an excellent cartoon.

To compensate for the statistical likelihood of finding an apparently significant result when many different probes are analyzed, both the "raw" and an "adjusted" score are presented by the software.   The raw score is the t-test result for each probe as if that probe had been the only probe.  The Bonferroni method provides an adjusted score which corrects for the issue of multiple comparisons and provides an improved estimate of significance.   That is, it provides a better estimate of how likely that observed result would have occurred by chance even if the null hypothesis were true (H0 = the probe is the same in every group).

The Bonferroni method controls the experiment-wise false-positive rate.  An alternative approach preferred by some is the Benjamini-Hochberg (BH) method to control the False Discovery Rate The procedure is to rank the raw p-values p1, p2, p3, ... pn starting with the smallest, and then to compare each pi with an adjusted significance threshold of  (i/n) α where α is the desired FDR rate, by default 5%.  The null hypothesis is rejected (i.e. the data is considered significant) when the p-values are below the adjusted threshold. 

Very crudely speaking, the Bonferroni method tries to keep the number of false positives less than 1, while the BH tries to limit the number of false positives to no more than 5% of all the positives.  Unless there are of order 20 significant probes, the two methods can be expected to give similar results.  The figure below is one way to visualize the difference.  In each case, the probes that are potentially significant are those whose p-values fall below the green line representing the significance threshold.

raw data
Raw p-value and significance threshold of 5%
Bonferroni-adjusted p-values and significance threshold of 5%.  The raw p-values are shifted up, approximately by a factor equal to the number of probes n.
Raw p-values and Benjamini-Hochberg adjusted significance threshold.  The start of the significance curve is adjusted down by 1/n, the second point by 2/n, the third by 3/n, etc.

Anova Test

When a single probe is considered over a variable that has several different levels (e.g. non-smoker, exposed to passive smoking, light smoker, heavy smoker) the question and the null hypothesis need to be re-framed.   Instead of "is there a difference between both groups" the question becomes "does this variable make a difference". The null hypothesis is that the mean value of the probe in all the groups is the same.  

Caution: even one group different from all the other groups is sufficient to fail the hypothesis.   In a three-way comparison between cancer pre-treatment, cancer post-treatment, and healthy, the two cancer samples could be the same as each other and different from the healthy sample.   No conclusion regarding treatment can be drawn unless the healthy is removed from consideration.

Statistically, the analysis proceeds by comparing the variance between groups to the variance within  groups, using an F-test.  If the value of the test statistic exceeds a threshold which depends on the number of groups and number of members of each group, the result is likely significant.   When there are only two groups, the Anova test can be shown to be identical to the t-test.

Anova Test with correction for multiple comparisons

Analyzing many different probes over a variable with several levels raises the same multiple comparisons issues as over a variable with just two levels.  The same methods (Bonferroni and Benjamini-Hochberg) are used to adjust the p-values and estimate significance cutoff as for the two-level case.

Frequently asked questions

Q: How are assay control probes and blank probes considered in the analysis?
A: They are removed from consideration.

Q: If my experiment is designed around 12 particular probes which may or not be biomarkers, but I also included 12 other probes for normalization or other purposes.   How should they be treated?
A: If the extraneous probes will not be considered as biomarkers no matter how the data comes out, it is reasonable to remove them from analysis by using the "hide" checkbox on the probe table.

Q: Suppose I start with 45 potential biomarkers, and find that none are significantly different among groups, after adjusting for multiple comparisons.  If I hide all the probes except the most significant (or rather, least in-significant) probes, the adjusted p-value becomes significant.   Is the probe significant or not?  
A: No.  That is equivalent to ignoring 17 out of 20 failed rolls of the dice in the snake eyes example.

See Also

Chapter 12, "Fundamentals of Biostatistics", 7th Edition, Bernard Rosner
ISBN-10: 0538733497 Brooks/Cole, Cengage Learning (2011) .

General Anova analysis (Wikipedia)
Multiple comparisons (Wikipedia)
Bonferroni method  (Wikipedia)
False Discovery Rate (Wikipedia)
Benjamini-Hochberg method (Wikipedia)

Help Front Page