  # geNorm algorithm and Probe Stability

If the experiment contains a subset of probes that are "housekeeping" probes that are expressed at the same level in every sample, they can serve as good normalization probes.

In a given experiment, it is usually not known what probes were in fact unchanged between samples, since there are other random variables affecting the measured data.    However if there were such a subset of housekeeping probes, they would share the characteristic that many of the random variables would affect the measurements of all such together.   As an example, pipetting variation, sample dilution, sample collection technique variation, temperature variation between wells, and so on, would usually act across the set of probes.

As a consequence, the probes that are expressed at the same level in each sample would rise and fall in a constant ratio to each other.   For example, if mir-7a were expressed at twice the level of mir-7b in all samples, the vector of mir-7a in all samples divided by the vector of mir-7b in all samples will be a vector of 2.0's, and the standard deviation of that vector is zero.

These considerations lead Vandesompele and others  to define a distance between two probes P and Q as being the standard deviation of a vector composed of the ratio of P/Q expressed in all samples  (more precisely, log2(P/Q)).   Having defined a distance between two probes, the "stability" of a probe is then defined as the average distance of one probe to all the others.   If there is a core housekeeping group of probes, they will all be within a short distance from each other (zero in the idealized example above), while the probes of biological interest which are varying from sample to sample due to treatment or disease or age or some other factor will be further from the average probe -- they will be less "stable".

The geNorm algorithm chooses a set of normalization probes starting with the most stable probe and progressively accumulating more until the averge of the accumulated probes converges.

The workbench allows you to visualize the process by displaying the stability chart shown below.  The vertical axis is the average distance to all the other probes using the mean probe-probe distance defined above.  The shape of the chart is relatively typical, with many probes of similar stablity on the left, and a few "unstable" probes on the right, corresponding to probes that vary a lot between samples with a different pattern than most other probes.   The "unstable" probes are often the ones of most interest, though may also just be noisy.  Hemolysis markers in serum samples might vary substantially  due to variability in serum purification. It is not unusual to find the "unstable" probes at the top of the significance table in differential analysis, and the most "stable" probes at the bottom of the significance table. A custom normalization scheme can be designed by, for instance, selecting the 12 most stable probes (left-most in the chart), then perhaps de-selecting one or two that are expected to vary between samples in an interesting way.   While that selection is active, using the popup menu in the header of the probe table, "norm" column, the selected probes can be set as normalizers.