Data processing pipeline


Picture of data processing pipeline

Data

Internal data is represented as a statistical object for each sample and probe.  
The statistical object contains all the data values for that sample and probe combination, as well as the mean, standard deviation, standard error of the mean, confidence interval of the mean, coefficient of variation.    The statistical object by default computes the mean and variance measures from a central cohort of the data values, with appropriate corrections to the variance measures.    By default, the cohort is the 25th-75th percentile, though this can be changed on the analysis settings page.   Using the full population can result in undue influence of an outlier data value.   Using too narrow a cohort can result in poor estimates of the variance.  

For flow cytometry data, the data values are typically positive numbers in the range 1-10,000, with units that depend on the cytometer manufacturer.    After background or negative subtraction (to be described below) some values in the matrix may be negative.   Such values correspond to wells which are below background due to noise; either the background was particularly high in that well, or the probe signal was particularly low.   When taking the log of subtracted data, a log floor is applied to avoid taking the log of negative numbers.

Background subtraction refers to the combination of blank probe subtraction and negative well subtraction.  Normalization and standardization are never performed together; and experiment is either normalized or standardized (or neither).

Blanks and Negatives

Negative control probes ("blanks") and negative control samples ("negatives" or "water wells") can be defined for each experiment.

A negative control probe correspond to particles which either have no probe or off-species probe (e.g. platypus microRNA) bound to them.   Any signal from such a probe corresponds to non-specific binding of target to particle and is a measure of background signal.   If any probes are marked as "blank" on the probe table, they are subtracted from the other probes in each sample.   In the following table, the right-most column would be subtracted from the other columns.


mir-A mir-B
mir-C
mir-D
mir-E
mir-F
mir-G
mir-H
blank
A01
nnn
nnn nnn nnn nnn nnn nnn nnn nnn
A02
nnn nnn nnn nnn nnn nnn nnn nnn nnn
A03
nnn nnn nnn nnn nnn nnn nnn nnn nnn
A04
nnn nnn nnn nnn nnn nnn nnn nnn nnn
A05
nnn nnn nnn nnn nnn nnn nnn nnn nnn



Negative control wells are wells with the same particles as other wells, but contain only inert material, e.g. water or PBS.    If any wells are marked as "negative" on the sample table, they are subtracted probe for probe from the data in other wells.   In the table below, the two purple rows would be averaged and subtracted from the other rows.



mir-A mir-B
mir-C
mir-D
mir-E
mir-F
mir-G
mir-H
blank
A01
nnn
nnn nnn nnn nnn nnn nnn nnn nnn
A02
nnn nnn nnn nnn nnn nnn nnn nnn nnn
A03
nnn nnn nnn nnn nnn nnn nnn nnn nnn
A04
nnn nnn nnn nnn nnn nnn nnn nnn nnn
A05
nnn nnn nnn nnn nnn nnn nnn nnn nnn
water
nnn nnn nnn nnn nnn nnn nnn nnn nnn
water
nnn nnn nnn nnn nnn nnn nnn nnn nnn


Negative well subtraction is carried out when negative control wells are present, unless there is a standard curve.   With a standard curve, the negative wells are used only to help define a minimum detectable dose; there is no need to background subtract since the curve automatically compensates for the background.


Normalization

Normalization probes serve to correct for well to well variability arising from, for instance, different sample concentrations in different wells, different volumes due to pipetting imprecision, and other normal handling variation in sample preparation.   See the normalization page.  A normalization probe ideally corresponds to a housekeeping target that can be expected to vary in a like manner among all samples.  A variety of normalization styles are supported, including

The default is to normalize using auto-selected probes, chosen using the geNorm algorithm.  Both the geNorm implemntation as well as the geometric mean skip probes which have very low expression levels, or which result from hemolysis.

 Normalization is applied whenever normalization probes are selected.  

Whatever the normalization style, a synthetic probe is created using a geometric average of the selected normalization probes.    That synthetic probe is then scaled to have mean 1.0.   In each well, all the probes are divided by the synthetic probe.   Having a synthetic normalization probe with mean 1.0 results in the original probes retaining values in the same numeric range as their original values.

Standardization


A standard curve for each probe is used to interpolate the input data on the curve in order to estimate its absolute concentration.   See the standard curve page for more details.   If the input MFI data is background subtracted (negative control wells or negative control probes) the standard curve MFI data will also be background subtracted in order to make the interpolation consistent.

Synchronization

The data pipeline for an experiment is refreshed every time there is any change to its inputs, including the choice of blanks, negative wells, normalization probes, wells removed from the experiment, etc.   The displayed data is always consistent with the displayed settings.

Summary

To recap


Source
Yields
Usage
Blanks
Derived from one or a few probes
A background level for
each well
Subtracted on a well by well basis from
each probe
Negatives
Derived from a few
wells
A background level for
each probe
Subtracted on a probe
by probe basis in every
well
Normalization
Derived from one or a
few probes
A reference probe  with mean 1.0
Each probe in each
well is divided by the
reference probe in that
well

See Also

Data Statistics
Normalization
Standard curve

Help Front Page