  # Data statistics

In each sample, numerous particles with the same probe are detected.   The signal levels from the particles are aggregated to form an estimate of the mean signal in the well as well as the confidence interval of the mean.  A number of processing steps are included, including
• outlier removal
• quantile choice
• mean
• variance, standard deviation, standard error
• confidence interval

### Outlier removal

Occasionally a particle may have the barcode of the probe but provides an invalid signal.   For instance, the barcode might have been misread, and it actually belongs to some other probe.  The particle may have been folded or twisted while passing through the flow cell, and the signal was not properly read.   Two particles might have passed through together, confusing the particle calling algorithm.   In any case, such outliers could severely distort the statistics of the distribution.   For instance, if the expected value of the probe is of order 50 ± 5 and one particle has a value of 9000, it would outweigh the contributions of dozens of valid particles.

To remove outliers, a preliminary estimate of the mean and standard deviation is formed using the 25-75% quantile of the signal distribution.    Each particle is then assigned a p-value of belonging to the distribution, using its signal and the mean/standard deviation of the distribution.   Particles with p<10% are considered outliers and removed from consideration.   The probability can be modified using the probe settings panel.

### Central quantile

After outlier removal, the remaining particles are ranked, and a quantile from which statistics are to be computed is chosen.   By default, the quantile is set as the 25-75% quantile.   This provides statistical estimates comparable to that of using the full 0-100% quantile, while providing a second layer of protection against extreme values.   The quantile chosen can be modified using the probe settings panel.   The quantile can be imagined as a slider, between wide open (0-100%) which gives unbiased estimates of mean and variance, and very tight (49-51%) which gives a median estimate to the mean, but a very poor estimate of the variance.   The default setting of 25-75% gives a good balance between estimation accuracy and robustness to outliers.

### Mean signal

The mean is taken conventionally, as the arithmetic mean of the particle signals in the central quantile.

### Variance, standard deviation, standard error

The variance is computed conventionally from the RMS deviation of the central quantile.  It is then corrected by a factor which accounts for the quantile width, to give an unbiased estimate of the population variance.

The standard deviation is the square root of the variance.

The standard error of the mean is calculated as the standard deviation divided by square root of the number of particles.   SEM = stdDev / √ n.   The SEM estimates the standard deviation between samples of size n chosen at random from the same population, and is an estimate of the variance of the mean.

### Confidence interval

The confidence interval is computed as the 95% interval around the mean using ns standard deviations,
CI = μ ± ns SEM

Students statistics are used to calculate the appropriate ns.   In the limit of a large number of particles ns will converge to 1.96, the same as for normal statistics.    When the number of particles is small, the number of standard deviations to form a confident estimate of the mean increases, e.g. for 4 particles, ns is 3.2.   With just three particles, 4.3 standard deviations are needed.

Given the set of particles measured, the mean signal in the well has a 95% likelihood of falling inside the confidence interval.