Advanced | Help | Encyclopedia
Directory


Algorithms for calculating variance

In statistics, a formula for calculating the variance of a population of size n is:

<math>\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 – (\sum_{i=1}^{n} x_i)^2}{n^2}.<math>

A formula for calculating the unbiased estimation of the population variance from n finite samples is:

<math>\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 – (\sum_{i=1}^{n} x_i)^2}{n(n-1)}.<math>

The method of calculation may be more easily understood from the table below where the mean is 8.

i xi xi − mean (xi − mean)2
(index) (datum) (deviation) (squared deviation)
1 5 −3 9
2 7 −1 1
3 8 0 0
4 10 2 4
5 10 2 4
n = 5 sum = 40 0 18
  • mean = 40/5 = 8
  • variance = (5 · 338 − 402)/(5 · 4) = 4.5
  • standard deviation = <math>\sqrt{\mathrm{Variance}} = 2.12<math>

Note: Details of the variance calculation:

338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]

Algorithm

Therefore a simple algorithm to calculate variance can be described by the following pseudocode:

long n = 0;
double sum = 0;
double sum_sqr = 0;
double variance;

foreach x in data:
  n += 1;
  sum += x;
  sum_sqr += x*x;
end for

variance = (sum_sqr – sum*sum/n)/(n-1);

Algorithm

Another algorithm which avoids large numbers in sum_sqr while summing up

double avg = 0;
double var = 0;
long n = data.length; // number of elements
// the array is 1-indexed

for i = 1 to n
 avg = (avg*(i-1) + data[i]) / i;
 var = (var * (i – 1) + (data[i] – avg)*(data[i] – avg)) / i;
end for

return var; // resulting variance







Links: Addme | Keyword Research | Paid Inclusion | Femail | Software | Completive Intelligence

Add URL | About Slider | FREE Slider Toolbar - Simply Amazing
Copyright © 2000-2008 Slider.com. All rights reserved.
Content is distributed under the GNU Free Documentation License.