Is there an intuitive explanation for the difference between standard deviation and sample standard deviation?

Suppose you have a single datapoint x from a population P. What's the variance of P?

Well, if the whole population P consists solely of the single datapoint x (i.e., P = {x}), then the population variance formula correctly matches our intuition that there is zero variance.

But what if x is only a sample from P, i.e., P actually consists of many more unknown points? It's very unlikely that these other points are all x's, so if we used the population variance formula to estimate P's variance (giving an estimate of 0), we'd be underestimating the actual variance. Thus, we need to define a sample variance formula for when we don't have the entire population in hand. How should we define this formula? If we accept that the formula should have the form
1N−α ∑(X i −X ¯ ) 2   \frac{1}{N – \alpha}\sum(X_i – \bar{X})^2
for some α  \alpha  (which is natural, since it's the simplest form that ensures that the sample variance approaches the population variance in the limit), then the only choice of α  \alpha  that makes the sample variance non-zero (admittedly, to make it undefined or "infinite" instead — but then again, perhaps that's exactly what we want, since a single sample point gives us no information about variance) for the case of a single datapoint is α=1  \alpha = 1.

Answer by Edwin Chen:

Suppose you have a single datapoint x from a population P. What's the variance of P?

Well, if the whole population P consists solely of the single datapoint x (i.e., P = {x}), then the population variance formula correctly matches our intuition that there is zero variance.

But what if x is only a sample from P, i.e., P actually consists of many more unknown points? It's very unlikely that these other points are all x's, so if we used the population variance formula to estimate P's variance (giving an estimate of 0), we'd be underestimating the actual variance. Thus, we need to define a sample variance formula for when we don't have the entire population in hand. How should we define this formula? If we accept that the formula should have the form
[math] \frac{1}{N – \alpha}\sum(X_i – \bar{X})^2 [/math]
for some [math] \alpha [/math] (which is natural, since it's the simplest form that ensures that the sample variance approaches the population variance in the limit), then the only choice of [math] \alpha [/math] that makes the sample variance non-zero (admittedly, to make it undefined or "infinite" instead — but then again, perhaps that's exactly what we want, since a single sample point gives us no information about variance) for the case of a single datapoint is [math] \alpha = 1[/math].

Is there an intuitive explanation for the difference between standard deviation and sample standard deviation?

Advertisements

Leave a comment

Filed under Life

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s