Suppose you want to summarize the 50 numbers you received (about the appropriate age to be on a social networking site) with a single number. How do you figure out the "best" number to use? Well, you need an objective metric in order to choose the best number, in this case a metric that measures the error in using your number to approximate the others. Here are some possible error metrics (let your number be x and let the other numbers be d 1 ,…,d 50 d_1, \ldots, d_{50} ):

* One-zero error: For each of the d i d_i , there's a penalty of 1 if d i ≠x d_i \neq x and no penalty if d i =x d_i = x . In other words, all you care about is whether x=d i x = d_i or not, you don't care how far off x is when it's not.

* Absolute error: Your error is the sum of all the |d i −x| |d_i – x| . Thus, a single error of 5 (say x = 0, and one of the answers you got was 5) is just as bad as five errors of 1 (say x = 0, and five of the answers you got were 1).

* Squared error: Your error is the sum of (d i −x) 2 (d_i – x)^2 . Thus, a single error of 5 is much worse than five errors of 1.

It turns out that if you use absolute error as your metric, the best number to choose (the number that minimizes absolute error) is the median. (In other words, the median minimizes L1 error.)

(Similarly, the mean minimizes squared error (which is why Neil said the mean is more affected by outliers than the median) and the mode minimizes one-zero error.)

Answer by Edwin Chen:

Suppose you want to summarize the 50 numbers you received (about the appropriate age to be on a social networking site) with a single number. How do you figure out the "best" number to use? Well, you need an objective metric in order to choose the best number, in this case a metric that measures the error in using your number to approximate the others. Here are some possible error metrics (let your number be x and let the other numbers be [math] d_1, \ldots, d_{50} [/math]):

- One-zero error: For each of the [math] d_i [/math], there's a penalty of 1 if [math] d_i \neq x [/math] and no penalty if [math] d_i = x [/math]. In other words, all you care about is whether [math] x = d_i [/math] or not, you don't care how far off x is when it's not.
- Absolute error: Your error is the sum of all the [math] |d_i – x| [/math]. Thus, a single error of 5 (say x = 0, and one of the answers you got was 5) is just as bad as five errors of 1 (say x = 0, and five of the answers you got were 1).
- Squared error: Your error is the sum of [math] (d_i – x)^2 [/math]. Thus, a single error of 5 is much worse than five errors of 1.
It turns out that if you use absolute error as your metric, the best number to choose (the number that minimizes absolute error) is the median. (In other words, the median minimizes L1 error.)

(Similarly, the mean minimizes squared error (which is why Neil said the mean is more affected by outliers than the median) and the mode minimizes one-zero error.)

What does the median show about data?