Answer by Jerry Coffin:
Let's consider a really simple case: a least squares linear regression.
The most obvious method would probably be to simply minimize the sum of the absolute values of the squares.
This, however, can lead to some problems. Let's consider nearly the simplest possible case: three points:
Here we'll assume the line needs to start at point A, and we're only doing to choose what angle to draw it at from there. Unfortunately, since C is further away from A than B is, the error from B to the line will grow more slowly than the error from C to the line. Therefore, if we just minimize the absolute error, we'll draw the line from A directly through C, and the data for point B will effectively be ignored.
Depending on the exact angles formed, we can end up with either of two situations: either one point's data is ignored completely, and the other is treated as the only one that really matters, or else (for a few specific cases) the two are treated as precisely equal in importance, and we can draw a line through one, or the other, or any angle in between and all of them will come out as an equally good fit.
Squaring the errors eliminates these problems. The square of the error grows faster than the absolute error. Therefore minimizing the square attempts to distribute the errors as evenly among the points as possible. In the case above, it would pass the line so that the distances from B to the line and from C to the line were equal, so the input from each was taken into account approximately equally.