<$BlogRSDURL$>

Sunday, May 02, 2004

Counting crowds 

A fun article on counting crowds, like the recent pro-choice march in DC. Typically, after a big protest one sees the organizers (high) estimate of attendees, and the police (low) estimate. A good instinct is to guess the truth is somewhere in between (but where?), a better instinct is to actually try to measure the size in some verifiable way (e.g., aerial photography).

Two fun things about the article: an amusing (and rather accurate) metaphorical use of the word vector, and a nice, gentle discussion of bias and uncertainty in estimates. I recently came across a great line, to the effect that a "conservative estimate is always biased by definition". Suppose you had two techniques for estimating crowd size: a "conservative" one that tends to lowball, and an unbiased one. If you used these techniques repeatedly on a series of protests, the conservative one would on average be off the mark (too low), but the unbiased one would be just right, on average.

Bias is about accuracy. A totally separate question is how precise the estimates are---"how much do you miss by on average". The less biased estimate is not necessarily the more precise one. To take an extreme example, compare a stopped clock to one that is two minutes fast. The stopped clock is unbiased (right on average, and exactly right twice a day) but misses by an average of 6 hours. The fast clock is biased (in fact, always wrong in it estimate) but is off by only 2 minutes on average. Clearly, the fast clock is better in this case, but the quality of two estimation techniques generally depends on both bias and precision (statisticians use a concept called mean squared error to combine the two).

So be wary of "conservative" estimates, but don't be so quick to assume that an unbiased estimate is great either. Another problem with speaking of "conservative" estimates is the question of which direction is conservative is up in the air. The implicit concept is caution, but that could cut different ways for different folks, even with the same estimate. If you are an abortion rights activist trying to figure out how much support you have, the "conservative" estimate is the lower bound. But if you are a pro-lifer gauging the strength of the opposition, the conservative estimate is the upper bound. Probably we should abandon the words conservative estimate for "cautious estimate", since that would make people think a bit harder about what they mean. Better yet, report the mean estimate, the upper bound, and the lower bound, and let people use the information to answer their own questions.
This page is powered by Blogger. Isn't yours? Listed on BlogShares
Google
Search the web Search madsocialscientist.com