Flightstats.com is a website that reports the on-time performance of individual airline flights. If you look up, oh, say, USAirways Flight 464, you’ll find this assessment:
Now the puzzle: How, exactly, does one go about controlling for standard deviation and mean?
Hat tip to Michael Lugo at God Plays Dice
that’s an enigma wrapped up in a conundrum wrapped up in the original enigma!
Presumably they meant “taking into account” rather than “controlling for”. The way they worded it actually makes me think that they had the right idea, since using mean and std. deviation is exactly how you would figure out whether your number was 90% higher than the other data. Someone just sleepy at the keyboard I guess.
The abuse of statistics in marketing is remarkably widespread. I’d say 83% of companies report statistics incorrectly. When controlling for sample size, mean and deviation, it’s more like 100%.
In all seriousness, it sounds like they’re attempting to report numbers from a logistic regression.
Presumably they mean that when you convert it to unitless form, and look at how many sds they are from the mean, and do the same for other flights, then you see the flight is “on time” (by whatever criteria) more than 90% of other flights are on time. But it sure did make me shake my head and go whaaaaaaa!
Shorter form: we’re late almost 20% of the time. Compared to most flights that’s pretty good!
I’d have assumed they were trying to hand-wavingly describe something like a Bayesian sort:
http://andrewgelman.com/2007/03/21/bayesian_sortin/
If every flight we have records for is late at least 1% of the time, and then a new flight starts up which is on time for its maiden voyage, then our best explanation of that is not “this flight is on-time more often than 100% of other flights”, it’s “this flight comes from the same prior as the others, so despite being a little lucky on its first trip, its frequentist on-time probability will probably regress back towards the prior probability once we have more data points”.
Why use fancy pants statistics in the first place. Why not just say this flight is on time more often than 90% of flights?
@Dave Smith 6:
This is just a guess but I suspect because it isn’t so! They are probably counting “on time” as within some range after scheduled time. And that range might vary by flight. The first rule of statistics is if he hides the data he’s lying They have hidden the data.
Controlling for standard deviation and mean, this flight never actually flies
I am more curious about how they controlled for sample size.
Impressive that it’s statistically more often rather than, you know, non-statistically more often. I hate when they don’t specify that.
@Ken B #7, I think you are correct. “On time” means within 15 of minutes of scheduled arrival time, and is only measured by 15 airlines at 29 airports. See http://www.rita.dot.gov/bts/help/aviation/index.html
It doesn’t matter, because “on time” refers to departure, not arrival. Since the plane never leaves early, the only thing that really concerns the customer is arrival time. That’s not what is being measured.
On-time departure may simply mean that you’re burning fuel, flying in circles waiting for a landing slot – hardly optimal for anyone involved.
ols regression?
Maybe you could say that this flight has a score of 83% and this score corresponds to the 90th percentile of all flights that they have data for? Can we say that the on-time frequency of all the flights is normally distributed?
we’re still not controlling for mean and standard deviation. I’m not sure if this is a math problem or a language problem.