Basic Statistics for Writers, Editors and Journalists
By Jason Gillikin | July 24, 2011
There’s an old joke to the effect that if writers could do math, they’d be accountants. Maybe yes, maybe no, but one thing is for certain — any writing professional who fails to understand some very basic statistical concepts works at a distinct disadvantage.
When a source throws numbers at you, and you don’t really understand what she means, the temptation is to simply nod and regurgitate. This tendency isn’t good form, but consistent with human nature. Yet statistical competency can help writers cut through the fog that too many sources, particularly in government, blow over a difficult story. Bottom line is that you have to understand how to calculate the bottom line.
Herewith are some very basic pointers to refresh your statistical thinking:
- Always track your statistics back to the original source. Never rely on someone else’s interpretation, even (or especially) if the source is an advocacy group. Remember the claim that domestic violence spikes on Superbowl Sunday? It’s a myth propagated by lazy journalists who didn’t do their homework properly.
- If you can, visualize the data. Get the raw numbers and plug them into a histogram. This will let you see whether the data set fits the bell curve. If it doesn’t, then the value of most descriptive statistics diminishes significantly. For example, at my hospital, there is a spike in patient discharges at 10 a.m. and another at 2 p.m. Yet saying we have an “average” discharge time of noon — although mathematically correct — would be seriously misleading, because there are relatively few discharges at noon. It’s better to understand that the discharge pattern is a “bi-modal distribution” (a bell curve with two humps) and explain your data accordingly.
- Learn how to use the three measures of central tendency — mean, median and mode. The mean is the mathematical average of the data points within a given set. A trimmed mean has the top and bottom X percent removed, to reduce the influence of outliers. The median is the middle number in the set when all items are sorted from smallest to largest. The mode is the number that appears most frequently.
- Understand the usefulness of the standard deviation. This measure helps identify the “spread” of a bell curve, or how tightly packed the data are relative to the mean. According to the Central Limit Theorem, a normal (bell) curve will perform in certain, predictable ways. For example, 68 percent of all data values under the normal curve fall within a single standard deviation of the mean; 95.5 percent of the data fall within two standard deviations. So, if you know the mean of a data set, and its standard deviation, if you are given any other data point you will get a sense of how much of an outlier it may be. For example, if the height of an average adult male is 72 inches with a standard deviation of 3 inches, then you know that someone standing at 78 inches (2.0 standard deviations above) will be in the top 2.25 percent of men for his height.
- Get your percentages right. To calculate the percent change in something, subtract the old value from the new value, divide by the old value, and multiply by 100. For example, if the population of Paris dropped to 1.8 million from a high of 2.2 million, this represents a loss of 18.2 percent of cheese-eating surrender monkeys.
- Learn how to calculate a rate. In simplest terms, a rate is the frequency of incidence per X number of opportunities. Crime is typically expressed as a rate — news stories often trumpet burglary rates of, say, 15 per every 100,000 residents. To calculate it, divide the number of occurrences by the population and then multiply by whatever proportion of the community you wish to sample. For example, if Detroit with a population of 750,000 experienced 125 burglaries last year, the rate is 16.7 per 100,000 people. Rates can be compared against each other, making it a useful way of contrasting across population or geographic clusters.
- Master the art of sampling. In any poll or survey, there will be a confidence interval. This is typically at the 95 percent level, but sometimes it can be at the 99 percent level. The CI tells you how many times you can repeat a test and get basically the same results — but on the flip side, a poll with a 95 percent CI means you will get abnormal data one time in every 20. Such is life. Don’t confuse the CI with the margin of error, either. The MoE tells you the plus-or-minus spread of a given survey; you can calculate it with the formula of 1 divided by the square root of the number of people in the sample. So putting it together: If you randomly surveyed 2,500 people about whether they preferred boxers or briefs, and 55 percent said they preferred boxers, then you would have 55 percent for boxers with a margin of error of 2.0 percent. Assuming a 95 percent confidence interval, then 19 of 20 times that you repeat this survey you should expect 53 to 57 percent of respondents to share their love of boxers and 43 to 47 percent laud their briefs. And the 20th time? Who knows. That’s why smart money says you shouldn’t put too much faith in a single aberrant poll.





Recent Comments