There are many results from scientific studies thrown around in the world of Behavioral Economics & all other Behavioral Sciences. This is especially true for practitioners. But scientific research does not mean sound research results that can be replicated and can be used by practitioners successfully. I’ll detail below but in short, published research is biased in these ways:
- “sexy” results will be published more than more important but unsexy results
- research documenting a finding (“we compared x and y, and found x to be much superior to y in…”) will be much more published than the same research doing the same study and finding no effect (“we compared x and y, and found x and y not to differ in statistically significant ways regarding….”)
- research replicating an existing study successfully will have a hard time getting published; research failing to replicate successfully an existing study will very, very rarely be published
- research relying on very weak statistical validity measures is almost the norm, as compared to the studies relying on sound statistical validity measures
- research with only 1 sample pool of a minuscule size is still getting published despite having zero validity
- research from big names at western universities is much more published than stronger research done outside the US. Outside US/Western Europe, publishing research is much more difficult.
For those interested, more details follow. You can also expand the toggle below to see my blog posts on the same subject.
[accordian divider_line=”no” class=”” id=””]
[toggle title=”Blog posts on research studies weaknesses from the Marktisans blog” open=”no”]
“Why Most Published Research Findings Are False”
The above is the title of a 2005 paper1 by John Ioannidis, now at the Stanford School of Medecine. Apart from his papers, his work is very well summarized in this profile from The Atlantic. It boils down to this:
He and his team have shown, again and again, and in many different ways, that much of what biomedical researchers conclude in published studies—conclusions that doctors keep in mind when they prescribe antibiotics or blood-pressure medication, or when they advise us to consume more fiber or less meat, or when they recommend surgery for heart disease or back pain—is misleading, exaggerated, and often flat-out wrong. He charges that as much as 90 percent of the published medical information that doctors rely on is flawed. His work has been widely accepted by the medical community; it has been published in the field’s top journals, where it is heavily cited; and he is a big draw at conferences. — The Atlantic
And it’s not just some unknown studies that are false, but a large part of the most cited and used results in medical sciences. From The Atlantic again (using another of his papers2):
He zoomed in on 49 of the most highly regarded research findings in medicine over the previous 13 years, as judged by the science community’s two standard measures: the papers had appeared in the journals most widely cited in research articles, and the 49 articles themselves were the most widely cited articles in these journals. These were articles that helped lead to the widespread popularity of treatments such as the use of hormone-replacement therapy for menopausal women, vitamin E to reduce the risk of heart disease, coronary stents to ward off heart attacks, and daily low-dose aspirin to control blood pressure and prevent heart attacks and strokes. Ioannidis was putting his contentions to the test not against run-of-the-mill research, or even merely well-accepted research, but against the absolute tip of the research pyramid. Of the 49 articles, 45 claimed to have uncovered effective interventions. Thirty-four of these claims had been retested, and 14 of these, or 41 percent, had been convincingly shown to be wrong or significantly exaggerated. If between a third and a half of the most acclaimed research in medicine was proving untrustworthy, the scope and impact of the problem were undeniable.— The Atlantic
These issues are not isolated to the medical research community but are pervasive in all scientific disciplines. Why is this the case? A host of reasons, often related to the chase for research funding, bias in publications for positive findings and sensational headlines, lack of incentives to replicate all but the very major results, etc…
Is Bilingualism Really an Advantage?
A great example of publication bias has been demonstrated by Angela de Bruin3. She studied whether bilingualism really conferred advantages for cognitive tasks, as most of the published research shows that, yes, it really does! Here’s what she found:
When de Bruin looked at the data, though, in three of the four tasks testing inhibitory control, including the Simon task, the advantage wasn’t there. Monolinguals and bilinguals had performed identically. “We thought, Maybe the existing literature is not a full, reliable picture of this field,” she said. So, she decided to test it further.
Systematically, de Bruin combed through conference abstracts from a hundred and sixty-nine conferences, between 1999 and 2012, that had to do with bilingualism and executive control. The rationale was straightforward: conferences are places where people present in-progress research. They report on studies that they are running, initial results, initial thoughts. If there were a systematic bias in the field against reporting negative results—that is, results that show no effects of bilingualism—then there should be many more findings of that sort presented at conferences than actually become published.
That’s precisely what de Bruin found. At conferences, about half the presented results provided either complete or partial support for the bilingual advantage on certain tasks, while half provided partial or complete refutation. When it came to the publications that appeared after the preliminary presentation, though, the split was decidedly different. Sixty-eight per cent of the studies that demonstrated a bilingual advantage found a home in a scientific journal, compared to just twenty-nine per cent of those that found either no difference or a monolingual edge. “Our overview,” de Bruin concluded, “shows that there is a distorted image of the actual study outcomes on bilingualism, with researchers (and media) believing that the positive effect of bilingualism on nonlinguistic cognitive processes is strong and unchallenged.” — The New Yorker
Eat a chocolate bar every day to lose weight, or why statistical significance by itself is useless
A recent example highlights another concept widely used by both academics and consultants to unduly justify findings: statistical significance. Here’s the context in a nutshell:
“Slim by Chocolate!” the headlines blared. A team of German researchers had found that people on a low-carb diet lost weight 10 percent faster if they ate a chocolate bar every day. It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash. From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages. It was discussed on television news shows. It appeared in glossy print, most recently in the June issue of Shape magazine (“Why You Must Eat Chocolate Daily,” page 128). Not only does chocolate accelerate weight loss, the study found, but it leads to healthier cholesterol levels and overall increased well-being. The Bild story quotes the study’s lead author, Johannes Bohannon, Ph.D., research director of the Institute of Diet and Health: “The best part is you can buy chocolate everywhere.”
I am Johannes Bohannon, Ph.D. Well, actually my name is John, and I’m a journalist. I do have a Ph.D., but it’s in the molecular biology of bacteria, not humans. The Institute of Diet and Health? That’s nothing more than a website.
Other than those fibs, the study was 100 percent authentic. My colleagues and I recruited actual human subjects in Germany. We ran an actual clinical trial, with subjects randomly assigned to different diet regimes. And the statistically significant benefits of chocolate that we reported are based on the actual data. It was, in fact, a fairly typical study for the field of diet research. Which is to say: It was terrible science. The results are meaningless, and the health claims that the media blasted out to millions of people around the world are utterly unfounded. — io9
And zooming in on their use and manipulation of statistical significance:
If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives. […]
With our 18 measurements, we had a 60% chance of getting some“significant” result with p < 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor.
It’s called p-hacking—fiddling with your experimental design and data to push p under 0.05—and it’s a big problem. Most scientists are honest and do it unconsciously. They get negative results, convince themselves they goofed, and repeat the experiment until it “works.” Or they drop “outlier” data points. — io9
Not looking at statistical significance when analyzing a data set is bad, but trusting it blindly may be even worse.
How can we use academic research properly then?
Well, the first way to use it while mitigating for all the issues mentioned above is to use only research results that have been replicated many times and have proven very robust. The Myopic Loss Aversion behavioral trait is a good example, as it has been identified in many different settings and context. Another example is Milgram’s experiment on obedience to authority, replicated at different decades, in different countries and settings.
Another proper use of academic research is to treat it as a mere hint of what could work in an applied setting. Some of the experimental work published is very weak and may not be replicable, yet it can still be a source of ideas for choice architecture interventions. They should not be treated with more weight than other sources of inspiration simply because they originate from academia though.
If I’ve missed a good practical use for academic research, do let me know!
For additional color on the issue with only using/reporting/looking at p-values, Regina Nuzzo has a fantastic article in Nature4. Also, Valen Johnson5 gives a credible measure of how much t-tests generate false positive compared to bayesian tests, and gets a scary 25%…
- Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLOS Med, 2(8), e124–6. doi:10.1371/journal.pmed.0020124 (link) ↩
- Ioannidis, J. P. A. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. ACC Current Journal Review, 14(10), 6. doi:10.1016/j.accreview.2005.09.021 (link) ↩
- de Bruin, A., Treccani, B., & Sala, Della, S. (2014). Cognitive Advantage in Bilingualism An Example of Publication Bias? Psychological Science, 26(1), 0956797614557866–107. doi:10.1177/0956797614557866 (link) ↩
- Nuzzo, R. (2014). Scientific method: Statistical errors. Nature News, 506(7487), 150–3. doi:10.1038/506150a (link) ↩
- Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317. doi:10.1073/pnas.1313476110 (link) ↩