When a group of economists look at the same data, do they all see it the same way?
A new study has put this exact proposition to the test. One hundred and sixty-four research teams separately analyzed the same set of financial market data and wrote their findings in 164 short articles. The teams then received several rounds of comments, mimicking the type of informal peer review process economists engage in before submitting to an academic journal. All the researchers involved wanted to know what variation would exist between their different articles.
It turns out, a lot.
Data can be messy, notoriously so. Thus, scientists and researchers have developed a multitude of strategies to clean and analyze and ultimately exploit the data to draw conclusions. But this unusual study – an analysis of 164 separate analyzes – suggests that the decisions that go into choosing how to clean data sets, analyze them, and come to a conclusion may actually add as much noise as the data. themselves.
In an increasingly data-driven world, it’s important to keep this in mind, according to Robert Korajczyk, professor of finance at Kellogg. Korajczyk and former Kellogg doctoral student Dermot Murphy, now a professor at the University of Illinois at Chicago, were among 164 research teams involved in the project.
Kellogg Insight recently spoke to Korajczyk about the experience and what researchers and the general public can take away from the study’s surprising finding.
This conversation has been edited for length and clarity.
Kellogg preview: Can you start by explaining the data that you and the 163 other research teams have been asked to analyze?
Korajczyk: Yes. Each research team was given a dataset that spanned 17 years of trading activity on Europe’s most liquid futures contract, Euro Stoxx 50. This essentially represented 720 million trades. And there were six research questions that the teams were asked to consider. For example, has pricing become more or less efficient? Have the markets become more or less liquid? And has the fraction of agency transactions changed over time?
KI: These are pretty fundamental trends that you would want to understand if you were to try to assess the health of this market.
Korajczyk: Absolutely yes. But the broader focus of the research was what really interested me.
KI: Namely, how would different research teams approach the same set of questions?
Korajczyk: Yes. These types of âcrowdsourceâ projects have taken place in other areas, but this is the first that I know of in finance. And few projects are on the scale of this particular project. It’s more typical to have 15 or 20 teams. One hundred and sixty-four is really big. So my co-author Dermot Murphy and I decided to team up and get involved.
KI: Tell me about the 164 different articles that were submitted. What should we understand?
Korajczyk: There is a statistical concept called “standard error,” which tells you about the uncertainty of a parameter estimate such as an average. The standard error of an average will be larger when the data is noisy and it will be smaller when there are more observations.
But then there is another type of “error” or noise to consider. And these are all the decisions that come into play to get to this point. There are many different ways to measure market efficiency, for example, so this is one of the decisions a research team should make. When you clean up the data, how do you deal with these outliers? Do you throw them away or change them to some other value that is great but not as great? What will be the shape of your statistical model? What software are you using? Are you a good coder or a bad coder?
All of these choices that are made by the research team, along with their inherent ability, help create new variations in the outcome. We call this the “non-standard error”.
KI: And when the teams initially submitted their papers, these non-standard errors were about as large as the standard errors.
Korajczyk: Okay, so I guess one way to think about it is to read an article and say, “Okay, how much trust do I have in these results?” Standard errors tell you something about noise in the data. But the researchers made a lot of choices that I might not have made. So maybe the noise in the results is actually double what it looks like just looking at standard errors.
KI: Did that surprise you?
Korajczyk: It doesn’t surprise me that there have been variations. The size was bigger than I expected. There were also some clear outliers that seemed totally out of the ordinary to me.
Another surprise was that some of those wacky results were present in every round. With each step you learn something about what the reviewers think or what other teams have done, and you are allowed to revise your article with that knowledge. But even after peer review and the opportunity to see articles from other teams, many weird results remained.
At each stage, however, the dispersion between the teams has diminished somewhat.
KI: There appear to have been some real philosophical differences in how issues should be approached and how analyzes should be conducted.
Korajczyk: Absoutely. And in a way this project actually constrained these differences. We were told, “Here is the data and you are only allowed to use that data.” You were not allowed to enter any other data that might be relevant to answer this question and add it to the database. This would probably have increased the dispersion between the teams.
KI: There is certainly a “lookout for researchers!” Message to this job, because you determine how much you can trust the findings of the literature. This only adds to the growing concern of scientists about a “replication crisis”.
In your opinion, are there some changes that should be made to account for these pervasive non-standard errors? For example, should academic papers give more space to methods sections so that researchers can communicate more transparently about their choices?
Korajczyk: The norm has always been that someone who has read your article and decides to reproduce it should be able to do so from what you have written in the journal. If you have truncated some outliers, they should know exactly how you truncated them. Now, I cannot guarantee that every article is written this way. But that’s the standard for good writing, and that standard has been around for a long time.
But what a newspaper doesn’t normally tell people is, “I tried this specification and decide not to use it, and tried this specification and decided not to use it. And, oh yeah, I should have checked that other variable.
But there are some changes for the better. Nowadays, it is much more common to have long appendices available on the journal’s website. These can go into much more detail about the robustness of the results. This can give the reader some reassurance that you can look at the data in different ways and get the same results. Does everyone go through and read the 120-page annex? No, but people who are very interested in this topic might. Another thing that is becoming more and more common is to require researchers to publish our code. This makes it easier to reproduce the results and to determine if they are robust.
KI: What should the general public think of this research? If I read an article in Bloomberg Where The Wall Street Journal who cites a new financial study, how seriously should I take these findings?
Korajczyk: Well, whether it’s financial research, medical research, psychology, or sociology, it always helps to be skeptical. If I listen to the news, for example, one thing the news reports rarely tell you is the sample size of the study. Now with Covid-19 that changes a bit, but knowing the sample size tells me a lot if I want to take this result seriously.
I also think it’s helpful to ask, âWhat are the incentives? If this is someone trying to achieve permanence, there is a bias in favor of seeking statistically significant results. If this is someone who works for a fund management company, their financial incentives could be aligned with economically significant results going in a particular direction.
Finally, be aware that researchers must make many different choices. If you read “we made X” in a line in a document or footnote, that might not be as trivial as it sounds.