Last week, I submitted the methods for the project I’ve recently started to the Center for Open Science’s Preregistration Challenge. Briefly, the goal of the challenge is to get more scientists to preregister their research, and it’s got a monetary incentive. The goals of preregistration itself are to increase transparency and reproducibility in scientific research.
I’d never done a preregistration before, but it seemed like a Good Thing to Do in the name of Open Science. And the monetary incentive pushed me over the learning-curve barrier and the fact that it involves a bit more work than usual. I consider my preregistration a bit of an experiment. Having written one now, I have some opinions of the pluses and minuses.
Let’s start with the drawbacks. I found three significant drawbacks, the first of which is simply that preregistration is a foreign concept to most ecologists, and so I had to explain what I was doing — and justify it — a number of times to other people. That was only a slight annoyance in of itself, but it made the other two drawbacks harder.
It took me a few months to put together the preregistration plan. The reason for this is due to the nature of the project. I am using data produced by NEON and doing a series of complex statistical analyses on them. To do a preregistration means thinking about all the parts of analyses in depth: what variables am I going to use, how am I going to transform them, what will be the structure of my equations, and how am I going to do inference from model results to scientific meaning. In addition, I had to think about all the “what ifs”: What if I found that some variable was far from normally distributed? What if the data didn’t have good coverage or the response variables didn’t vary in the way I thought they would? What follow-on tests or modeling was I going to do if I got result A versus result B? Note that I didn’t look at the data while I was doing any of this, as part of the conditions on the preregistration challenge.
These are all very important things to think about, but like most everyone else in ecology, I am accustomed to figuring out many of the answers to these questions when — and if — the situation arises. This classical approach may lead to “researcher degrees of freedom” however, and I understand why it might be a good idea to preregister. On the other hand, having to figure out so many different contingencies might be a waste of time. If I have to figure out a bunch of contingencies that never happen, that’s time I could have been moving forward with analyses. I haven’t yet done the analyses, so we’ll see how much this drawback matters.
The final and probably biggest drawback was that I didn’t have any progress to report for three months. No doubt about it — I was making progress, but I didn’t have anything to show for it. I didn’t have any preliminary analyses or graphs or numbers or anything to show that was doing something. My lab does weekly progress updates and many of mine were feeble sounding: “I worked on some more mathematical modeling.” Blah. Because the NEON staff know I am working with their data, I was also asked by NEON my opinions about some of the data for their annual review. But because I hadn’t performed any analyses yet, I couldn’t provide any useful feedback, other than “ask me next year! I’ll have all the answers.” Pushing all the results to the end of the project can be a real detriment to projects focused on an analysis of existing data and/or applied projects.
Now the advantages of doing a preregistration plan.
Working through the full scope of my analysis without playing with the real data made me think very hard and carefully about the questions I wanted to ask and the kind of results I expected to get. Instead of just plugging data in, I had to ask, “What if the data are like this? What if the data are like that? What would that mean?” It made me figure out my assumptions in a way that I don’t think I usually do when I figure out analyses as I go along. It made me clarify my qualitative thoughts into quantitative predictions. I think the process made me a better scientist.
I think that having scoped out all my analyses in detail at the start will mean that doing the analyses themselves will go really quickly. In fact, if they do, I think figuring out analyses ahead of time will have saved me time in the long run. I remember playing with a big data set as a grad student and trying to figure out all the various questions I could ask of it. Instead of thinking about what questions were important to ask, I tried to ask as many questions as possible. It took a lot of time and left me with many loose threads that were hard to tie together into a coherent story (for a paper). Being super clear about my questions means, I hope, that writing the paper will be fairly straightforward, which would be yet another time-saver. But all of this depends on the analyses working out okay. That is, hopefully I have enough data with enough variation and that at least some of my predictors do actually contribute to predicting the response.
The preregistration queries on the Center for Open Science’s website were super useful in helping me think through my research. I’d recommend using them even if you don’t plan to file an official plan. In particular, when I got to the question about drawing scientific inference from analytical results, I realized I didn’t have a concrete plan. While a p-value of 0.05 is a pretty standard cutoff for a lot of traditional ecology research, I am using Bayesian statistics and am not a fan of arbitrary cutoffs generally. I didn’t have a good answer off the top of my head, so I emailed some colleagues and that turned into an interesting discussion about good/normal/accepted ways to report Bayesian posterior distributions. I don’t think I’d ever have made a conscious effort to figure out how to interpret results otherwise.
Finally, if you do want to take the Preregistration Challenge, I have a couple more notes to recommend it. First, David Mellor has been super responsive and helpful as I waded through my preregistration. Any questions? Ask him. And while the Preregistration Challenge website states that it can take up to two weeks to have your preregistration approved — and that you shouldn’t start your analyses until it is — mine was approved within 24 hours. I’m looking forward to actually putting the data through my models now!