A couple of weeks ago, I posted a proposal for four academic reforms. Most notably, I suggested that academic journals should pre-accept articles based on their design before results are available to authors, which would reduce the system's bias toward statistically significant findings that fail to replicate in future studies.
Unsurprisingly, other people have had similar ideas. Here are some comments on the related proposals that I've come across:
1. Chris Said, a postdoc at the Center for Neural Science at NYU, emphasizes the importance of funding agencies in promoting replication:
Granting agencies should reward scientists who publish in journals that have acceptance criteria that are aligned with good science. In particular, the agencies should favor journals that devote special sections to replications, including failures to replicate. More directly, the agencies should devote more grant money to submissions that specifically propose replications.
The problem, however, is that most replications will continue to fail given the incentives produced by the current system. That's why I'm most interested in his proposal for funding agencies to give preference to scientists who publish in "outcome-unbiased" journals:
I would like to see some preference given to fully “outcome-unbiased” journals that make decisions based on the quality of the experimental design and the importance of the scientific question, not the outcome of the experiment. This type of policy naturally eliminates the temptation to manipulate data towards desired outcomes.
He makes a convincing argument that NIH, NSF, etc. could play a key role in overcoming the collective action problem inherent to switching to a new system. I still think leading journals could make a contribution on their own, but the funding agencies could play a key role, especially in fields that are heavily grant-driven. In practice, this could mean both rewarding scientists who published in outcome-unbiased journals in the past as well as grant proposals that promise to submit the proposed study to such a journal.
2. George Mason economist Robin Hanson proposes results-blind peer review, a potentially more general approach that could be applied to non-experimental data:
I’d add an extra round of peer review. In the first found, all conclusions about signs, amounts, and significance would be blanked out. After a paper had passed the first round, the reviewers would see the full paper. While reviewers might then allow the conclusions to influence their evaluation, they could not as easily hide such bias. Reviewers who rejected on the second round after accepting on the first round would feel pressure to explain what about the actual results, over and above the method, suggested that the paper was poor.
Glymour and Kawachi offered a similar proposal in BMJ in 2005:
We offer a solution to this problem that lies at the disposal of journal editors. Preliminary editorial decisions could be based solely on the peer review of the introduction and methods sections of submitted papers. These two sections deal with the key issues on which editorial decisions would ideally be based: the importance of the research question and the potential for the study design and proposed analyses to inform that question.
Blinding reviewers to the results and discussion sections may pose some challenges to the reviewing process because elements of these later sections are also relevant for editorial decisions. However, these difficulties would probably be outweighed by the benefits of reducing publication bias. Peer reviewers might be asked to make a preliminary recommendation to the editor (reject or continue further review) on the basis of the merit of the study design and proposed data analyses—not on the findings themselves.
If manuscripts pass this initial stage then reviewers could be unblinded to the results and discussion sections. Our proposal could have the additional benefit of improving the clarity and detail of methods sections.
The problem, as a commenter on Hanson's blog notes, is that many reviewers have already read relevant papers in their field or seen talks about them at conferences, especially in social science (which is characterized by very long publication lags). Even if the reviewer has not read the paper in question before being assigned the review, it's often easy to look them up online and find the results. As a result, this approach could only work in fields where papers are not made public before they are published.
3. Columbia's Macartan Humphreys, Raul Sanchez de la Sierra, and Peter van der Windt have proposed "comprehensive registration" for experiments in political science:
Researchers in political science generally enjoy substantial latitude in selecting measures and models for hypothesis testing. Coupled with publication and related biases, this latitude raises the concern that researchers may intentionally or unintentionally select models that yield positive findings, leading to an unreliable body of published research. To combat this problem of "data fishing" in medical studies, leading journals now require preregistration of designs that emphasize the prior identification of dependent and independent variables. However, we demonstrate here that even with this level of advanced specification, the scope for fishing is considerable when there is latitude over selection of covariates, subgroups, and other elements of an analysis plan. These concerns could be addressed through the use of a form of "comprehensive registration." We experiment with such an approach in the context of an ongoing field experiment for which we drafted a complete "mock report" of findings using fake data on treatment assignment. We describe the advantages and disadvantages of this form of comprehensive registration and propose that a comprehensive but non-binding approach be adopted as a first step in registration in political science. Likely effects of a comprehensive but non-binding registration are discussed, the principal advantage being communication rather than commitment, in particular that it generates a clear distinction between exploratory analyses and genuine tests.
Unfortunately, the incentives to engage in this form of registration are weak. The comprehensive report format limits authors' ability to produce the statistically significant findings that reviewers demand and may lead authors to opt out of registration or to shelve non-significant findings. That's why it's essential that pre-accepted articles be offered as an option to authors by top journals.
4. MIT's David Karger has suggested changing the submission requirements for conference papers in computer science so that evaluations of the proposed system are conducted after acceptance, increasing the incentives for evaluation and reducing incentives to report that the evaluation results were successful.
5. Perhaps most notably, Northwestern's Philip Greenland, the past editor of the Archives of Internal Medicine, conducted a pilot study of "mechanisms that might identify and reduce biases," including a two-stage review process:
First, to understand the tendency of authors to submit mostly positive studies, we assessed the percentage of positive articles that authors submitted to the Archives. Of 100 consecutive submitted manuscripts assessed in June and July of 2008, 77% reported significant primary results, based on editors' assessments of the results. If the articles had been categorized based on the authors' interpretation of their analyses, a higher percentage of manuscripts would have fallen into the positive category. Of the manuscripts sent out for external peer review, over 83% of positive studies were accepted by the Archives. Only 3 negative studies were sent to external review, of which only 1 was ultimately accepted. Overall, only 5.3% of all negative studies that were submitted were accepted.
Recognizing that publication bias can result from reviewers' enthusiasm for positive results, we next evaluated the willingness of our 58 most highly rated and prolific peer reviewers to participate in an alternate peer-review process. The proposed hypothetical alternate process involved 2 steps. First, peer reviewers would have access only to a modified abstract containing no mention of results, the full introduction describing the nature of the research question, and a complete "Methods" section to allow an evaluation of the quality of the research. With this information available, the reviewers would be asked to provide a preliminary assessment of the manuscript in the absence of the "Results" section. Following this preliminary assessment, we proposed that reviewers would gain access to the full article, including the "Results" section, and be asked to make a final evaluation of the manuscript. We hypothesized that this 2-stage procedure would force peer reviewers to make an initial evaluation solely based on the quality of the methods and that the result would be a more equitable consideration of well-performed negative studies. Of the 43 respondents, 37 (>86%) stated that they were willing to complete a full review following an abbreviated one as described herein.
We then turned to an assessment of the role of the editorial board. Prior to peer review, editors may decide to reject articles on their face value. Furthermore, editors assign reviewers and render final decisions after receiving reviewer comments. At the Archives, an editorial estimate of study rejection without any external peer review was roughly 70% of all submissions, whereas a JAMA study reported a 50% editorial rejection rate at that journal. These substantial figures suggest that any investigation of publication bias at the journal level ought to begin with, or at least include, the editors. Consequently, the aforementioned alternate review process was applied to the editorial review that occurred prior to outside peer review. In a pilot study, among a selection of submitted articles, a study was characterized as positive if an author's conclusion about his or her primary outcome was portrayed as such. Of the 46 articles examined, 28 were positive, and 18 were negative (with an explicit attempt to oversample negative studies in this pilot research). Ultimately, 36 of the 46 articles (>77%) were rejected, consistent with prior publication decisions at this journal. Of note, editors were consistent in their assessment of a manuscript in both steps of the review process in over 77% of cases. This suggests that most of the time the editors' decision after reviewing the "Methods" section alone does not change after reviewing the full results.
Although this provides some comfort, it is important to look at not only the majority of manuscripts but also the tail ends of the curve, because this is most likely where any bias would lie. In doing so, we found that over 7% of positive articles benefited from editors changing their minds between steps 1 and 2 of the alternate review process, deciding to push forward with peer review after reading the results. By contrast, in this small study, we found that this never occurred with the negative studies. Indeed, 1 negative study, which was originally queued for peer review after an editor's examination of the introduction and "Methods" section, was removed from such consideration after the results were made available.
We admit that these findings are neither conclusive nor definitive but rather a descriptive analysis from a pilot study. Certainly, it is reassuring that the editors were mostly consistent in their opinions regardless of the results. However, in the minority of cases in which bias matters, the influence of the results on the editor's decision to move to peer review and ultimately to publication is still uncertain. There is a dearth of rigorous research on editorial bias and the possible interventions that may attenuate it. The alternate review process piloted at the Archives has never been performed before, to the best of our knowledge, although it has been suggested. Importantly, such a mechanism can be implemented both with editors and peer reviewers, addressing 2 sources of potential bias over which a medical journal can have the most direct impact. The negative trial by Etter et al published in this issue of the Archives was a part of our pilot study. Obviously, the editors supported peer review and publication of this study based on the rigor and quality of its methods alone, and that decision was sustained even when the negative results were revealed to them.
Greenland is to be commended for his willingness to innovate, but the results reported above suggest some of the challenges that a two-stage review system will face and the need for further experimentation by journals. Most disappointingly, the current instructions to reviewers provided by the journal make no mention of the two-stage process, suggesting that the approach has been abandoned by his successors. Let's hope some other journal editor out there is willing to experiment further.
Update 4/30 10:06 AM: One challenge raised by Chris Said via email is how these approaches could be adapted to fields like neuroscience in which articles typically include several studies that build on each other. Here are two possible approaches I've contemplated:
1. The journal offers rounds of results-blind reviewing in which authors propose Study 1, get results, and then come back for a second round of results-blind reviewing. This approach would ensure that each round was fully outcome-unbiased, but would increase the burden on reviewers and editors.
2. An alternate option would be for authors to conduct a set of exploratory studies 1...x and then submit the design and analysis plan for study x+1 on a pre-accepted basis. Readers would then be told that the results of studies 1....x were not pre-specified but that study x+1 was pre-specified.
Also, I've updated the Humphreys item above to include his co-authors on the paper in question (which is not yet available publicly). Finally, see Hanson's followup item here.