Bern-Based AI Tool for more reliable research

Psychologist Jamie Cummins has developed an AI tool that helps researchers assess the quality of scientific publications. In this interview, he explains how it works—and why AI doesn’t replace humans.

Jamie Cummins in conversation with Martina Huber, freelance journalist.

uniAKTUELL: Jamie Cummins, you developed an AI tool called RegCheck at the University of Bern. How would you explain it to laypeople in a single sentence over dinner?

Jamie Cummins: Essentially, the AI tool helps researchers compare two documents and verify whether they match: the original study protocol and the scientific paper published after the research is completed. RegCheck stands for Registration Check.

Why is this important?

In the scientific community, there is a growing recognition that it is important to specify in advance exactly what one intends to investigate and how—as well as what results one hopes to achieve. Especially in medical research, but also in quantitative social sciences such as psychology and economics, it has become standard practice to transparently document these research plans in a publicly accessible registry or repository at the start of the study. We call this documentation pre-registration. This allows colleagues, editors, reviewers, or other interested parties to review it later and check whether researchers deviate from their original plans.

How significant are such deviations?

Medical research forms the basis for clinical guidelines and thus has a direct impact on patients. The so-called Cochrane Reviews are of particular importance here: systematic reviews in which researchers collect all relevant studies on a specific topic, critically evaluate them, and synthesize them into an overall picture. If the underlying medical research is not robust, the clinical guidelines derived from it may also be less robust. For example, if researchers do not make it clear that a result from a clinical trial was merely a chance finding and not a planned result that confirms a hypothesis supported by extensive data, this can give the results more weight than they deserve. In the worst case, they compromise the quality of the Cochrane Review on which doctors and other healthcare professionals base their decisions.

Can you give a concrete example?

Imagine I’m conducting a clinical trial to test the effectiveness of a new therapy for depression. In the pre-registration, I state that I will compare the depression rates of patients with and without the therapy. During the trial, however, I observe that the therapy has no significant effect on depression rates but does work against anxiety disorders. Then there is a strong temptation to write the article as if I had always planned to find an effect of the therapy on anxiety disorders. In research, we refer to this as “outcome switching” or “HARKing”—Hypothesising After Results are Known.

After all, new discoveries and scientific breakthroughs are often based on serendipitous findings and results that weren’t planned in advance …

That is absolutely correct. RegCheck’s goal is not to restrict researchers to what they originally planned. On the contrary: some of the best discoveries arise precisely from this openness to all possible outcomes.

But it is important that researchers make transparent what was unexpected or unintentionally investigated, and what was planned—and why the results may ultimately deviate from the originally planned outcomes. So it’s perfectly fine to write: “The treatment had no effect on the rate of depression. Instead, we found an effect on anxiety disorders. We didn’t expect that, and we will investigate more closely in future studies whether this effect on anxiety is confirmed.” If deviations from the pre-registration are made this transparent, that’s great.

Cochrane Reviews – Key Decision-Making Tools in Medicine

Health decisions should be based on current, reliable evidence. The non-profit Cochrane Network produces systematic reviews, known as Cochrane Reviews: Authors collect relevant studies, critically evaluate them, and synthesize them into an overall picture. Flawed studies can skew results and compromise the basis for decision-making by doctors, patients, and policymakers. The INSPECT-SR tool supports the quality assessment of the included studies. The AI tool RegCheck, developed at the University of Bern, will be integrated into INSPECT-SR in the future, thereby enhancing the quality of Cochrane Reviews.

Your tool is intended to help improve Cochrane Reviews in the future… How exactly does that work?

On the one hand, researchers can use RegCheck before submitting a publication to check whether their article transparently and accurately reflects the original registration and revise it if necessary. Reviewers or editors can also use the tool during the review process and, if necessary, ask authors to make such deviations more transparent. Of course, we don’t want to force anyone to use RegCheck, but when it is used, it can help improve the quality of published studies. Above all, however, Cochrane authors are already explicitly required to check whether there are significant discrepancies between a publication and the pre-registration for studies they screen for a Cochrane Review. So far, this is rarely done because it is very time-consuming. With RegCheck, they have a tool that allows them to do this with minimal effort.

And what happens if they find such discrepancies?

Studies in which problems are identified—whether these are discrepancies between the registration and the publication or other anomalies—can be flagged accordingly in systematic reviews, given lower weighting, or, in the worst case, even excluded entirely. This improves the quality of the review.

How exactly does RegCheck work?

RegCheck operates in four steps. First, pre-registration and publication data are imported and processed as plain text—references, for example, are removed. Then the researcher defines exactly what they want to compare: sample size, hypotheses, outcomes. This is intentional: We don’t want to leave it up to the AI to decide what’s relevant and where the focus should lie. It is merely a tool designed to help people make decisions.

«AI cannot take on responsibility. In the end, the human must always make the important decisions.»

Jamie Cummins

And then?

In the third step, a classic natural language processing system extracts the text passages relevant to the desired comparison from both documents. The process is completely deterministic: the same input always yields the same output; hallucinations are ruled out. Only in the final step does a generative large language models receive these very specific text excerpts and is asked: Do these match in terms of the desired dimension? This is how we minimize the margin for error.

What does the researcher ultimately see?

A report containing original quotes from both documents, along with the language model’s assessment of whether they match. But ultimately, the responsibility lies with the researcher to review this themselves and decide whether the discrepancies are relevant and sufficiently justified or not.

So the AI doesn’t replace the researcher’s judgment?

No, absolutely not! I’d like to quote here from a 1979 IBM training manual, which stated: “A computer can never be held accountable, which is why a computer should never make a management decision.” That is precisely why “the human in the loop” is so important: not only because human expertise must be preserved, but above all because AI cannot take responsibility for errors or hallucinations. In the end, important decisions must always be made by a human.

Why did you actually come to the University of Bern specifically to develop RegCheck?

The University of Bern is very strong both in the field of AI tools and methods and in meta-science, which examines and improves how research is conducted, funded, and published. In particular, the Institute of Psychology has one of the world’s leading research groups in the field of meta-science. In addition to RegCheck, we have also co-developed other AI tools that support researchers in their work and contribute to quality assurance. And just recently, we received research funding from the Swiss National Science Foundation for another such project.

The tools we’re developing here in Bern aren’t just relevant for researchers in Switzerland: I’ve already received feedback from people in other parts of Europe, America, and Australia, all of whom say they’ve used RegCheck and that it has made their work easier. That’s really motivating!

About the person:

Jamie Cummins studied psychology in Ireland, where he completed his master’s degree in 2017. He then earned his Ph.D. at Ghent University and conducted research there for over six years—focusing on measurement instruments for beliefs and attitudes as well as computer-based training to improve children’s academic performance. Today he works at the University of Bern, where he developed the AI tool RegCheck.

Subscribe to the uniAKTUELL newsletter

Discover stories about the research at the University of Bern and the people behind it.