14/07/2025 Opinion

CREAF opens up its science to avoid going back to the cave

Knowledge Management & Open Science Officer

Florencia Florido

Former English-Spanish translator, I have a bachelor’s degree in Conservation and Restoration of Archaeological Heritage from the ESCRBCC, a master’s in Condition Assessment of Cultural Heritage from the UPO, and

From May 26 to June 4, CREAF held the course “Caring for your data: Ensuring quality and protection”, the first internal training on data management organized by the new Open Science and Knowledge Management office . Framed by the center’s Watering Talents program and funded by Fundae, this eight-hour course was structured in three masterclass-type sessions that concentrated the most current knowledge and trends in research data management with scientific integrity and quality as a background .

The sessions, which were led by collaborators from the University of Barcelona (UB), the Institute of Economic Analysis of the CSIC (IAE-CSIC) and the Barcelona School of Economics (BSE), delved into philosophical, legal and practical aspects to provide a holistic answer to questions such as: why do we have to reproduce scientific results? Do we know how to treat sensitive data? How do we plan data management?

You may be wondering why such intense training was necessary and what Plato's cave depicts. Let's take a sneak peek:

  • Most scientific results cannot be reproduced. We will delve into the causes and consequences of this phenomenon.
  • Is there a solution to this crisis ? Hint: self-criticism.

Hold on, let us explain!

The irreproducibility of science

As Oriol Pujol, professor of the Department of Mathematics and Computer Science at the UB and collaborator of the course, reminded us, scientific knowledge is created on conclusions that are based on and can be verified with reproducible results, which are valid regardless of who reports and verifies them. Unfortunately, most researchers are unable to reproduce the scientific results published by their colleagues and more than half cannot even reproduce their own. This is the conclusion of a study published in 2016 that refers to a phenomenon that has its roots in the middle of the 20th century : the reproducibility crisis .

Fueled by academic pressure to publish , failures in the peer review system , and editorial bias that filters out negative and unflattering results, the accumulation of erroneous and mutilated scientific literature is leading us to build a false knowledge base (i.e., back to the cave). For example, between 1996 and 2010, several articles were published on bee navigation that were based on duplicated and manipulated data and calculation errors . These articles, published in journals such as Science, PNAS, and PLOS Biology, were cited more than 1,000 times. Maria Ángeles Oviedo-García, professor of the Department of Business Administration and Marketing at the University of Seville , is clear : “Other researchers will base their research on this false information, which is terrifying.” All of this undermines the credibility of science and fuels denialist narratives.

Meme_perro

In flames. Source: KC Green

The watchdogs of science

The prestige of both the so-called scientific apse and those of the journals in which they publish dazzles and captivates, but their brass is visible thanks to the growth of post-publication review and forensic metascience. In 2023, more than 10,000 scientific articles had to be retracted , of which 8,000 were published by the Wiley publishing house. Every day, watchdogs like RetractionWatch reveal cases of fabricated data, cherry-picking, buying and selling of authors, non-existent quality controls, etc. These practices are not foreign to influential researchers who reach high-level positions and publish in journals like Science or Nature and have repercussions in the media. This is the case of a researcher in research ethics who, ironically, falsified data. As Oriol emphasized in his session, “Claiming that something is true based on our expertise is not enough”. Paraphrasing it, we could say that stating that something is true because it has been published in a supposedly high-impact or renowned journal is not enough either.

Oriol_Pujol_UB

Stating that something is true based on our expertise is not enough.

Oriol Pujol

Obviously, not everything is fraud or malpractice; there are also errors and unconscious biases. For example, according to a study with ecologists, researchers can reach opposing conclusions when faced with the same data, which is explained by the subjective decisions each one makes during the analysis. This only reinforces the Russian proverb that says “Trust, but verify”, since retractions do not solve the problem, especially since less than 5% of retracted articles are reported as such .

Reproduce vs. replicate

When we talk about confirming the results of a scientific study, we must distinguish two concepts: reproducibility and replicability. On the one hand, reproducibility refers to the possibility of using the same data and methods to reach the same result. On the other hand, replicability appears when an equal or similar result is reached (with a reasonable margin of difference) using new data and method, new data and the same method as the original or the same data with a new method. As Joan Llull, professor of Economics at the IAE-CISC and the BSE and collaborator of the course, explained to us, journals verify reproducibility so that the scientific community seeks replicability . And he adds: “The impact derives from the fact that others can build on what we have investigated. That is why we must facilitate the replicability of the data with the greatest possible clarity and detail.”

Joan_Llull

Impact comes from others being able to build on what we have researched. That is why we must facilitate the replicability of the data with the greatest possible clarity and detail.

John Llull

To contribute quality knowledge and scientific impact and help ourselves, Joan taught us the basic rules for creating a reproducibility package following a data and code availability standard . From sharing raw data to providing detailed documentation, the fundamental rule is to be empathetic! We must do everything possible so that the users of our reproducibility package understand what we have done.

The spectrum of open science

With the mantra “As open as possible, as closed as necessary”, Open Science encourages us to make research accessible for reuse in balance with what needs to be protected. How can we put this into practice at CREAF?

For example, geographic coordinates or satellite images of critical habitats or records of the presence of threatened species are data of scientific interest, while at the same time a gateway to activities that endanger their conservation, such as illegal logging or poaching. One way to take care of this sensitive data without limiting its reuse is to deposit it in a trusted repository - such as the CORA RDR - with restricted access and with its metadata open.

There are also specific techniques or software to identify and anonymize personal data or to manage consent . As a center with close ties to society, CREAF processes data from volunteers and end users of citizen science and co-creation projects, partners, donors, newsletter subscribers, etc. In this sense, Ruben Ortiz, former Data Protection Delegate of the UB and collaborator of the course, debunked a myth: “It is not true that data protection regulations do not let us do anything. What the standard wants is for data to be used, for it to move, but within certain limits of security and trust.” One of the course attendees, Agustí Escobar, reflects that data protection laws are more complex than they seem and some aspects that, initially, can be seen as arbitrary impositions, when well argued make all the sense in the world. To navigate this complexity, CREAF has the Data Protection Officer's inquiry mailbox: dpo@creaf.uab.cat

"It's not true that data protection regulations prevent us from doing anything. What the regulation wants is for data to be used, to circulate, but within limits of security and trust," said Ruben Ortiz from the University of Barcelona.

Other cases that would require special protection have to do with intellectual property . It is key to assign licenses to our creations in order to protect the authorship and indicate what use is allowed. In addition, the publication of results of a potentially patentable invention must be planned so as not to preempt the patent registration process.

Finally, there are alternatives such as publishing synthetic or simulated data or using Data Sharing Agreements (DSA) , agreements that allow data exchange with clauses on intellectual property rights, purpose, restrictions, security, etc. to prevent misuse and unauthorized dissemination.

The recipe for a data management plan

Now that we know what we need to ensure that our results can be reproduced and protected, let's roll up our sleeves with the Data Management Plan ( DMP ).

A DMP is a living document that helps us organize our research data from start to finish. Like a cooking recipe, in a DMP we find:

  • Ingredients: description of the data that we will collect or reuse.
  • Preparation instructions: indications of the methods, tools and standards that we will use to collect and process the data with integrity.
  • Storage and retention: details on how to share or store data securely and how and for how long it will be accessible.

Cooked dates? No thanks, I prefer them raw!

Gollum from Lord of the Rings meme

Modified still from the film The Lord of the Rings. Source: tooomanysteves

The twenty participants in the training, including research staff and managers, learned that carrying out scientific practice with ethics and rigor requires us to question our processes . This exercise does not devalue or delegitimize science, but rather improves its accuracy, efficiency and utility.

Meritxell Batalla and Agustí Escobar , research technicians at CREAF, value the course, agreeing on the importance of taking good care of the data that is manipulated and managed in the scientific world. Agustí comments that “it is something that can involve additional work, but that brings very tangible benefits in the long run ”. Meritxell adds: “If I had been aware of it when I started, I would have saved myself a lot of extra effort in the future!” For her part, Laura Force , a Citizen Science and Environmental Education technician at CREAF, reflects that “the training has helped us, even more, to reinforce the fact that when people outside CREAF enter the scene -such as volunteers in citizen science projects-, we must take special care of their personal data and the research data they generate”.

Plato's allegory of the cave reminds us that open science and its obsession with quality, responsibility and collaboration breaks the chains of access to knowledge.