Tag Archives: Bibliometrics

How do we make data count?

Data generated through the course of research is as valuable an asset as research publications. Access to research data enables the validation and verification of published results, allows the data to be reused in different ways, helps to prevent duplication of research effort, enables expansion on prior research and therefore increases the returns from investment. Yet the quality and quantity of a researcher’s publications continue to provide the key measure of their research productivity. Sharing data, it seems, still does not count for nearly enough.

In recent years there have been a proliferation of policies strongly encouraging and sometimes even requiring researchers to share their data for the reasons outlined above. This includes policies from governments (e.g. USA, Australia), publishers (e.g. PLOS, Nature), and research funders (e.g. NIH, ARC). These policies are certainly opening up more data but even more research data remains locked away and therefore undiscoverable. So how do we unlock more data? One of the ways is to figure out how to make data count so that researchers have more incentives to undertake the extra (and in the main, unfunded) work required to share their data.

A 2013 study by Heather Piwowar and Todd Vision looked into the link between open data and citation counts. They found that the citation benefit intensified over time: with publications from 2004 and 2005 cited 30 per cent more often if their data was freely available; every 100 papers with open data prompted 150 “data reuse papers” within five years; original authors tended to use their data for only two years, but others re-used it for up to six years. More studies like this one are needed to demonstrate and track over time the link between opening up data and making it count, in this case in the form of citations which – like it or not – is still the primary measure of research impact.

Counting data citations – whether to gather citation metrics or alternative metrics (altmetrics) – is challenging in and of itself because data is cited very differently to publications. Data can be cited within an article text rather than in the references section, which means the article must be open access in order for the citation to be discovered. Sometimes the article that referenced the data is cited rather than the data itself even where the reference applies only to the data. Reference managers don’t tend to recognise datasets and therefore don’t record the Digital Object Identifier (DOI), which creates difficulties since DOIs make it so much easier to track citations. There are also many self-citations, where researchers are citing their own data, and so it difficult to distinguish an article that has cited another person’s data. And there are likely to be differences between how data is cited in the sciences as compared to the humanities.

Fortunately, California Digital Libraries, PLOS and DataONE have partnered in an NSF-funded project called Make Data Count. The project will “design and develop metrics that track and measure data use i.e data-level metrics”. The findings promise to be highly valuable and may also shape future recommendations for the way data should be cited in order for it to be counted.

Sharing impact stories of data reuse is perhaps another way that can help make data count. A number of organisations around the world that promote better data management have been collecting data reuse stories (e.g. DataONE, ANDS). Some researchers may see these stories as a negative because they show that “someone else might get the scoop on ‘my’ data”. But these stories can also inspire researchers to spend the extra effort to make their data available when they feel they are ready to. The rewards may not only be in the metrics but in the unexpected ‘buzz’ of seeing ‘your’ data have a longer life and be reused in ways you had not even imagined. Are there other ways that we can help make data count? It’s worth thinking about because “data sharing is good for science, good for you”.

#Thead5 Dive in and out of communications (multi dimensional)

A little provocation…

Here are a couple of excerpts from The Past, Present & Future of Scholarly Publishing by Michael Eisen of UC Berkeley to kick off discussion ahead of the Sydney Conference: “And interested members of the public – like many of you – find it difficult to engage with scientific research. Is it any wonder that such a large fraction of the population rejects basic scientific findings when the scientific community thumbs its collective noses at the them by making it impossible for them to read about what we’re doing with all of their money?”3157622372_1d8ecf1e71_o “…the only thing that distinguishes a contemporary paper from a 17th century one is the occasional color photograph. The multilayered, hyperlinked structure of the Web was made for scientific communication, and yet papers today are largely dispersed and read as static PDFs – another relic of the days of printed papers. We are working with the community to enable the “paper of the future”, that embeds not only things like movies, but access to raw data and the tools used to analyze them.”

Image: Print Paradigm RIP (CC BY-SA 2.0)

“…while it is a nice idea to imagine peer review as defender of scientific integrity – it isn’t. Flaws in a paper are far more often uncovered after the paper is published than in peer review. And yet, because we have a system that places so much emphasis on where a paper is published, we have no effective way to annotate previously published papers that turn out to be wrong… …So what would be better? The outlines of an ideal system are simple to spell out. There should be no journal hierarchy, only broad journals like PLOS ONE. When papers are submitted to these journals, they should be immediately made available for free online – clearly marked to indicate that they have not yet been reviewed, but there to be used by people in the field capable of deciding on their own if the work is sound and important.”