About Martin Donnelly

Research Data Support Manager, Library and University Collections, University of Edinburgh

Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day will be published on MediaHopper in the coming weeks.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh

Share

Greater Expectations? Writing and supporting Data Management Plans

“A blueprint for what you’re going to do”

This series of videos was arranged before I joined the Research Data Service team, otherwise I’d no doubt have had plenty to say myself on a range of data-related topics! But the release today of this video – “How making a Data Management Plan can help you” – provides an opportunity to offer a few thoughts and reflections on the purpose and benefits of data management planning (DMP), along with the support that we offer here at Edinburgh.

YouTube Preview Image

“Win that funding”

We have started to hear anecdotal tales of projects being denied funding due – in part at least – to inadequate or inappropriate data management plans. While these stories remain relatively rare, the direction of travel is clear: we are moving towards greater expectations, more scrutiny, and ultimately into the risk of incurring sanctions for failure to manage and share data in line with funder policies and community standards: as Niamh Moore puts it, various stakeholders are paying “much more attention to data management”. From the researcher’s point of view this ‘new normal’ is a significant change, requiring a transition that we should not underestimate. The Research Data Service exists to support researchers in normalising research data management (RDM) and embedding it as a core scholarly norm and competency, developing skills and awareness and building broader comfort zones, helping them adjust to these new expectations.

“Put the time in…”

My colleague Robin Rice mentions the various types of data management planning support available to Edinburgh’s research community, citing the online self-directed MANTRA training module, our tailored version of the DCC’s DMPonline tool, and bespoke support from experienced staff. Each of these requires an investment of time. MANTRA requires the researcher to take time to work through it, and took the team a considerable amount of time to produce in order to provide the researcher with a concise and yet wide-ranging grounding in the major constituent strands of RDM.  DMPonline took hundreds and probably thousands of hours of developer time and input from a broad range of stakeholders to reach its current levels of stability and maturity and esteem. This investment has resulted in a tool that makes the process of creating a data management plan much more straightforward for researchers. PhD student Lis is quick to note the direct support that she was able to draw upon from the Research Data Service staff at the University, citing quick response times, fluent communication, and ongoing support as the plan evolves and responds to change. Each of these are examples of spending time to save time, not quite Dusty Springfield’s “taking time to make time”, but not a million miles away.

There is a cost to all of this, of course, and we should be under no illusions that we are fortunate at the University of Edinburgh to be in a position to provide and make use of this level of tailored service, and we are working towards a goal of RDM related costs being stably funded to the greatest degree possible, through a combination of project funding and sustained core budget.

“You may not have thought of everything”

Plans are not set in stone. They can, and indeed should, be kept updated in order to reflect reality, and the Horizon 2020 guidelines state that DMPs should be updated “as the implementation of the project progresses and when significant changes occur”, e.g. new data; changes in consortium policies (e.g. new innovation potential, decision to file for a patent); changes in consortium composition and external factors (such as new consortium members joining or old members leaving).

Essentially, data management planning provides a framework for thinking things through (Niamh uses the term “a series of prompts”, and Lis “a structure”. As Robin says, you won’t necessarily think of everything beforehand – a plan is a living document which will change over time – but the important things is to document and explain the decisions that are taken in order for others (and your future self is among these others!) to understand your work. A good approach that I’ve seen first-hand while reviewing DMPs for the European Commission is to leave place markers to identify deferred decisions, so that these details are not forgotten about (This is also a good reason for using a template – a empty heading means an issue that has not yet been addressed, whereas it’s deceptively easy to read free text DMPs and get the sense that everything is in good shape, only to find on more rigorous inspection that important information is missing, or that some responses are ambiguous.)

“Cutting and pasting”

It has often been said that plans are less important than the process of planning, and I’ve been historically resistant to sharing plans for “benchmarking” which is often just another word for copying. However Robin is right to point out that there are some circumstances where copying and pasting boilerplate text makes sense, for example when referring to standard processes or services, where it makes no sense – and indeed can in some cases be unnecessarily risky – to duplicate effort or reinvent the wheel. That said, I would still generally urge researchers to resist the temptation to do too much benchmarking. By all means use standards and cite norms, but also think things through for yourself (and in conjunction with your colleagues, project partners, support staff and other stakeholders etc) – and take time to communicate with your contemporaries and the future via your data management plan… or record?

“The structure and everything”

Because data management plans are increasingly seen as part of the broader scholarly record, it’s worth concluding with some thoughts on how all of this hangs together. Just as Open Science depends on a variety of Open Things, including publications, data and code, the documentation that enables us to understand it also has multiple strands. Robin talks about the relationship between data management and consent, and as a reviewer it is certainly reassuring to see sample consent agreement forms when assessing data management plans, but other plans and records are also relevant, such as Data Protection Impact Assessments, Software Management Plans and other outputs management processes and products. Ultimately the ideal (and perhaps idealistic) picture is of an interlinked, robust, holistic and transparent record documenting and evidencing all aspects of the research process, explaining rights and supporting re-use, all in the overall service of long-lasting, demonstrably rigorous, highest-quality scholarship.

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh

Share