Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day will be published on MediaHopper in the coming weeks.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh

Share

New research data storage

The latest BITS magazine for University of Edinburgh staff (Issue 8, Autumn/ Winter 2013) contains a lead article on new data storage facilities that Information Services have recently procured and will be making available to researchers for their research data management.

“The arrival of the RDM storage and its imminent roll out is an exciting step in the development of our new set of services under the Research Data Management banner. Ensuring that the service we deploy is fair, useful and t transparent are key principles for the IS team.” John Scally

 

Information Services is very pleased to announce that our new Research Data Storage hardware has been safely delivered.

Following a competitive procurement process, a range of suppliers were selected to provide the various parts of the infrastructure, incl. Dell, NetApp, Brocade and Cisco. The bulk of the order was assembled over the summer in China and shipped to the King’s Buildings campus at the end of August. Since then IT Infrastructure staff have been installing, testing and preparing the storage for roll-out.

How good is the storage?
Information Services recognises the importance of the University’s research data and has procured enterprise-class storage infrastructure to underpin the programme of Research
Data services. The infrastructure ranges from the highest class of flash-storage (delivering 375,000 IO operations per second) to 1.6PB (1 Petabyte = 1,024 Terabytes) of bulk storage arrays. The data in the Research Data Management (RDM) file-store is automatically replicated to an off-site disaster facility and also backed up with a 60-day retention period, with 10 days of file history visible online.

Who qualifies for an allocation?
Every active researcher in the University! This is an agreement between the University and the researcher to provide quality active data storage, service support and long term curation for researchers. This is for all researchers, not just Principal Investigators or those in receipt of external grants to fund research.

When do I get my allocation?
We are planning to roll out to early adopter Schools and institutes late November this year. This is dependent on all of the quality checks and performance testing on the system being completed successfully, however, confidence is high that the deadline will be met.
The early adopters for the initial service roll-out are: School of GeoSciences, School of Philosophy, Psychology and Language Sciences, and the Centre for Population Health
Sciences. Phased roll-out to all areas of the University will follow.

How much free allocation will I receive?
The University has committed 0.5TB (500GB) of high quality storage with guaranteed backup and resilience to every active researcher. The important principle at work is that the 0.5TB is for the individual researcher to use primarily to store their active research data. This ensures that they can work in a high quality and resilient environment and, hopefully, move valuable data from potentially unstable local drives. Research groups
and Schools will be encouraged to pool their allocations in order to facilitate shared data management and collaboration.

This formula was developed in close consultation with College and School representatives; however, there will be discipline differences in how much storage is required and individual need will not be uniform. A degree of flexibility will be built into the
allocation model and roll-out, though if researchers go over their 0.5TB free allocation they will have to pay.

Why is the University doing this?
The storage roll-out is one component of a suite of existing and planned services known as our Research Data Management Initiative. An awareness raising campaign accompanies the storage allocation to Schools, units and individuals to
encourage best practice in research data management planning and sharing.

Research Data Management support services:
www.ed.ac.uk/is/data-management

University’s Research Data Management Policy:
www.ed.ac.uk/is/research-data-policy

BITS magazine (Issue 8, Autumn/ Winter 2013)
http://www.ed.ac.uk/schools-departments/information-services/about/news/edinburgh-bits

Share

RDM reflection – finishing the data life cycle

Research Data Management and I were a chance acquaintance. I was asked to stand in for one of the steering group despite having some very tenuous qualifications for the role. That said, I quickly realised that it was an important and complex initiative and our University is leading with this initiative.

Progressing with RDM in the University is not straightforward but it is essential.

This reflection could go off on many tracks but it will concentrate on one – finishing the data life cycle.

If we consider in a very simplistic way the funding of a researcher, it might look like this:

The point at which data should transfer to Data Stewardship may coincide with higher priorities for the researcher.

A big hurdle that RDM has to cross is the final point of data transition. The data manager wants to see data moved into Data Stewardship.  The researcher’s priorities are publication and next grant application. The result:

Data will not flow easily from stage 2. Active Data Management to 3. Data Stewardship.

Of course, a researcher and a data manager may look at the above diagram and say it is wrong. They will see solutions. And when they do, this reflection will have succeeded in communicating what it needed to say.

James Jarvis, Senior Computing Officer

IS User Services Division

Share