DataVault – larger deposits and new review process notifications

New deposit size limit: 10TB

Great news for DataVault users: you can now deposit up to a whopping ten terabytes in a single deposit in the Edinburgh DataVault! That’s five times greater than the previous deposit limit, saving you time that might have been wasted splitting your data artificially and making multiple deposits.

It’s still a good idea to divide up your data into deposits that correspond well to whatever subsets of the dataset you and your colleagues are likely to want to retrieve at any one time. That’s because you can only retrieve a single deposit in its entirety; you cannot select individual files in the deposit to retrieve. Smaller deposits are quicker to retrieve. And remember you’ll need enough space for the retrieved data to arrive in.

We’ve made some performance improvements thanks to our brilliant technical team, so depositing now goes significantly faster. Nonetheless, please bear in mind that any deposit of multiple terabytes will probably take several days to complete (depending on how many deposits are queueing and some characteristics of the fileset), because the DataVault needs time to encrypt the data and store it on the tape archives and into the cloud. Remember not to delete your original copy from your working area on DataStore until you receive our email confirming that the deposit has completed!

And you can archive as many deposits as you like into a vault, as long as you have the resources to pay the bill when we send you the eIT!

A reminder on how to structure your data:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/prepare-datavault/structure

 Ensuring good stewardship of your data through the review process

Another great feature that’s now up and running is the review process notification system, and the accompanying dashboard which allows the curators to implement decisions about retaining or deleting data.

Vault owners should receive an email when the chosen review date is six months away, seeking your involvement in the review process. The email will provide you with the information you need about when the funder’s minimum retention period (if there is one) expires, and how to access the vault. Don’t worry if you think you might have moved on by then; the system is designed to allow the University to implement good stewardship of all the data vaults, even when the Principal Investigator (PI) is no longer contactable. Our curators use a review dashboard to see all vaults whose review dates are approaching, and who the Nominated Data Managers (NDMs) are. In the absence of the Owner, the system notifies the NDMs instead. We will consult with the NDMs or the School about the vault, to ensure all deposits that should be deleted are deleted in good time, and all deposits that should be kept longer are kept safe and sound and still accessible to all authorised users.

DataVault Review Process:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/review-process 

The new max. deposit size of 10 TB is equivalent to over five million images of around 2 MB each – that’s one selfie for every person in Scotland. Image: A selfie on the cliffs at Bell Hill, St Abbs
cc-by-sa/2.0 – © Walter Baxter – geograph.org.uk/p/5967905

Pauline Ward
Research Data Support Assistant
Library & University Collections

Share

Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day are published on MediaHopper.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh

Share

EPSRC Expectations Awareness Survey

As many of you will already know EPSRC set out its research data management (RDM) expectations for institutions in receipt of EPSRC grant funding in May 2011, this included the development of an institutional ‘Roadmap’. EPSRC assessment of compliance with these expectations will begin on 1 May 2015 for research outputs published on or after that date.

In order to comply with EPSRC expectations and to implement the University’s RDM Policy, the University of Edinburgh has invested significantly in RDM services, infrastructure (incl. storage and security) and support as detailed in the University of Edinburgh’s RDM Roadmap.

In an effort to gauge the University of Edinburgh’s ‘readiness’ in relation to EPSRC’s RDM expectations, we are conducting a short survey of EPSRC grant holders.

The survey aims to find out more about researcher awareness of those expectations concerning the management and provision of access to EPSRC-funded research data as detailed in the EPSRC Policy Framework on Research Data.

We aim to conduct follow-up interviews with EPSRC grant holders who are willing to talk through these issues in a bit more detail to help shape the development of the RDM services at the University of Edinburgh.

We will endeavour to make available some of our findings shortly. In the meantime, if you want to use or refer to our survey we have posted a ‘demo’version below:
https://edinburgh.onlinesurveys.ac.uk/epsrc-expectations-awareness-demo

Should you decide to make use of our survey, let us know, as we can potentially share our data with each other to benchmark our progress.

(As an aside Oxford University have crafted a useful data decision tree for EPSRC-funded researchers at Oxford)

Regards
Stuart Macdonald
RDM Services Coordinator
stuart.macdonald@ed.ac.uk

Upadate: A link to the findings can be found at: http://datablog.is.ed.ac.uk/files/2016/07/EPSRC-RDM-Expectations-Awareness-Survey-Findings.pdf

Share