DataVault – larger deposits and new review process notifications

New deposit size limit: 10TB

Great news for DataVault users: you can now deposit up to a whopping ten terabytes in a single deposit in the Edinburgh DataVault! That’s five times greater than the previous deposit limit, saving you time that might have been wasted splitting your data artificially and making multiple deposits.

It’s still a good idea to divide up your data into deposits that correspond well to whatever subsets of the dataset you and your colleagues are likely to want to retrieve at any one time. That’s because you can only retrieve a single deposit in its entirety; you cannot select individual files in the deposit to retrieve. Smaller deposits are quicker to retrieve. And remember you’ll need enough space for the retrieved data to arrive in.

We’ve made some performance improvements thanks to our brilliant technical team, so depositing now goes significantly faster. Nonetheless, please bear in mind that any deposit of multiple terabytes will probably take several days to complete (depending on how many deposits are queueing and some characteristics of the fileset), because the DataVault needs time to encrypt the data and store it on the tape archives and into the cloud. Remember not to delete your original copy from your working area on DataStore until you receive our email confirming that the deposit has completed!

And you can archive as many deposits as you like into a vault, as long as you have the resources to pay the bill when we send you the eIT!

A reminder on how to structure your data:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/prepare-datavault/structure

 Ensuring good stewardship of your data through the review process

Another great feature that’s now up and running is the review process notification system, and the accompanying dashboard which allows the curators to implement decisions about retaining or deleting data.

Vault owners should receive an email when the chosen review date is six months away, seeking your involvement in the review process. The email will provide you with the information you need about when the funder’s minimum retention period (if there is one) expires, and how to access the vault. Don’t worry if you think you might have moved on by then; the system is designed to allow the University to implement good stewardship of all the data vaults, even when the Principal Investigator (PI) is no longer contactable. Our curators use a review dashboard to see all vaults whose review dates are approaching, and who the Nominated Data Managers (NDMs) are. In the absence of the Owner, the system notifies the NDMs instead. We will consult with the NDMs or the School about the vault, to ensure all deposits that should be deleted are deleted in good time, and all deposits that should be kept longer are kept safe and sound and still accessible to all authorised users.

DataVault Review Process:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/review-process 

The new max. deposit size of 10 TB is equivalent to over five million images of around 2 MB each – that’s one selfie for every person in Scotland. Image: A selfie on the cliffs at Bell Hill, St Abbs
cc-by-sa/2.0 – © Walter Baxter – geograph.org.uk/p/5967905

Pauline Ward
Research Data Support Assistant
Library & University Collections

Share

Research Data Workshops: Sensitive Data Challenges and Solutions

This workshop at the Bioquarter was attended by 27 research staff representing all three colleges, with a majority of Medicine and Veterinary Medicine. It began with an introductory presentation from Robin Rice covering the new Data Safe Haven facility of the Research Data Service and and was followed by brief presentations from Lynne Forrest (Research Support Officer on Scottish Longitudinal Study); Fiona Strachan (Clinical Research Manager, Centre for Cardiovascular Science); and Jonathan Crook (Professor of Business Economics). Each speaker shared their experiences of both conducting research using sensitive data and supporting other researchers. Although they work with very different types of data it was easy to identify certain common requirements:

  • Easy access to secure data storage and analysis platforms;
  • Consistent & comprehensive training and guidance on working with sensitive data;
  • Support to meet the necessary requirements to gain access to the data they need;

In the discussion groups that followed, participants were asked about their experiences working with sensitive data, the requirements researchers needed services such as data safe havens to fulfil, and ramifications of the cost recovery model, with regard to including costs in grant proposals.

The major themes that emerged were concerns around training, data governance, and concerns about meeting costs for protecting sensitive data. There was a strong feeling that more and better training was required for all those working with sensitive data. There was also confusion about the number, location, and criteria of different Data Safe Havens now available, and no single place to find clear information on these.

When talking specifically about the Data Safe Haven offered by IS for UoE researchers, the biggest concern was around cost. The standard price was considered high for the majority of grants, which are either small or need to be highly competitive. In some disciplines grant funding is not common and so it is unclear how the costs would be able to be met. The Research Data Service representatives encouraged people to get a bespoke quote and discuss requirements with the team as early as possible, as flexibility on both cost and build specifications (e.g. high performance computing) is built-in.

Some specific points arising from the discussions were:

  • One negative experience about working with sensitive data is the length of time needed to get data approvals (e.g. from NHS bodies). Participants wondered if the University could help to speed those up.
  • More training was desired in sensitive data management and better ways to structure training for students.
  • Learning outcomes need to focus on change of behaviour; with focus on local procedures.
  • One participant felt that schools need a researcher portfolio system, some way of keeping track of who has what data. A suggestion was made to have an asset manager in the university, similar to the one in NHS.
  • Less than optimal security practices can be observed, such as leaving a clinical notebook in a coffee room. More training is needed but this is not fully covered in either clinical practice courses nor ethics.
  • There were concerns around data governance – how to set up gatekeepers for research projects using Data Safe Haven, how long to store things in the DataVault. ACCORD was pointed to for having good structure in data governance.
  • Long-running projects (e.g. ten years) would have trouble meeting the annual costs.
  • Projects are invested in locally run services and expertise; added value centralised services need to be low-cost.

Overall researchers were in favour of having a Data Safe Haven available for projects that need it, but they would also like to have support to correctly anonymise and manage their data so that they could continue to use standard data storage and analysis platforms. This would mean that only those with the most sensitive of data would need to rely upon the UoE DSH to conduct their research.

Those with a University log-in may read the full set of notes on the RDM wiki.

Kerry Miller
Research Data Support Officer
Library & University Collections

Share

Research Data Training – Semester 1

*UPDATE* – We have just added two new and exciting courses to our training schedule:

  • Assessing Disclosure Risk in Quantitative Data (RDS006)
  • Assessing Data Quality in Quantitative Data (RDS007)

To find out more about these courses just visit our training page.

Each semester the Research Data Support team puts together a training programme for researchers and research support staff in all schools, and at all points in their career. Our programme this year introduces a number of new courses, including one designed especially for Undergraduates planning their final year dissertation. We have also reviewed and refreshed all of our existing courses to ensure that they are not only up-to-date but also more engaging and interactive.

Full Course list:

  • Realising the Benefits of Good Research Data Management (RDS001)
  • Writing a Data Management Plan for your Research (RDS002)
  • Working with Personal and Sensitive Data (RDS003)
  • Data Cleaning with OpenRefine (RDS004)
  • Handling Data Using SPSS (RDS005)
  • Assessing Disclosure Risk in Quantitative Data (RDS006)
  • Assessing Data Quality in Quantitative Data (RDS007)
  • Data Mindfulness: Making the Most of your Dissertation (RDS009)
  • Introduction to Visualising Data in ArcGIS (RDS011)
  • Introduction to Visualising Data in QGIS (RDS012)

Full details of all these courses, with direct booking links, can be found on our training webpage https://www.ed.ac.uk/information-services/research-support/research-data-service/training

Courses can also be found and booked via the MyEd Events page.

We are always happy to deliver tailored versions of these courses suitable for a specific school, institute or discipline. Just contact us at data-support@ed.ac.uk to let us know what you need!

Kerry Miller
Research Data Support Officer
Library and University Collections

Share

Updated MANTRA content: Research data in context

The Research Data Support team is pleased to announce the launch of the first in a series of updates to MANTRA, the free and open online research data management training course.

The first updated module ‘Research data in context’ (previously ‘Research data explained’) is now live on the MANTRA site and provides an introduction to research data, alongside detail on the contexts in which data are generated, and the challenges presented by big data in society.

MANTRA is designed to give post-graduate students, early career researchers, and information professionals the knowledge and skills needed to work effectively with research data.

Since launching in 2011, MANTRA has been through a number of significant rewrites to keep up with current trends, and over 10,000 different learners have visited MANTRA in the last academic year.

The ‘Research data in context’ module has been substantially revised in order to:

  • remove dated and obsolete content;
  • simplify and improve the readability of existing material;
  • add information on data literacy and data science.

The changes in this module include:

  • Revised pages: Introduction; Why is research data management important?; What are data?; What are research data?; Data as research output; Module Summary; Next & further reading.
  • New pages: Data in society; Data Science; Video: machine learning; Data literacy and skills.

A change log detailing all changes in this release is available on request from the Research Data Support team (data-support@ed.ac.uk).

We hope you find this update interesting and useful and welcome any feedback you may have.

Further MANTRA updates are forthcoming, focusing on FAIR data and newer data protection legislation and we will announce these in future blog posts.

Bob Sanders
Research Data Support

Share