DataVault – larger deposits and new review process notifications

New deposit size limit: 10TB

Great news for DataVault users: you can now deposit up to a whopping ten terabytes in a single deposit in the Edinburgh DataVault! That’s five times greater than the previous deposit limit, saving you time that might have been wasted splitting your data artificially and making multiple deposits.

It’s still a good idea to divide up your data into deposits that correspond well to whatever subsets of the dataset you and your colleagues are likely to want to retrieve at any one time. That’s because you can only retrieve a single deposit in its entirety; you cannot select individual files in the deposit to retrieve. Smaller deposits are quicker to retrieve. And remember you’ll need enough space for the retrieved data to arrive in.

We’ve made some performance improvements thanks to our brilliant technical team, so depositing now goes significantly faster. Nonetheless, please bear in mind that any deposit of multiple terabytes will probably take several days to complete (depending on how many deposits are queueing and some characteristics of the fileset), because the DataVault needs time to encrypt the data and store it on the tape archives and into the cloud. Remember not to delete your original copy from your working area on DataStore until you receive our email confirming that the deposit has completed!

And you can archive as many deposits as you like into a vault, as long as you have the resources to pay the bill when we send you the eIT!

A reminder on how to structure your data:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/prepare-datavault/structure

 Ensuring good stewardship of your data through the review process

Another great feature that’s now up and running is the review process notification system, and the accompanying dashboard which allows the curators to implement decisions about retaining or deleting data.

Vault owners should receive an email when the chosen review date is six months away, seeking your involvement in the review process. The email will provide you with the information you need about when the funder’s minimum retention period (if there is one) expires, and how to access the vault. Don’t worry if you think you might have moved on by then; the system is designed to allow the University to implement good stewardship of all the data vaults, even when the Principal Investigator (PI) is no longer contactable. Our curators use a review dashboard to see all vaults whose review dates are approaching, and who the Nominated Data Managers (NDMs) are. In the absence of the Owner, the system notifies the NDMs instead. We will consult with the NDMs or the School about the vault, to ensure all deposits that should be deleted are deleted in good time, and all deposits that should be kept longer are kept safe and sound and still accessible to all authorised users.

DataVault Review Process:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/review-process 

The new max. deposit size of 10 TB is equivalent to over five million images of around 2 MB each – that’s one selfie for every person in Scotland. Image: A selfie on the cliffs at Bell Hill, St Abbs
cc-by-sa/2.0 – © Walter Baxter – geograph.org.uk/p/5967905

Pauline Ward
Research Data Support Assistant
Library & University Collections

Share

DataVault user roles let you share access to archived data

The Edinburgh DataVault is a secure long-term retention solution for research data.

Thanks to the hard work of our software developers in the Digital Library and EDINA, the Edinburgh DataVault now facilitates five different user roles. This means busy PIs can delegate the work of depositing and retrieving data, to members of their team or other collaborators within the University. It also allows PIs to nominate support staff to deposit and retrieve data on their behalf, or grant access to new members of their team.

Diagram representing a PI and two postdocs using the roles of Owner and Nominated Data Manager to share access to data in the DataVault

There are five user roles:

  • Data Owner
    Usually the Principal Investigator. Can add/remove other users to their vault(s).
  • Nominated Data Manager (of a given vault)
    Can view and edit metadata fields, deposit data and retrieve any deposit in the vault. May add/remove Depositors to the vault.
  • Depositor (of a given vault)
    Can view the vault contents, deposit data and retrieve any deposit in the vault.
  • School Support Officer
    Acting on behalf of the Head of School, may view all vaults and associated deposits belonging to the School.
  • School Data Manager
    Assigned only with the express permission of the Head of School, may view, deposit into and retrieve data from any vault belonging to the School.

Full details of the permissions associated with each role:
Roles and permissions

Support staff who need to view reporting data for their School, or admin access to their School’s vaults, should attend our training – Edinburgh DataVault: supporting users archiving their research data.

Further information on why and how to use the DataVault is available on the Research Data Service website:
DataVault long-term retention

If you have any questions about using DataVault please don’t hesitate to contact the Research Data Support team at data-support@ed.ac.uk.

Pauline Ward, Research Data Support Assistant
Library and University Collections
@PaulineData

Share

Research Data Workshops: DataVault Summary

Having soft-launched the DataVault facility in early 2019, the Research Data Support team -with the support of the project board – held five workshops in different colleges and locations to find out what the user community thought about it. This post summarises what we learned from participants, who were made up roughly equally of researchers (mainly staff) and support professionals (mainly computing officers based in the Schools and Colleges).

Each workshop began with presentations and a demonstration by Research Data Service staff, explaining the rationale of the DataVault, what it should and should not be used for, how it works, how the University will handle long-term management of data assets deposited in the DataVault, and practicalities such as how to recover costs through grant proposals or get assistance to deposit.

After a networking lunch we held discussion groups, covering topics such as prioritisation of features and functionality, roles such as the university as data asset owner, and the nature of the costs (price).

The team was relieved to learn that the majority (albeit from a somewhat self-selecting sample) agreed that the service fulfilled a real need; some data does need to be kept securely for a named period to comply with research funders’ rules, and participants welcomed a centralised platform to do this. The levels of usability and functionality we have managed to reach so far were met with somewhat less approval: clearly the development team has more work to do, and we are glad to have won further funding from the Digital Research Services programme in 2019-2020 in order to do it.

Attitudes toward university ownership of data assets was also a mixed bag; some were sceptical and wondered if researchers would participate in such a scheme, but others found it a realistic option for dealing with staff turnover and the inevitability of data outlasting data owners. Attitudes toward cost were largely accepting (the DataVault provides a cheaper alternative than our baseline DataStore disk storage), but concerns about the safekeeping of legacy and unfunded research data were raised at each workshop.

A sample of points raised follows:

  • Utility? “Everyone I know has everything on OneDrive.”
  • Regarding prioritisation of features – security first; file integrity first; putting data from other sources than DataStore; facilitating larger deposit sizes; ease of use.
  • Quickness of deposit and retrieval? Deposit was deemed more important to be quick than retrieval.
  • University as data asset owner?
    • Under GDPR the data are already university assets (because the Uni is the data controller).
    • People who manage the data should be close to the research; IT people can manage users but shouldn’t be making decisions about data. Danger that because it’s related to IT it gets dumped on IT officers. The formal review process helps to ensure decisions will be made properly. Include flexibility into the review hierarchy to allow for variation in school infrastructure.
    • When I heard that I was – not shocked – but concerned. If I move to another university how do I get access? This might be a problem. Researchers might prefer to retain three copies themselves.
  • Is the cost recovery mechanism valid?
    • Vault costs are legitimate costs.
    • Ideally should come from grant overheads, until then need to charge.
    • Possible to charge for small / medium/large project at start rather than per TB?
  • Is the 100 GB threshold sufficient for unfunded research? How else could unfunded or legacy data be covered (who pays)?
    • Alumni sponsor a dataset scheme?
    • There will be people with a ‘whole bunch of data somewhere’ that would be more appropriately stored in DataVault.

The team is grateful to all of the workshop participants for their time and thoughts; the report will be considered further by the project board and the Research Data Service Steering Group members. The full set of workshop notes are colour-coded to show comments from different venues and are available to read on the RDM wiki, for anyone with a University log-in (EASE).


Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections

Share

Research Data Workshop Series 2019

Over the spring of 2019 the Research Data Service (RDS) is holding a series of workshops with the aim of gathering feedback and requirements from our researchers on a number of important Research Data topics.

Each workshop will consist of a small number of short presentations from researchers and research support staff who have experience of the topic. These will then be followed by guided discussions so that the RDS can gather your input on the tools we currently provide, the gaps in our services, and how you go about addressing the challenges and issues raised in the talks.
The workshops for 2019 are:

Electronic Notebooks 1
14th March at King’s Buildings (Fully Booked)

DataVault
1200-1400, 10th April at 6301 JCMB, King’s Buildings, Map
Booking Link – https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34308
The DataVault was developed to offer UoE staff a long-term retention solution for research data collected by research projects that are at the completion stage. Each ‘Vault’ can contain multiple files associated with a research project that will be securely stored for an identified period, such as ten years. It is designed to fill in gaps left by existing research data services such as DataStore (active data storage platform) and DataShare (open access online data repository). The service enables you to comply with funder and University requirements to preserve research data for the long-term, and to confidently store your data for retrieval at a future date. This workshop is intended to gather the views of researchers and support staff in schools to explore the utility of the new service and discuss potential practicalities around its roll-out and long-term sustainability.

Sensitive Data Challenges and Solutions
1200-1430, 16th April in Seminar Room 2, Chancellors Building, Bioquarter, Map
Booking Link – https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34321
Researchers face a number of technical, ethical and legal challenges in creating, analysing and managing research data, including pressure to increase transparency and conduct research openly. But for those who have collected or are re-using sensitive or confidential data, these challenges can be particularly taxing. Tools and services can help to alleviate some of the problems of using sensitive data in research. But cloud-based tools are not necessarily trustworthy, and services are not necessarily geared for highly sensitive data. Those that are may not be very user-friendly or efficient for researchers, and often restrict the types of analysis that can be done. Researchers attending this workshop will have the opportunity to hear from experienced researchers on related topics.

Electronic Notebooks 2
1200-1430, 9th May at Training & Skills Room, ECCI, Central Area, Map
Booking Link – https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34287
Electronic Notebooks, both computational and lab-based, are gaining ground as productivity tools for researchers and their collaborators. Electronic notebooks can help facilitate reproducibility, longevity and controlled sharing of information. There are many different notebook options available, either commercially or free. Each application has different features and will have different advantages depending on researchers or lab’s requirements. Jupyter Notebook, RSpace, and Benchling are some of the platforms that are used at the University and all will be represented by researchers who use them on a daily basis.

Data, Software, Reproducibility and Open Research
Due to unforeseen circumstances this event has been postponed. We will update with the new event details as soon as they are confirmed.
In this workshop we will examine real-life use cases wherein datasets combine with software and/or notebooks to provide a richer, more reusable and long-lived record of Edinburgh’s research. We will also discuss user needs and wants, capturing requirements for future development of the University’s central research support infrastructure in line with (e.g.) the LERU Roadmap for Open Science, which the Library Research Support team has sought to map its existing and planned provision against, and domain-oriented Open Research strategies within the Colleges.

Kerry Miller
Research Data Support Officer
Library & University Collections

Share