After extended development, the Research Data Service’s DataVault system is now operational, adding value to research data for principal investigators and their funders alike by offering a long-term retention solution for important datasets.
DataVault is a companion service to DataShare, the institutional digital repository for researchers to openly license and share datasets and related outputs via the Web. DataVault comprises an online interface connected to the university’s data centre infrastructure and cloud storage.
Each research project can store data in a single vault made up of any number of deposits. DataVault is currently able to accept individual deposits (groups of files) of up to 2 TB each; this will increase over time as project development continues.
DataVault sprint meeting before launch
DataVault is designed for long-term retention of research data, to meet funder requirements and ensure future access to high value datasets. It meets digital preservation requirements by storing three copies in different locations (two on tape, one in the cloud) with integrity checking built-in, so that the data owner can retrieve their data with confidence until the end of the retention period (typically ten years).
The DataVault interface helps to guide users in how to deposit personal and sensitive data, using anonymisation or pseudonymisation techniques whenever possible, as prescribed by the University’s Data Protection Officer (DPO). Because all data are encrypted before deposit, they are protected from unauthorised disclosure. Only the data owner or their nominated delegate is allowed to retrieve data during the retention period. Any decisions about allowing access to others are made by the data owner and are conducted outside the DataVault system, once they have been retrieved onto a private area on DataStore and decrypted.
Although DataVault offers a form of closed archive, the design encourages good research data management practice by requiring a metadata record for each vault in Pure. These records are discoverable on the Web, and linked to the respective data creators, projects and publications.
In exchange for creating this high level public metadata record, the Principal Investigator benefits from the assignment of a unique digital object identifier (DOI) which can be used to cite the data in publications.
The open nature of the metadata means that any reader may make a request to access the dataset. The data owner decides who may have access and under what conditions. Advice can be provided by the Research Data Support team and the DPO.
University data assets
DataVault’s workflow takes into account the possibility/likelihood that the original data owner will have left the university when the period of retention comes to an end. Each vault will be reviewed by representatives of the university in schools, colleges or the Library, acting as the data owner, to make decisions on disposal or further retention and curation. If kept, the vault contents become university data assets.
Plan ahead for data archiving
The Research Data Support team encourages researchers to plan ahead for data archiving, right from the earliest conception stages of the project, so that appropriate costs are included in bids, and enabling the appropriate steps to be carried out to prepare data for either open or closed long-term archiving.
The team can be contacted through the IS Helpline and offers assistance with writing data management plans and making archival decisions. See our service website and contact information at https://www.ed.ac.uk/is/research-data-service or go straight to the DataVault page to learn more about it, get instructions for use, or look up charges. An introductory demo video is available at https://media.ed.ac.uk/media/Getting+started+with+the+DataVault/1_h4r4glf7 .
Data Librarian and Head, Research Data Support
Library & University Collections