DataShare 3.0: The ‘Download Release’ means deposits up to 100 GB

With the DataShare 3.0 release, completed on 6 October, 2017, the data repository can manage data items of 100 GB. This means a single dataset of up to 100 GB can be cited with a single DOI, viewed at a single URL, and downloaded through the browser with a single click of our big red “Download all files” button. We’re not saying the system cannot handle datasets larger than this, but 100 GB is what we’ve tested for, and can offer with confidence. This release joins up the DSpace asset store to our managed filestore space (DataStore) making this milestone release possible.

How to deposit up to 100 GB

In practice, what this means for users is:

– You can still upload up to 20 GB of data files as part of a single deposit via our web submission form.

– For sets of files over 20 GB, depositors may contact the Research Data Service team on data-support@ed.ac.uk to arrange a batch import. The key improvement in this step is that all the files can be in a single deposit, displayed together on one page with their descriptive metadata, rather than split up into five separate deposits.

Users of DataShare can now also benefit from MD5 integrity checking

The MD5 checksum of every file in DataShare is displayed (on the Full Item view), including historic deposits. This allows users downloading files to check their integrity.

For example, suppose I download Professor Richard Ribchester’s fluorescence microscopy of the neuromuscular junction from http://datashare.is.ed.ac.uk/handle/10283/2749. N.B. the “Download all files” button in this release works differently than before. And one of the differences which users will see is that the zip file it downloads is now named with the two numbers from the deposit’s handle identifier, separated by an underscore instead of a forward slash. So I’ve downloaded the file “DS_10283_2749.zip”.

I want to ensure there was no glitch in the download – I want to know the file I’ve downloaded is identical to the one in the repository. So, I do the following:

  • Click on “Show full item record”.
  • Scroll down to the red button labelled “Download all files”, where I see “zip file MD5 Checksum: a77048c58a46347499827ce6fe855127” (see screenshot). I copy the checksum (highlighted in yellow).

    screenshot from DataShare showing where the MD5 checksum hash of the zip file is displayed

    DataShare displays MD5 checksum hash

  • On my PC, I generate the MD5 checksum hash of the downloaded copy, and then I check that the hash on DataShare matches. There are a number of free tools available for this task: I could use the Windows command line, or I could use an MD5 utility such as the free “MD5 and SHA Checksum Utility”. In the case of the Checksum Utility, I do this as follows:
    • I paste the hash I copied from DataShare into the desktop utility (ignoring the fact the program confusingly displays the checksum hashes all in upper case).
    • I click the “Verify” button.

In this case they are identical – I have a match. I’ve confirmed the integrity of the file I downloaded.

Screenshot showing result of MD5 match

The MD5 checksum hashes match each other.

More confidence in request-a-copy for embargoed files

Another improvement we’ve made is to give depositors confidence in the request-a-copy feature. If the files in your deposit are under temporary embargo, they will not be available for users to download directly. However, users can send you a request for the files through DataShare, which you’ll receive via email, as described in an earlier blogpost. If you then agree to the request using the form and the “Send” button in DataShare, the system will attempt to email the files to the user. However, as we all know, some files are too large for email servers.

If the email server refuses to send the email message because the attachment is too large, DataShare 3.0 will immediately display an error message for you in the browser saying “File too large”. Thus allowing you to make alternative arrangements to get those files to the user. Otherwise, the system moves on to offer you a chance to change the permissions on the file to open access. So, if you see no error after clicking “Send”, you’ll have peace of mind the files have been sent successfully.

Pauline Ward, Research Data Service Assistant
EDINA and Data Library

Share

Updates from the fourth meeting of the RDM Forum

Guest blog post by Ewa Lipinska

On 28th August members of the RDM Forum gathered in the stunning Old Library at the Department of Geography in the Old Infirmary building, to hear the latest updates from the Research Data Service team and discuss all things data. It’d been a good few months since the last time we met, so the event presented us with the perfect opportunity to catch up on new developments, network with colleagues working on RDM in different parts of the University, and prepare ourselves for the new academic year which will see the University take up a pivotal role in making Edinburgh the Data Capital of Europe.

We started off with an RDM update from Cuna Ekmekcioglu, who gave us an overview of developments to University research data services: the launch of interim DataVault long-term retention service, continuing development of Data Save Haven aimed at research projects dealing with sensitive data, and a new release of DataShare which will allow larger datasets. We also learned about RDM training courses planned for the new academic year, most of which can be booked via MyEd.

Next, Pauline Ward gave a presentation which went into a bit more detail about the DataVault service allowing researchers to comply with their funders’ requirements to preserve data for the long term in cases where the datasets cannot be made public. The current interim service requires a mediated deposit which can be done by contacting data-support[at]ed.ac.uk. Comprehensive guidance on how to prepare your data before storing it in DataVault can be found on the service website.

This was followed by a demonstration of the new Research Data Service promotional video which outlines the range of tools and support offered by the team, and which can be a very good resource for new members of staff who would like to find out about the types of services available. Diarmuid McDonnell who presented the video also gave us a quick overview of a recent project called Scoping Statistical Analysis Support, which looked at the demand for statistical analysis training for current postgraduate students. The final project report is full of current information about statistical training around the University.

We then went on to discuss the potential impact of data sharing, which tied in nicely with a recent panel discussion at Repository Fringe 2017 that focused on how repositories and associated services can feature in supporting researchers to achieve and evidence impact in preparation for the next Research Excellence Framework exercise (live notes from the day are available). Pauline Ward presented examples of popular public datasets by Edinburgh University researchers, described ways to access information about their usage, and talked about how datasets can be shared more widely to engage external audiences, which may lead to potential impact. Even though on their own research data usage statistics are not enough to demonstrate significant impact beyond academia, they are a good (though perhaps still slightly overlooked) starting point for tracking how and by whom datasets are used, and how that benefits individuals and communities.

The meeting concluded with a presentation by Robin Rice, who shared with us the draft Research Data Service Roadmap. As the goals set out in the previous roadmap have now largely been achieved, the time has come to look to the future and identify new objectives for the next few years. It was interesting to hear about the team’s long-term plans which include unification of the service (aiming to ensure the best user experience and interoperability between systems), advocacy of data management planning, support around active data, enhanced data stewardship, improved communications and more training opportunities.

Overall, it was a very useful and informative meeting, and I’d very much encourage anyone interested in research data management and sharing to join us next time. In the meantime Cuna’s slides, together with lots of other useful resources and points for discussion, are available on the RDM Sharepoint (access on request).

Ewa Lipinska
Research Outcomes Co-Ordinator
College of Arts, Humanities and Social Sciences

Share

New video about the Research Data Service

The Research Data Service team is delighted to announce a new resource to help researchers and research support staff become familiar with the wide range of tools and support that we offer:

YouTube Preview Image

The video, produced by Senate Media, outlines how the University of Edinburgh Research Data Service can help you access, manage, store, share and preserve your research data. The permanent location for the video on our service website is: http://edin.ac/2hbswRw.

Robin Rice
Data Librarian & Head, Research Data Support
EDINA and Data Library

Share

Research Data Management Forum: Third meeting – 28/03/2017

Harkening back to a bygone era of libraries, when books were printed on paper and research data management meant not accidentally burning your notes with your candle, the third meeting of the university RDM forum was held in the impressively aged Old Library in Geography’s Old Infirmary building at the end of March.

As a regular participant, I find the RDM forum is a very useful platform for everyone who has an interest in supporting research data management. It is an opportunity for me to update myself on the support and services that the university has in place in this area, to ask the daft questions but get a sensible answer and more generally, to meet the others in the university who are working in the same area as myself and face the same issues and challenges.

This edition of the RDM forum was no different. After a quick introduction of the participants, Cuna, leading the forum, took us through the following agenda:

  • Cuna Ekmekcioglu – RDM update
  • Dominic Tate – DataVault update
  • Pauline Ward – DataShare new features
  • Cuna Ekmekcioglu – development of Data Safe Haven

The session began with the RDM update which went into detail about the RDM Sharepoint site and some of the tools and documents that have been uploaded to the site. There are some useful threads looking to collect information about the different types of data that we have, as well as some guidance on recording datasets in PURE, RDM journey flowchart and sample Data Management Plans amongst other things. The Sharepoint site can be accessed by request, and can be found here: https://uoe.sharepoint.com/sites/rdmforum (access is only for UoE staff and students).

We had updates on the existing services such as DataShare and details about the development of both DataVault and the future Data Safe Haven, a system which will allow the storage and analysis of very sensitive data. There were some discussions around the new systems and practical issues such as cost and training/guidance for the new services.

It was a very worthwhile event and I shall be looking forward to the next forum.

Michelle O’Hara
Research Data & Information Officer
School of Social and Political Science

 

Share