The Edinburgh DataShare Awards!

The Research Data Service team applauds those researchers at the University of Edinburgh who share their data. We therefore decided to show our appreciation by presenting awards to our most successful depositors, as part of the Dealing With Data conference. The prizes themselves do not come with a cash research grant attached unfortunately. However, the winners did receive a certificate bearing an image of our mascot for the day, Databot. We think you’ll agree the winning depositors and their data demonstrate the diversity of our collections, in terms of subject matter, formats and sheer size. We were particularly pleased with the reactions from both the recipients and the attendees, both in person, by email and on twitter (#UoEData was the Dealing with Data hashtag). Who doesn’t love the drama of an awards ceremony! A video is available.

Photograph of Pauline Ward announcing the award winners

Photo: CC-BY Lorna M. Campbell

The winners in full…

MOST DATASHARING SCHOOL: Edinburgh Medical School

– the School which boasts the greatest number of Edinburgh DataShare Collections currently. Thirty-three eligible Collections (already containing at least one dataset) such as “Connectomic analysis of motor units in the mouse fourth deep lumbrical muscle”, the Edinburgh Imaging “Image Library” and “Generation Scotland”.

MOST PROLIFIC DATASHARER: Professor Richard Baldock
– the most prolific depositor into Edinburgh DataShare for the academic year 2016-17, and over the lifetime of the repository, having shared a grand total of 1,105 data items with full metadata. These are grouped together into numerous Collections under the heading of “e-Mouse Atlas”. The majority of these detailed images show microscope slides of stained tissue, others are 3D models. They accompany a book and website published by Professor Baldock, building on the seminal work of Professor Matt Kaufman in developmental biology. The metadata for each of the slides links to a lower definition version within the e-Mouse Atlas website, where the data may be viewed and navigated in context. The original slides themselves are held by the University’s Centre for Research Collections.

detail of histological slide showing stained cells

Detail from Elizabeth Graham; Julie Moss; Nick Burton; Yogmatee Roochun; Chris Armit; Lorna Richardson; Richard Baldock. (2015). eHistology Kaufman Atlas Plate 21a image d, [image]. University of Edinburgh. College of Medicine and Veterinary Medicine. http://dx.doi.org/10.7488/ds/735.

MOST PROLIFIC DATASHARER (CSE): Professor Euan Brechin
– the depositor of the greatest number of Edinburgh DataShare items from the College of Science and Engineering in academic year 2016-2017. Euan deposits his coordination chemistry research data so frequently that we set up a Collection template on the Brechin Research Group, which automatically pre-populates some of the metadata fields for him, saving Euan time. If only we could find a way to mention metallosupramolecular cubes here.

The certificate awarded to Professor Euan Brechin

The certificate awarded to Professor Euan Brechin

MOST PROLIFIC DATASHARER (CAHSS): Dr Andrea Martin
– the depositor of the greatest number of Edinburgh DataShare items from the College of Arts, Humanities and Social Sciences in academic year 2016-2017. Some of these “Language Cognition and Communication” data items are still under temporary embargo. Users may nonetheless see all the metadata.

MOST POPULAR SHARED DATA: Professor Peter Sandercock
– the depositor of the Edinburgh DataShare item which has attracted the greatest number of page views over the lifetime of the repository: “International Stroke Trial database (version 2)” (aka IST-1).  These data from the International Stroke Trial provide a great example of how clinical trial data may be anonymised to allow them to be shared. For more information, you may want to watch Prof Sandercock’s very accessible and detailed  public lecture. Admittedly, one other item is higher up DataShare’s table of page views than IST. However we believe the traffic drawn by “RCrO3-xNx ChemComm 2016” to be artifactual, arising from the appearance of the word ‘doping’ in its abstract, and the fact the deposit was made at a time when doping in sport was very prominent in the news media. Additionally, the earlier, superseded, version of the IST-1 dataset also appears in the all-time top ten, and if we combine the number of views, it is in the No.1 spot outright :-)

MOST POPULAR DATA 2016-17: Dr. Junichi Yamagishi
– the depositor of the Edinburgh DataShare item which has attracted the greatest number of page views (1,720 to be precise, as counted by Google Analytics) over the academic year 2016-17: “Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database”. Here’s the suggested citation, which DataShare compiles automatically, and displays prominently, to encourage users to cite the data:

Wu, Zhizheng; Kinnunen, Tomi; Evans, Nicholas; Yamagishi, Junichi. (2015). Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database, [dataset]. University of Edinburgh. The Centre for Speech Technology Research (CSTR). http://dx.doi.org/10.7488/ds/298.

MOST POPULAR DATA 2016-17 (CAHSS): Professor Miles Glendinning

– the depositor of the Edinburgh DataShare item from the College of Arts, Humanities and Social Sciences which has attracted the greatest number of page views (1,374 to be precise, as counted by Google Analytics), over the academic year 2016-17: “Hong Kong Public Housing Archive”. The Research Data Service is working closely with Miles, Personal Chair of Architectural Conservation, on a series of batch imports to put his fabulous array of photographs of public housing tower blocks from all around the world on DataShare over the coming months – keep an eye on DOCOMOMO International Mass Housing Archive.

Sunny image of the façade of several tower blocks; a tree is visible in the foreground.

Image cropped from “HKI_H_Yue_Fai_Ct.jpg” from Glendinning, Miles; Forsyth, Louise; Maxwell, Gavin; Wood, Michael. (2015). Hong Kong Public Housing Database, 2006-2015 [image]. University of Edinburgh. Edinburgh College of Art. http://dx.doi.org/10.7488/ds/322.

MOST POPULAR DATA 2016-17 (MVM): Dr. Tom Pennycott
– the depositor of the Edinburgh DataShare Collection page from the College of Medicine and Veterinary Medicine which has attracted the greatest number of page views over the academic year 2016-17: “Diseases of Wild Birds”. Hundreds of grotesquely beautiful photographs of dead wild birds, bodies ravaged with viruses, bacteria and protists, found at locations all around the United Kingdom; these images support the PhD thesis of Dr Tom Pennycott from our Veterinary School.

You can see usage statistics for any DataShare Item or Collection simply by clicking on the “View usage statistics” button on the right-hand-side of the page.

Pauline Ward, Research Data Service Assistant
EDINA and Data Library

Share

Research Data MANTRA gets a refresh

Research Data MANTRA updates

MANTRA, the free online training course which provides guidelines for good practice in research data management (RDM), has recently been refreshed. The course content remains applicable to all research disciplines, and is particularly appropriate for postgraduate students and early career researchers who would like to learn more about managing their research data.

The latest release helps ensure that content from each of the eight learning modules remains up-to-date, with interactive elements across all units being revised to make them more user friendly, and new content added to some units.

Additionally, as part of the CEPAL, United Nations project some video content used within MANTRA has been translated. Claudia Vilches and Gabriela Andaur from Hernán Santa Cruz Library (Santiago, Chile) have helpfully translated several of the video interviews with research staff, and these can now be viewed with Spanish subtitles within MANTRA or on our Youtube channel, helping to widen accessibility to these training materials for researchers outside the UK. Please contact us if you wish to translate any of the MANTRA materials.

MANTRA learning units now available via Zenodo

In addition to being a free-of-charge online learning resource, all content from MANTRA is openly available for use and re-use by others. For those interested in developing their own RDM training materials based on MANTRA content, all MANTRA units (along with four sets of data handling exercises) are now available for direct download from the Zenodo repository’s RDM Open Training Materials community. The eight individual MANTRA units were created using open source software Xerte Online Toolkits and units can be imported and edited in Virtual Learning Environments (VLE) such as Moodle. All that we ask is for attribution according to our CC-BY licence.

Content from a number of shorter MANTRA ‘taster’ units is also openly available from Zenodo. These provide an overview of RDM in four very short modules which can be edited so as to add information about local RDM support services, before deploying locally in a VLE or on the Web.

Share

DataShare 3.0: The ‘Download Release’ means deposits up to 100 GB

With the DataShare 3.0 release, completed on 6 October, 2017, the data repository can manage data items of 100 GB. This means a single dataset of up to 100 GB can be cited with a single DOI, viewed at a single URL, and downloaded through the browser with a single click of our big red “Download all files” button. We’re not saying the system cannot handle datasets larger than this, but 100 GB is what we’ve tested for, and can offer with confidence. This release joins up the DSpace asset store to our managed filestore space (DataStore) making this milestone release possible.

How to deposit up to 100 GB

In practice, what this means for users is:

– You can still upload up to 20 GB of data files as part of a single deposit via our web submission form.

– For sets of files over 20 GB, depositors may contact the Research Data Service team on data-support@ed.ac.uk to arrange a batch import. The key improvement in this step is that all the files can be in a single deposit, displayed together on one page with their descriptive metadata, rather than split up into five separate deposits.

Users of DataShare can now also benefit from MD5 integrity checking

The MD5 checksum of every file in DataShare is displayed (on the Full Item view), including historic deposits. This allows users downloading files to check their integrity.

For example, suppose I download Professor Richard Ribchester’s fluorescence microscopy of the neuromuscular junction from http://datashare.is.ed.ac.uk/handle/10283/2749. N.B. the “Download all files” button in this release works differently than before. And one of the differences which users will see is that the zip file it downloads is now named with the two numbers from the deposit’s handle identifier, separated by an underscore instead of a forward slash. So I’ve downloaded the file “DS_10283_2749.zip”.

I want to ensure there was no glitch in the download – I want to know the file I’ve downloaded is identical to the one in the repository. So, I do the following:

  • Click on “Show full item record”.
  • Scroll down to the red button labelled “Download all files”, where I see “zip file MD5 Checksum: a77048c58a46347499827ce6fe855127” (see screenshot). I copy the checksum (highlighted in yellow).

    screenshot from DataShare showing where the MD5 checksum hash of the zip file is displayed

    DataShare displays MD5 checksum hash

  • On my PC, I generate the MD5 checksum hash of the downloaded copy, and then I check that the hash on DataShare matches. There are a number of free tools available for this task: I could use the Windows command line, or I could use an MD5 utility such as the free “MD5 and SHA Checksum Utility”. In the case of the Checksum Utility, I do this as follows:
    • I paste the hash I copied from DataShare into the desktop utility (ignoring the fact the program confusingly displays the checksum hashes all in upper case).
    • I click the “Verify” button.

In this case they are identical – I have a match. I’ve confirmed the integrity of the file I downloaded.

Screenshot showing result of MD5 match

The MD5 checksum hashes match each other.

More confidence in request-a-copy for embargoed files

Another improvement we’ve made is to give depositors confidence in the request-a-copy feature. If the files in your deposit are under temporary embargo, they will not be available for users to download directly. However, users can send you a request for the files through DataShare, which you’ll receive via email, as described in an earlier blogpost. If you then agree to the request using the form and the “Send” button in DataShare, the system will attempt to email the files to the user. However, as we all know, some files are too large for email servers.

If the email server refuses to send the email message because the attachment is too large, DataShare 3.0 will immediately display an error message for you in the browser saying “File too large”. Thus allowing you to make alternative arrangements to get those files to the user. Otherwise, the system moves on to offer you a chance to change the permissions on the file to open access. So, if you see no error after clicking “Send”, you’ll have peace of mind the files have been sent successfully.

Pauline Ward, Research Data Service Assistant
EDINA and Data Library

Share

Updates from the fourth meeting of the RDM Forum

Guest blog post by Ewa Lipinska

On 28th August members of the RDM Forum gathered in the stunning Old Library at the Department of Geography in the Old Infirmary building, to hear the latest updates from the Research Data Service team and discuss all things data. It’d been a good few months since the last time we met, so the event presented us with the perfect opportunity to catch up on new developments, network with colleagues working on RDM in different parts of the University, and prepare ourselves for the new academic year which will see the University take up a pivotal role in making Edinburgh the Data Capital of Europe.

We started off with an RDM update from Cuna Ekmekcioglu, who gave us an overview of developments to University research data services: the launch of interim DataVault long-term retention service, continuing development of Data Save Haven aimed at research projects dealing with sensitive data, and a new release of DataShare which will allow larger datasets. We also learned about RDM training courses planned for the new academic year, most of which can be booked via MyEd.

Next, Pauline Ward gave a presentation which went into a bit more detail about the DataVault service allowing researchers to comply with their funders’ requirements to preserve data for the long term in cases where the datasets cannot be made public. The current interim service requires a mediated deposit which can be done by contacting data-support[at]ed.ac.uk. Comprehensive guidance on how to prepare your data before storing it in DataVault can be found on the service website.

This was followed by a demonstration of the new Research Data Service promotional video which outlines the range of tools and support offered by the team, and which can be a very good resource for new members of staff who would like to find out about the types of services available. Diarmuid McDonnell who presented the video also gave us a quick overview of a recent project called Scoping Statistical Analysis Support, which looked at the demand for statistical analysis training for current postgraduate students. The final project report is full of current information about statistical training around the University.

We then went on to discuss the potential impact of data sharing, which tied in nicely with a recent panel discussion at Repository Fringe 2017 that focused on how repositories and associated services can feature in supporting researchers to achieve and evidence impact in preparation for the next Research Excellence Framework exercise (live notes from the day are available). Pauline Ward presented examples of popular public datasets by Edinburgh University researchers, described ways to access information about their usage, and talked about how datasets can be shared more widely to engage external audiences, which may lead to potential impact. Even though on their own research data usage statistics are not enough to demonstrate significant impact beyond academia, they are a good (though perhaps still slightly overlooked) starting point for tracking how and by whom datasets are used, and how that benefits individuals and communities.

The meeting concluded with a presentation by Robin Rice, who shared with us the draft Research Data Service Roadmap. As the goals set out in the previous roadmap have now largely been achieved, the time has come to look to the future and identify new objectives for the next few years. It was interesting to hear about the team’s long-term plans which include unification of the service (aiming to ensure the best user experience and interoperability between systems), advocacy of data management planning, support around active data, enhanced data stewardship, improved communications and more training opportunities.

Overall, it was a very useful and informative meeting, and I’d very much encourage anyone interested in research data management and sharing to join us next time. In the meantime Cuna’s slides, together with lots of other useful resources and points for discussion, are available on the RDM Sharepoint (access on request).

Ewa Lipinska
Research Outcomes Co-Ordinator
College of Arts, Humanities and Social Sciences

Share