Updated MANTRA content: Research data in context

The Research Data Support team is pleased to announce the launch of the first in a series of updates to MANTRA, the free and open online research data management training course.

The first updated module ‘Research data in context’ (previously ‘Research data explained’) is now live on the MANTRA site and provides an introduction to research data, alongside detail on the contexts in which data are generated, and the challenges presented by big data in society.

MANTRA is designed to give post-graduate students, early career researchers, and information professionals the knowledge and skills needed to work effectively with research data.

Since launching in 2011, MANTRA has been through a number of significant rewrites to keep up with current trends, and over 10,000 different learners have visited MANTRA in the last academic year.

The ‘Research data in context’ module has been substantially revised in order to:

  • remove dated and obsolete content;
  • simplify and improve the readability of existing material;
  • add information on data literacy and data science.

The changes in this module include:

  • Revised pages: Introduction; Why is research data management important?; What are data?; What are research data?; Data as research output; Module Summary; Next & further reading.
  • New pages: Data in society; Data Science; Video: machine learning; Data literacy and skills.

A change log detailing all changes in this release is available on request from the Research Data Support team (data-support@ed.ac.uk).

We hope you find this update interesting and useful and welcome any feedback you may have.

Further MANTRA updates are forthcoming, focusing on FAIR data and newer data protection legislation and we will announce these in future blog posts.

Bob Sanders
Research Data Support

DataShare 3.0: The ‘Download Release’ means deposits up to 100 GB

With the DataShare 3.0 release, completed on 6 October, 2017, the data repository can manage data items of 100 GB. This means a single dataset of up to 100 GB can be cited with a single DOI, viewed at a single URL, and downloaded through the browser with a single click of our big red “Download all files” button. We’re not saying the system cannot handle datasets larger than this, but 100 GB is what we’ve tested for, and can offer with confidence. This release joins up the DSpace asset store to our managed filestore space (DataStore) making this milestone release possible.

How to deposit up to 100 GB

In practice, what this means for users is:

– You can still upload up to 20 GB of data files as part of a single deposit via our web submission form.

– For sets of files over 20 GB, depositors may contact the Research Data Service team on data-support@ed.ac.uk to arrange a batch import. The key improvement in this step is that all the files can be in a single deposit, displayed together on one page with their descriptive metadata, rather than split up into five separate deposits.

Users of DataShare can now also benefit from MD5 integrity checking

The MD5 checksum of every file in DataShare is displayed (on the Full Item view), including historic deposits. This allows users downloading files to check their integrity.

For example, suppose I download Professor Richard Ribchester’s fluorescence microscopy of the neuromuscular junction from http://datashare.is.ed.ac.uk/handle/10283/2749. N.B. the “Download all files” button in this release works differently than before. And one of the differences which users will see is that the zip file it downloads is now named with the two numbers from the deposit’s handle identifier, separated by an underscore instead of a forward slash. So I’ve downloaded the file “DS_10283_2749.zip”.

I want to ensure there was no glitch in the download – I want to know the file I’ve downloaded is identical to the one in the repository. So, I do the following:

  • Click on “Show full item record”.
  • Scroll down to the red button labelled “Download all files”, where I see “zip file MD5 Checksum: a77048c58a46347499827ce6fe855127” (see screenshot). I copy the checksum (highlighted in yellow).

    screenshot from DataShare showing where the MD5 checksum hash of the zip file is displayed

    DataShare displays MD5 checksum hash

  • On my PC, I generate the MD5 checksum hash of the downloaded copy, and then I check that the hash on DataShare matches. There are a number of free tools available for this task: I could use the Windows command line, or I could use an MD5 utility such as the free “MD5 and SHA Checksum Utility”. In the case of the Checksum Utility, I do this as follows:
    • I paste the hash I copied from DataShare into the desktop utility (ignoring the fact the program confusingly displays the checksum hashes all in upper case).
    • I click the “Verify” button.

In this case they are identical – I have a match. I’ve confirmed the integrity of the file I downloaded.

Screenshot showing result of MD5 match

The MD5 checksum hashes match each other.

More confidence in request-a-copy for embargoed files

Another improvement we’ve made is to give depositors confidence in the request-a-copy feature. If the files in your deposit are under temporary embargo, they will not be available for users to download directly. However, users can send you a request for the files through DataShare, which you’ll receive via email. If you then agree to the request using the form and the “Send” button in DataShare, the system will attempt to email the files to the user. However, as we all know, some files are too large for email servers.

If the email server refuses to send the email message because the attachment is too large, DataShare 3.0 will immediately display an error message for you in the browser saying “File too large”. Thus allowing you to make alternative arrangements to get those files to the user. Otherwise, the system moves on to offer you a chance to change the permissions on the file to open access. So, if you see no error after clicking “Send”, you’ll have peace of mind the files have been sent successfully.

Pauline Ward, Research Data Service Assistant
EDINA and Data Library

Analytics platform trial

Information Services is evaluating a new collaborative platform for data-science and analytics as part of its expanding portfolio of services for researchers. We are looking for researchers with suitable problems who expect to achieve results in the one-year trial. We will be able to work closely with a small number of projects to help them get the most out of the platform, and training will be available. In addition, we encourage further researchers to use the platform with less formal support.

The Aridhia AnalytiXagility Platform

AnalytiXagility is a purpose-built, user-friendly, collaborative platform for data science and analytics. It allows your team to easily create, discuss, modify and share analyses in a single, secure system accessed conveniently through a web browser.
The platform handles routine data management tasks such as confidentiality, availability, integrity and audit, reducing time to insight and discovery. In particular, it is ideally suited for:

  • Exploring, comparing and linking structured datasets including data quality profiling
  • Supporting data management, accountability and provenance
  • Processing large datasets that do not fit in memory

Bring your team

Project members collaborate through a private workspace configured with compute, storage and analytical tools. Embedded social media tools allow teams to post and share questions, updates, comments and insights, building an active record of the research undertaken.

Bring your data

Users import their datasets using the secure and reliable file transfer mechanism, SFTP. Working files (documents, images, analysis scripts) can be uploaded directly through the web interface, and tagged for easy management and retrieval by the team.

Bring your analysis

AnalytiXagility provides an analysis platform, based on R, which can be accessed through a web browser. Combining R with an SQL database and an associated access library allows researchers to analyse their data in a faster and more scalable way than with R alone.

Generate your output

The platform supports generation of PDF reports for communication and publication using LaTeX templates, such as those provided by many leading journals, in which users can embed active analytical scripts to auto-generate images and tabular data within the report at runtime.

More information

If you are interested in participating in the trial, please email IS.Helpline@ed.ac.uk with the subject “XAP Trial”.

Further information can be found at:

Steve Thorn
Research Services
IT Infrastructure

Dancing with Data

I went to an interesting talk yesterday by Prof Chris Speed called “Dancing with Data”, on how our interactions and relationships with each other, with the objects in our lives and with companies and charities are changing as a result of the data that is now being generated by those objects (particularly smartphones, but increasingly by other objects too). New phenomena such as 3D printing, airbnb, foursquare and iZettle are giving us choices we never had before, but also leading to things being done with our data which we might not have expected or known about. The relationships between individuals and our data are being re-defined as we speak. Prof Speed challenged us to think about the position of designers in this new world where push-to-pull markets are being replaced by new models. He also told us about his research collaborations with Oxfam, looking at how technology might enhance the value of the second-hand objects they sell by allowing customers to hear their stories from their previous owners.   Logo for the Tales of Things project

All very thought-provoking, but what about the implications for academic research, aside from those working in the fields of Design, Economics or Sociology who must now develop new models to reflect this changing landscape? Well, the question arises, if all this data is being generated and collected by companies, are the academics (and indeed the charity sector) falling behind the curve? Here at the University of Edinburgh, my colleagues in Informatics are doing Data Science research, looking into the infrastructure and the algorithms used to analyse the kind of commercial Big Data flowing out of the smartphones in our pockets, while Prof Speed and his colleagues are looking at how design itself is being affected. But perhaps academics in all disciplines need to be tuning their antennae to this wavelength and thinking seriously about how their research can adapt to and be enhanced by the new ways we are all dancing with data.

For more about the University of Edinburgh’s Design Informatics research and forthcoming seminars see www.designinformatics.org. Prof Chris Speed tweets @ChrisSpeed.

Pauline Ward is a Data Library Assistant working at the University of Edinburgh and EDINA.