DataShare 3.0: The ‘Download Release’ means deposits up to 100 GB

With the DataShare 3.0 release, completed on 6 October, 2017, the data repository can manage data items of 100 GB. This means a single dataset of up to 100 GB can be cited with a single DOI, viewed at a single URL, and downloaded through the browser with a single click of our big red “Download all files” button. We’re not saying the system cannot handle datasets larger than this, but 100 GB is what we’ve tested for, and can offer with confidence. This release joins up the DSpace asset store to our managed filestore space (DataStore) making this milestone release possible.

How to deposit up to 100 GB

In practice, what this means for users is:

– You can still upload up to 20 GB of data files as part of a single deposit via our web submission form.

– For sets of files over 20 GB, depositors may contact the Research Data Service team on data-support@ed.ac.uk to arrange a batch import. The key improvement in this step is that all the files can be in a single deposit, displayed together on one page with their descriptive metadata, rather than split up into five separate deposits.

Users of DataShare can now also benefit from MD5 integrity checking

The MD5 checksum of every file in DataShare is displayed (on the Full Item view), including historic deposits. This allows users downloading files to check their integrity.

For example, suppose I download Professor Richard Ribchester’s fluorescence microscopy of the neuromuscular junction from http://datashare.is.ed.ac.uk/handle/10283/2749. N.B. the “Download all files” button in this release works differently than before. And one of the differences which users will see is that the zip file it downloads is now named with the two numbers from the deposit’s handle identifier, separated by an underscore instead of a forward slash. So I’ve downloaded the file “DS_10283_2749.zip”.

I want to ensure there was no glitch in the download – I want to know the file I’ve downloaded is identical to the one in the repository. So, I do the following:

  • Click on “Show full item record”.
  • Scroll down to the red button labelled “Download all files”, where I see “zip file MD5 Checksum: a77048c58a46347499827ce6fe855127” (see screenshot). I copy the checksum (highlighted in yellow).

    screenshot from DataShare showing where the MD5 checksum hash of the zip file is displayed

    DataShare displays MD5 checksum hash

  • On my PC, I generate the MD5 checksum hash of the downloaded copy, and then I check that the hash on DataShare matches. There are a number of free tools available for this task: I could use the Windows command line, or I could use an MD5 utility such as the free “MD5 and SHA Checksum Utility”. In the case of the Checksum Utility, I do this as follows:
    • I paste the hash I copied from DataShare into the desktop utility (ignoring the fact the program confusingly displays the checksum hashes all in upper case).
    • I click the “Verify” button.

In this case they are identical – I have a match. I’ve confirmed the integrity of the file I downloaded.

Screenshot showing result of MD5 match

The MD5 checksum hashes match each other.

More confidence in request-a-copy for embargoed files

Another improvement we’ve made is to give depositors confidence in the request-a-copy feature. If the files in your deposit are under temporary embargo, they will not be available for users to download directly. However, users can send you a request for the files through DataShare, which you’ll receive via email, as described in an earlier blogpost. If you then agree to the request using the form and the “Send” button in DataShare, the system will attempt to email the files to the user. However, as we all know, some files are too large for email servers.

If the email server refuses to send the email message because the attachment is too large, DataShare 3.0 will immediately display an error message for you in the browser saying “File too large”. Thus allowing you to make alternative arrangements to get those files to the user. Otherwise, the system moves on to offer you a chance to change the permissions on the file to open access. So, if you see no error after clicking “Send”, you’ll have peace of mind the files have been sent successfully.

Pauline Ward, Research Data Service Assistant
EDINA and Data Library

Share

Analytics platform trial

Information Services is evaluating a new collaborative platform for data-science and analytics as part of its expanding portfolio of services for researchers. We are looking for researchers with suitable problems who expect to achieve results in the one-year trial. We will be able to work closely with a small number of projects to help them get the most out of the platform, and training will be available. In addition, we encourage further researchers to use the platform with less formal support.

The Aridhia AnalytiXagility Platform

AnalytiXagility is a purpose-built, user-friendly, collaborative platform for data science and analytics. It allows your team to easily create, discuss, modify and share analyses in a single, secure system accessed conveniently through a web browser.
The platform handles routine data management tasks such as confidentiality, availability, integrity and audit, reducing time to insight and discovery. In particular, it is ideally suited for:

  • Exploring, comparing and linking structured datasets including data quality profiling
  • Supporting data management, accountability and provenance
  • Processing large datasets that do not fit in memory

Bring your team

Project members collaborate through a private workspace configured with compute, storage and analytical tools. Embedded social media tools allow teams to post and share questions, updates, comments and insights, building an active record of the research undertaken.

Bring your data

Users import their datasets using the secure and reliable file transfer mechanism, SFTP. Working files (documents, images, analysis scripts) can be uploaded directly through the web interface, and tagged for easy management and retrieval by the team.

Bring your analysis

AnalytiXagility provides an analysis platform, based on R, which can be accessed through a web browser. Combining R with an SQL database and an associated access library allows researchers to analyse their data in a faster and more scalable way than with R alone.

Generate your output

The platform supports generation of PDF reports for communication and publication using LaTeX templates, such as those provided by many leading journals, in which users can embed active analytical scripts to auto-generate images and tabular data within the report at runtime.

More information

If you are interested in participating in the trial, please email IS.Helpline@ed.ac.uk with the subject “XAP Trial”.

Further information can be found at:

Steve Thorn
Research Services
IT Infrastructure

Share

Dancing with Data

I went to an interesting talk yesterday by Prof Chris Speed called “Dancing with Data”, on how our interactions and relationships with each other, with the objects in our lives and with companies and charities are changing as a result of the data that is now being generated by those objects (particularly smartphones, but increasingly by other objects too). New phenomena such as 3D printing, airbnb, foursquare and iZettle are giving us choices we never had before, but also leading to things being done with our data which we might not have expected or known about. The relationships between individuals and our data are being re-defined as we speak. Prof Speed challenged us to think about the position of designers in this new world where push-to-pull markets are being replaced by new models. He also told us about his research collaborations with Oxfam, looking at how technology might enhance the value of the second-hand objects they sell by allowing customers to hear their stories from their previous owners.   Logo for the Tales of Things project

All very thought-provoking, but what about the implications for academic research, aside from those working in the fields of Design, Economics or Sociology who must now develop new models to reflect this changing landscape? Well, the question arises, if all this data is being generated and collected by companies, are the academics (and indeed the charity sector) falling behind the curve? Here at the University of Edinburgh, my colleagues in Informatics are doing Data Science research, looking into the infrastructure and the algorithms used to analyse the kind of commercial Big Data flowing out of the smartphones in our pockets, while Prof Speed and his colleagues are looking at how design itself is being affected. But perhaps academics in all disciplines need to be tuning their antennae to this wavelength and thinking seriously about how their research can adapt to and be enhanced by the new ways we are all dancing with data.

For more about the University of Edinburgh’s Design Informatics research and forthcoming seminars see www.designinformatics.org. Prof Chris Speed tweets @ChrisSpeed.

Pauline Ward is a Data Library Assistant working at the University of Edinburgh and EDINA.

Share

Dealing with Data Conference & RDM Service Launch – summary

University of Edinburgh Research Data Management Service LogoInformation Services (IS) held a half-day conference in the Main Library on the subject of ‘Dealing with Data’ to coincide with the launch of the University of Edinburgh’s Research Data Management support services on 26 August.

University researchers presented to over 120 delegates from across the disciplinary and support spectrum on many aspects of working with data, particularly research with novel methods of creating, using, storing, or sharing data. Subjects included Big Data for disease control, managing West Nilotic language sound files, sharing brain images, geospatial metadata services, visualising qualitative data via carpets!

Dealing with Data Conference

The RDM Programme team are currently collecting feedback and will report on this and the conference in more detail via this blog.

‘Dealing with Data Conference’ delegates then gathered in the Main Library foyer to hear brief talks by Professor Jeff Haywood, Professor Peter Clarke and Dr John Scally followed by the formal launch of the RDM Services by the University’s Principal, Sir Timothy O’Shea who underlined the successful collaboration between research and support service communities in establishing research support services worthy of a leading UK research-intensive university.

University of Edinburgh RDM Service launch by Sir Timothy O'Shea

A ‘storify’ story of tweets collected during the launch and the conference is available, with pictures and perspectives from various attendees.

The launch of the IS-led RDM Services is the culmination of work detailed in the RDM Roadmap which began in earnest in August 2012 following approval of the RDM Policy by the University Court in May 2011.

Details of available and planned RDM Services for University of Edinburgh researchers were reported on in the blogpost: RDM Roadmap: Completion of Phase 1

Conference presentations can be downloaded from Edinburgh Research Archive (ERA) at: https://www.era.lib.ed.ac.uk/handle/1842/9389

Stuart Macdonald
RDM Service Coordinator
stuart.macdonald@ed.ac.uk

Share