EDINA’s ShareGeo Open content into DataShare

Many fascinating datasets can be found in our new ShareGeo Open Collection: http://datashare.is.ed.ac.uk/handle/10283/2345  .

This data represents the entire contents of EDINA’s geospatial repository, ShareGeo Open, successfully imported into DataShare. We took this step to preserve the ShareGeo Open data, after the decision was taken to end the service. Not only have we maintained the accessibility of the data but we also successfully redirected all the handle persistent identifiers so that any existing links to the data, including those included in academic journal articles, have been preserved, such as the one in this paper: http://dx.doi.org/10.1007/s10393-016-1131-y .

Similarly, should the day ever arrive when DataShare was to be closed, we would endeavour to find a suitable repository to which we could migrate our data to ensure its preservation, as per item 13 of our Preservation policy.

We were able to copy the content of almost all metadata fields from ShareGeo to DataShare. The fact both repositories use the Dublin Core metadata standard, and both were running on DSpace, made the task a little easier. The University of Edinburgh supports the Dublin Core Metadata Initiative. DataShare’s metadata schema can be found at https://www.wiki.ed.ac.uk/display/datashare/Current+metadata+schema setting out what our metadata fields are and which values are permitted in them.

Our EDINA sysadmin (and developer) George was very helpful with all our questions and discussions that took place while the team settled on the most appropriate correspondence between the two schemas. The existing documentation was a great help too. George then produced a Python script to harvest the data, using OAI-PMH to get a list of ShareGeo items, then METS for the metadata and bitstreams. He then used SWORD to deposit them all in DataShare.

The team took the opportunity to use DSpace’s batch metadata editing utility and web interface to clean up some of the metadata: adding dates to the temporal coverage field and adding placenames and country abbreviations to the spatial coverage field, to enhance the discoverability of the data.

For example “GB Postcode Areas” can be found using the original handle persistent identifier: http://hdl.handle.net/10672/51 as well as the new DOI which DataShare has given it – DOI: 10.7488/ds/1755. Each of the 255 items migrated to our ShareGeo Open Collection contains a file called metadata.xml which contains all the metadata exactly as it was when exported from ShareGeo itself. I have manually added placenames in the spatial coverage field (which was used differently in ShareGeo, with a bounding box i.e. “northlimit=60.7837;eastlimit=2.7043;southlimit=49.8176;westlimit=-7.4856;”). Many of these datasets cover Great Britain, so they don’t include Northern Ireland but do include Scotland, England and Wales. In this case I’ve added the words “Scotland”, “England” and “Wales” in Spatial Coverage (‘dc.coverage.spatial’), even though these are already implicit in the “Great Britain” value in the same field, because I believe doing so:

  • enhanced the accessibility of the data (by making the geographical extent clearer for users unfamiliar with Great Britain) and…
  • enhanced the discoverability of the data (users searching Google for “Wales” now have a chance of seeing this dataset among the hits).

James Crone who compiled this “GB Postcode Areas” data is part of EDINA’s highly renowned geospatial services team.

Part of James’ work for EDINA involves producing census geography data for the UK DataService. He has recently added updated boundary data for use with the latest anonymised census microdata (that’s from the 2011 census): see the Boundary Data Selector at https://census.ukdataservice.ac.uk/get-data/boundary-data .

Pauline Ward is a Research Data Service Assistant for the University of Edinburgh, based at EDINA.

Detail from GB Postcode Areas data, viewed using QGIS.

Detail from GB Postcode Areas data, viewed using QGIS.

Share

DataShare upgraded to v2.3 – The embargo enhancement release

The latest upgrade of Edinburgh DataShare, from version 2.2 to 2.3, brings in several useability improvements.

  • Embargo expiry reminder
    If you want to deposit your data in DataShare, but you want to impose a delay before your files become freely downloadable, you can apply an embargo to your submission – see our “Checklist for deposit” for a fuller explanation of the embargo feature. As of DataShare v2.3, if you apply an embargo to your deposit, DataShare will now send you an email reminder one week before the embargo is due to expire. This gives you time to make us aware if you need the embargo to be extended, or to send us the details of your paper if it has been published, so that we can add those to the metadata, to help users understand your data.
  • DOI added to the citation field immediately
    When your DataShare deposit is approved by the curator, the system mints a new DOI for you. As of version 2.3, DataShare now immediately appends the URL containing that DOI into the “Citation” field, which is visible at the top of the summary view page of your item. The “Citation” field makes it easy for others to cite your data, because it provides them with text which they can copy and paste into any manuscript (or any other document where they want to cite the data). Previously you would have had to click on “Show full item record” to look for the DOI in the “Persistent identifier” field, or wait for an overnight script to paste the DOI onto the end of the “Citation” field.
  • Tombstone records
    We now have the ability to leave a ‘tombstone’ record in place for any DataShare item that is withdrawn. We only withdraw items in exceptional circumstances – for example where there is a substantive error or omission in the data, such that we feel merely labelling the item as “Superseded” is not sufficient. Now, when we tombstone an item, the files become unavailable indefinitely, but the metadata remain visible at the DOI and handle URLs. Whereas until now, every withdrawn item has become completely invisible, so that the original DOI and handle URLs produced a ‘not found’ error.
Screenshot of a DataShare item's citation field with the DOI

Cortical parcellation citation – now with DOI!

Enjoy!

Pauline Ward

Research Data Service

P.S. Many thanks to our software developer at EDINA, George Hamilton, who actually coded all these enhancements to DataShare, which uses the open-source DSpace system. EDINA’s DataShare code is available at https://github.com/edina/dspace .

Share

Data-X Symposium

Registrations have been coming in thick and fast for the Data-X Symposium to be held on 1 December, Main Lecture theatre, Edinburgh College of Art (programme below).

Data-X is a University of Edinburgh IS Innovation Fund initiative supported by the Data Lab & ASCUS | Art & Science. It brings together PhD researchers from the arts and sciences to develop collaborative data ‘installations’.

To register visit: https://www.eventbrite.com/e/data-x-symposium-tickets-29076676121

Programme:

10.00 – 10.30: Registration & coffee

10.30 – 10.40: Welcome – Stuart Macdonald (Edina, Data-X Project Manager) & Introduction – Dr Martin Parker (Director of Outreach, Edinburgh College of Art)

10.40 – 11.20: Guest speaker: ASCUS & the ASCUS Lab: catalysts for Artiscience- Dr James Howie (Co-Founder, ASCUS)

Session 1 presentations: Chair – Dr. Rocio von Jungenfeld (School of Engineering & Arts, University of Kent)

· 11.20 – 11.35: PUROS Sound Box – Dr Sophia Banou, Dr Christos Kakalis (both School of Architecture & Landscape Architecture, Edinburgh College of Art), Matt Giannotti (Reid School of Music)

· 11.35 – 11.50: eTunes – Dr Siraj Sabihuddin (School of Engineering)

· 11.50 – 12.05: Inside the black box -Luis Fernando Montaño (Centre for Synthetic and Systems Biology) & Bohdan Mykhaylyk (School of Chemistry)

· 12.05 – 12.20: Wind Gust 42048 – Matt Giannotti (Reid School of Music)

· 12.20 – 12.30: Session 1. wrap-up

12.30 – 13.15: Lunch

Session 2 presentations: Chair – Martin Donnelly (Digital Curation Centre)

· 13.15 – 13.30: Elegy for Philippines Eagle – Oli Jan (Reid School of Music)

· 13.30 – 13.45: Feel the Heat: World Temperature Data Quilt – Nathalie Vladis (Centre for Integrative Physiology) & Julia Zaenker (School of Engineering)

· 13.45 – 14.00: o ire – Prof. Nick Fells (School of Culture and Creative Arts, University of Glasgow)

· 14.00 – 14.15: Sinterbot – Adela Rabell Montiel (Queen’s Medical Research Institute) & Dr Siraj Sabihuddin (School of Engineering)

· 14.15 – 14.25: Session 2. wrap-up

14.25 – 15.05: Guest speaker: FUSION – where art meets neuroscience – Dr Jane Haley (Edinburgh Neurioscience)

15.05 – 15.15: Closing remarks: Stuart Macdonald (Edina, Data-X Project Manager)

15.20: Close

Data-X is supported by: The Data Lab, ASCUS, Information Services

Stuart Macdonald
DATA-X Project Manager / Associate Data Librarian
EDINA

Share

Twenty’s Plenty: DataShare v2.1 Upload Upgrade

We have upgraded DataShare (to v2.1) to enable HTML5 resumable upload. This means depositors can now use the user-friendly web deposit interface to upload numerous files at once via drag’n’drop. And to upload files up to 15 GB in size, regardless of network ‘blips’.

In fact we have reason to believe it may be possible to upload a 20 GB file this way: in testing, I gave it 2 hours till the progress bar said 100%, and even though the browser then produced an error message instead of the green tick I was hoping for, I found when I retrieved the submission from the Submissions page that I was able to resume, and the file had been added.

*** So our new advice to depositors is: our current Item size limit and file size limit is 20 GB. Files larger than 15 GB may not upload through your browser. If you have files over 15 GB or data totalling over 20 GB which you’d like to share online, please contact the Data Library team to discuss your options. ***

See screenshots below. Once the files have been selected and the upload commenced, the ‘Status’ column shows the percentage uploaded. A 10 GB file may take in the region of 1 hour to upload in this way. 15 GB files have been uploaded with Chrome, Firefox and Internet Explorer using this interface.

Until now, any file over 1 GB had caused browsers difficulties, meaning many prospective depositors were not able to use the web deposit interface, and instead had to email the curation team, arrange to transfer us their files via DropBox, USB or through the Windows network, and then the curator had to transfer these same files to our server, collate the metadata into an XML file, log into the Linux system and run a batch import script. Often with many hiccups concerning permissions, virus checkers and memory along the way. All very time-consuming.

Soon we will begin working on a download upgrade, to integrate a means for users to download much bigger files from DataShare outside of the limitations of HTTP (perhaps using FTP). The aim is to allow some of the datasets we have in the university which are in the region of 100 GB to be shared online in a way that makes it reasonably quick and easy for users to download them. We have depositors queueing up to use this feature. Watch this space.

Further technical detail about both the HTML5 upload feature and plans for an optimised large download release are available on the slides for the presentation I made at Open Repositories 2016 in Dublin this week: http://www.slideshare.net/paulineward/growing-open-data-making-the-sharing-of-xxlsized-research-data-files-online-a-reality-using-edinburgh-datashare .

NewUploadPage

A simple interface invites the depositor to select files to upload.

 

 

Upload15GB

A 15 GB file uploaded via Firefox on Windows and included in a submitted Item.

 

 

A 20 GB file uploaded and included in an incomplete submission.

A 20 GB file uploaded and included in an incomplete submission.

Pauline Ward, Data Library Assistant, University of Edinburgh

Share