Data-X Symposium

Registrations have been coming in thick and fast for the Data-X Symposium to be held on 1 December, Main Lecture theatre, Edinburgh College of Art (programme below).

Data-X is a University of Edinburgh IS Innovation Fund initiative supported by the Data Lab & ASCUS | Art & Science. It brings together PhD researchers from the arts and sciences to develop collaborative data ‘installations’.

To register visit: https://www.eventbrite.com/e/data-x-symposium-tickets-29076676121

Programme:

10.00 – 10.30: Registration & coffee

10.30 – 10.40: Welcome – Stuart Macdonald (Edina, Data-X Project Manager) & Introduction – Dr Martin Parker (Director of Outreach, Edinburgh College of Art)

10.40 – 11.20: Guest speaker: ASCUS & the ASCUS Lab: catalysts for Artiscience- Dr James Howie (Co-Founder, ASCUS)

Session 1 presentations: Chair – Dr. Rocio von Jungenfeld (School of Engineering & Arts, University of Kent)

· 11.20 – 11.35: PUROS Sound Box – Dr Sophia Banou, Dr Christos Kakalis (both School of Architecture & Landscape Architecture, Edinburgh College of Art), Matt Giannotti (Reid School of Music)

· 11.35 – 11.50: eTunes – Dr Siraj Sabihuddin (School of Engineering)

· 11.50 – 12.05: Inside the black box -Luis Fernando Montaño (Centre for Synthetic and Systems Biology) & Bohdan Mykhaylyk (School of Chemistry)

· 12.05 – 12.20: Wind Gust 42048 – Matt Giannotti (Reid School of Music)

· 12.20 – 12.30: Session 1. wrap-up

12.30 – 13.15: Lunch

Session 2 presentations: Chair – Martin Donnelly (Digital Curation Centre)

· 13.15 – 13.30: Elegy for Philippines Eagle – Oli Jan (Reid School of Music)

· 13.30 – 13.45: Feel the Heat: World Temperature Data Quilt – Nathalie Vladis (Centre for Integrative Physiology) & Julia Zaenker (School of Engineering)

· 13.45 – 14.00: o ire – Prof. Nick Fells (School of Culture and Creative Arts, University of Glasgow)

· 14.00 – 14.15: Sinterbot – Adela Rabell Montiel (Queen’s Medical Research Institute) & Dr Siraj Sabihuddin (School of Engineering)

· 14.15 – 14.25: Session 2. wrap-up

14.25 – 15.05: Guest speaker: FUSION – where art meets neuroscience – Dr Jane Haley (Edinburgh Neurioscience)

15.05 – 15.15: Closing remarks: Stuart Macdonald (Edina, Data-X Project Manager)

15.20: Close

Data-X is supported by: The Data Lab, ASCUS, Information Services

Stuart Macdonald
DATA-X Project Manager / Associate Data Librarian
EDINA

Share

Twenty’s Plenty: DataShare v2.1 Upload Upgrade

We have upgraded DataShare (to v2.1) to enable HTML5 resumable upload. This means depositors can now use the user-friendly web deposit interface to upload numerous files at once via drag’n’drop. And to upload files up to 15 GB in size, regardless of network ‘blips’.

In fact we have reason to believe it may be possible to upload a 20 GB file this way: in testing, I gave it 2 hours till the progress bar said 100%, and even though the browser then produced an error message instead of the green tick I was hoping for, I found when I retrieved the submission from the Submissions page that I was able to resume, and the file had been added.

*** So our new advice to depositors is: our current Item size limit and file size limit is 20 GB. Files larger than 15 GB may not upload through your browser. If you have files over 15 GB or data totalling over 20 GB which you’d like to share online, please contact the Data Library team to discuss your options. ***

See screenshots below. Once the files have been selected and the upload commenced, the ‘Status’ column shows the percentage uploaded. A 10 GB file may take in the region of 1 hour to upload in this way. 15 GB files have been uploaded with Chrome, Firefox and Internet Explorer using this interface.

Until now, any file over 1 GB had caused browsers difficulties, meaning many prospective depositors were not able to use the web deposit interface, and instead had to email the curation team, arrange to transfer us their files via DropBox, USB or through the Windows network, and then the curator had to transfer these same files to our server, collate the metadata into an XML file, log into the Linux system and run a batch import script. Often with many hiccups concerning permissions, virus checkers and memory along the way. All very time-consuming.

Soon we will begin working on a download upgrade, to integrate a means for users to download much bigger files from DataShare outside of the limitations of HTTP (perhaps using FTP). The aim is to allow some of the datasets we have in the university which are in the region of 100 GB to be shared online in a way that makes it reasonably quick and easy for users to download them. We have depositors queueing up to use this feature. Watch this space.

Further technical detail about both the HTML5 upload feature and plans for an optimised large download release are available on the slides for the presentation I made at Open Repositories 2016 in Dublin this week: http://www.slideshare.net/paulineward/growing-open-data-making-the-sharing-of-xxlsized-research-data-files-online-a-reality-using-edinburgh-datashare .

NewUploadPage

A simple interface invites the depositor to select files to upload.

 

 

Upload15GB

A 15 GB file uploaded via Firefox on Windows and included in a submitted Item.

 

 

A 20 GB file uploaded and included in an incomplete submission.

A 20 GB file uploaded and included in an incomplete submission.

Pauline Ward, Data Library Assistant, University of Edinburgh

Share

Publishing Data Workflows

[Guest post from Angus Whyte, Digital Curation Centre]

In the first week of March the 7th Plenary session of the Research Data Alliance got underway in Tokyo. Plenary sessions are the fulcrum of RDA activity, when its many Working Groups and Interest Groups try to get as much leverage as they can out of the previous 6 months of voluntary activity, which is usually coordinated through crackly conference calls.

The Digital Curation Centre (DCC) and others in Edinburgh contribute to a few of these groups, one being the Working Group (WG) on Publishing Data Workflows. Like all such groups it has a fixed time span and agreed deliverables. This WG completes its run at the Tokyo plenary, so there’s no better time to reflect on why DCC has been involved in it, how we’ve worked with others in Edinburgh and what outcomes it’s had.

DCC takes an active part in groups where we see a direct mutual benefit, for example by finding content for our guidance publications. In this case we have a How-to guide planned on ‘workflows for data preservation and publication’. The Publishing Data Workflows WG has taken some initial steps towards a reference model for data publishing, so it has been a great opportunity to track the emerging consensus on best practice, not to mention examples we can use.

One of those examples was close to hand, and DataShare’s workflow and checklist for deposit is identified in the report alongside workflows from other participating repositories and data centres. That report is now available on Zenodo. [1]

In our mini-case studies, the WG found no hard and fast boundaries between ‘data publishing’ and what any repository does when making data publicly accessible. It’s rather a question of how much additional linking and contextualisation is in place to increase data visibility, assure the data quality, and facilitate its reuse. Here’s the working definition we settled on in that report:

Research data publishing is the release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way.

The ‘key components’ of data publishing are illustrated in this diagram produced by Claire C. Austin.

Data publishing components. Source: Claire C. Austin et al [1]

Data publishing components. Source: Claire C. Austin et al [1]

As the Figure implies, a variety of workflows are needed to build and join up the components. They include those ‘upstream’ around the data collection and analysis, ‘midstream’ workflows around data deposit, packaging and ingest to a repository, and ‘downstream’ to link to other systems. These downstream links could be to third-party preservation systems, publisher platforms, metadata harvesting and citation tracking systems.

The WG recently began some follow-up work to our report that looks ‘upstream’ to consider how the intent to publish data is changing research workflows. Links to third-party systems can also be relevant in these upstream workflows. It has long been an ambition of RDM to capture as much as possible of the metadata and context, as early and as easily as possible. That has been referred to variously as ‘sheer curation’ [2], and ‘publication at source [3]). So we gathered further examples, aiming to illustrate some of the ways that repositories are connecting with these upstream workflows.

Electronic lab notebooks (ELN) can offer one route towards fly-on-the-wall recording of the research process, so the collaboration between Research Space and University of Edinburgh is very relevant to the WG. As noted previously on these pages [4] ,[5], the RSpace ELN has been integrated with DataShare so researchers can deposit directly into it. So we appreciated the contribution Rory Macneil (Research Space) and Pauline Ward (UoE Data Library) made to describe that workflow, one of around half a dozen gathered at the end of the year.

The examples the WG collected each show how one or more of the recommendations in our report can be implemented. There are 5 of these short and to the point recommendations:

  1. Start small, building modular, open source and shareable components
  2. Implement core components of the reference model according to the needs of the stakeholder
  3. Follow standards that facilitate interoperability and permit extensions
  4. Facilitate data citation, e.g. through use of digital object PIDs, data/article linkages, researcher PIDs
  5. Document roles, workflows and services

The RSpace-DataShare integration example illustrates how institutions can follow these recommendations by collaborating with partners. RSpace is not open source, but the collaboration does use open standards that facilitate interoperability, namely METS and SWORD, to package up lab books and deposit them for open data sharing. DataShare facilitates data citation, and the workflows for depositing from RSpace are documented, based on DataShare’s existing checklist for depositors. The workflow integrating RSpace with DataShare is shown below:

RSpace-DataShare Workflows

RSpace-DataShare Workflows

For me one of the most interesting things about this example was learning about the delegation of trust to research groups that can result. If the DataShare curation team can identify an expert user who is planning a large number of data deposits over a period of time, and train them to apply DataShare’s curation standards themselves they would be given administrative rights over the relevant Collection in the database, and the curation step would be entrusted to them for the relevant Collection.

As more researchers take up the challenges of data sharing and reuse, institutional data repositories will need to make depositing as straightforward as they can. Delegating responsibilities and the tools to fulfil them has to be the way to go.

 

[1] Austin, C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Available at: http://dx.doi.org/10.5281/zenodo.34542

[2] ‘Sheer Curation’ Wikipedia entry. Available at: https://en.wikipedia.org/wiki/Digital_curation#.22Sheer_curation.22

[3] Frey, J. et al (2015) Collection, Curation, Citation at Source: Publication@Source 10 Years On. International Journal of Digital Curation. 2015, Vol. 10, No. 2, pp. 1-11

http://doi:10.2218/ijdc.v10i2.377

[4] Macneil, R. (2014) Using an Electronic Lab Notebook to Deposit Data http://datablog.is.ed.ac.uk/2014/04/15/using-an-electronic-lab-notebook-to-deposit-data/

[5] Macdonald, S. and Macneil, R. Service Integration to Enhance Research Data Management: RSpace Electronic Laboratory Notebook Case Study International Journal of Digital Curation 2015, Vol. 10, No. 1, pp. 163-172. http://doi:10.2218/ijdc.v10i1.354

Angus Whyte is a Senior Institutional Support Officer at the Digital Curation Centre.

 

Share

Fostering open science in social science

FOSTER_logoOn 10th of June, the Data Library team ran two workshops in association with the EU Horizon 2020 project, FOSTER (Facilitate Open Science Training for European Research), and the Scottish Graduate School of Social Science.

The aim of the morning workshop, “Good practice in data management & data sharing with social research,” was to provide new entrants into the Scottish Graduate School of Social Science with a grounding in research data management using our online interactive training resource MANTRA, which covers good practice in data management and issues associated with data sharing.

The morning started with a brief presentation by Robin Rice on ‘open science’ and its meaning for the social sciences. Pauline Ward then demonstrated the importance of data management plans to ensure work is safeguarded and that data sharing is made possible. I introduced MANTRA briefly, and then Laine Ruus assigned different MANTRA units to participants and asked them to briefly go through the units and extract one or two key messages and report back to the rest of the group. After the coffee break we had another presentation on ethics, informed consent and the barriers for sharing, and we finished the morning session with a ‘Do’s and Dont’s exercise where we asked participants to write in post-it notes the things they remembered, the things they were taking with them from the workshop: green for things they should DO, and pink for those they should NOT. Here are some of the points the learners posted:

DO
– consider your usernames & passwords
– read the Data Protection Act
– check funder/institution regulations/policies
– obtain informed consent
– design a clear consent form
– give participants info about the research
– inform participants of how we will manage data
– confidentiality
– label your data with enough info to retrieve it in future
– develop a data management plan
– follow the certain policies when you re-use dataset[s] created by others
– have a clear data storage plan
– think about how & how long you will store your data
– store data in at least 3 places, in at least 2 separate locations
– backup!
– consider how/where you back up your data
– delete or archive old versions
– data preservation
– keep your data safe and secure with the help of facilities of fund bodies or university
– think about sharing
– consider sharing at all stages. Think about who will use my data next
– share data (responsibly)

DON’T
– unclear informed consent
– a sense of forcing participants to be part of research
– do not store sensitive information unless necessary
– don’t staple consent forms to de-identified data records/store them together
– take information security for granted
– assume all software will be able to handle your data
– don’t assume you will remember stuff. Document your data
– assume people understand
– disclose participants’ identity
– leave computer on
– share confidential data
– leave your laptop on the bus!
– leave your laptop on the train!
– leave your files on a train!
– don’t forget it is not just my data, it is public data
– forget to future proof

Robin Rice presenting at FOSTERing Open Science workshop

Our message was that open science will thrive when researchers:

  • organise and version their data files effectively,
  • provide comprehensive and sufficient documentation for others to understand and replicate results and thus cite the source properly
  • know how to store and transport your data safely and securely (ensuring backup and encryption)
  • understand legal and ethical requirements for managing data about human subjects
  • Recognise the importance of good research data management practice in your own context

The afternoon workshop on “Overcoming obstacles to sharing data about human subjects” built on one of the main themes introduced in the morning, with a large overlap of attendees. The ethical and regulatory issues in this area can appear daunting. However, data created from research with human subjects are valuable, and therefore are worth sharing for all the same reasons as other research data (impact, transparency, validation etc). So it was heartening to find ourselves working with a group of mostly new PhD students, keen to find ways to anonymise, aggregate, or otherwise transform their data appropriately to allow sharing.

Robin Rice introduced the Data Protection Act, as it relates to research with human subjects, and ethical considerations. Naturally, we directed our participants to MANTRA, which has detailed information on the ethical and practical issues, with specific modules on “Data protection, rights & access” and “Sharing, preservation & licensing”. Of course not all data are suitable for sharing, and there are risks to be considered.

In many cases, data can be anonymised effectively, to allow the data to be shared. Richard Welpton from the UK Data Archive shared practical information on anonymisation approaches and tools for ‘statistical disclosure control’, recommending sdcMicroGUI (a graphical interface for carrying out anonymisation techniques, which is an R package, but should require no knowledge of the R language).

DrNiamhMooreFinally Dr Niamh Moore from University of Edinburgh shared her experiences of sharing qualitative data. She spoke about the need to respect the wishes of subjects, her research gathering oral history, and the enthusiasm of many of her human subjects to be named in her research outputs, in a sense to own their own story, their own words.

Links:

Rocio von Jungenfeld & Pauline Ward
EDINA and Data Library

Share