Publishing Data Workflows

[Guest post from Angus Whyte, Digital Curation Centre]

In the first week of March the 7th Plenary session of the Research Data Alliance got underway in Tokyo. Plenary sessions are the fulcrum of RDA activity, when its many Working Groups and Interest Groups try to get as much leverage as they can out of the previous 6 months of voluntary activity, which is usually coordinated through crackly conference calls.

The Digital Curation Centre (DCC) and others in Edinburgh contribute to a few of these groups, one being the Working Group (WG) on Publishing Data Workflows. Like all such groups it has a fixed time span and agreed deliverables. This WG completes its run at the Tokyo plenary, so there’s no better time to reflect on why DCC has been involved in it, how we’ve worked with others in Edinburgh and what outcomes it’s had.

DCC takes an active part in groups where we see a direct mutual benefit, for example by finding content for our guidance publications. In this case we have a How-to guide planned on ‘workflows for data preservation and publication’. The Publishing Data Workflows WG has taken some initial steps towards a reference model for data publishing, so it has been a great opportunity to track the emerging consensus on best practice, not to mention examples we can use.

One of those examples was close to hand, and DataShare’s workflow and checklist for deposit is identified in the report alongside workflows from other participating repositories and data centres. That report is now available on Zenodo. [1]

In our mini-case studies, the WG found no hard and fast boundaries between ‘data publishing’ and what any repository does when making data publicly accessible. It’s rather a question of how much additional linking and contextualisation is in place to increase data visibility, assure the data quality, and facilitate its reuse. Here’s the working definition we settled on in that report:

Research data publishing is the release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way.

The ‘key components’ of data publishing are illustrated in this diagram produced by Claire C. Austin.

Data publishing components. Source: Claire C. Austin et al [1]

Data publishing components. Source: Claire C. Austin et al [1]

As the Figure implies, a variety of workflows are needed to build and join up the components. They include those ‘upstream’ around the data collection and analysis, ‘midstream’ workflows around data deposit, packaging and ingest to a repository, and ‘downstream’ to link to other systems. These downstream links could be to third-party preservation systems, publisher platforms, metadata harvesting and citation tracking systems.

The WG recently began some follow-up work to our report that looks ‘upstream’ to consider how the intent to publish data is changing research workflows. Links to third-party systems can also be relevant in these upstream workflows. It has long been an ambition of RDM to capture as much as possible of the metadata and context, as early and as easily as possible. That has been referred to variously as ‘sheer curation’ [2], and ‘publication at source [3]). So we gathered further examples, aiming to illustrate some of the ways that repositories are connecting with these upstream workflows.

Electronic lab notebooks (ELN) can offer one route towards fly-on-the-wall recording of the research process, so the collaboration between Research Space and University of Edinburgh is very relevant to the WG. As noted previously on these pages [4] ,[5], the RSpace ELN has been integrated with DataShare so researchers can deposit directly into it. So we appreciated the contribution Rory Macneil (Research Space) and Pauline Ward (UoE Data Library) made to describe that workflow, one of around half a dozen gathered at the end of the year.

The examples the WG collected each show how one or more of the recommendations in our report can be implemented. There are 5 of these short and to the point recommendations:

  1. Start small, building modular, open source and shareable components
  2. Implement core components of the reference model according to the needs of the stakeholder
  3. Follow standards that facilitate interoperability and permit extensions
  4. Facilitate data citation, e.g. through use of digital object PIDs, data/article linkages, researcher PIDs
  5. Document roles, workflows and services

The RSpace-DataShare integration example illustrates how institutions can follow these recommendations by collaborating with partners. RSpace is not open source, but the collaboration does use open standards that facilitate interoperability, namely METS and SWORD, to package up lab books and deposit them for open data sharing. DataShare facilitates data citation, and the workflows for depositing from RSpace are documented, based on DataShare’s existing checklist for depositors. The workflow integrating RSpace with DataShare is shown below:

RSpace-DataShare Workflows

RSpace-DataShare Workflows

For me one of the most interesting things about this example was learning about the delegation of trust to research groups that can result. If the DataShare curation team can identify an expert user who is planning a large number of data deposits over a period of time, and train them to apply DataShare’s curation standards themselves they would be given administrative rights over the relevant Collection in the database, and the curation step would be entrusted to them for the relevant Collection.

As more researchers take up the challenges of data sharing and reuse, institutional data repositories will need to make depositing as straightforward as they can. Delegating responsibilities and the tools to fulfil them has to be the way to go.

 

[1] Austin, C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Available at: http://dx.doi.org/10.5281/zenodo.34542

[2] ‘Sheer Curation’ Wikipedia entry. Available at: https://en.wikipedia.org/wiki/Digital_curation#.22Sheer_curation.22

[3] Frey, J. et al (2015) Collection, Curation, Citation at Source: Publication@Source 10 Years On. International Journal of Digital Curation. 2015, Vol. 10, No. 2, pp. 1-11

http://doi:10.2218/ijdc.v10i2.377

[4] Macneil, R. (2014) Using an Electronic Lab Notebook to Deposit Data http://datablog.is.ed.ac.uk/2014/04/15/using-an-electronic-lab-notebook-to-deposit-data/

[5] Macdonald, S. and Macneil, R. Service Integration to Enhance Research Data Management: RSpace Electronic Laboratory Notebook Case Study International Journal of Digital Curation 2015, Vol. 10, No. 1, pp. 163-172. http://doi:10.2218/ijdc.v10i1.354

Angus Whyte is a Senior Institutional Support Officer at the Digital Curation Centre.

 

Share

Using an electronic lab notebook to deposit data into Edinburgh DataShare

This is heads up about a ‘coming attraction’.  For the past several months a group at Research Space has been working with the DataShare team, including Robin Rice and George Hamilton, to make it possible to deposit research data from our new RSpace electronic notebook into DataShare.

I gave the first public preview of this integration last month in a presentation called Electronic lab notebooks and data repositories:  Complementary responses to the scientific data problem  to a session on Research Data and Electronic Lab Notebooks at the American Chemical Society conference in Dallas.

When the RSpace ELN becomes available to researchers at Edinburgh later this spring, users of RSpace will be able to make deposits to DataShare directly from RSpace using a simple interface we have built into RSpace.  The whole process only takes a few clicks, and starts with selecting records to be deposited into DataShare and clicking on the DataShare button as illustrated in the following screenshot:b2_workspaceHighlightedYou are then asked to enter some information about the deposit:

c2_datashareDialogFilledAfter confirming a few details about the deposit, the data is deposited directly into DataShare, and information about the deposit appears in DataShare.

h2_viewInDatashare2We will provide details about how to sign up for an RSpace account in a future post later in the spring.  In the meantime, I’d like to thank Robin and George for working with us at RSpace on this exciting project.  As far as we know this is the first time an electronic lab notebook has ever been integrated with an institutional data repository, so this is a pioneering and very exciting experiment!  We hope to use it as a model for similar integrations with other institutional and domain-specific repositories.

Rory MacNeil
Chief Executive, Research Space

Share

Electronic Laboratory Notebooks – help or hindrance to academic research?

On the 30 October 2013 the University of Edinburgh (UoE) organised what I believe to be the first University wide meeting on Electronic Lab Notebooks (ELN), and allowed a number of Principal Investigators (PIs) and others the opportunity to provide useful feedback on their user experiences.  This provided an excellent opportunity to help discuss and inform what the UoE can do to help its researchers, and whether there is likely to be one ‘solution’ which could be implemented across the UoE or if a more bespoke and individual/discipline specific approach would be required.

Lab Notes by S.S.K. – Flickr

Good research and good research data management (RDM) stem from the ability of researchers to accurately record, find, retrieve and store the information from their research endeavours.  For many, but by no means all, this will initially be done by recording their outputs on the humble piece of paper.  Albeit one contained within a hardbound notebook (to ensure an accurate chronological record of the work) and supplemented liberally with printouts, photographs, x-rays, etc. and reminders of where to look for the electronic data relevant to the day’s work (ideally at least).

Presentations from University researchers

Slides from these presentations are available to UoE members via the wiki.

The event kicked off with a live demonstration from the member of the School of Physics & Astronomy, and his positive experiences with the Livescribe system.  This demonstration impressively articulated the functions of the electronic pen, which allows its user to record, stroke by stroke, their writings, and pass on this information either as a movie or document to others, and store the output electronically.  Although there were some disadvantages noted, such as the physical size of the pen and the reliance on WiFi for certain features, and that to date, only certain iOS 7 devices are supported (although this list will grow in 2014).  Clearly, this device has had a positive effect on both the presenter’s research and teaching duties.  However the livescribe pen does not in itself help address how to store these digital files.

The remainder of the presentations from the academic researchers were from the fields of life science, although their experiences were quite diverse.  This helpfully provided a good set-up for a healthy discussion, on both ELNs and indeed the wider aspects of RDM at the UoE.

Of the active researchers who presented, two were PIs from the School of Molecular, Genetic and Population Health Sciences and one was a postdoctoral researcher from the School of Biological Sciences.  All three had prior experience in using previous versions of ELNs, and had sought an ELN to address a range of similar issues with paper laboratory notebooks.

Merits and pitfalls of electronic notebooks

I have chosen not to provide feedback on the specific ELNs trialled here, but the software discussed was Evernote, eCAT, and Accelyrs, and as the UoE does not recommend or discourage the use of any particular ELN to-date, I won’t either.

In all cases these electronic systems were purchased for help with key areas:

Motivation/Benefits

  • Searchable data resource
  • Safe archive
  • Sharing data
  • Copy and paste functions
  • Functionality for reviewing lab member’s progress
  • Ability to organise by experiments (not just chronologically)
  • One system to store reagents/freezer contents with experimental data

And in general, key problematic issues raised with these systems were:

Barriers/Problems

  • Need for reliable internet access
  • Hardware integration into lab environment
  • Required more time to document and import data
  • Poor user interface/experience
  • Copy and paste functions (although time-saving, may increase errors as data are not reviewed)
  • Administration time by PI is required
  • PhD students and postdocs (when given the choice*) preferred to use paper notebooks

*it was mooted that no choice should be given.

Infrastructure

A common theme with the use of ELNs was that of the hardware, and the reliance on WiFi.  Clearly when working at the bench with reagents that are potentially hazardous (chemicals, radiation, etc) or with biologicals that you don’t wish to contaminate (primary cell cultures for instance) the hardware used is not supposed to be moved between such locations and  ‘dry areas’ such as your office.  A number of groups have attempted to solve this problem by utilising tablets, and sync to both the “cloud” and their office computers, and this is of course dependent on WiFi.  Without WiFi, you might unexpectedly find yourself with no access to any of your data/protocols, which leads to real problems if you are in the middle of an experiment.  Additionally this requires the outlay of monies for the purchase of the tablets, and provides a tempting means of distraction to group members (both of which may be frowned upon by many PIs).  This monetary concern was identified as a potential problem for the larger groups, where multiple tablets would be required.

Research Data Management & Electronic Laboratory Notebooks

From an RDM perspective the subsequent discussions raised a number of interesting issues.  Firstly, as a number of these ELN services utilise the “cloud” for storage, it was clear that many researchers, PIs included, were unaware of what was expected from them by both their funding councils and the UoE.

Secure Cloud Computing by FutUndBeidl – Flickr

The Data Protection Act 1998 sets out how organisations may use personal data, and the Records Management Section’s guidance on ‘Taking sensitive information and personal data outside the University’s secure computing environment  details the UoE position on this matter, but essentially all sensitive or personal information leaving the UoE should be encrypted.  This guidance would seem not to have reached a significant proportion of the researchers yet.

ELN? – not for academic research!

Whilst the first two presentations were broadly supportive of ELNs, the third researcher’s presentation was distinctly negative, and he provided his interpretation on the use of an ELN in an academic setting.  Although broadly speaking this presentation was on one product, it was made clear that his opinions were not based on one ‘software product’ alone.  In this case the PI has since abandoned the ELN (after four years of use and requiring his lab members to use it), citing reasons of practicality; it took too long to document the results (paper is always quicker), there is no standard for writing up documentation online**, and the data have effectively been stored twice.

He was also of the strong opinion that the use of ELNs:

“were not going to improve your research quality – it’s for those who want to spend time making their data look pretty.”

And –

“it is not for academic research, but more suited for service labs and industry.”

These would seem to be viewpoints that cannot easily be addressed.

The role of the PI

**Of course this is also true for paper versions, with the National Postdoctoral Association (USA)  noting in their toolkit section on ‘Data Acquisition, Management, Sharing and Ownership’ that with the multinational approach to research that:

“many [postdocs] may prefer to keep their notes in their native language instead of English. Postdoc supervisors need to take this into consideration and establish guidelines for the extent to which record keeping must be generally accessible.”

The role of the PI cannot be overlooked in this process and to-date, even if a paper notebook is utilised, there is often no standard to observe.

The next generation of ELNs

Despite these concerns ResearchSpace Ltd are poised to release the next generation of an ELN, with an enterprise release of their popular eCAT ELN, to be called RSpace.  The RSpace team seem confident that they are both aware and capable of addressing these various user requirements and it will certainly be interesting to see how they get on.  Certainly they provided clear evidence of improved user interfaces, enhanced tools, knowledge of University policy, with the prospect of integration into the existing UoE digital infrastructure, such as the data repository, Edinburgh DataShare.

Researcher engagement

Importantly whilst this programme identified concerns and benefits with the various software systems available, it also highlighted issues with the UoE dissemination of RDM knowledge to the research community, and so perhaps fittingly the last word will be from the chair:

“The University has a lot of useful information on this area of data management; please look at the research support pages!”

So the fundamental question remains, what is the best way to engage researchers in RDM and how can we best address this need at all levels?

Links

David Girdwood
EDINA & Data Library

Share