Research Data Workshops: Electronic Notebooks Summary of Feedback

In the spring of this year (March & May) the Research Data Service ran two workshops on Electronic Notebooks (ENs) where researchers from all three colleges were invited to share their experiences of using ENs with other researchers. Presentations and demos were given on RSpace, Benchling, Jupyter Notebooks, WikiBench, and Lab Archives. Almost 70 research and support staff attended and participated in the discussions.

This post is a distillation of those discussions and we will use them to inform our plans around Electronic Notebooks over the coming year. It was obvious from the level of attendance and engagement with the discussions that there was quite a lot of enthusiasm for the idea of adopting ENs across a variety of different schools and disciplines. However, it also quickly became clear that many researchers and support staff had quite justified reservations about how effectively they could replace traditional paper notebooks. In addition to the ENs which were the subject of presentations a number of other solutions were also discussed, including; LabGuru, OneNote, SharePoint, and Wikis.

It appears that across the University there are a very wide range of platforms being used, and not all of them are intended to serve the function of an EN. This is unsurprising as different disciplines have different requirements and an EN designed for the biological sciences, such as Benchling, is unlikely to meet the needs of a researcher in veterinary medicine or humanities. There is also a huge element of personal preference involved, some researchers wish a simple system that will work straight out of the box while others want something more customisable and with greater functionality for an entire lab to use in tandem.

So, within this complex and varied landscape are there any general lessons we can learn? The answer is “Yes” because regardless of platform or discipline there are a number of common functions an EN has to serve, and a number of hurdles they will have to overcome to replace traditional paper lab books.

Firstly, let’s look at common functional requirements:

  1. Entries in ENs must be trustworthy, anyone using one has to be confident that once an entry is made it cannot be accidentally deleted or altered. All updates or changes must be clearly recorded and timestamped to provide a complete and accurate record of the research conducted and the data collected. This is fundamental to research integrity and to their acceptance by funders, or regulators as a suitable replacement for the traditional, co-signed, lab books.
  2. They must make sharing within groups and between collaborators easier – it is, in theory, far easier to share the contents of an EN with interested parties whether they are in the same lab or in another country. But in doing so they must not make the contents inappropriately available to others, security is also very important.
  3. Integration is the next requirement, any EN should be able to integrate smoothly with the other software packages that a researcher uses on a regular basis, as well as with external (or University central) storage, data repositories, and other relevant systems. If it doesn’t do this then researchers may lose the benefits of being able to record, view, and analyse all of their data in one place, and the time savings from being able to directly deposit data into a suitable repository when a project ends or a publication is coming out.
  4. Portability is also required, it must be possible for a researcher to move from one EN platform to another if, for example, they change institutions. This means they need to be able to extract all of their entries and data in a format that can be understood by another system and which will still allow analysis. Most ENs support PDF exports which are fine for some purposes, but of no use if processing or analysis is desired.
  5. Finally, all ENs need to be stable and reliable, this is a particular issue with web based ENs which require an internet connection to access and use the EN. This is also an area where the University will have to play a significant role in providing long-term and reliable support for selected ENs. They also need the same longevity as a paper notebook, the records they contain must not disappear if an individual leaves a group, or a group moves to another EN platform.

Secondly, barriers to adoption and support required:

  1. Hardware:
    1. Many research environments are not suitable for digital devices, phones / tablets are banned from some “wet” labs on health and safety grounds. If they are allowed in the lab they may not be allowed out again, so space for storage and charging will need to be found. What happens if they get contaminated?
    2. Field based research may not have reliable internet access so web based platforms wouldn’t work.
    3. There is unlikely to be space in most labs for a desktop computer(s).
    4. All of this means there will still be a need for paper based notes in labs with later transfer to the EN, which will result in duplication of effort.
  1. Cost:
    1. tablets and similar are not always an allowable research expense for a grant, so who will fund this?
    2. if the University does not have an enterprise licence for the EN a group uses they will also need to find the funds for this
    3. additional training and support my also be required
  2. Support:
    1. technical support for University adopted systems will need to be provide
    2. ISG staff will need to be clear on what is available to researchers and able to provide advice on suitable platforms for different needs
    3. clear incentives for moving to an EN need to be communicated to staff at all levels
    4. funders, publishers, and regulatory bodies will also need to be clear that ENs are acceptable for their purposes

So, what next? The Research Data Support service will now take all of this feedback and use it to inform our future Electronic Notebook strategy for the University. We will work with other areas of Information Services, the Colleges, and Schools to try to provide researchers in all disciplines with the information they need to use ENs in ways that make their research more efficient and effective. If you have any suggestions, comments, or questions about ENs please visit our ENs page (https://www.ed.ac.uk/information-services/research-support/research-data-service/during/eln). You can also contact us on data-support@ed.ac.uk.

The notes that were taken during both events can be read here Combined_discussion_notes_V1.2

Some presentations from the two workshops are available below, others will be added when they become available:

Speaker(s) Topic Link
Mary Donaldson (Service Coordinator, Research Data Management Service, University of Glasgow) Jisc Research Notebooks Study Mary_Donaldson_ELN_Jisc
Ralitsa Madsen (Postdoctoral Research Fellow, Centre for Cardiovascular Science) RSpace 2019-03-14_ELN_RSpace_RRM
Uriel Urquiza Garcia (Postdoctoral Research Associate, Institute of Molecular Plant Science) Benchling
Yixi Chen (PhD Student, Kunath Group, Institute for Stem Cell Research) Lab Archives 20190509_LabArchives_Yixi_no_videos
Andrew Millar (Chair of Systems Biology) WikiBench
Ugur Ozdemir (Lecturer – Quantitative Political Science or Quantitative IR) Jupyter Notebooks WS_Talk
James slack & Núria Ruiz (Digital Learning Applications and Media) Jupyter Notebooks for Research Jupyter_Noteable_Research_Presentation

Kerry Miller, Research Data Support Officer, Research Data Service

Share

DataVault is now live

After extended development, the Research Data Service’s DataVault system is now operational, adding value to research data for principal investigators and their funders alike by offering a long-term retention solution for important datasets.

DataVault is a companion service to DataShare, the institutional digital repository for researchers to openly license and share datasets and related outputs via the Web. DataVault comprises an online interface connected to the university’s data centre infrastructure and cloud storage.

Each research project can store data in a single vault made up of any number of deposits. DataVault is currently able to accept individual deposits (groups of files) of up to 2 TB each; this will increase over time as project development continues.

DataVault sprint meeting before launch

Immutable

DataVault is designed for long-term retention of research data, to meet funder requirements and ensure future access to high value datasets. It meets digital preservation requirements by storing three copies in different locations (two on tape, one in the cloud) with integrity checking built-in, so that the data owner can retrieve their data with confidence until the end of the retention period (typically ten years).

Secure

The DataVault interface helps to guide users in how to deposit personal and sensitive data, using anonymisation or pseudonymisation techniques whenever possible, as prescribed by the University’s Data Protection Officer (DPO). Because all data are encrypted before deposit, they are protected from unauthorised disclosure. Only the data owner or their nominated delegate is allowed to retrieve data during the retention period. Any decisions about allowing access to others are made by the data owner and are conducted outside the DataVault system, once they have been retrieved onto a private area on DataStore and decrypted.

Discoverable

Although DataVault offers a form of closed archive, the design encourages good research data management practice by requiring a metadata record for each vault in Pure. These records are discoverable on the Web, and linked to the respective data creators, projects and publications.

In exchange for creating this high level public metadata record, the Principal Investigator benefits from the assignment of a unique digital object identifier (DOI) which can be used to cite the data in publications.

The open nature of the metadata means that any reader may make a request to access the dataset. The data owner decides who may have access and under what conditions. Advice can be provided by the Research Data Support team and the DPO.

University data assets

DataVault’s workflow takes into account the possibility/likelihood that the original data owner will have left the university when the period of retention comes to an end. Each vault will be reviewed by representatives of the university in schools, colleges or the Library, acting as the data owner, to make decisions on disposal or further retention and curation. If kept, the vault contents become university data assets.

Plan ahead for data archiving

The Research Data Support team encourages researchers to plan ahead for data archiving, right from the earliest conception stages of the project, so that appropriate costs are included in bids, and enabling the appropriate steps to be carried out to prepare data for either open or closed long-term archiving.

The team can be contacted through the IS Helpline and offers assistance with writing data management plans and making archival decisions. See our service website and contact information at https://www.ed.ac.uk/is/research-data-service or go straight to the DataVault page to learn more about it, get instructions for use, or look up charges. An introductory demo video is available  at  https://media.ed.ac.uk/media/Getting+started+with+the+DataVault/1_h4r4glf7 .

Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections

Share

Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day will be published on MediaHopper in the coming weeks.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh

Share

“Archiving Your Data” – new videos from the Research Data Service

In three new videos released today, researchers from the University of Edinburgh talk about why and how they archive their research data, and the ways in which they make their data openly available using the support, tools and resources provided by the University’s Research Data Service.

Professor Richard Baldock from the MRC Human Genetics Unit explains how he’s been able to preserve important research data relating to developmental biology – and make it available for the long term using Edinburgh DataShare – in a way that was not possible by other means owing to the large amount of histology data produced.

YouTube Preview Image

Dr Marc Metzger from the School of GeoSciences tells how he saves himself time by making his climate mapping research data openly available so that others can download it for themselves, rather than him having to send out copies in response to requests. This approach represents best practice – making the data openly available is also more convenient for users, removing a potential barrier to the re-use of the data.

YouTube Preview Image

Professor Miles Glendinning from Edinburgh College of Art talks about how his architectural photographs of social housing are becoming more discoverable as a result of being shared on Edinburgh DataShare. And Robin Rice, the University’s Data Librarian, discusses the difference between the open (DataShare) and restricted (DataVault) archiving options provided by the Research Data Service.

YouTube Preview Image

For more details about Edinburgh’s Research Data Service, including the DataShare and DataVault systems, see:

https://www.ed.ac.uk/is/research-data-service

Pauline Ward
Research Data Service Assistant
Library and University Collections
University of Edinburgh

Share