University of Edinburgh Data Safe Haven: a new facility for sensitive data

Information Services has implemented a remote-access “Safe Haven” environment to protect data confidentiality, satisfy concerns about data loss and reassure Data Controllers about the University’s secure management and processing of their data in compliance with Data Protection Legislation.

The Data Safe Haven (DSH) provides a secure storage space and a secure analytic environment that is appropriate for all research projects working with different kinds of sensitive data. It has its own firewall and is isolated from the University network. It is located in a secure facility with controlled access. All traffic between the DSH and the user’s computer is encrypted and no internet access is available. Access to the DSH is only for authorized users via an assigned ‘Yubikey’ and secure VMware Horizon Client, and will only be available from the managed desktops that are white listed for access to the DSH.

Provision of a range of analytic and supporting applications (e.g., SPSS, STATA, SAS, MATLAB, and R) is available. These are delivered dynamically and are assigned to the project. The applications that are available to the users will depend on the type of arrangement that has been made with the DSH technical team prior to the project registration and on the licensing arrangements with the software provider.

The DSH initial security review (penetration test) was carried out by a CREST accredited organisation in August 2018. The DSH exhibited an overall good security stance and demonstrated resilience against the various types of tests performed by the consultants. This was the initial review that formed part of our ongoing drive towards ISO 27001 certification. We expect to complete this phase of the project and obtain the certificate by November 2019.

We have successfully closed the pilot phase of the DSH with five projects in October 2018, and softly launched the service at our “Dealing with Data” conference in November 2018. At present, the DSH Technical team has been migrating Centre for Clinical Brain Sciences – National CJD Research and Surveillance project data from the walled garden into the DSH.

The DSH operates on a cost recovery basis and this cost should be included in grant applications. We welcome enquiries from researchers as early as possible in their project planning. Costing is based on bespoke project requirements (see DSH Overview for users at https://www.ed.ac.uk/is/data-safe-haven.

The DSH Operations team also provides:

  • advice and input for funding and permissions applications;
  • guidance on meeting Approved Researcher requirements;
  • advice about meeting data sharing requirements and archiving of data.

We can set up a demo environment for researchers on request to explore the use of the DSH for their projects. If you need further information, please contact the RDS Team via data-support@ed.ac.uk.

Cuna Ekmekcioglu, Data Safe Haven Manager, Research Data Service

Share

Research Data Workshops: Electronic Notebooks Summary of Feedback

In the spring of this year (March & May) the Research Data Service ran two workshops on Electronic Notebooks (ENs) where researchers from all three colleges were invited to share their experiences of using ENs with other researchers. Presentations and demos were given on RSpace, Benchling, Jupyter Notebooks, WikiBench, and Lab Archives. Almost 70 research and support staff attended and participated in the discussions.

This post is a distillation of those discussions and we will use them to inform our plans around Electronic Notebooks over the coming year. It was obvious from the level of attendance and engagement with the discussions that there was quite a lot of enthusiasm for the idea of adopting ENs across a variety of different schools and disciplines. However, it also quickly became clear that many researchers and support staff had quite justified reservations about how effectively they could replace traditional paper notebooks. In addition to the ENs which were the subject of presentations a number of other solutions were also discussed, including; LabGuru, OneNote, SharePoint, and Wikis.

It appears that across the University there are a very wide range of platforms being used, and not all of them are intended to serve the function of an EN. This is unsurprising as different disciplines have different requirements and an EN designed for the biological sciences, such as Benchling, is unlikely to meet the needs of a researcher in veterinary medicine or humanities. There is also a huge element of personal preference involved, some researchers wish a simple system that will work straight out of the box while others want something more customisable and with greater functionality for an entire lab to use in tandem.

So, within this complex and varied landscape are there any general lessons we can learn? The answer is “Yes” because regardless of platform or discipline there are a number of common functions an EN has to serve, and a number of hurdles they will have to overcome to replace traditional paper lab books.

Firstly, let’s look at common functional requirements:

  1. Entries in ENs must be trustworthy, anyone using one has to be confident that once an entry is made it cannot be accidentally deleted or altered. All updates or changes must be clearly recorded and timestamped to provide a complete and accurate record of the research conducted and the data collected. This is fundamental to research integrity and to their acceptance by funders, or regulators as a suitable replacement for the traditional, co-signed, lab books.
  2. They must make sharing within groups and between collaborators easier – it is, in theory, far easier to share the contents of an EN with interested parties whether they are in the same lab or in another country. But in doing so they must not make the contents inappropriately available to others, security is also very important.
  3. Integration is the next requirement, any EN should be able to integrate smoothly with the other software packages that a researcher uses on a regular basis, as well as with external (or University central) storage, data repositories, and other relevant systems. If it doesn’t do this then researchers may lose the benefits of being able to record, view, and analyse all of their data in one place, and the time savings from being able to directly deposit data into a suitable repository when a project ends or a publication is coming out.
  4. Portability is also required, it must be possible for a researcher to move from one EN platform to another if, for example, they change institutions. This means they need to be able to extract all of their entries and data in a format that can be understood by another system and which will still allow analysis. Most ENs support PDF exports which are fine for some purposes, but of no use if processing or analysis is desired.
  5. Finally, all ENs need to be stable and reliable, this is a particular issue with web based ENs which require an internet connection to access and use the EN. This is also an area where the University will have to play a significant role in providing long-term and reliable support for selected ENs. They also need the same longevity as a paper notebook, the records they contain must not disappear if an individual leaves a group, or a group moves to another EN platform.

Secondly, barriers to adoption and support required:

  1. Hardware:
    1. Many research environments are not suitable for digital devices, phones / tablets are banned from some “wet” labs on health and safety grounds. If they are allowed in the lab they may not be allowed out again, so space for storage and charging will need to be found. What happens if they get contaminated?
    2. Field based research may not have reliable internet access so web based platforms wouldn’t work.
    3. There is unlikely to be space in most labs for a desktop computer(s).
    4. All of this means there will still be a need for paper based notes in labs with later transfer to the EN, which will result in duplication of effort.
  1. Cost:
    1. tablets and similar are not always an allowable research expense for a grant, so who will fund this?
    2. if the University does not have an enterprise licence for the EN a group uses they will also need to find the funds for this
    3. additional training and support my also be required
  2. Support:
    1. technical support for University adopted systems will need to be provide
    2. ISG staff will need to be clear on what is available to researchers and able to provide advice on suitable platforms for different needs
    3. clear incentives for moving to an EN need to be communicated to staff at all levels
    4. funders, publishers, and regulatory bodies will also need to be clear that ENs are acceptable for their purposes

So, what next? The Research Data Support service will now take all of this feedback and use it to inform our future Electronic Notebook strategy for the University. We will work with other areas of Information Services, the Colleges, and Schools to try to provide researchers in all disciplines with the information they need to use ENs in ways that make their research more efficient and effective. If you have any suggestions, comments, or questions about ENs please visit our ENs page (https://www.ed.ac.uk/information-services/research-support/research-data-service/during/eln). You can also contact us on data-support@ed.ac.uk.

The notes that were taken during both events can be read here Combined_discussion_notes_V1.2

Some presentations from the two workshops are available below, others will be added when they become available:

Speaker(s) Topic Link
Mary Donaldson (Service Coordinator, Research Data Management Service, University of Glasgow) Jisc Research Notebooks Study Mary_Donaldson_ELN_Jisc
Ralitsa Madsen (Postdoctoral Research Fellow, Centre for Cardiovascular Science) RSpace 2019-03-14_ELN_RSpace_RRM
Uriel Urquiza Garcia (Postdoctoral Research Associate, Institute of Molecular Plant Science) Benchling
Yixi Chen (PhD Student, Kunath Group, Institute for Stem Cell Research) Lab Archives 20190509_LabArchives_Yixi_no_videos
Andrew Millar (Chair of Systems Biology) WikiBench
Ugur Ozdemir (Lecturer – Quantitative Political Science or Quantitative IR) Jupyter Notebooks WS_Talk
James slack & Núria Ruiz (Digital Learning Applications and Media) Jupyter Notebooks for Research Jupyter_Noteable_Research_Presentation

Kerry Miller, Research Data Support Officer, Research Data Service

Share

New training: Assessing Data Quality and Disclosure Risk in Numeric Data

The Research Data Service, in collaboration with the UK Data Service, are running workshops on the theme ‘Assessing Data Quality and Disclosure Risk in Numeric Data’. These hands-on sessions introduce the key elements of data quality and disclosure risk, and include practical demonstrations of two tools to evaluate the quality (QAMyData) and disclosure risk (sdcMicro) of numeric research data.

Workshops will run across two days, with sessions on different days for researchers interested in social survey data (10th June) and health data (11th June).

Session 1: Assessing Data Quality in Numeric Data

This workshop will introduce the key elements of data quality assessment, including file checks, and undertaking data and metadata checks. Attendees will gain hands-on experience using QAMyData, a purpose-built configurable tool to quickly and automatically detect some of the most common problems in survey and other numeric data (SPSS, STATA, SAS & csv files).

Session 2: Assessing Disclosure Risk in Numeric Data

This workshop will provide an introduction to statistical disclosure control (SDC), covering: types of Identifiers; de-identification and anonymization; types of disclosure; SDC approaches; k-anonymity and l-diversity. The workshop introduces sdcMicro, a practical R package for measuring disclosure risk in numeric data. The session will give attendees hands-on experience using sdcMicro to assess disclosure risk and apply SDC methods to anonymize numeric data, while evaluating the balance between disclosure risk and data loss.

These sessions are available to research staff and students and can be booked using the links below:

Assessing Data Quality in Numeric Data (Social Survey Data) –                                      10th June 0930-1200, Lister Learning and Teaching Centre, Room 1.16 (Central Area)  https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34939

Assessing Disclosure Risk in Numeric Data (Social Survey Data) –                                10th June 1330-1700, Lister Learning and Teaching Centre, Room 1.16 (Central Area) https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34941

Assessing Data Quality in Numeric Data (Health Data) –                                                  11th June 0930-1230, Microlab 1, Chancellor’s Building (Little France) https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34940

Assessing Disclosure Risk in Numeric Data (Health Data) –                                            11th June 1300-1700, Microlab 1, Chancellor’s Building (Little France)                                 https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=34942

Bob Sanders
Research Data Support
Library & University Collections

Share

FAIR dues to the Research Data Alliance

It has been a while since we’ve blogged about the Research Data Alliance (RDA), and as an organisation it has come into its own since its beginnings in 2013. One can count on discovering the international state of the art in a range of data-related topics covered by its interest groups and working groups which meet at its plenary events, held every six months. That is why I attended the 13th RDA Plenary held in Philadelphia earlier this month and I was not disappointed.

I arrived Monday morning in time for the second day of a pre-conference sponsored by CODATA on FAIR and Responsible Research Data Management at Drexel University. FAIR is a popular concept amongst research funders for illustrating data management done right: by the time you complete your research project (or shortly after) your data should be Findable, Accessible, Interoperable and Reusable.

Fair enough, but we data repository providers also want to know how to build the ecosystems that will make it super-easy for researchers to make their data FAIR, so we need to talk to each other to compare notes and decide exactly what each letter means in practice.

Borrowed from OpenAire 

Amongst the highlights were some tools and resources for researchers or data providers mentioned by various speakers.

  • The Australian Research Data Commons (ARDC) has created a FAIR self-assessment tool.
  • For those who like stories, the Danish National Archives have created a FAIRytale to help understand the FAIR principles.
  • ARDC with Library Carpentry conducted a sprint that led to a disciplinary smorgasbord called Top Ten Data and Software Things.
  • DataCite offers a Repository Finder tool through its union with re3data.org to find the most appropriate repository in which to deposit your data.
  • Resources for “implementation networks” from the EU-funded project GO FAIR, including training materials under the rubric of GO TRAIN.
  • The Geo-science focused Enabling FAIR Data Project is signing up publishers and repositories to commitment statements, and has a user-friendly FAQ explaining why researchers should care and what they can do.
  • A brand new EU-funded project, FAIRsFAIR (Fostering FAIR Data Practice in Europe) is taking things to the next level, building new networks to certifying learners and trainers, researchers and repositories in FAIRdom.

That last project’s ambitions are described in this blog post by Joy Davidson at DCC. Another good blog post I found about the FAIR pre-conference event is by Rebecca Springer at Ithaka S+R. If I get a chance I’ll add another brief post for the main conference.

Robin Rice
Data Librarian & Head of Research Data Support
Library & University Collections

Share