Training researchers for a software and data-intensive world with Edinburgh Carpentries

This is guest post from Giacomo Peru and the EdCarp Committee (https://edcarp.github.io/committee/). Sections of this post were published previously on the EPCC blog.

EdCarpLogo

EdCarpLogo

The Edinburgh Carpentries (EdCarp) is a training initiative, which offers the Carpentries computing and data skills curriculum in Edinburgh. The workshops train researchers on fundamental skills needed for conducting efficient, open, and reproducible research. The EdCarp team comprises staff and student volunteers from across disciplines, academic units, and career stages.

Since 2018, EdCarp has organised 25 workshops across the academic institution, training over 300 staff and students in data cleaning, manipulation, visualisation and version control methods using tools such as R, python, Unix shell, Git, SQL and OpenRefine. Courses are free to participants and are oversubscribed very quickly. We are now rolling out our 2020 schedule and announcing workshops.

EdCarp are working to establish collaborations with other organisations, external and internal to the university: the Scottish Funding Council, the Institute for Academic Development and the Data Driven Innovation programme.

EdCarp can work with your academic unit or doctoral training program to help promote the fundamental data skills that your colleagues need.

A crucial aspect of EdCarp and their training model is the participation and voluntary commitment of the community, where trainees go to become helpers, helpers to instructors and so on.  EdCarp are always looking for new people willing to help, in any capacity; please sign up here if you would like to be kept updated and/or get involved: https://eepurl.com/gl4MsX.

 

Share

Research Data Workshops: DataVault Summary

Having soft-launched the DataVault facility in early 2019, the Research Data Support team -with the support of the project board – held five workshops in different colleges and locations to find out what the user community thought about it. This post summarises what we learned from participants, who were made up roughly equally of researchers (mainly staff) and support professionals (mainly computing officers based in the Schools and Colleges).

Each workshop began with presentations and a demonstration by Research Data Service staff, explaining the rationale of the DataVault, what it should and should not be used for, how it works, how the University will handle long-term management of data assets deposited in the DataVault, and practicalities such as how to recover costs through grant proposals or get assistance to deposit.

After a networking lunch we held discussion groups, covering topics such as prioritisation of features and functionality, roles such as the university as data asset owner, and the nature of the costs (price).

The team was relieved to learn that the majority (albeit from a somewhat self-selecting sample) agreed that the service fulfilled a real need; some data does need to be kept securely for a named period to comply with research funders’ rules, and participants welcomed a centralised platform to do this. The levels of usability and functionality we have managed to reach so far were met with somewhat less approval: clearly the development team has more work to do, and we are glad to have won further funding from the Digital Research Services programme in 2019-2020 in order to do it.

Attitudes toward university ownership of data assets was also a mixed bag; some were sceptical and wondered if researchers would participate in such a scheme, but others found it a realistic option for dealing with staff turnover and the inevitability of data outlasting data owners. Attitudes toward cost were largely accepting (the DataVault provides a cheaper alternative than our baseline DataStore disk storage), but concerns about the safekeeping of legacy and unfunded research data were raised at each workshop.

A sample of points raised follows:

  • Utility? “Everyone I know has everything on OneDrive.”
  • Regarding prioritisation of features – security first; file integrity first; putting data from other sources than DataStore; facilitating larger deposit sizes; ease of use.
  • Quickness of deposit and retrieval? Deposit was deemed more important to be quick than retrieval.
  • University as data asset owner?
    • Under GDPR the data are already university assets (because the Uni is the data controller).
    • People who manage the data should be close to the research; IT people can manage users but shouldn’t be making decisions about data. Danger that because it’s related to IT it gets dumped on IT officers. The formal review process helps to ensure decisions will be made properly. Include flexibility into the review hierarchy to allow for variation in school infrastructure.
    • When I heard that I was – not shocked – but concerned. If I move to another university how do I get access? This might be a problem. Researchers might prefer to retain three copies themselves.
  • Is the cost recovery mechanism valid?
    • Vault costs are legitimate costs.
    • Ideally should come from grant overheads, until then need to charge.
    • Possible to charge for small / medium/large project at start rather than per TB?
  • Is the 100 GB threshold sufficient for unfunded research? How else could unfunded or legacy data be covered (who pays)?
    • Alumni sponsor a dataset scheme?
    • There will be people with a ‘whole bunch of data somewhere’ that would be more appropriately stored in DataVault.

The team is grateful to all of the workshop participants for their time and thoughts; the report will be considered further by the project board and the Research Data Service Steering Group members. The full set of workshop notes are colour-coded to show comments from different venues and are available to read on the RDM wiki, for anyone with a University log-in (EASE).


Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections

Share

Research Data Workshops: Sensitive Data Challenges and Solutions

This workshop at the Bioquarter was attended by 27 research staff representing all three colleges, with a majority of Medicine and Veterinary Medicine. It began with an introductory presentation from Robin Rice covering the new Data Safe Haven facility of the Research Data Service and and was followed by brief presentations from Lynne Forrest (Research Support Officer on Scottish Longitudinal Study); Fiona Strachan (Clinical Research Manager, Centre for Cardiovascular Science); and Jonathan Crook (Professor of Business Economics). Each speaker shared their experiences of both conducting research using sensitive data and supporting other researchers. Although they work with very different types of data it was easy to identify certain common requirements:

  • Easy access to secure data storage and analysis platforms;
  • Consistent & comprehensive training and guidance on working with sensitive data;
  • Support to meet the necessary requirements to gain access to the data they need;

In the discussion groups that followed, participants were asked about their experiences working with sensitive data, the requirements researchers needed services such as data safe havens to fulfil, and ramifications of the cost recovery model, with regard to including costs in grant proposals.

The major themes that emerged were concerns around training, data governance, and concerns about meeting costs for protecting sensitive data. There was a strong feeling that more and better training was required for all those working with sensitive data. There was also confusion about the number, location, and criteria of different Data Safe Havens now available, and no single place to find clear information on these.

When talking specifically about the Data Safe Haven offered by IS for UoE researchers, the biggest concern was around cost. The standard price was considered high for the majority of grants, which are either small or need to be highly competitive. In some disciplines grant funding is not common and so it is unclear how the costs would be able to be met. The Research Data Service representatives encouraged people to get a bespoke quote and discuss requirements with the team as early as possible, as flexibility on both cost and build specifications (e.g. high performance computing) is built-in.

Some specific points arising from the discussions were:

  • One negative experience about working with sensitive data is the length of time needed to get data approvals (e.g. from NHS bodies). Participants wondered if the University could help to speed those up.
  • More training was desired in sensitive data management and better ways to structure training for students.
  • Learning outcomes need to focus on change of behaviour; with focus on local procedures.
  • One participant felt that schools need a researcher portfolio system, some way of keeping track of who has what data. A suggestion was made to have an asset manager in the university, similar to the one in NHS.
  • Less than optimal security practices can be observed, such as leaving a clinical notebook in a coffee room. More training is needed but this is not fully covered in either clinical practice courses nor ethics.
  • There were concerns around data governance – how to set up gatekeepers for research projects using Data Safe Haven, how long to store things in the DataVault. ACCORD was pointed to for having good structure in data governance.
  • Long-running projects (e.g. ten years) would have trouble meeting the annual costs.
  • Projects are invested in locally run services and expertise; added value centralised services need to be low-cost.

Overall researchers were in favour of having a Data Safe Haven available for projects that need it, but they would also like to have support to correctly anonymise and manage their data so that they could continue to use standard data storage and analysis platforms. This would mean that only those with the most sensitive of data would need to rely upon the UoE DSH to conduct their research.

Those with a University log-in may read the full set of notes on the RDM wiki.

Kerry Miller
Research Data Support Officer
Library & University Collections

Share

RDM Training for Undergraduate Students

Link

RDM Training for Undergraduate Students

The Research Data Service at the University of Edinburgh provides research data support and training for staff and postgraduate students. Yet, over the last year through an Innovation Grant  – we have decided to branch out and produce training materials to support our undergraduate students as well.

thumbnail

The result is a new handbook called ‘Data Mindfulness: Making the most of your dissertation’, along with a set of face-to-face workshops that we have delivered during the spring semester and will be delivering again this autumn. The idea behind this handbook and the workshops is to take UG students through all the stages of their dissertation journey: from choosing their question to dealing with literature and data to preserving their data after submission.

Unlike existing material for postgraduates and researchers, this handbook has been written by one of our PhD interns from the perspective of a student; and it places data management tips within the broader experience of conducting a UG dissertation. We believe this student perspective is what makes this handbook unique and particularly innovative.

Download Data Mindfulness-Making the Most of your Dissertation handbook for your own use or to customise for your own UG students.

Candela Sanchez-Rodilla Espeso
UG Research Data Management Skills Co-ordinator

Share