Highlights from the RDM Programme Progress Report: November 2015 – January 2016

Data Seal of Approval have awarded DataShare Trusted Repository status; their assessment of our service can be read at https://assessment.datasealofapproval.org/assessment_175/seal/html/. In addition a major new release of DataShare was completed in November, this makes the code open in Github as well as making general improvements to the look and feel of the website.

The ‘interim’ DataVault is now in final testing and will be rolled out on a request basis to those researchers who can demonstrate an urgent need to use the service now rather than waiting until the final version is ready later this year. The phase three funding for development of the DataVault has been received from Jisc, this runs from March to August, so the final version should be ready for launch sometime after this. The project was presented at the International Digital Curation Conference in February 2016.

Over the three month period a total of 328 staff and postgraduate researchers have attended a Research Data Management (RDM) course or workshop.

Work on the MANTRA MOOC (Massive Open Online Course) was expected to be finalised in February and launched on 1st March, at the following URL: https://www.coursera.org/learn/data-management.

University of Edinburgh wrote the Working with Data section (one out of 5 weeks of the course) and with the help of the Learning, Teaching and Web division of Information Services completed two video interviews with researchers and a ‘vox pop’ video clip of clinical researchers at the EQUATOR conference in Edinburgh in autumn, 2015. The content is open source and videos can be added to our YouTube channel to help with promotion. There will be some income from this, but a smaller portion than our partner, the University of North Carolina, based on certificates of completion priced at $49 or £33.

The need to create a dataset record in PURE for each dataset published, or referenced in a publication, is now being emphasised in all Research Data Service communications, formal and informal, and to staff at all levels. Uptake is understandably low at this point but we hope to see a steady increase as researchers and support staff begin to see the benefits of adding datasets to their research profile. In the case of DataShare records, a draft mapping of fields between DataShare and PURE has been produced as a start of a plan for migrating records from DataShare to PURE.

By the end of January 2016, 69 records had been created and published on Edinburgh Research Explorer.

Four interns have been employed using funding from Jisc as part of the UK Research Data Discovery Service (UKRDDS) project which aims to create a national aggregate register of data sets.  A trial site is available at: http://ckan.data.alpha.jisc.ac.uk/. The UKRDDS interns will help to create PURE records and upload open data into DataShare, and raise awareness of RDM generally within their schools. There are currently three PhD interns in place in LLC, SOS, and Roslin, two more in LLC, & DIPM will start in February. The approach each intern takes will depend on the nature and structure of their school and will, in some cases, be mediated by research administrators.

An innovation fund grant has been received to fund the delivery of an exhibition “Pioneering Research Data”. Each college will be represented by a PhD intern, the recruitment of these has already begun and they should be in post by the end of March. The Exhibition is due to be delivered in November of this year.

National and International Engagement Activities

Robin Rice led a panel at the IPRES conference, Chapel Hill, North Carolina, on 3rd November called ‘Good, better, best’? Examining the range and rationales of institutional data curation practices’.

Robin Rice had a proposal accepted for the forthcoming Force11 (2016) conference, on Overcoming Obstacles to Sharing Data about Human Subjects, building on the training course we are delivering, Working with Personal and Sensitive Data.

Fostering open science in social science

FOSTER_logoOn 10th of June, the Data Library team ran two workshops in association with the EU Horizon 2020 project, FOSTER (Facilitate Open Science Training for European Research), and the Scottish Graduate School of Social Science.

The aim of the morning workshop, “Good practice in data management & data sharing with social research,” was to provide new entrants into the Scottish Graduate School of Social Science with a grounding in research data management using our online interactive training resource MANTRA, which covers good practice in data management and issues associated with data sharing.

The morning started with a brief presentation by Robin Rice on ‘open science’ and its meaning for the social sciences. Pauline Ward then demonstrated the importance of data management plans to ensure work is safeguarded and that data sharing is made possible. I introduced MANTRA briefly, and then Laine Ruus assigned different MANTRA units to participants and asked them to briefly go through the units and extract one or two key messages and report back to the rest of the group. After the coffee break we had another presentation on ethics, informed consent and the barriers for sharing, and we finished the morning session with a ‘Do’s and Dont’s exercise where we asked participants to write in post-it notes the things they remembered, the things they were taking with them from the workshop: green for things they should DO, and pink for those they should NOT. Here are some of the points the learners posted:

– consider your usernames & passwords
– read the Data Protection Act
– check funder/institution regulations/policies
– obtain informed consent
– design a clear consent form
– give participants info about the research
– inform participants of how we will manage data
– confidentiality
– label your data with enough info to retrieve it in future
– develop a data management plan
– follow the certain policies when you re-use dataset[s] created by others
– have a clear data storage plan
– think about how & how long you will store your data
– store data in at least 3 places, in at least 2 separate locations
– backup!
– consider how/where you back up your data
– delete or archive old versions
– data preservation
– keep your data safe and secure with the help of facilities of fund bodies or university
– think about sharing
– consider sharing at all stages. Think about who will use my data next
– share data (responsibly)

– unclear informed consent
– a sense of forcing participants to be part of research
– do not store sensitive information unless necessary
– don’t staple consent forms to de-identified data records/store them together
– take information security for granted
– assume all software will be able to handle your data
– don’t assume you will remember stuff. Document your data
– assume people understand
– disclose participants’ identity
– leave computer on
– share confidential data
– leave your laptop on the bus!
– leave your laptop on the train!
– leave your files on a train!
– don’t forget it is not just my data, it is public data
– forget to future proof

Robin Rice presenting at FOSTERing Open Science workshop

Our message was that open science will thrive when researchers:

  • organise and version their data files effectively,
  • provide comprehensive and sufficient documentation for others to understand and replicate results and thus cite the source properly
  • know how to store and transport your data safely and securely (ensuring backup and encryption)
  • understand legal and ethical requirements for managing data about human subjects
  • Recognise the importance of good research data management practice in your own context

The afternoon workshop on “Overcoming obstacles to sharing data about human subjects” built on one of the main themes introduced in the morning, with a large overlap of attendees. The ethical and regulatory issues in this area can appear daunting. However, data created from research with human subjects are valuable, and therefore are worth sharing for all the same reasons as other research data (impact, transparency, validation etc). So it was heartening to find ourselves working with a group of mostly new PhD students, keen to find ways to anonymise, aggregate, or otherwise transform their data appropriately to allow sharing.

Robin Rice introduced the Data Protection Act, as it relates to research with human subjects, and ethical considerations. Naturally, we directed our participants to MANTRA, which has detailed information on the ethical and practical issues, with specific modules on “Data protection, rights & access” and “Sharing, preservation & licensing”. Of course not all data are suitable for sharing, and there are risks to be considered.

In many cases, data can be anonymised effectively, to allow the data to be shared. Richard Welpton from the UK Data Archive shared practical information on anonymisation approaches and tools for ‘statistical disclosure control’, recommending sdcMicroGUI (a graphical interface for carrying out anonymisation techniques, which is an R package, but should require no knowledge of the R language).

DrNiamhMooreFinally Dr Niamh Moore from University of Edinburgh shared her experiences of sharing qualitative data. She spoke about the need to respect the wishes of subjects, her research gathering oral history, and the enthusiasm of many of her human subjects to be named in her research outputs, in a sense to own their own story, their own words.


open.ed report


Lorna M. Campbell, a Digital Education Manager with EDINA and the University of Edinburgh, writes about the ideas shared and discussed at the open.ed event this week.


Earlier this week I was invited by Ewan Klein and Melissa Highton to speak at Open.Ed, an event focused on Open Knowledge at the University of Edinburgh.  A storify of the event is available here: Open.Ed – Open Knowledge at the University of Edinburgh.

“Open Knowledge encompasses a range of concepts and activities, including open educational resources, open science, open access, open data, open design, open governance and open development.”

 – Ewan Klein

Ewan set the benchmark for the day by reminding us that open data is only open by virtue of having an open licence such as CC0, CC BY, CC SA. CC Non Commercial should not be regarded as an open licence as it restricts use.  Melissa expanded on this theme, suggesting that there must be an element of rigour around definitions of openness and the use of open licences. There is a reputational risk to the institution if we’re vague about copyright and not clear about what we mean by open. Melissa also reminded us not to forget open education in discussions about open knowledge, open data and open access. Edinburgh has a long tradition of openness, as evidenced by the Edinburgh Settlement, but we need a strong institutional vision for OER, backed up by developments such as the Scottish Open Education Declaration.


I followed Melissa, providing a very brief introduction to Open Scotland and the Scottish Open Education Declaration, before changing tack to talk about open access to cultural heritage data and its value to open education. This isn’t a topic I usually talk about, but with a background in archaeology and an active interest in digital humanities and historical research, it’s an area that’s very close to my heart. As a short case study I used the example of Edinburgh University’s excavations at Loch na Berie broch on the Isle of Lewis, which I worked on in the late 1980s. Although the site has been extensively published, it’s not immediately obvious how to access the excavation archive. I’m sure it’s preserved somewhere, possibly within the university, perhaps at RCAHMS, or maybe at the National Museum of Scotland. Where ever it is, it’s not openly available, which is a shame, because if I was teaching a course on the North Atlantic Iron Age there is some data form the excavation that I might want to share with students. This is no reflection on the directors of the fieldwork project, it’s just one small example of how greater access to cultural heritage data would benefit open education. I also flagged up a rather frightening blog post, Dennis the Paywall Menace Stalks the Archives,  by Andrew Prescott which highlights the dangers of what can happen if we do not openly licence archival and cultural heritage data – it becomes locked behind commercial paywalls. However there are some excellent examples of open practice in the cultural heritage sector, such as the National Portrait Gallery’s clearly licensed digital collections and the work of the British Library Labs. However openness comes at a cost and we need to make greater efforts to explore new business and funding models to ensure that our digital cultural heritage is openly available to us all.

Ally Crockford, Wikimedian in Residence at the National Library of Scotland, spoke about the hugely successful Women, Science and Scottish History editathon recently held at the university. However she noted that as members of the university we are in a privileged position in that enables us to use non-open resources (books, journal articles, databases, artefacts) to create open knowledge. Furthermore, with Wikpedia’s push to cite published references, there is a danger of replicating existing knowledge hierarchies. Ally reminded us that as part of the educated elite, we have a responsibility to open our mindsets to all modes of knowledge creation. Publishing in Wikipedia also provides an opportunity to reimagine feedback in teaching and learning. Feedback should be an open participatory process, and what better way for students to learn this than from editing Wikipedia.

Robin Rice, of EDINA & Data Library, asked the question what does Open Access and Open Data sharing look like? Open Access publications are increasingly becoming the norm, but we’re not quite there yet with open data. It’s not clear if researchers will be cited if they make their data openly available and career rewards are uncertain. However there are huge benefits to opening access to data and citizen science initiatives; public engagement, crowd funding, data gathering and cleaning, and informed citizenry. In addition, social media can play an important role in working openly and transparently.

Robin Rice

James Bednar, talking about computational neuroscience and the problem of reproducibility, picked up this theme, adding that accountability is a big attraction of open data sharing. James recommended using iPython Notebook   for recording and sharing data and computational results and helping to make them reproducible. This promoted Anne-Marie Scott to comment on twitter:

@ammienoot: "Imagine students creating iPython notebooks... and then sharing them as OER #openEd"

“Imagine students creating iPython notebooks… and then sharing them as OER #openEd”

Very cool indeed.

James Stewart spoke about the benefits of crowdsourcing and citizen science.   Despite the buzz words, this is not a new idea, there’s a long tradition of citizens engaging in science. Darwin regularly received reports and data from amateur scientists. Maintaining transparency and openness is currently a big problem for science, but openness and citizen science can help to build trust and quality. James also cited Open Street Map as a good example of building community around crowdsourcing data and citizen science. Crowdsourcing initiatives create a deep sense of community – it’s not just about the science, it’s also about engagement.


After coffee (accompanied by Tunnocks caramel wafers – I approve!) We had a series of presentations on the student experience and students engagement with open knowledge.

Paul Johnson and Greg Tyler, from the Web, Graphics and Interaction section of IS,  spoke about the necessity of being more open and transparent with institutional data and the importance of providing more open data to encourage students to innovate. Hayden Bell highlighted the importance of having institutional open data directories and urged us to spend less time gathering data and more making something useful from it. Students are the source of authentic experience about being a student – we should use this! Student data hacks are great, but they often have to spend longer getting and parsing the data than doing interesting stuff with it. Steph Hay also spoke about the potential of opening up student data. VLEs inform the student experience; how can we open up this data and engage with students using their own data? Anonymised data from Learn was provided at Smart Data Hack 2015 but students chose not to use it, though it is not clear why.  Finally, Hans Christian Gregersen brought the day to a close with a presentation of Book.ed, one of the winning entries of the Smart Data Hack. Book.ed is an app that uses open data to allow students to book rooms and facilities around the university.

What really struck me about Open.Ed was the breadth of vision and the wide range of open knowledge initiatives scattered across the university.  The value of events like this is that they help to share this vision with fellow colleagues as that’s when the cross fertilisation of ideas really starts to take place.

P.S. another interesting talk came from Bert Remijsen, who spoke of the benefits he has found from publishing his linguistics research data using DataShare, particularly the ability to enable others to hear recordings of the sounds, words and songs described in his research papers, spoken and sung by the native speakers of Shilluk, with whom he works during his field research in South Sudan.


EPSRC Expectations Awareness Survey

As many of you will already know EPSRC set out its research data management (RDM) expectations for institutions in receipt of EPSRC grant funding in May 2011, this included the development of an institutional ‘Roadmap’. EPSRC assessment of compliance with these expectations will begin on 1 May 2015 for research outputs published on or after that date.

In order to comply with EPSRC expectations and to implement the University’s RDM Policy, the University of Edinburgh has invested significantly in RDM services, infrastructure (incl. storage and security) and support as detailed in the University of Edinburgh’s RDM Roadmap.

In an effort to gauge the University of Edinburgh’s ‘readiness’ in relation to EPSRC’s RDM expectations, we are conducting a short survey of EPSRC grant holders.

The survey aims to find out more about researcher awareness of those expectations concerning the management and provision of access to EPSRC-funded research data as detailed in the EPSRC Policy Framework on Research Data.

We aim to conduct follow-up interviews with EPSRC grant holders who are willing to talk through these issues in a bit more detail to help shape the development of the RDM services at the University of Edinburgh.

We will endeavour to make available some of our findings shortly. In the meantime, if you want to use or refer to our survey we have posted a ‘demo’version below:

Should you decide to make use of our survey, let us know, as we can potentially share our data with each other to benchmark our progress.

(As an aside Oxford University have crafted a useful data decision tree for EPSRC-funded researchers at Oxford)

Upadate: A link to the findings can be found at: http://datablog.is.ed.ac.uk/files/2016/07/EPSRC-RDM-Expectations-Awareness-Survey-Findings.pdf