Highlights from the RDM Programme Progress Report: November 2015 – January 2016

Data Seal of Approval have awarded DataShare Trusted Repository status; their assessment of our service can be read at https://assessment.datasealofapproval.org/assessment_175/seal/html/. In addition a major new release of DataShare was completed in November, this makes the code open in Github as well as making general improvements to the look and feel of the website.

The ‘interim’ DataVault is now in final testing and will be rolled out on a request basis to those researchers who can demonstrate an urgent need to use the service now rather than waiting until the final version is ready later this year. The phase three funding for development of the DataVault has been received from Jisc, this runs from March to August, so the final version should be ready for launch sometime after this. The project was presented at the International Digital Curation Conference in February 2016.

Over the three month period a total of 328 staff and postgraduate researchers have attended a Research Data Management (RDM) course or workshop.

Work on the MANTRA MOOC (Massive Open Online Course) was expected to be finalised in February and launched on 1st March, at the following URL: https://www.coursera.org/learn/data-management.

University of Edinburgh wrote the Working with Data section (one out of 5 weeks of the course) and with the help of the Learning, Teaching and Web division of Information Services completed two video interviews with researchers and a ‘vox pop’ video clip of clinical researchers at the EQUATOR conference in Edinburgh in autumn, 2015. The content is open source and videos can be added to our YouTube channel to help with promotion. There will be some income from this, but a smaller portion than our partner, the University of North Carolina, based on certificates of completion priced at $49 or £33.

The need to create a dataset record in PURE for each dataset published, or referenced in a publication, is now being emphasised in all Research Data Service communications, formal and informal, and to staff at all levels. Uptake is understandably low at this point but we hope to see a steady increase as researchers and support staff begin to see the benefits of adding datasets to their research profile. In the case of DataShare records, a draft mapping of fields between DataShare and PURE has been produced as a start of a plan for migrating records from DataShare to PURE.

By the end of January 2016, 69 records had been created and published on Edinburgh Research Explorer.

Four interns have been employed using funding from Jisc as part of the UK Research Data Discovery Service (UKRDDS) project which aims to create a national aggregate register of data sets.  A trial site is available at: http://ckan.data.alpha.jisc.ac.uk/. The UKRDDS interns will help to create PURE records and upload open data into DataShare, and raise awareness of RDM generally within their schools. There are currently three PhD interns in place in LLC, SOS, and Roslin, two more in LLC, & DIPM will start in February. The approach each intern takes will depend on the nature and structure of their school and will, in some cases, be mediated by research administrators.

An innovation fund grant has been received to fund the delivery of an exhibition “Pioneering Research Data”. Each college will be represented by a PhD intern, the recruitment of these has already begun and they should be in post by the end of March. The Exhibition is due to be delivered in November of this year.

National and International Engagement Activities

Robin Rice led a panel at the IPRES conference, Chapel Hill, North Carolina, on 3rd November called ‘Good, better, best’? Examining the range and rationales of institutional data curation practices’.

Robin Rice had a proposal accepted for the forthcoming Force11 (2016) conference, on Overcoming Obstacles to Sharing Data about Human Subjects, building on the training course we are delivering, Working with Personal and Sensitive Data.

Kerry Miller
RDM Service Coordinator


Publishing Data Workflows

[Guest post from Angus Whyte, Digital Curation Centre]

In the first week of March the 7th Plenary session of the Research Data Alliance got underway in Tokyo. Plenary sessions are the fulcrum of RDA activity, when its many Working Groups and Interest Groups try to get as much leverage as they can out of the previous 6 months of voluntary activity, which is usually coordinated through crackly conference calls.

The Digital Curation Centre (DCC) and others in Edinburgh contribute to a few of these groups, one being the Working Group (WG) on Publishing Data Workflows. Like all such groups it has a fixed time span and agreed deliverables. This WG completes its run at the Tokyo plenary, so there’s no better time to reflect on why DCC has been involved in it, how we’ve worked with others in Edinburgh and what outcomes it’s had.

DCC takes an active part in groups where we see a direct mutual benefit, for example by finding content for our guidance publications. In this case we have a How-to guide planned on ‘workflows for data preservation and publication’. The Publishing Data Workflows WG has taken some initial steps towards a reference model for data publishing, so it has been a great opportunity to track the emerging consensus on best practice, not to mention examples we can use.

One of those examples was close to hand, and DataShare’s workflow and checklist for deposit is identified in the report alongside workflows from other participating repositories and data centres. That report is now available on Zenodo. [1]

In our mini-case studies, the WG found no hard and fast boundaries between ‘data publishing’ and what any repository does when making data publicly accessible. It’s rather a question of how much additional linking and contextualisation is in place to increase data visibility, assure the data quality, and facilitate its reuse. Here’s the working definition we settled on in that report:

Research data publishing is the release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way.

The ‘key components’ of data publishing are illustrated in this diagram produced by Claire C. Austin.

Data publishing components. Source: Claire C. Austin et al [1]

Data publishing components. Source: Claire C. Austin et al [1]

As the Figure implies, a variety of workflows are needed to build and join up the components. They include those ‘upstream’ around the data collection and analysis, ‘midstream’ workflows around data deposit, packaging and ingest to a repository, and ‘downstream’ to link to other systems. These downstream links could be to third-party preservation systems, publisher platforms, metadata harvesting and citation tracking systems.

The WG recently began some follow-up work to our report that looks ‘upstream’ to consider how the intent to publish data is changing research workflows. Links to third-party systems can also be relevant in these upstream workflows. It has long been an ambition of RDM to capture as much as possible of the metadata and context, as early and as easily as possible. That has been referred to variously as ‘sheer curation’ [2], and ‘publication at source [3]). So we gathered further examples, aiming to illustrate some of the ways that repositories are connecting with these upstream workflows.

Electronic lab notebooks (ELN) can offer one route towards fly-on-the-wall recording of the research process, so the collaboration between Research Space and University of Edinburgh is very relevant to the WG. As noted previously on these pages [4] ,[5], the RSpace ELN has been integrated with DataShare so researchers can deposit directly into it. So we appreciated the contribution Rory Macneil (Research Space) and Pauline Ward (UoE Data Library) made to describe that workflow, one of around half a dozen gathered at the end of the year.

The examples the WG collected each show how one or more of the recommendations in our report can be implemented. There are 5 of these short and to the point recommendations:

  1. Start small, building modular, open source and shareable components
  2. Implement core components of the reference model according to the needs of the stakeholder
  3. Follow standards that facilitate interoperability and permit extensions
  4. Facilitate data citation, e.g. through use of digital object PIDs, data/article linkages, researcher PIDs
  5. Document roles, workflows and services

The RSpace-DataShare integration example illustrates how institutions can follow these recommendations by collaborating with partners. RSpace is not open source, but the collaboration does use open standards that facilitate interoperability, namely METS and SWORD, to package up lab books and deposit them for open data sharing. DataShare facilitates data citation, and the workflows for depositing from RSpace are documented, based on DataShare’s existing checklist for depositors. The workflow integrating RSpace with DataShare is shown below:

RSpace-DataShare Workflows

RSpace-DataShare Workflows

For me one of the most interesting things about this example was learning about the delegation of trust to research groups that can result. If the DataShare curation team can identify an expert user who is planning a large number of data deposits over a period of time, and train them to apply DataShare’s curation standards themselves they would be given administrative rights over the relevant Collection in the database, and the curation step would be entrusted to them for the relevant Collection.

As more researchers take up the challenges of data sharing and reuse, institutional data repositories will need to make depositing as straightforward as they can. Delegating responsibilities and the tools to fulfil them has to be the way to go.


[1] Austin, C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Available at: http://dx.doi.org/10.5281/zenodo.34542

[2] ‘Sheer Curation’ Wikipedia entry. Available at: https://en.wikipedia.org/wiki/Digital_curation#.22Sheer_curation.22

[3] Frey, J. et al (2015) Collection, Curation, Citation at Source: Publication@Source 10 Years On. International Journal of Digital Curation. 2015, Vol. 10, No. 2, pp. 1-11


[4] Macneil, R. (2014) Using an Electronic Lab Notebook to Deposit Data http://datablog.is.ed.ac.uk/2014/04/15/using-an-electronic-lab-notebook-to-deposit-data/

[5] Macdonald, S. and Macneil, R. Service Integration to Enhance Research Data Management: RSpace Electronic Laboratory Notebook Case Study International Journal of Digital Curation 2015, Vol. 10, No. 1, pp. 163-172. http://doi:10.2218/ijdc.v10i1.354

Angus Whyte is a Senior Institutional Support Officer at the Digital Curation Centre.



New MOOC! Research Data Management and Sharing

[Guest post from Dr. Helen Tibbo, University of North Carolina-Chapel Hill]

The School of Information and Library Science and the Odum Institute at the University of North Carolina-Chapel Hill and EDINA at the University of Edinburgh are pleased to announce the forthcoming Coursera MOOC (Massive Open Online Course), Research Data Management and Sharing.

CaptureThis is a collaboration of the UNC-CH CRADLE team (Curating Research Assets and Data Using Lifecycle Education) and MANTRA. CRADLE has been funded in part by the Institute of Museum and Library Services to develop training for both researchers and library professionals. MANTRA was designed as a prime resource for postgraduate training in research data management skills and is used by learners worldwide.

The MOOC uses the Coursera on-demand format to provide short, video-based lessons and assessments across a five-week period, but learners can proceed at their own pace. Although no formal credit is assigned for the MOOC, Statements of Accomplishment will be available to any learner who completes a course for a small fee.

The Research Data Management and Sharing MOOC will launch 1st March, 2016, and enrolment is open now. Subjects covered in the 5-week course follow the stages of any research project. They are:

  • Understanding Research Data
  • Data Management Planning
  • Working with Data
  • Sharing Data
  • Archiving Data

Dr. Helen Tibbo from the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill delivers four of the five sets of lessons, and Sarah Jones, Digital Curation Centre, delivers the University of Edinburgh-developed content in Week 3 (Working with Data). Quizzes and supplementary videos add to the learning experience, and assignments are peer reviewed by fellow learners, with questions and answers handled by peers and team teachers in the forum.

Staff from both organizations will monitor the learning forums and the peer-reviewed assignments to make sure learners are on the right track, and to watch for adjustments needed in course content.

The course is open to enrolment now, and will ‘go live’ on 1st March.

Hashtag: #RDMSmooc

A preview of one of the supplementary videos is now available on Youtube:

Please join us in this data adventure.

Dr. Helen R. Tibbo, Alumni Distinguished Professor
President, 2010-2011 & Fellow, Society of American Archivists
School of Information and Library Science
201 Manning Hall, CB#3360
University of North Carolina at Chapel Hill
Chapel Hill, NC 27599-3360
Tel: 919-962-8063
Fax: 919-962-8071


MANTRA @ Melbourne

The aim of the Melbourne_MANTRA project was to review, adapt and pilot an online training program in research data management (RDM) for graduate researchers at the University of Melbourne. Based on the UK-developed and acclaimed MANTRA program, the project reviewed current UK content and assessed its suitability for the Australian and Melbourne research context. The project team adapted the original MANTRA modules and incorporated new content as required, in order to develop the refreshed Melbourne_MANTRA local version. Local expert reviewers ensured the localised content met institutional and funder requirements. Graduate researchers were recruited to complete the training program and contribute to the detailed evaluation of the content and associated resources.

The project delivered eight revised training modules, which were evaluated as part of the pilot via eight online surveys (one for each module) plus a final, summative evaluation survey. Overall, the Melbourne_MANTRA pilot training program was well received by participants. The content of the training modules generally gathered high scores, with low scores markedly sparse across all eight modules. The participants recognised that the content of the training program should be tailored to the institutional context, as opposed to providing general information and theory around the training topics. In its current form, the content of the modules only partly satisfies the requirements of our evaluators, who made valuable recommendations for further improving the training program.

In 2016, the University of Melbourne will revisit MANTRA with a view to implement evaluation feedback into the program; update the modules with new content, audiovisual materials and exercises; augment targeted delivery via the University’s LMS; and work towards incorporating Melbourne_MANTRA in induction and/or reference materials for new and current postgraduates and early career researchers.

The current version is available at: http://library.unimelb.edu.au/digitalscholarship/training_and_outreach/mantra2

Dr Leo Konstantelos
Manager, Digital Scholarship
Research | Research & Collections
Academic Services
University of Melbourne
Melbourne, Australia