Highlights from the RDM Programme Progress Report: February to April 2016

The membership of the Research Data Service Virtual Team across four divisions of IS was confirmed and met for the first time (to replace the former action group meetings) on 11 February where it was agreed meetings would be held approximately every six weeks for information and decision-making.

In February, the DataShare metadata was mapped to the PURE metadata and staff in L&UC and Data Library trained each other for creating dataset records in Pure and reviewing submissions in DataShare. It was agreed that staff would create records in Pure for items deposited in DataShare until the company (Elsevier) provides a mechanism for automatically inputting records into Pure.

In March, Jisc announced that the University of Edinburgh was selected as a framework supplier for their new Research Data Management Shared Service.

A review of the existing ethics processes in each college is in progress with Jacqueline McMahon at the College of Arts, Humanities and Social Sciences (CAHSS) to create a University-wide ethics template. There is also engagement with the School ethics committees at the School of Health in Social Sciences (HiSS), Moray House School of Education (MHSE), Law and School of Social and Political Science (SPS) in CAHSS.

The Research Data Management and Sharing (RDMS) Coursera MOOC opened for enrolment on 1 March 2016. This was completed in partnership with the University of North Carolina-Chapel Hill CRADLE project. Research Data Management and Sharing (RDMS) MOOC stats from the Coursera Dashboard reveal that as of 23 May 2016, there have been 5,429 visitors and 1,526 active learners; 335 visitors have completed the course.

The large data sharing investigation was completed for DataShare and reported previously. (Two new releases in DataShare defined: upload and download). Upload release (2.1) to go live 23 May 2016.

PURE dataset functionality is now included in standard PURE and Research Data Management (RDM) training. There are now 210 dataset records in PURE.

Four PhD interns were hired in mid-March to act as College representatives for the IS Innovation Fund Pioneering Research Data Exhibition. They will be employed until mid-December 2016.

A total of 363 staff and postgraduates attended RDM courses and workshops during this quarter.

There were 30 new DMPonline users and 55 new plans created during this quarter.

There are now 210 dataset metadata records in PURE.

A total of 56 datasets were deposited in DataShare during this quarter.

The total number of DataStore users rose from 12,948 in the previous quarter to 13,239 in this quarter, an increase of 291 new users.

National and International Engagement Activities

In February

  • Stuart Lewis gave a DataVault presentation at the International Digital Curation Conference (IDCC) in Amsterdam.

In March

  • A University news item was released to mark the launch of the Research Data Management and Sharing (RDMS) MOOC on Coursera. http://www.ed.ac.uk/news/2016/dataskills-010316
  • Stuart MacDonald gave an RDM presentation to trainee physicians at the Royal College of Physicians Edinburgh Course: Critical appraisal and research for trainees, Edinburgh. http://www.slideshare.net/smacdon2/rdm-for-trainee-physicians
  • Three delegates from Göttingen University were hosted here. The delegates have shared interests in RDM and visited to gain more insight into RDM support and experiences here.
  • Robin Rice gave an invited talk about the RDMS MOOC and web-based Survey Documentation and Analysis (SDA) tool to Learning, Teaching and Web and elearning@Ed Showcase and Network monthly gathering.

In April

As part of my responsibilities to cover the one year interim of Kerry Miller’s maternity leave, I will be writing blogs for this page until Kerry returns next summer.

Prior to this post, I worked the past 12 years as the geospatial metadata co-ordinator at EDINA. My primary role was to promote and support research data management and sharing amongst UK researchers and students using spatial data and geographical information.

Tony Mathys
Research Data Management Service Co-ordinator


Highlights from the RDM Programme Progress Report: November 2015 – January 2016

Data Seal of Approval have awarded DataShare Trusted Repository status; their assessment of our service can be read at https://assessment.datasealofapproval.org/assessment_175/seal/html/. In addition a major new release of DataShare was completed in November, this makes the code open in Github as well as making general improvements to the look and feel of the website.

The ‘interim’ DataVault is now in final testing and will be rolled out on a request basis to those researchers who can demonstrate an urgent need to use the service now rather than waiting until the final version is ready later this year. The phase three funding for development of the DataVault has been received from Jisc, this runs from March to August, so the final version should be ready for launch sometime after this. The project was presented at the International Digital Curation Conference in February 2016.

Over the three month period a total of 328 staff and postgraduate researchers have attended a Research Data Management (RDM) course or workshop.

Work on the MANTRA MOOC (Massive Open Online Course) was expected to be finalised in February and launched on 1st March, at the following URL: https://www.coursera.org/learn/data-management.

University of Edinburgh wrote the Working with Data section (one out of 5 weeks of the course) and with the help of the Learning, Teaching and Web division of Information Services completed two video interviews with researchers and a ‘vox pop’ video clip of clinical researchers at the EQUATOR conference in Edinburgh in autumn, 2015. The content is open source and videos can be added to our YouTube channel to help with promotion. There will be some income from this, but a smaller portion than our partner, the University of North Carolina, based on certificates of completion priced at $49 or £33.

The need to create a dataset record in PURE for each dataset published, or referenced in a publication, is now being emphasised in all Research Data Service communications, formal and informal, and to staff at all levels. Uptake is understandably low at this point but we hope to see a steady increase as researchers and support staff begin to see the benefits of adding datasets to their research profile. In the case of DataShare records, a draft mapping of fields between DataShare and PURE has been produced as a start of a plan for migrating records from DataShare to PURE.

By the end of January 2016, 69 records had been created and published on Edinburgh Research Explorer.

Four interns have been employed using funding from Jisc as part of the UK Research Data Discovery Service (UKRDDS) project which aims to create a national aggregate register of data sets.  A trial site is available at: http://ckan.data.alpha.jisc.ac.uk/. The UKRDDS interns will help to create PURE records and upload open data into DataShare, and raise awareness of RDM generally within their schools. There are currently three PhD interns in place in LLC, SOS, and Roslin, two more in LLC, & DIPM will start in February. The approach each intern takes will depend on the nature and structure of their school and will, in some cases, be mediated by research administrators.

An innovation fund grant has been received to fund the delivery of an exhibition “Pioneering Research Data”. Each college will be represented by a PhD intern, the recruitment of these has already begun and they should be in post by the end of March. The Exhibition is due to be delivered in November of this year.

National and International Engagement Activities

Robin Rice led a panel at the IPRES conference, Chapel Hill, North Carolina, on 3rd November called ‘Good, better, best’? Examining the range and rationales of institutional data curation practices’.

Robin Rice had a proposal accepted for the forthcoming Force11 (2016) conference, on Overcoming Obstacles to Sharing Data about Human Subjects, building on the training course we are delivering, Working with Personal and Sensitive Data.

Kerry Miller
RDM Service Coordinator


Publishing Data Workflows

[Guest post from Angus Whyte, Digital Curation Centre]

In the first week of March the 7th Plenary session of the Research Data Alliance got underway in Tokyo. Plenary sessions are the fulcrum of RDA activity, when its many Working Groups and Interest Groups try to get as much leverage as they can out of the previous 6 months of voluntary activity, which is usually coordinated through crackly conference calls.

The Digital Curation Centre (DCC) and others in Edinburgh contribute to a few of these groups, one being the Working Group (WG) on Publishing Data Workflows. Like all such groups it has a fixed time span and agreed deliverables. This WG completes its run at the Tokyo plenary, so there’s no better time to reflect on why DCC has been involved in it, how we’ve worked with others in Edinburgh and what outcomes it’s had.

DCC takes an active part in groups where we see a direct mutual benefit, for example by finding content for our guidance publications. In this case we have a How-to guide planned on ‘workflows for data preservation and publication’. The Publishing Data Workflows WG has taken some initial steps towards a reference model for data publishing, so it has been a great opportunity to track the emerging consensus on best practice, not to mention examples we can use.

One of those examples was close to hand, and DataShare’s workflow and checklist for deposit is identified in the report alongside workflows from other participating repositories and data centres. That report is now available on Zenodo. [1]

In our mini-case studies, the WG found no hard and fast boundaries between ‘data publishing’ and what any repository does when making data publicly accessible. It’s rather a question of how much additional linking and contextualisation is in place to increase data visibility, assure the data quality, and facilitate its reuse. Here’s the working definition we settled on in that report:

Research data publishing is the release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way.

The ‘key components’ of data publishing are illustrated in this diagram produced by Claire C. Austin.

Data publishing components. Source: Claire C. Austin et al [1]

Data publishing components. Source: Claire C. Austin et al [1]

As the Figure implies, a variety of workflows are needed to build and join up the components. They include those ‘upstream’ around the data collection and analysis, ‘midstream’ workflows around data deposit, packaging and ingest to a repository, and ‘downstream’ to link to other systems. These downstream links could be to third-party preservation systems, publisher platforms, metadata harvesting and citation tracking systems.

The WG recently began some follow-up work to our report that looks ‘upstream’ to consider how the intent to publish data is changing research workflows. Links to third-party systems can also be relevant in these upstream workflows. It has long been an ambition of RDM to capture as much as possible of the metadata and context, as early and as easily as possible. That has been referred to variously as ‘sheer curation’ [2], and ‘publication at source [3]). So we gathered further examples, aiming to illustrate some of the ways that repositories are connecting with these upstream workflows.

Electronic lab notebooks (ELN) can offer one route towards fly-on-the-wall recording of the research process, so the collaboration between Research Space and University of Edinburgh is very relevant to the WG. As noted previously on these pages [4] ,[5], the RSpace ELN has been integrated with DataShare so researchers can deposit directly into it. So we appreciated the contribution Rory Macneil (Research Space) and Pauline Ward (UoE Data Library) made to describe that workflow, one of around half a dozen gathered at the end of the year.

The examples the WG collected each show how one or more of the recommendations in our report can be implemented. There are 5 of these short and to the point recommendations:

  1. Start small, building modular, open source and shareable components
  2. Implement core components of the reference model according to the needs of the stakeholder
  3. Follow standards that facilitate interoperability and permit extensions
  4. Facilitate data citation, e.g. through use of digital object PIDs, data/article linkages, researcher PIDs
  5. Document roles, workflows and services

The RSpace-DataShare integration example illustrates how institutions can follow these recommendations by collaborating with partners. RSpace is not open source, but the collaboration does use open standards that facilitate interoperability, namely METS and SWORD, to package up lab books and deposit them for open data sharing. DataShare facilitates data citation, and the workflows for depositing from RSpace are documented, based on DataShare’s existing checklist for depositors. The workflow integrating RSpace with DataShare is shown below:

RSpace-DataShare Workflows

RSpace-DataShare Workflows

For me one of the most interesting things about this example was learning about the delegation of trust to research groups that can result. If the DataShare curation team can identify an expert user who is planning a large number of data deposits over a period of time, and train them to apply DataShare’s curation standards themselves they would be given administrative rights over the relevant Collection in the database, and the curation step would be entrusted to them for the relevant Collection.

As more researchers take up the challenges of data sharing and reuse, institutional data repositories will need to make depositing as straightforward as they can. Delegating responsibilities and the tools to fulfil them has to be the way to go.


[1] Austin, C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Available at: http://dx.doi.org/10.5281/zenodo.34542

[2] ‘Sheer Curation’ Wikipedia entry. Available at: https://en.wikipedia.org/wiki/Digital_curation#.22Sheer_curation.22

[3] Frey, J. et al (2015) Collection, Curation, Citation at Source: Publication@Source 10 Years On. International Journal of Digital Curation. 2015, Vol. 10, No. 2, pp. 1-11


[4] Macneil, R. (2014) Using an Electronic Lab Notebook to Deposit Data http://datablog.is.ed.ac.uk/2014/04/15/using-an-electronic-lab-notebook-to-deposit-data/

[5] Macdonald, S. and Macneil, R. Service Integration to Enhance Research Data Management: RSpace Electronic Laboratory Notebook Case Study International Journal of Digital Curation 2015, Vol. 10, No. 1, pp. 163-172. http://doi:10.2218/ijdc.v10i1.354

Angus Whyte is a Senior Institutional Support Officer at the Digital Curation Centre.



Edinburgh DataShare – new features for users and depositors

I was asked recently on Twitter if our data library was still happily using DSpace for data – the topic of a 2009 presentation I gave at a DSpace User Group meeting. In responding (answer: yes!) I recalled that I’d intended to blog about some of the rich new features we’ve either adopted from the open source community or developed ourselves to deliver our data users and depositors a better service and fulfill deliverables in the University’s Research Data Management Roadmap.

Edinburgh DataShare was built as an output of the DISC-UK DataShare project, which explored pathways for academics to share their research data over the Internet at the Universities of Edinburgh, Oxford and Southampton (2007-2009). The repository is based on DSpace software, the most popular open source repository system in use, globally.  Managed by the Data Library team within Information Services, it is now a key component in the UoE’s Research Data Programme, endorsed by its academic-led steering group.

An open access, institutional data repository, Edinburgh DataShare currently holds 246 datasets across collections in 17 out of 22 communities (schools) of the University and is listed in the Re3data Registry of Research Data Repositories and indexed by Thomson-Reuters’ Data Citation Index.

Last autumn, the university joined DataCite, an international standards body that assigns persistent identifiers in the form of Digital Object Identifiers (DOIs) to datasets. DOIs are now assigned to every item in the repository, and are included in the citation that appears on each landing page. This helps to ensure that even after the DataShare system no longer exists, as long as the data have a home, the DOI will be able to direct the user to the new location. Just as importantly, it helps data creators gain credit for their published data through proper data citation in textual publications, including their own journal articles that explain the results of their data analyses.

CaptureThe autumn release also streamlined our batch ingest process to assist depositors with large and voluminous data files by getting around the web upload front-end. Currently we are able to accept files up to 10 GB in size but we are being challenged to allow ever greater file sizes.

Making the most of metadata

Discover panel screenshot

Example from Geosciences community

Every landing page (home, community, collection) now has a ‘Discover’ panel giving top hits for each metadata field (such as subject classification, keyword, funder, data type, spatial coverage). The panel acts as a filter when drilling down to different levels,  allowing the most common values to be ‘discovered’ within each section.

The usage statistics at each level  are now publicly viewable as well, so depositors and others can see how often an item is viewed or downloaded. This is useful for many reasons. Users can see what is most useful in the repository; depositors can see if their datasets are being used; stakeholders can compare the success of different communities. By being completely open and transparent, this is a step towards ‘alt-metrics’ or alternative ways measuring scholarly or scientific impact. The repository is now also part of IRUS-UK, (Institutional Repository Usage Statistics UK), which uses the COUNTER standard to make repository usage statistics nationally comparable.

What’s coming?

Stay tuned for future improvements around a new look and feel, preview and display by data type, streaming support, bittorent downloading, and Linked Open Data.

Robin Rice
EDINA and Data Library