FAIR dues to the Research Data Alliance

It has been a while since we’ve blogged about the Research Data Alliance (RDA), and as an organisation it has come into its own since its beginnings in 2013. One can count on discovering the international state of the art in a range of data-related topics covered by its interest groups and working groups which meet at its plenary events, held every six months. That is why I attended the 13th RDA Plenary held in Philadelphia earlier this month and I was not disappointed.

I arrived Monday morning in time for the second day of a pre-conference sponsored by CODATA on FAIR and Responsible Research Data Management at Drexel University. FAIR is a popular concept amongst research funders for illustrating data management done right: by the time you complete your research project (or shortly after) your data should be Findable, Accessible, Interoperable and Reusable.

Fair enough, but we data repository providers also want to know how to build the ecosystems that will make it super-easy for researchers to make their data FAIR, so we need to talk to each other to compare notes and decide exactly what each letter means in practice.

Borrowed from OpenAire 

Amongst the highlights were some tools and resources for researchers or data providers mentioned by various speakers.

  • The Australian Research Data Commons (ARDC) has created a FAIR self-assessment tool.
  • For those who like stories, the Danish National Archives have created a FAIRytale to help understand the FAIR principles.
  • ARDC with Library Carpentry conducted a sprint that led to a disciplinary smorgasbord called Top Ten Data and Software Things.
  • DataCite offers a Repository Finder tool through its union with re3data.org to find the most appropriate repository in which to deposit your data.
  • Resources for “implementation networks” from the EU-funded project GO FAIR, including training materials under the rubric of GO TRAIN.
  • The Geo-science focused Enabling FAIR Data Project is signing up publishers and repositories to commitment statements, and has a user-friendly FAQ explaining why researchers should care and what they can do.
  • A brand new EU-funded project, FAIRsFAIR (Fostering FAIR Data Practice in Europe) is taking things to the next level, building new networks to certifying learners and trainers, researchers and repositories in FAIRdom.

That last project’s ambitions are described in this blog post by Joy Davidson at DCC. Another good blog post I found about the FAIR pre-conference event is by Rebecca Springer at Ithaka S+R. If I get a chance I’ll add another brief post for the main conference.

Robin Rice
Data Librarian & Head of Research Data Support
Library & University Collections

Share

Reflections on Repository Fringe 2017

The following is a guest post by Mick Eadie, Research Information Management Officer at University of Glasgow, on his impressions of Repository Fringe 2017.

Capture1From the Arts

The first day afternoon 10×10 (lightning talk) sessions had many of the presentations on Research Data topics.  We heard talks about repositories in the arts; evolving research data policy at national and pan-national level; and archival storage and integrations between research data repositories and other systems like Archivematica, EPrints and Pure.

Repositories and their use in managing research data in the arts was kicked off with Nicola Siminson from the Glasgow School of Art with her talk on What RADAR did next: developing a peer review process for research plans.  Nicola explained how EPrints has been developed to maximise the value of research data content at GSA by making it more visually appealing and better able to deal with a multitude of non-text based objects and artefacts.   She then outlined GSA’s recently developed Annual Research Planning (ARP) tool which is an EPrints add-on that allows the researcher to provide information on their current and planned research activities and potential impact.

GSA have built on this functionality to enable the peer-reviewing of ARPs, which means they can be shared and commented on by others.   This has led to significant uptake in the use of the repository by researchers as they are keen to keep their research profile up-to-date, which has in turn raised the repository profile and increased data deposits.  There are also likely to be cost-benefits to the institution by using an existing system to help to manage research information as well as outputs, as it keeps content accessible from one place and means the School doesn’t need to procure separate systems.

On Policy

We heard from Martin Donnelly from the DCC on National Open Data and Open Science Policies in Europe.  Martin talked about the work done by the DCC and SPARC Europe in assessing policies from across Europe to assess the methodologies used by countries and funders to promote the concept of Open Data across the continent.   They found some interesting variants across countries: some funder driven, others more national directives, plans and roadmaps.  It was interesting to see how a consensus was emerging around best practice and how the EU through its Horizon 2020 Open Research Data Pilot seemed to be emerging as a driver for increased take up and action.

Storage, Preservation and Integration

No research data day would be complete without discussing archival storage and preservation.  Pauline Ward from Edinburgh University gave us an update on Edinburgh DataVault: Local implementation of Jisc DataVault: the value of testing. She highlighted the initial work done at national level by Jisc and the research data Spring project, and went on to discuss the University of Edinburgh’s local version of Data Vault which integrates with their CRIS system (Pure) – allowing a once only upload of the data which links to metadata in the CRIS and creates an archival version of the data.  Pauline also hinted at future integration with DropBox which will be interesting to see develop.

Alan Morrison from the University of Strathclyde continued on the systems integration and preservation theme by giving as assessment of Data Management & Preservation using PURE and Archivematica. He gave us the background to Strathclyde’s systems and workflows between Pure and Archivematica, highlighting some interesting challenges in dealing with file-formats in the STEM subjects which are often proprietary and non-standard.

Share

Publishing Data Workflows

[Guest post from Angus Whyte, Digital Curation Centre]

In the first week of March the 7th Plenary session of the Research Data Alliance got underway in Tokyo. Plenary sessions are the fulcrum of RDA activity, when its many Working Groups and Interest Groups try to get as much leverage as they can out of the previous 6 months of voluntary activity, which is usually coordinated through crackly conference calls.

The Digital Curation Centre (DCC) and others in Edinburgh contribute to a few of these groups, one being the Working Group (WG) on Publishing Data Workflows. Like all such groups it has a fixed time span and agreed deliverables. This WG completes its run at the Tokyo plenary, so there’s no better time to reflect on why DCC has been involved in it, how we’ve worked with others in Edinburgh and what outcomes it’s had.

DCC takes an active part in groups where we see a direct mutual benefit, for example by finding content for our guidance publications. In this case we have a How-to guide planned on ‘workflows for data preservation and publication’. The Publishing Data Workflows WG has taken some initial steps towards a reference model for data publishing, so it has been a great opportunity to track the emerging consensus on best practice, not to mention examples we can use.

One of those examples was close to hand, and DataShare’s workflow and checklist for deposit is identified in the report alongside workflows from other participating repositories and data centres. That report is now available on Zenodo. [1]

In our mini-case studies, the WG found no hard and fast boundaries between ‘data publishing’ and what any repository does when making data publicly accessible. It’s rather a question of how much additional linking and contextualisation is in place to increase data visibility, assure the data quality, and facilitate its reuse. Here’s the working definition we settled on in that report:

Research data publishing is the release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way.

The ‘key components’ of data publishing are illustrated in this diagram produced by Claire C. Austin.

Data publishing components. Source: Claire C. Austin et al [1]

Data publishing components. Source: Claire C. Austin et al [1]

As the Figure implies, a variety of workflows are needed to build and join up the components. They include those ‘upstream’ around the data collection and analysis, ‘midstream’ workflows around data deposit, packaging and ingest to a repository, and ‘downstream’ to link to other systems. These downstream links could be to third-party preservation systems, publisher platforms, metadata harvesting and citation tracking systems.

The WG recently began some follow-up work to our report that looks ‘upstream’ to consider how the intent to publish data is changing research workflows. Links to third-party systems can also be relevant in these upstream workflows. It has long been an ambition of RDM to capture as much as possible of the metadata and context, as early and as easily as possible. That has been referred to variously as ‘sheer curation’ [2], and ‘publication at source [3]). So we gathered further examples, aiming to illustrate some of the ways that repositories are connecting with these upstream workflows.

Electronic lab notebooks (ELN) can offer one route towards fly-on-the-wall recording of the research process, so the collaboration between Research Space and University of Edinburgh is very relevant to the WG. As noted previously on these pages [4] ,[5], the RSpace ELN has been integrated with DataShare so researchers can deposit directly into it. So we appreciated the contribution Rory Macneil (Research Space) and Pauline Ward (UoE Data Library) made to describe that workflow, one of around half a dozen gathered at the end of the year.

The examples the WG collected each show how one or more of the recommendations in our report can be implemented. There are 5 of these short and to the point recommendations:

  1. Start small, building modular, open source and shareable components
  2. Implement core components of the reference model according to the needs of the stakeholder
  3. Follow standards that facilitate interoperability and permit extensions
  4. Facilitate data citation, e.g. through use of digital object PIDs, data/article linkages, researcher PIDs
  5. Document roles, workflows and services

The RSpace-DataShare integration example illustrates how institutions can follow these recommendations by collaborating with partners. RSpace is not open source, but the collaboration does use open standards that facilitate interoperability, namely METS and SWORD, to package up lab books and deposit them for open data sharing. DataShare facilitates data citation, and the workflows for depositing from RSpace are documented, based on DataShare’s existing checklist for depositors. The workflow integrating RSpace with DataShare is shown below:

RSpace-DataShare Workflows

RSpace-DataShare Workflows

For me one of the most interesting things about this example was learning about the delegation of trust to research groups that can result. If the DataShare curation team can identify an expert user who is planning a large number of data deposits over a period of time, and train them to apply DataShare’s curation standards themselves they would be given administrative rights over the relevant Collection in the database, and the curation step would be entrusted to them for the relevant Collection.

As more researchers take up the challenges of data sharing and reuse, institutional data repositories will need to make depositing as straightforward as they can. Delegating responsibilities and the tools to fulfil them has to be the way to go.

 

[1] Austin, C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Available at: http://dx.doi.org/10.5281/zenodo.34542

[2] ‘Sheer Curation’ Wikipedia entry. Available at: https://en.wikipedia.org/wiki/Digital_curation#.22Sheer_curation.22

[3] Frey, J. et al (2015) Collection, Curation, Citation at Source: Publication@Source 10 Years On. International Journal of Digital Curation. 2015, Vol. 10, No. 2, pp. 1-11

http://doi:10.2218/ijdc.v10i2.377

[4] Macneil, R. (2014) Using an Electronic Lab Notebook to Deposit Data http://datablog.is.ed.ac.uk/2014/04/15/using-an-electronic-lab-notebook-to-deposit-data/

[5] Macdonald, S. and Macneil, R. Service Integration to Enhance Research Data Management: RSpace Electronic Laboratory Notebook Case Study International Journal of Digital Curation 2015, Vol. 10, No. 1, pp. 163-172. http://doi:10.2218/ijdc.v10i1.354

Angus Whyte is a Senior Institutional Support Officer at the Digital Curation Centre.

 

Share

MANTRA @ Melbourne

The aim of the Melbourne_MANTRA project was to review, adapt and pilot an online training program in research data management (RDM) for graduate researchers at the University of Melbourne. Based on the UK-developed and acclaimed MANTRA program, the project reviewed current UK content and assessed its suitability for the Australian and Melbourne research context. The project team adapted the original MANTRA modules and incorporated new content as required, in order to develop the refreshed Melbourne_MANTRA local version. Local expert reviewers ensured the localised content met institutional and funder requirements. Graduate researchers were recruited to complete the training program and contribute to the detailed evaluation of the content and associated resources.

The project delivered eight revised training modules, which were evaluated as part of the pilot via eight online surveys (one for each module) plus a final, summative evaluation survey. Overall, the Melbourne_MANTRA pilot training program was well received by participants. The content of the training modules generally gathered high scores, with low scores markedly sparse across all eight modules. The participants recognised that the content of the training program should be tailored to the institutional context, as opposed to providing general information and theory around the training topics. In its current form, the content of the modules only partly satisfies the requirements of our evaluators, who made valuable recommendations for further improving the training program.

In 2016, the University of Melbourne will revisit MANTRA with a view to implement evaluation feedback into the program; update the modules with new content, audiovisual materials and exercises; augment targeted delivery via the University’s LMS; and work towards incorporating Melbourne_MANTRA in induction and/or reference materials for new and current postgraduates and early career researchers.

The current version is available at: http://library.unimelb.edu.au/digitalscholarship/training_and_outreach/mantra2

Dr Leo Konstantelos
Manager, Digital Scholarship
Research | Research & Collections
Academic Services
University of Melbourne
Melbourne, Australia

Share